Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
- 论文名称: Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
- 模型架构: VLM
- Visual Encoder: Transformer
- Model Details: Vision Encoder:CLIP ViT-B
- Task: Scene Classification, RS VQA, Semantic Segmentation, Image-text Retrieval
- Link: https://arxiv.org/abs/2312.06960
- Code/Project: -
- Short Summary: 介绍了一种在不使用任何文本注释,为遥感图像训练视觉语言模型的方法。关键是利用地面上拍摄的共同位置的互联网图像作为连接遥感图像和语言的中介
- Published in: ICLR 2024
- 备注: 提出了一种训练遥感图像的视觉-语言模型的方法
This post is licensed under CC BY 4.0 by the author.