RS5M and GeoRSCLIP A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
- 论文名称: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
- 模型架构: VLM
- Visual Encoder: Transformer
- Text Encoder: Transformer
- Model Details: (CLIP)
- Task: Scene Classification, Image-text Retrieval, Semantic Localization
- Link: https://arxiv.org/abs/2306.11300
- Code/Project: -
- Survey Inclusion: awesome-remote-sensing-vision-language-models
- Published in: Arxiv 2023
This post is licensed under CC BY 4.0 by the author.