VLM 10
- Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
- RemoteCLIP A Vision Language Foundation Model for Remote Sensing
- Large Language Models for Captioning and Retrieving Remote Sensing Images
- SkyScript A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
- Language-aware domain generalization network for cross-scene hyperspectral image classification
- Vlca vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning
- Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
- Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
- S-CLIP Semi-supervised Vision-Language Learning using Few Specialist Captions
- RS5M and GeoRSCLIP A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing