RemoteCLIP A Vision Language Foundation Model for Remote Sensing
- 论文名称: RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
- 模型架构: VLM
- Visual Encoder: CNN, Transformer
- Text Encoder: Transformer
- Model Details: (CLIP)Vision Encoder:ResNet-50/ViT-Base/ViT-LargeText Encoder:Transformer Architecture
- Task: Image-text Retrieval, Scene Classification, Object Counting
- Link: https://arxiv.org/abs/2306.11029
- Code/Project: https://github.com/ChenDelong1999/RemoteCLIP
- Survey Inclusion: awesome-remote-sensing-vision-language-models
- Published in: TGRS 2024
This post is licensed under CC BY 4.0 by the author.