Post

RemoteCLIP A Vision Language Foundation Model for Remote Sensing

  • 论文名称: RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
  • 模型架构: VLM
  • Visual Encoder: CNN, Transformer
  • Text Encoder: Transformer
  • Model Details: (CLIP)Vision Encoder:ResNet-50/ViT-Base/ViT-LargeText Encoder:Transformer Architecture
  • Task: Image-text Retrieval, Scene Classification, Object Counting
  • Link: https://arxiv.org/abs/2306.11029
  • Code/Project: https://github.com/ChenDelong1999/RemoteCLIP
  • Survey Inclusion: awesome-remote-sensing-vision-language-models
  • Published in: TGRS 2024
This post is licensed under CC BY 4.0 by the author.