Post

RS5M and GeoRSCLIP A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

  • 论文名称: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
  • 模型架构: VLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: (CLIP)
  • Task: Scene Classification, Image-text Retrieval, Semantic Localization
  • Link: https://arxiv.org/abs/2306.11300
  • Code/Project: -
  • Survey Inclusion: awesome-remote-sensing-vision-language-models
  • Published in: Arxiv 2023
This post is licensed under CC BY 4.0 by the author.