Post

SkyScript A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

  • 论文名称: SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
  • 模型架构: VLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: (CLIP)Vision Encoder: ViT-B/ViT-LText Encoder:Transformer Architecture
  • Task: Scene Classification, Image-text Retrieval
  • Link: https://arxiv.org/abs/2312.12856
  • Code/Project: https://github.com/wangzhecheng/SkyScript
  • Published in: AAAI 2024
  • 备注: 主要是构建数据集
This post is licensed under CC BY 4.0 by the author.