SkyScript A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
- 论文名称: SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
- 模型架构: VLM
- Visual Encoder: Transformer
- Text Encoder: Transformer
- Model Details: (CLIP)Vision Encoder: ViT-B/ViT-LText Encoder:Transformer Architecture
- Task: Scene Classification, Image-text Retrieval
- Link: https://arxiv.org/abs/2312.12856
- Code/Project: https://github.com/wangzhecheng/SkyScript
- Published in: AAAI 2024
- 备注: 主要是构建数据集
This post is licensed under CC BY 4.0 by the author.