SkyScript A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

Posted Mar 1, 2024

By 1 min read

论文名称: SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
模型架构: VLM
Visual Encoder: Transformer
Text Encoder: Transformer
Model Details: (CLIP)Vision Encoder: ViT-B/ViT-LText Encoder：Transformer Architecture
Task: Scene Classification, Image-text Retrieval
Link: https://arxiv.org/abs/2312.12856
Code/Project: https://github.com/wangzhecheng/SkyScript
Published in: AAAI 2024
备注: 主要是构建数据集

This post is licensed under CC BY 4.0 by the author.

Trending Tags