Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Posted May 21, 2024

By 1 min read

论文名称: Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
模型架构: VLM
Visual Encoder: Transformer
Model Details: Vision Encoder：CLIP ViT-B
Task: Scene Classification, RS VQA, Semantic Segmentation, Image-text Retrieval
Link: https://arxiv.org/abs/2312.06960
Code/Project: -
Short Summary: 介绍了一种在不使用任何文本注释，为遥感图像训练视觉语言模型的方法。关键是利用地面上拍摄的共同位置的互联网图像作为连接遥感图像和语言的中介
Published in: ICLR 2024
备注: 提出了一种训练遥感图像的视觉-语言模型的方法

This post is licensed under CC BY 4.0 by the author.

Trending Tags