Post

RSGPT A Remote Sensing Vision Language Model and Benchmark

  • 论文名称: RSGPT: A Remote Sensing Vision Language Model and Benchmark
  • 模型架构: MLLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: Vision Encoder:EVAText Encoder:Vicuna
  • Task: Image Caption, RS VQA
  • Link: https://arxiv.org/abs/2307.15266
  • Code/Project: -
  • Short Summary: 1. 提出RSICap数据集,基于 DOTA 目标检测数据集构建了 RSICap2. 提出RSIEval评估集3. RSGPT模型,现成的冻结的预训练图像编码器(EVA-G)和大型语言模型(vicuna7b,vicuna13b)构成了该模型的基础,并通过微调Q-Former和线性层结构
  • Published in: Arxiv 2023
This post is licensed under CC BY 4.0 by the author.