Post

H2RSVLM Towards Helpful and Honest Remote Sensing Large Vision Language Model

  • 论文名称: H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model
  • 模型架构: MLLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: Vision Encoder:CLIP ViT-LText Encoder:Vicuna-v1.5
  • Task: Scene Classification, RS VQA, Visual Grounding
  • Link: https://arxiv.org/abs/2403.20213
  • Code/Project: https://github.com/opendatalab/H2RSVLM
  • Short Summary: 1. 创建了HqDC-1.4M数据集,还构建了两个指令微调数据集HqDC-Instruct和RS-Specialized-Instruct 2. 针对幻觉问题,构建了第一个遥感self-awareness数据集,RSSA。包含一系列可回答和不可回答的任务 3. 基于上述数据,通过预训练和监督微调两个步骤,基于LLaVA模型训练了H2RSVLM模型(helpfulness 和 honesty)
  • Published in: Arxiv 2024
This post is licensed under CC BY 4.0 by the author.