GeoChat Grounded Large Vision-Language Model for Remote Sensing
- 论文名称: GeoChat: Grounded Large Vision-Language Model for Remote Sensing
- 模型架构: MLLM
- Visual Encoder: Transformer
- Text Encoder: Transformer
- Model Details: Vision Encoder:CLIP-ViTText Encoder: Vicuna-v1.5
- Task: Scene Classification, RS VQA, Visual Grounding
- Link: https://arxiv.org/abs/2311.15826
- Code/Project: https://github.com/mbzuai-oryx/geochat
- Short Summary: 1. RS多模态数据集:多模态数据集,且提出了一个生成数据的pipeline2. GeoChat:利用已有的数据微调LLaVA-1.5,利用lora微调;除了能够处理自然语言问题之外,用户还可以提供视觉提示(bounding box),并且模型能够回答有关ROI(指定感兴趣区域)的问题
- Published in: CVPR 2024
This post is licensed under CC BY 4.0 by the author.