RS-LLaVA Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery
论文名称: RS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery 模型架构: MLLM Visual Encoder: Transformer Text Encoder: Transformer Model D...