J 1

Vlca vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning Jun 5, 2023

Trending Tags

dataset 图像 paper Pretrain 图像、文本 Other VLM MLLM 视频 Agent