J 1 Vlca vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning Jun 5, 2023