Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
- 论文名称: Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
- 模型架构: VLM
- Visual Encoder: Transformer
- Text Encoder: Transformer
- Model Details: Vision Encoder:CLIP ViT-LText Encoder:OPT-2.7B
- Task: Image Caption
- Link: https://arxiv.org/abs/2312.01191
- Code/Project: https://github.com/yangcong356/BITA
- Published in: Arxiv 2023
This post is licensed under CC BY 4.0 by the author.