Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning

Posted Apr 22, 2023

By 1 min read

论文名称: Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
模型架构: VLM
Visual Encoder: Transformer
Text Encoder: Transformer
Model Details: Vision Encoder：CLIP ViT-LText Encoder：OPT-2.7B
Task: Image Caption
Link: https://arxiv.org/abs/2312.01191
Code/Project: https://github.com/yangcong356/BITA
Published in: Arxiv 2023

This post is licensed under CC BY 4.0 by the author.

Trending Tags