Post

Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning

  • 论文名称: Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
  • 模型架构: VLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: Vision Encoder:CLIP ViT-LText Encoder:OPT-2.7B
  • Task: Image Caption
  • Link: https://arxiv.org/abs/2312.01191
  • Code/Project: https://github.com/yangcong356/BITA
  • Published in: Arxiv 2023
This post is licensed under CC BY 4.0 by the author.