Post

S-CLIP Semi-supervised Vision-Language Learning using Few Specialist Captions

  • 论文名称: S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
  • 模型架构: VLM
  • Visual Encoder: Transformer
  • Text Encoder: Transformer
  • Model Details: (CLIP)
  • Task: Scene Classification, Image-text Retrieval
  • Link: https://arxiv.org/abs/2305.14095
  • Code/Project: https://github.com/alinlab/s-clip
  • Survey Inclusion: awesome-remote-sensing-vision-language-models
  • Short Summary: 提出了一种半监督学习方法,用于在少量专家标注的字幕情况下训练CLIP模型。S-CLIP利用额外的未配对图像,并通过两种伪标签策略增强对比学习和语言模态的训练。
  • Published in: NeurIPS 2023
This post is licensed under CC BY 4.0 by the author.