S-CLIP Semi-supervised Vision-Language Learning using Few Specialist Captions
- 论文名称: S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
- 模型架构: VLM
- Visual Encoder: Transformer
- Text Encoder: Transformer
- Model Details: (CLIP)
- Task: Scene Classification, Image-text Retrieval
- Link: https://arxiv.org/abs/2305.14095
- Code/Project: https://github.com/alinlab/s-clip
- Survey Inclusion: awesome-remote-sensing-vision-language-models
- Short Summary: 提出了一种半监督学习方法,用于在少量专家标注的字幕情况下训练CLIP模型。S-CLIP利用额外的未配对图像,并通过两种伪标签策略增强对比学习和语言模态的训练。
- Published in: NeurIPS 2023
This post is licensed under CC BY 4.0 by the author.