Advancements in Zero-Shot and Few-Shot Learning with Vision-Language Models

The recent developments in the field of zero-shot and few-shot learning, particularly in the context of vision-language models (VLMs), showcase a significant shift towards enhancing model generalizability and scalability. Innovations are focusing on leveraging the synergy between visual and textual data to improve classification tasks without extensive labeled datasets. A notable trend is the integration of language-inspired strategies and super-class guidance to refine model predictions and facilitate knowledge transfer from seen to unseen categories. Additionally, there's a growing emphasis on iterative transduction methods that explore the structure of language and vision spaces for better alignment and classification accuracy. These advancements are not only improving state-of-the-art performance across various benchmarks but are also making strides in addressing the challenges of domain adaptation and insufficient data scenarios.

Noteworthy Papers

  • Super-class guided Transformer for Zero-Shot Attribute Classification: Introduces SugaFormer, leveraging super-classes for enhanced scalability and generalizability, achieving state-of-the-art performance in zero-shot settings.
  • Language-Inspired Relation Transfer for Few-shot Class-Incremental Learning: Proposes a novel paradigm that combines visual and text depictions for understanding objects, significantly outperforming existing models in FSCIL benchmarks.
  • Generate, Transduct, Adapt: Iterative Transduction with VLMs: Presents GTA-CLIP, an iterative technique that improves zero-shot classification by jointly transducing in language and vision spaces, showing notable performance improvements.
  • IDEA: Image Description Enhanced CLIP-Adapter: Develops a method that enhances CLIP's adaptability to few-shot tasks by leveraging image descriptions, achieving or surpassing state-of-the-art results without additional training.

Sources

Super-class guided Transformer for Zero-Shot Attribute Classification

Language-Inspired Relation Transfer for Few-shot Class-Incremental Learning

Generate, Transduct, Adapt: Iterative Transduction with VLMs

IDEA: Image Description Enhanced CLIP-Adapter

Built with on top of