Advances in Domain Generalization and Open-Vocabulary Object Detection

The field of computer vision is moving towards developing more robust and generalizable models, particularly in the areas of domain generalization and open-vocabulary object detection. Recent research has focused on addressing the challenges of distribution shifts, noisy labels, and limited training data. Noteworthy papers in this area have proposed innovative solutions, including training-free frameworks for open-vocabulary object detection, language anchor-guided methods for robust noisy domain generalization, and caching mechanisms to mitigate cache noise in test-time adaptation. These advancements have the potential to revolutionize applications such as security screening, image classification, and attribute detection. Some papers that are particularly noteworthy include:

  • A paper that proposes RAXO, a training-free framework for open-vocabulary object detection in X-ray imaging, which achieves state-of-the-art performance on a newly introduced benchmark.
  • A paper that introduces Anchor Alignment and Adaptive Weighting (A3W), a language anchor-guided method for robust noisy domain generalization, which demonstrates significant improvements in accuracy and robustness across various datasets.
  • A paper that proposes Compositional Caching (ComCa), a training-free method for open-vocabulary attribute detection, which achieves competitive performance with recent training-based methods.

Sources

Superpowering Open-Vocabulary Object Detectors for X-ray Vision

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

Mitigating Cache Noise in Test-Time Adaptation for Large Vision-Language Models

Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification

Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization

Compositional Caching for Training-free Open-vocabulary Attribute Detection

Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability

Feature Modulation for Semi-Supervised Domain Generalization without Domain Labels

Built with on top of