Integrated Multimodal Approaches in OOD Detection

The recent advancements in Out-of-Distribution (OOD) detection have seen a significant shift towards leveraging both visual and textual modalities to enhance robustness and accuracy. Researchers are increasingly focusing on developing adaptive and dynamic methods that can better align with the underlying OOD label space, particularly in scenarios where traditional static labels may lead to semantic misalignments. Vision-Language Models (VLMs) are being integrated with novel frameworks to create more effective OOD detection systems, often by dynamically generating proxies during testing to better represent the OOD distribution. Additionally, the use of prior knowledge and normalized energy losses is being explored to address distribution shifts and improve the reliability of OOD predictions, especially in long-tailed recognition scenarios. These approaches not only enhance the performance of OOD detection but also reduce the dependency on manual hyperparameter tuning and additional data modeling. The field is moving towards more integrated and adaptive solutions that combine textual and visual knowledge, paving the way for more robust and versatile OOD detection systems.

Sources

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection

Long-Tailed Out-of-Distribution Detection via Normalized Outlier Distribution Adaptation

'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue

Built with on top of