The recent developments in the field of computer vision, particularly in depth estimation and segmentation, showcase a significant shift towards enhancing model generalization and efficiency across diverse and challenging environments. A notable trend is the emphasis on zero-shot learning capabilities, aiming to reduce the dependency on domain-specific fine-tuning and large-scale annotated datasets. This is achieved through innovative approaches such as leveraging synthetic data for training, employing mixed annotation frameworks, and integrating auxiliary tasks to enrich model learning. Additionally, there's a growing focus on improving the robustness and accuracy of models in real-world applications, with advancements in network architectures and self-supervised learning frameworks that adapt to various environmental conditions.
- FoundationStereo introduces a foundation model for stereo depth estimation, achieving strong zero-shot generalization by utilizing a large-scale synthetic dataset and novel network components.
- MARIO presents a mixed supervision model for polyp segmentation, effectively utilizing various annotation types to enhance dataset usability and model performance.
- Survey on Monocular Metric Depth Estimation provides a comprehensive review of advancements in monocular metric depth estimation, highlighting the importance of scale-agnostic methods for zero-shot generalization.
- Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks demonstrates the benefits of incorporating auxiliary datasets and tasks to improve monocular depth estimation quality and data efficiency.
- PromptMono introduces a self-supervised learning framework for monocular depth estimation in challenging environments, utilizing visual prompt learning and a novel gated cross prompting attention module.