The recent advancements in the field of Structure from Motion (SfM) and Visual SLAM (VSLAM) are notably shifting towards ground-truth-free methodologies and the integration of multi-camera systems. Innovations in ground-truth-free evaluation are enabling more scalable and self-supervised tuning of SfM and VSLAM systems, potentially leading to breakthroughs similar to those seen in generative AI. Multi-camera setups are being developed to enhance robustness and flexibility, addressing the limitations of monocular and binocular systems in textureless environments. These systems leverage learning-based feature extraction and tracking to manage data processing pressures and improve pose estimation accuracy. Additionally, there is a growing focus on dynamic scene analysis, with new frameworks capable of handling complex, uncontrolled camera motions and providing accurate, fast, and robust estimations of camera parameters and depth maps. These developments collectively push the boundaries of SfM and VSLAM applications, making them more adaptable to diverse real-world scenarios.
Noteworthy papers include one proposing a ground-truth-free evaluation methodology for SfM and VSLAM, and another introducing a generic visual odometry system for arbitrarily arranged multi-cameras, which demonstrates high flexibility and robustness. A third paper presents a system for accurate, fast, and robust estimation of camera parameters and depth maps from dynamic scenes, outperforming existing methods in accuracy and robustness.