3D Perception and Localization Research

Current Developments in 3D Perception and Localization Research

The field of 3D perception and localization is witnessing significant advancements, driven by innovations in multi-task learning, efficient data representations, and robust localization techniques. Recent developments are characterized by a shift towards more integrated and efficient frameworks that leverage novel data structures and cross-modal collaborations.

General Direction of the Field

  1. Multi-Task Learning and Integration: There is a growing emphasis on developing frameworks that can handle multiple tasks simultaneously, such as depth estimation, surface normal prediction, and semantic segmentation. These frameworks aim to enhance efficiency and accuracy by leveraging shared representations and cross-task relationships.

  2. Efficient Data Representations: The use of advanced data structures like 3D Gaussian Splatting (3DGS) is becoming prevalent. These representations not only improve the quality of 3D scene maps but also enhance computational efficiency, making them suitable for real-time applications.

  3. Robust Localization Techniques: Researchers are focusing on developing robust localization methods that can operate without the need for extensive training data or precise calibration markers. These methods often incorporate state space models and visual foundation models to improve accuracy and reduce dependency on specific hardware configurations.

  4. Semi-Supervised and Self-Supervised Learning: There is a significant push towards semi-supervised and self-supervised learning approaches that can leverage both labeled and unlabeled data. These methods aim to reduce the need for extensive manual annotation and improve the generalization capabilities of models.

  5. Cross-Modal and Cross-Task Collaboration: Integration of information from different modalities (e.g., RGB-D images, point clouds) and across different tasks (e.g., geometry and semantics) is being explored to enhance the overall performance of 3D perception systems.

Noteworthy Papers

  1. Elite360M: This paper introduces a novel multi-task learning framework that efficiently integrates depth, surface normal estimation, and semantic segmentation using advanced projection techniques and cross-task collaboration.

  2. LoopSplat: Proposes an innovative approach to SLAM using 3DGS, enhancing global consistency and efficiency through online loop closure and robust pose graph optimization.

  3. MambaLoc: Demonstrates exceptional training efficiency and robustness in sparse data environments by applying the selective state space model to visual localization, showcasing the potential of efficient feature extraction and global information capture.

  4. P3P: Addresses the challenge of scaling 3D pre-training by leveraging pseudo-3D data and introducing a linear-time-complexity token embedding strategy, achieving state-of-the-art performance in 3D classification and few-shot learning.

  5. Kalib: Offers an automatic and universal markerless hand-eye calibration pipeline using keypoint tracking and proprioceptive sensors, simplifying setup and reducing dependency on precise physical markers.

These papers represent significant strides in the field, highlighting the potential of integrated multi-task frameworks, efficient data representations, and robust localization techniques. They underscore the ongoing evolution towards more intelligent and adaptable 3D perception systems.

Sources

Elite360M: Efficient 360 Multi-task Learning via Bi-projection Fusion and Cross-task Collaboration

LoopSplat: Loop Closure by Registering 3D Gaussian Splats

MambaLoc: Efficient Camera Localisation via State Space Model

P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders

Kalib: Markerless Hand-Eye Calibration with Keypoint Tracking

Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations

FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization

Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation

Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance

Positional Prompt Tuning for Efficient 3D Representation Learning

LiFCal: Online Light Field Camera Calibration via Bundle Adjustment

Highly Accurate Robot Calibration Using Adaptive and Momental Bound with Decoupled Weight Decay

Enhancing Sampling Protocol for Robust Point Cloud Classification

GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion

Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model