Advancements in 3D Perception and Autonomous Systems

The field of 3D perception and autonomous systems is rapidly advancing, with a focus on improving the accuracy and robustness of 3D object detection, scene understanding, and navigation. Recent developments include the use of weakly supervised learning methods, such as Weak Cube R-CNN, which can predict 3D objects using only 2D bounding box annotations. Additionally, techniques like Mono3R and MS-Occ are being explored to enhance 3D reconstruction and occupancy prediction. The integration of LiDAR and camera data is also becoming increasingly important, with methods like RCAlign and MS-Occ demonstrating state-of-the-art performance in 3D object detection and semantic occupancy prediction. Furthermore, researchers are investigating the use of graph-based methods, such as Graph2Nav, to improve navigation and object-relation graph generation in autonomous systems. Noteworthy papers in this area include Weak Cube R-CNN, which achieves increased performance in accuracy compared to an annotation time equalized baseline, and Mono3R, which substantially enhances the robustness of multi-view reconstruction systems. Overall, these advancements have the potential to significantly improve the performance and safety of autonomous systems in various applications, including robotics, autonomous driving, and augmented reality.

Sources

Weak Cube R-CNN: Weakly Supervised 3D Detection using only 2D Bounding Boxes

Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction

Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems

WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion

Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding

HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering

LMPOcc: 3D Semantic Occupancy Prediction Utilizing Long-Term Memory Prior from Historical Traversals

Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction

RefComp: A Reference-guided Unified Framework for Unpaired Point Cloud Completion

V2P Collision Warnings for Distracted Pedestrians: A Comparative Study with Traditional Auditory Alerts

VoxCity: A Seamless Framework for Open Geospatial Data Integration, Grid-Based Semantic 3D City Model Generation, and Urban Environment Simulation

Occlusion-Ordered Semantic Instance Segmentation

HFBRI-MAE: Handcrafted Feature Based Rotation-Invariant Masked Autoencoder for 3D Point Cloud Analysis

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation

SG-Reg: Generalizable and Efficient Scene Graph Registration

ApexNav: An Adaptive Exploration Strategy for Zero-Shot Object Navigation with Target-centric Semantic Fusion

RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

VM-BHINet:Vision Mamba Bimanual Hand Interaction Network for 3D Interacting Hand Mesh Recovery From a Single RGB Image

NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation

An Iterative Task-Driven Framework for Resilient LiDAR Place Recognition in Adverse Weather

ScanEdit: Hierarchically-Guided Functional 3D Scan Editing

VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation

Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations

Locating and Mitigating Gradient Conflicts in Point Cloud Domain Adaptation via Saliency Map Skewness

MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

Understanding the Role of Covariates in Numerical Reconstructions of Real-World Vehicle-to-Pedestrian Collisions

ForesightNav: Learning Scene Imagination for Efficient Exploration

Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes

MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation

Measuring Uncertainty in Shape Completion to Improve Grasp Quality

PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

Gaussian Splatting is an Effective Data Generator for 3D Object Detection

Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation

Procedural Dataset Generation for Zero-Shot Stereo Matching

MobileCity: An Efficient Framework for Large-Scale Urban Behavior Simulation

Scene-Aware Location Modeling for Data Augmentation in Automotive Object Detection

AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models

Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset

S2S-Net: Addressing the Domain Gap of Heterogeneous Sensor Systems in LiDAR-Based Collective Perception

Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding

Flying through cluttered and dynamic environments with LiDAR

Improving Open-World Object Localization by Discovering Background

DiMeR: Disentangled Mesh Reconstruction Model

PICO: Reconstructing 3D People In Contact with Objects

Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation

The Fourth Monocular Depth Estimation Challenge

LiDPM: Rethinking Point Diffusion for Lidar Scene Completion