The recent advancements in 3D vision and perception have seen a significant shift towards more efficient and scalable unsupervised learning methods. A notable trend is the integration of neural rendering techniques with novel geometric representations, such as Gaussian splatting and planar primitives, to enhance the accuracy and speed of 3D reconstruction and perception tasks. These methods leverage differentiable rendering to enable unsupervised pre-training on large-scale datasets, reducing the dependency on labeled data and improving the generalizability of models across different tasks and datasets. Additionally, there is a growing emphasis on incorporating temporal information from LiDAR sequences to improve the robustness and accuracy of 3D object detection and scene understanding. The efficiency gains in computational resources and memory usage, as well as the advancements in handling domain shifts, are paving the way for more practical applications in real-world scenarios.
Noteworthy papers include one that introduces an efficient framework for point cloud representation learning using 3D Gaussian splatting, achieving significant speedups and memory reductions. Another paper presents a novel approach to 3D planar reconstruction from monocular videos, demonstrating state-of-the-art performance with a generic plane representation. A third paper highlights a method for unsupervised 3D representation learning via temporal forecasting for LiDAR perception, showing substantial improvements in downstream tasks.