Depth Estimation and Scene Understanding

Report on Current Developments in Depth Estimation and Scene Understanding

General Direction of the Field

The recent advancements in depth estimation and scene understanding are significantly pushing the boundaries of what autonomous systems, particularly Unmanned Aerial Vehicles (UAVs) and robots, can achieve. The field is moving towards more practical, real-time, and multi-modal solutions that leverage a combination of traditional sensors and novel computational techniques. Key trends include the integration of thermal imaging, stereo vision, and LiDAR with advanced machine learning models to enhance depth perception and object recognition in complex and visually degraded environments.

One of the major shifts is the emphasis on metric depth estimation, which is crucial for accurate scene modeling and navigation. Researchers are developing methods that can recover absolute scale from relative depth estimations, often by incorporating external data sources such as Global Digital Elevation Models (GDEMs). This approach not only improves the accuracy of depth maps but also enhances the robustness of depth estimation in various environmental conditions.

Another significant development is the integration of multi-modal data, such as combining RGB and thermal imaging, to overcome the limitations of individual sensors. This fusion allows for more comprehensive scene understanding, particularly in scenarios where traditional sensors fail, such as in low-light or adverse weather conditions. The use of physics-induced models and regularization constraints is also gaining traction, as these methods can better account for the physical characteristics of the data, leading to more accurate and realistic scene reconstructions.

Real-time processing remains a critical focus, with researchers optimizing algorithms for edge computing platforms to enable faster inference speeds without compromising accuracy. This is particularly important for applications like autonomous driving and robotic navigation, where timely decision-making is essential.

Noteworthy Papers

  1. TanDepth: Introduces a practical, online scale recovery method for metric depth estimation in UAVs, leveraging GDEMs for accurate depth maps.
  2. ThermalGaussian: Proposes the first thermal 3D Gaussian splatting approach, enabling high-quality multimodal rendering with significant storage cost reduction.
  3. Thermal3D-GS: Develops a physics-induced 3D Gaussian splatting method for thermal infrared imaging, addressing issues of floaters and indistinct edges with a new benchmark dataset.
  4. FIReStereo: Presents a stereo thermal depth perception dataset for UAS, demonstrating robust depth estimation in visually degraded environments.
  5. Real-time Multi-view Omnidirectional Depth Estimation System: Proposes a real-time, multi-view depth estimation system for robots and autonomous driving, achieving high accuracy and inference speed on edge platforms.

These papers represent significant strides in the field, offering innovative solutions that advance depth estimation and scene understanding for autonomous systems.

Sources

TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs

iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation

Object Depth and Size Estimation using Stereo-vision and Integration with SLAM

ThermalGaussian: Thermal 3D Gaussian Splatting

Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments

Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes

Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor