The recent advancements in monocular depth estimation (MDE) and multi-view stereo (MVS) have shown significant strides towards enhancing the robustness and accuracy of depth perception in various environments. Researchers are increasingly focusing on addressing the vulnerabilities and limitations of existing models, particularly in dynamic and adversarial scenarios. Innovations such as adversarial attacks on depth perception, multi-sample refinement techniques, and the activation of self-attention mechanisms in pose regression models are pushing the boundaries of what is possible in MDE. Additionally, the integration of physical constraints and novel network architectures, such as those leveraging dual-pixel images and cross-zone feature propagation, are contributing to more efficient and accurate depth completion methods. These developments collectively indicate a trend towards more resilient and versatile depth estimation systems, capable of handling a wider range of real-world conditions and applications, from autonomous driving to robotics.