The recent advancements in 3D scene reconstruction and monocular geometry estimation are pushing the boundaries of what is possible with current technology. Researchers are increasingly focusing on developing models that can handle large-scale scenes and dynamic surfaces with high fidelity and efficiency. Key innovations include the use of novel representations such as 3D Gaussians and VoxSplats, which enable more accurate and scalable reconstruction from sparse and unposed images. These methods often leverage deep learning techniques, such as Vision Transformers and self-supervised learning, to enhance feature extraction and alignment across multiple views. Additionally, the integration of physics-free approaches in photometric stereo is revolutionizing surface normal recovery by eliminating the need for calibrated lighting and sensors. The field is also witnessing a shift towards real-time and online processing, with models capable of updating scene representations continuously as new data is observed. This trend is particularly evident in point-based reconstruction methods that maintain a global point cloud representation, ensuring view consistency and robustness against errors. Overall, the emphasis is on developing generalizable, efficient, and high-quality solutions that can be applied to a wide range of real-world scenarios, from autonomous driving to augmented reality.