The field of computer vision is rapidly advancing, with a focus on improving monocular 3D estimation and camera calibration. Recent research has led to the development of more generalizable and accurate methods for estimating 3D scenes from single images. These methods have the potential to improve a wide range of applications, including robotics, virtual reality, and augmented reality. Notably, innovative approaches such as spherical 3D representations and learned superpositions of spherical harmonics have enabled more accurate disentanglement of camera and scene geometry. Additionally, advancements in camera calibration have led to more accurate modeling of camera distortions and scene geometry. Overall, the field is moving towards more robust and flexible methods for 3D estimation and camera calibration. Some noteworthy papers in this area include UniK3D, which introduces a novel method for monocular 3D estimation that can model any camera, and AlignDiff, which proposes a diffusion model for physically-grounded camera alignment. These papers demonstrate significant improvements in state-of-the-art performance and have the potential to impact a wide range of applications.