The recent advancements in radar perception and multi-modal sensor fusion for 3D object detection and transparent surface reconstruction have shown significant promise. In radar perception, there is a notable shift towards leveraging temporal relations and motion consistency for improved object detection and tracking, addressing the inherent challenges of low spatial resolution and motion blur. This approach not only enhances scalability but also significantly boosts performance metrics such as mAP and MOTA. In the realm of multi-modal sensor fusion, integrating temporal information with radar and camera data has proven effective in capturing dynamic object motion, leading to more robust and accurate 3D object detection. This fusion strategy, particularly when guided by motion features, has demonstrated state-of-the-art results on challenging datasets. Additionally, the fusion of visual and acoustic modalities for transparent surface reconstruction in indoor environments has opened new avenues for low-cost, high-precision sensing solutions, enabling more effective navigation in complex scenes. These developments collectively underscore a trend towards more integrated, motion-aware, and multi-modal approaches that enhance the reliability and accuracy of perception systems in various applications.