The field of autonomous systems is witnessing significant advancements in multimodal 3D object detection, driven by the integration of various sensors such as LiDAR, cameras, and 4D radar. Researchers are exploring innovative ways to fuse data from these sensors to improve detection accuracy and robustness. A key challenge is addressing the domain gaps between different modalities, which is being tackled through the development of new fusion strategies and architectures. Another area of focus is the efficient use of large-scale pre-training and foundation models to enhance multi-modal fusion. The use of prompting techniques and latent fusion models is also gaining traction. Notably, some papers are proposing novel frameworks for predictive object detection, which aims to predict the short-term future location and dimensions of objects based solely on current observations. Overall, the field is moving towards more accurate, efficient, and robust 3D object detection systems. Noteworthy papers include LiDAR-based Object Detection with Real-time Voice Specifications, which achieved 87.0% validation accuracy on a 3000-sample subset, and ZFusion, which proposed a 3D object detection method that fuses 4D radar and vision modality, achieving state-of-the-art mAP on the VoD dataset. Additionally, PF3Det achieved state-of-the-art results under limited training data, improving NDS by 1.19% and mAP by 2.42% on the nuScenes dataset.