The recent advancements in cross-modality object detection have significantly enhanced the ability to detect objects in challenging environments, particularly under poor illumination conditions. Researchers are increasingly focusing on leveraging the complementary strengths of visible and infrared images to improve detection accuracy. A notable trend is the development of dual-enhancement mechanisms that not only fuse the two modalities effectively but also reduce mutual interference, thereby enhancing feature representation. Additionally, there is a growing emphasis on frequency-driven feature decomposition, which captures unique frequency characteristics to improve detection performance. These innovations are leading to state-of-the-art results in various benchmarks, showcasing the potential of these methods in real-world applications.
Noteworthy papers include one introducing a dual-enhancement-based cross-modality object detection network that significantly outperforms existing algorithms, and another proposing a frequency-driven feature decomposition network that leverages unique frequency representations to enhance multimodal features.