The field of remote sensing is rapidly advancing with the development of multi-modal foundation models that can effectively integrate and process different types of remote sensing data, such as optical, synthetic aperture radar (SAR), and multi-spectral data. These models have shown remarkable performance in various remote sensing tasks, including image interpretation, object detection, and change detection. Noteworthy papers in this area include RingMoE, which introduces a unified multi-modal remote sensing foundation model with 14.7 billion parameters, and REJEPA, which presents a novel joint-embedding predictive architecture for efficient remote sensing image retrieval. Other notable papers include SARLANG-1M, which introduces a large-scale benchmark for multimodal SAR image understanding, and RS-RAG, which proposes a novel remote sensing retrieval-augmented generation framework for incorporating external knowledge into remote sensing vision-language tasks.
Advances in Multi-Modal Remote Sensing
Sources
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model