Advances in Multi-Modal Remote Sensing

The field of remote sensing is rapidly advancing with the development of multi-modal foundation models that can effectively integrate and process different types of remote sensing data, such as optical, synthetic aperture radar (SAR), and multi-spectral data. These models have shown remarkable performance in various remote sensing tasks, including image interpretation, object detection, and change detection. Noteworthy papers in this area include RingMoE, which introduces a unified multi-modal remote sensing foundation model with 14.7 billion parameters, and REJEPA, which presents a novel joint-embedding predictive architecture for efficient remote sensing image retrieval. Other notable papers include SARLANG-1M, which introduces a large-scale benchmark for multimodal SAR image understanding, and RS-RAG, which proposes a novel remote sensing retrieval-augmented generation framework for incorporating external knowledge into remote sensing vision-language tasks.

Sources

RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation

MIMRS: A Survey on Masked Image Modeling in Remote Sensing

REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval

SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding

Semi-supervised learning for marine anomaly detection on board satellites

Bottom-Up Scattering Information Perception Network for SAR target recognition

RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model

LDGNet: A Lightweight Difference Guiding Network for Remote Sensing Change Detection

iEBAKER: Improved Remote Sensing Image-Text Retrieval Framework via Eliminate Before Align and Keyword Explicit Reasoning

A Self-Supervised Framework for Space Object Behaviour Characterisation

Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation