The recent advancements in the field of multi-modal learning and cross-domain adaptation have shown significant progress, particularly in enhancing the adaptability and generalization of models to unseen data. Researchers are increasingly focusing on developing methods that leverage pre-trained models for dynamic and efficient adaptation across different modalities and languages. Key innovations include the use of adapters for flexible prompt tuning, dynamic generation of prompts, and semantic disentangling to improve cross-lingual and cross-modal retrieval. Additionally, decoupling language bias from visual and layout data has proven effective for multilingual visual information extraction. These developments not only enhance the performance of models in zero-shot and cross-domain settings but also reduce computational costs and reliance on large-scale labeled data. Notably, methods like IT3A and UCDR-Adapter have demonstrated superior performance in test-time adaptation and universal cross-domain retrieval, respectively, showcasing the potential of these approaches to advance the field.