Inclusive and Robust Language Models for Low-Resource Languages

The recent research in the field of low-resource language translation and adversarial attacks on language models has shown significant advancements. There is a growing focus on developing translation models for underrepresented languages, particularly those in the Austroasiatic and Indian subcontinent language families. These models are leveraging advanced techniques such as transfer learning and data augmentation to overcome the challenges posed by limited available data. Additionally, there is a notable shift towards enhancing the robustness of language models, especially for minority languages, through the development of adversarial attack methods. These methods aim to evaluate and improve the resilience of models against subtle perturbations that can lead to incorrect predictions. The integration of visual similarity features in adversarial text generation for languages like Tibetan is a novel approach that underscores the need for context-specific solutions in language technology. Overall, the field is progressing towards more inclusive and robust language models, with a strong emphasis on addressing the unique challenges of low-resource and minority languages.

Sources

Towards Santali Linguistic Inclusion: Building the First Santali-to-English Translation Model using mT5 Transformer and Data Augmentation

N\"ushuRescue: Revitalization of the endangered N\"ushu Language with AI

From Priest to Doctor: Domain Adaptaion for Low-Resource Neural Machine Translation

Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script

TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity

BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

Built with on top of