Advances in Low-Resource Language Processing

The field of natural language processing is moving towards more efficient and scalable approaches to low-resource language processing. Recent studies have shown that text-only training can be effective for visual language models, and that incorporating morphological features can improve parsing accuracy. Additionally, there is a growing interest in developing methods for cross-lingual transfer learning, language modeling, and machine translation for low-resource languages. Noteworthy papers include 'When Words Outperform Vision' which proposes a novel text-only training approach for visual language models, and 'COMI-LINGUA' which introduces a large-scale dataset for multitask NLP in Hindi-English code-mixing. These advancements have the potential to improve the performance of NLP systems in low-resource languages and enable more effective communication across languages.

Sources

When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making

Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer

Investigating Recent Large Language Models for Vietnamese Machine Reading Comprehension

PAD: Towards Efficient Data Generation for Transfer Learning Using Phrase Alignment

Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work

Dense Retrieval for Low Resource Languages -- the Case of Amharic Language

LANGALIGN: Enhancing Non-English Language Models via Cross-Lingual Embedding Alignment

Towards Terminology Management Automation for Arabic

Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning

Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair

Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models

Enhancing Korean Dependency Parsing with Morphosyntactic Features

Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models

COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing

Non-Monotonic Attention-based Read/Write Policy Learning for Simultaneous Translation

Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer

Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization

Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation