Enhancing Multilingual and Endangered Language Processing with LLMs

The recent developments in the research area of multilingual and endangered language processing with Large Language Models (LLMs) have shown significant advancements in enhancing model performance across diverse languages. Researchers are increasingly focusing on methods that improve the adaptability and effectiveness of LLMs in non-English and non-Latin script languages. Techniques such as dictionary insertion prompting, adaptive mixture of contextualization experts, and leveraging phonemic transcriptions are being explored to bridge the performance gap between English and other languages. Additionally, there is a growing emphasis on creating resources and models for endangered languages, with efforts to preserve and revitalize these languages through advanced NLP techniques. The field is also witnessing innovations in cross-lingual alignment for information extraction and the development of universal dependency treebanks for under-researched languages. These advancements not only enhance the capabilities of LLMs but also contribute to the broader goal of linguistic diversity and preservation. Notably, the introduction of novel methods like dictionary-augmented generation and transformer-based models for predicting inflection classes in endangered languages are particularly innovative and hold promise for future research.

Sources

Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models

MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation

DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT

Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages

Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language

Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models

AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

LuxBank: The First Universal Dependency Treebank for Luxembourgish

High Entropy Alloy property predictions using Transformer-based language model

Built with on top of