The recent publications in the field of linguistics and language technology reveal a strong focus on addressing and mitigating biases within language models and datasets, particularly those related to gender. A significant trend is the development and application of large language models (LLMs) to identify, analyze, and correct gender biases in various linguistic contexts, including speech translation systems, medical literature, and multilingual resources. These efforts aim to create more inclusive and equitable language technologies by neutralizing gendered language and promoting gender-neutral alternatives. Additionally, there is a growing interest in expanding linguistic resources for underrepresented languages, such as Italian, to support advancements in speech technology and linguistic research. The field is moving towards a more inclusive and comprehensive understanding of language, with a particular emphasis on leveraging technology to address socio-cultural biases and enhance the representativeness of linguistic datasets.
Noteworthy Papers
- Women, Infamous, and Exotic Beings: What Honorific Usages in Wikipedia Reveal about the Socio-Cultural Norms: This study provides a cross-linguistic analysis of honorific usage in Bengali and Hindi Wikipedia, highlighting socio-cultural influences on language and revealing gender biases in honorific usage.
- Addressing speaker gender bias in large scale speech translation systems: Introduces a novel approach to mitigate gender bias in speech translation, significantly improving translation accuracy for female speakers through LLM-based corrections and model fine-tuning.
- Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts: Presents MOBERT, a BERT-based model designed to neutralize gendered occupational pronouns in medical literature, achieving a high rate of inclusive pronoun replacement.
- A Survey on Spoken Italian Datasets and Corpora: Offers a comprehensive review of spoken Italian datasets, addressing the scarcity and representativeness of resources for advancing Italian speech technologies and linguistic research.
- mGeNTE: A Multilingual Resource for Gender-Neutral Language and Translation: Expands the GeNTE dataset to include multiple language pairs, facilitating research in gender-neutral translation and language modeling for grammatical gender languages.