Bias Detection and Mitigation in Large Language Models

The recent advancements in the field of bias detection and mitigation in Large Language Models (LLMs) have shown a significant shift towards more nuanced and context-specific approaches. Researchers are increasingly focusing on identifying and addressing various forms of biases, including generalizations, unfairness, and stereotypes, which are critical for ensuring the ethical use of AI in diverse applications. Innovative methodologies, such as the use of generative AI and personalized prompts, are being employed to create synthetic datasets and enhance the diversity of annotations, thereby improving the accuracy and fairness of bias detection models. Notably, there is a growing emphasis on the impact of demographic factors on model outputs, with studies exploring how different demographic attributes influence the biases inherent in LLMs. Additionally, the field is witnessing a move towards more robust and data-efficient models that can predict individual annotator ratings, capturing the nuances often overlooked by traditional aggregation methods. These developments not only advance the technical capabilities of bias detection but also contribute to the broader goal of creating more equitable and socially aware AI systems.

Particularly noteworthy are the papers that introduce GUS-Net for comprehensive bias detection, the study on demographic influences in LLM annotations, and the approach for enhancing diversity in data annotation through personalized LLMs. These contributions highlight the innovative strides being made in understanding and mitigating biases in AI.

Sources

GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes

Which Demographics do LLMs Default to During Annotation?

Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks

Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Personas with Attitudes: Controlling LLMs for Diverse Data Annotation

Bias Similarity Across Large Language Models

Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

Building Better: Avoiding Pitfalls in Developing Language Resources when Data is Scarce

Large Language Models and the Rationalist Empiricist Debate

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

Bias in the Mirror : Are LLMs opinions robust to their own adversarial attacks ?

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors

Built with on top of