Advancing Ethical and Safe Generative Models

The research area is currently experiencing a surge in interest around the ethical and safety implications of generative models, particularly text-to-image (T2I) models. A significant portion of the recent work focuses on identifying and mitigating biases, ensuring fairness, and enhancing safety measures in these models. Researchers are developing novel benchmarks and test suites to evaluate the robustness of these models against various forms of bias, toxicity, and privacy concerns. Additionally, there is a growing emphasis on understanding the unintended consequences of model manipulations, such as concept erasure techniques and cognitive morphing attacks, which can lead to degraded model performance or the generation of harmful content. The field is also exploring the potential of retrieval-augmented generation techniques to improve model efficiency and generalization, albeit with a cautious eye on the security vulnerabilities they may introduce. Overall, the direction of the field is moving towards creating more responsible, fair, and safe generative models that can be trusted in real-world applications.

Noteworthy Papers

  • EraseBench: Introduces a comprehensive benchmark for evaluating concept erasure techniques, revealing significant challenges in maintaining image quality post-erasure.
  • MSTS: Presents a multimodal safety test suite for vision-language models, highlighting safety issues and the increased risk with non-English prompts.
  • Are generative models fair?: Investigates racial bias in dermatological image generation, emphasizing the need for improved uncertainty quantification to address bias.
  • CogMorph: Uncovers a novel ethical risk in T2I models through cognitive morphing attacks, proposing methods to mitigate such risks.
  • Owls are wise and foxes are unfaithful: Systematically examines animal stereotypes in vision-language models, shedding light on cultural biases in AI-generated content.
  • T2ISafety: Develops a safety benchmark for T2I models, identifying persistent issues with racial fairness and toxicity.
  • Retrievals Can Be Detrimental: Reveals security vulnerabilities in retrieval-augmented diffusion models through a novel backdoor attack paradigm.
  • IMAGINE-E: Offers a comprehensive evaluation framework for state-of-the-art T2I models, highlighting their expanding applications and potential as foundational AI tools.

Sources

EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Are generative models fair? A study of racial bias in dermatological image generation

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Built with on top of