Robustness and Safety in Customizing Language and Vision-Language Models

Report on Current Developments in the Research Area

General Direction of the Field

The recent advancements in the research area are predominantly focused on enhancing the robustness, safety, and security of large language models (LLMs) and vision-language models (VLMs) during their customization and adaptation processes. The field is moving towards developing more secure and reliable models by addressing vulnerabilities that arise from fine-tuning and adaptation, particularly in safety-critical domains.

Robustness Against Jailbreaking and Adversarial Attacks: There is a significant emphasis on developing methods to immunize LLMs against jailbreaking attacks, which exploit vulnerabilities introduced during fine-tuning. Innovations in data curation and augmentation are being explored to strengthen LLMs at every stage of customization, ensuring that they remain robust even when exposed to malicious inputs.
Safety-Centric Model Development: The field is increasingly adopting a safety-centric approach to model development, where the preservation of existing capabilities is prioritized alongside the acquisition of new ones. This is particularly important in continual learning scenarios where models undergo multiple cycles of development, and the risk of catastrophic forgetting poses significant safety risks.
Holistic Evaluation Frameworks: There is a growing recognition of the need for comprehensive evaluation frameworks that assess not only the performance of VLMs on specific tasks but also their fairness, robustness, and safety across various dimensions. These frameworks aim to provide a multi-dimensional view of model capabilities, enabling more informed decisions in model selection and deployment.
Secure Fine-Tuning Strategies: Researchers are exploring novel fine-tuning strategies that mitigate security risks associated with instruction fine-tuning, even when the instructions are benign. These strategies focus on enhancing the robustness of internal model modules to preserve safety capabilities while adapting to new tasks.
Dynamic Data Curation for Safety Alignment: The importance of high-quality data in aligning LLMs with safety requirements is being emphasized through dynamic data curation methods. These methods aim to generate data that is both diverse and aligned with safety principles, ensuring that models remain safe and reliable in real-world applications.

Noteworthy Papers

Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation - This paper introduces a comprehensive framework for enhancing LLM robustness against jailbreaking attacks through data curation, demonstrating significant reductions in jailbreaking effects.
Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models - This work presents a safety-centric framework for vision-language models, ensuring that new capabilities are acquired without compromising existing safety features, with applications in autonomous driving and scene recognition.
VHELM: A Holistic Evaluation of Vision Language Models - This paper extends the HELM framework to VLMs, providing a comprehensive, multi-dimensional view of model capabilities across various important factors, including fairness, robustness, and safety.
Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning - This study pioneers a novel fine-tuning strategy to mitigate security risks from benign instruction fine-tuning, significantly reducing harmfulness without impacting usability.
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models - This paper proposes an enhanced LLM-based method for generating safety-aligned data, demonstrating improved model safety without sacrificing utility.

Robustness and Safety in Customizing Language and Vision-Language Models

Report on Current Developments in the Research Area

General Direction of the Field

Noteworthy Papers

Sources