Securing Large Language Models

The field of Large Language Models (LLMs) is rapidly evolving, with a growing focus on addressing the security risks associated with their use. Recent research has highlighted the vulnerabilities of LLMs to various types of attacks, including prompt injection attacks, jailbreak attacks, and backdoor exploits. To mitigate these risks, researchers are exploring new methods for detecting and preventing malicious behavior in LLMs, such as the use of encrypted prompts, hidden state forensics, and lightweight defense mechanisms. These advances have the potential to significantly improve the security and reliability of LLMs, enabling their safe deployment in a wide range of applications. Noteworthy papers in this area include:

  • One paper introduces a novel method for securing LLMs against unauthorized actions by appending an encrypted prompt to each user prompt, verifying permissions before executing any actions.
  • Another paper reveals a critical control-plane attack surface in current LLM architectures, introducing a novel jailbreak class that weaponizes structured output constraints to bypass safety mechanisms.

Sources

Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms

Integrated LLM-Based Intrusion Detection with Secure Slicing xApp for Securing O-RAN-Enabled Wireless Network Deployments

Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics

No Free Lunch with Guardrails

PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization

LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution

Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses

Built with on top of