Advances in Embodied AI Security and World Modeling

The field of embodied AI is moving towards developing more robust and efficient defense mechanisms against jailbreak attacks and other security threats. Researchers are exploring innovative approaches to enhance the safety of embodied AI systems, including representation engineering and targeted attention modification. These methods aim to mitigate the risks associated with large language models and ensure the reliable operation of embodied agents in various environments. Notable papers in this area include:

  • Concept Enhancement Engineering, which proposes a novel defense framework that leverages representation engineering to enhance the safety of embodied LLMs.
  • DETAM, a finetuning-free defense approach that improves the defensive capabilities against jailbreak attacks of LLMs via targeted attention modification.
  • DoomArena, a security evaluation framework for AI agents that allows for detailed threat modeling and adaptable security testing.
  • A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents, which presents an integrated framework for measuring and aligning the safety of LLM-based embodied agents' behaviors.
  • Advancing Embodied Agent Security, which introduces a novel input moderation framework and a safety benchmark tailored to embodied agents.
  • WALL-E 2.0, which proposes a training-free world alignment approach that learns an environment's symbolic knowledge complementary to LLMs and improves the performance of world model-based LLM agents.

Sources

Concept Enhancement Engineering: A Lightweight and Efficient Robust Defense Against Jailbreak Attacks in Embodied AI

DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Built with on top of