The field of embodied AI is moving towards developing more robust and efficient defense mechanisms against jailbreak attacks and other security threats. Researchers are exploring innovative approaches to enhance the safety of embodied AI systems, including representation engineering and targeted attention modification. These methods aim to mitigate the risks associated with large language models and ensure the reliable operation of embodied agents in various environments. Notable papers in this area include:
- Concept Enhancement Engineering, which proposes a novel defense framework that leverages representation engineering to enhance the safety of embodied LLMs.
- DETAM, a finetuning-free defense approach that improves the defensive capabilities against jailbreak attacks of LLMs via targeted attention modification.
- DoomArena, a security evaluation framework for AI agents that allows for detailed threat modeling and adaptable security testing.
- A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents, which presents an integrated framework for measuring and aligning the safety of LLM-based embodied agents' behaviors.
- Advancing Embodied Agent Security, which introduces a novel input moderation framework and a safety benchmark tailored to embodied agents.
- WALL-E 2.0, which proposes a training-free world alignment approach that learns an environment's symbolic knowledge complementary to LLMs and improves the performance of world model-based LLM agents.