Advances in Explainable AI

The field of explainable AI is moving towards developing more transparent and trustworthy models. Recent research has focused on creating models that can provide interpretable and uncertainty-aware reasoning, allowing for more reliable and effective collaboration between humans and AI systems. Innovations in this area include the development of compositional and probabilistic reasoning systems, as well as methods for generating natural language explanations of agent behavior. Additionally, there is a growing emphasis on the importance of human-AI interaction and the need for explainability to be a bidirectional process. Notable papers in this area include:

  • Bonsai, which introduces a tunable reasoning system that generates adaptable inference trees and demonstrates reliable handling of varied domains.
  • Model-Agnostic Policy Explanations with Large Language Models, which proposes a method for generating natural language explanations of agent behavior without access to the agent's underlying model.
  • Interactive Explanations for Reinforcement-Learning Agents, which presents an interactive explanation system that allows users to query the agent's behavior and identify faulty actions.

Sources

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

"You just can't go around killing people" Explaining Agent Behavior to a Human Terminator

Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations

Interactive Explanations for Reinforcement-Learning Agents

Trust Through Transparency: Explainable Social Navigation for Autonomous Mobile Robots via Vision-Language Models

Model-Agnostic Policy Explanations with Large Language Models

Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces

Beware of "Explanations" of AI

Built with on top of