Advances in Theory of Mind and Social Intelligence in AI

The field of artificial intelligence is moving towards developing more advanced Theory of Mind (ToM) capabilities, enabling machines to better understand human intentions, beliefs, and mental states. Recent research has focused on evaluating and improving the performance of vision-language models in ToM tasks, with a particular emphasis on egocentric domains and human-image aesthetic assessment. Notably, reinforcement learning has been shown to be effective in unlocking ToM capabilities in small language models, and multimodal models are being developed to provide more explainable and visually grounded social intelligence. Noteworthy papers include: EgoToM, which introduces a new video question-answering benchmark for evaluating ToM in egocentric domains. ToM-RL, which demonstrates the effectiveness of reinforcement learning in unlocking ToM capabilities in small language models.

Sources

How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark

EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment

All You Need is Sally-Anne: ToM in AI Strongly Supported After Surpassing Tests for 3-Year-Olds

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

Built with on top of