AI Planning and Reasoning

Report on Current Developments in AI Planning and Reasoning

General Direction of the Field

The recent advancements in AI planning and reasoning, particularly within the context of large language models (LLMs), are pushing the boundaries of what these models can achieve. The field is moving towards more sophisticated and reliable methods for planning, scheduling, and multi-step reasoning, with a strong emphasis on integrating domain-specific knowledge and neurosymbolic approaches. This integration aims to address the inherent probabilistic nature of LLMs, which often leads to inconsistencies and inaccuracies in complex problem-solving tasks.

One of the key trends is the development of frameworks that leverage the semantic and world knowledge of LLMs to enhance hierarchical imitation learning (HIL) and long-horizon decision-making tasks. These frameworks are designed to pre-label states and specify sub-goal spaces without prior knowledge of task hierarchies, thereby improving the robustness and adaptability of sub-goal representations. Additionally, there is a growing interest in combining LLMs with external verifiers to guarantee the correctness of generated outputs, which is crucial for applications requiring reliability and precision.

Another significant development is the focus on state-tracking and reasoning for acting and planning with LLMs. This approach enhances the "chain-of-thought" reasoning by incorporating state-tracking, which allows for more efficient and accurate long-range reasoning tasks. The use of few-shot in-context learning methods is also gaining traction, as it reduces the need for additional data or human-crafted rules, making the process more scalable and adaptable.

The field is also witnessing the creation of new benchmarks and datasets that specifically evaluate the multi-step reasoning and procedural planning abilities of LLMs and visual language models (VLMs). These benchmarks are designed to assess the models' ability to follow explicit instructions and solve problems that require varying numbers of steps, providing a comprehensive evaluation of their reasoning capabilities.

Noteworthy Papers

  • SEAL: Introduces a novel framework that leverages LLMs' semantic and world knowledge for hierarchical imitation learning, outperforming state-of-the-art methods in complex long-horizon tasks.
  • StateAct: Proposes a simple yet effective method for enhancing state-tracking and reasoning with LLMs, achieving new state-of-the-art results in in-context learning methods.
  • DANA: Presents a neurosymbolic architecture that integrates domain-specific knowledge to improve consistency and accuracy in complex problem-solving tasks, significantly outperforming current LLM-based systems.
  • ProcBench: Develops a benchmark focused on evaluating multi-step inference in LLMs, highlighting areas for future research in advancing their reasoning abilities.
  • ActPlan-1K: Introduces a multi-modal planning benchmark for evaluating the procedural planning ability of VLMs, emphasizing the need for further research in this area.
  • ACPBench: Provides a comprehensive benchmark for evaluating reasoning tasks in planning, revealing significant gaps in the reasoning capabilities of current LLMs.

Sources

Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1

SEAL: SEmantic-Augmented Imitation Learning via Language Model

StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models

DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracy

ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure

ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities

ACPBench: Reasoning about Action, Change, and Planning

AAAI Workshop on AI Planning for Cyber-Physical Systems -- CAIPI24

Built with on top of