Reinforcement Learning and Planning

Report on Current Developments in Reinforcement Learning and Planning

General Trends and Innovations

The recent advancements in the field of reinforcement learning (RL) and classical planning demonstrate a strong emphasis on robustness, efficiency, and generalization. Researchers are increasingly focusing on methods that can handle noisy data, improve exploration strategies, and adapt to diverse environments without requiring extensive domain-specific tuning. The following trends and innovations are particularly noteworthy:

  1. Robustness Against Noisy Data: There is a growing interest in developing techniques that can learn effectively from noisy or imperfect data. This is evident in approaches that leverage probabilistic models and Bayesian methods to infer robust policies and reward structures from noisy execution traces. These methods aim to reduce the impact of data inconsistencies and improve the reliability of learned models.

  2. Efficient Exploration and Novelty Search: Novelty-based exploration techniques are being refined to enhance the efficiency of search algorithms in classical planning. Recent work introduces count-based novelty methods that maintain a constant number of tuples, thereby structuring exploration more effectively. These methods are shown to complement existing heuristics and achieve competitive results in challenging benchmarks.

  3. Generalization and Transfer Learning: The field is witnessing a shift towards more generalizable models that can be reused across multiple tasks. Unsupervised-to-online RL frameworks are emerging as a promising approach, enabling the reuse of pre-trained models for various downstream tasks without requiring domain-specific offline RL pre-training. This approach not only improves performance but also enhances stability.

  4. Multi-task Learning and Skill Decomposition: Multi-task offline RL is gaining traction, particularly in scenarios where diverse datasets are available. Techniques that decompose tasks into shareable skills and use quality-weighted losses to guide learning are showing promise in improving the performance of RL agents across heterogeneous datasets. These methods aim to leverage common skills across tasks, making the learning process more efficient and robust.

  5. Goal Recognition and Heuristic Computation: New approaches to goal recognition are being developed that map observed facts into a vector space to compute heuristic values for potential goals. These methods offer improved precision and reduced computational complexity, making them suitable for real-world applications where exact probabilities are infeasible to obtain.

Noteworthy Papers

  • Count-based Novelty Exploration in Classical Planning: Introduces a novel count-based novelty technique that enhances exploration efficiency and complements existing heuristics, achieving competitive results in challenging benchmarks.

  • Unsupervised-to-Online Reinforcement Learning: Proposes a framework that replaces domain-specific offline RL with unsupervised offline RL, enabling the reuse of pre-trained models for multiple tasks and improving performance and stability.

  • Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning: Presents a skill-based multi-task RL technique that leverages common skills across heterogeneous datasets, improving robustness and performance in complex tasks.

These papers represent significant strides in the field, addressing key challenges and offering innovative solutions that advance the state-of-the-art in reinforcement learning and planning.

Sources

Count-based Novelty Exploration in Classical Planning

Fact Probability Vector Based Goal Recognition

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Learning Robust Reward Machines from Noisy Labels

Unsupervised-to-Online Reinforcement Learning

Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning