Lifelong Learning and Robustness in Reinforcement Learning

The recent developments in the field of reinforcement learning (RL) have shown a significant shift towards addressing complex, real-world challenges by integrating advanced theoretical frameworks with practical applications. A notable trend is the emphasis on lifelong learning and adaptation, where RL agents are designed to continuously learn from a stream of tasks without experiencing catastrophic forgetting. This is exemplified by the introduction of novel algorithms like EPIC, which leverages PAC-Bayesian theory to achieve rapid adaptation to new tasks while retaining knowledge from previous experiences. Another emerging area is the robustness of RL methods under diverse data corruptions, where TRACER stands out by introducing Bayesian inference to enhance robustness against various types of data corruptions. Additionally, the field is witnessing advancements in zero-shot generalization, particularly in inventory management, where the TED framework is proposed to train agents capable of handling a broad range of inventory challenges without retraining. Furthermore, the integration of hierarchical and modular approaches, such as HOP and FraCOs, is proving to be effective in mitigating catastrophic forgetting and accelerating task generalization, respectively. These developments collectively underscore the field's progress towards creating more adaptive, robust, and efficient RL agents capable of handling dynamic and uncertain environments.

Noteworthy papers include EPIC, which offers both theoretical guarantees and practical efficacy in lifelong RL through its world policy; TRACER, which significantly enhances robustness in offline RL by distinguishing corrupted data using an entropy-based uncertainty measure; and TED, which demonstrates superior empirical performance in inventory management by leveraging zero-shot generalization.

Sources

StepCountJITAI: simulation environment for RL with application to physical activity adaptive intervention

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization

Regret of exploratory policy improvement and $q$-learning

Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Efficient Active Imitation Learning with Random Network Distillation

Accelerating Task Generalisation with Multi-Level Hierarchical Options

Hierarchical Orchestra of Policies

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

Approximate Equivariance in Reinforcement Learning

Scaling Laws for Pre-training Agents and World Models

Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity

Hypercube Policy Regularization Framework for Offline Reinforcement Learning

Convergence and Robustness of Value and Policy Iteration for the Linear Quadratic Regulator

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning

Structure Matters: Dynamic Policy Gradient

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games