Efficient and Stable Reinforcement Learning: Recent Advances

The recent developments in the field of reinforcement learning (RL) and imitation learning (IL) have shown significant advancements in tackling complex tasks with high-dimensional inputs and intricate dynamics. Innovations in online IL have leveraged reward-free world models to efficiently model environmental dynamics in latent spaces, achieving stable, expert-level performance across diverse benchmarks. Streaming deep RL has also seen breakthroughs with the introduction of stream-x algorithms, which overcome the stream barrier and match the sample efficiency of batch RL, demonstrating stable learning in various environments. Offline-to-online RL has been enhanced with novel algorithms that utilize scarce demonstrations effectively, achieving high success rates in image-based robotic tasks. Additionally, the incorporation of action abstractions and hierarchical planning in RL has shown improved sample efficiency and interpretability in discovering high-reward states. Notably, the optimization of backward policies in GFlowNets and the development of identifiable representations for latent dynamic systems have provided theoretical guarantees and practical advancements in complex RL scenarios. These developments collectively indicate a shift towards more efficient, stable, and interpretable RL methods that can handle diverse and complex environments.

Noteworthy Papers:

  • The introduction of stream-x algorithms marks a significant step in overcoming the stream barrier in deep RL, enabling stable and efficient learning in streaming environments.
  • The Vector-Quantized Continual Diffuser (VQ-CD) method demonstrates state-of-the-art performance in continual offline RL by aligning different state and action spaces, facilitating continual training across various tasks.

Sources

Reward-free World Models for Online Imitation Learning

Inverse Reinforcement Learning from Non-Stationary Learning Agents

Streaming Deep Reinforcement Learning Finally Works

Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations

Action abstractions for amortized sampling

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

In-Trajectory Inverse Reinforcement Learning: Learn Incrementally From An Ongoing Trajectory

Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces

Primal-Dual Spectral Representation for Off-policy Evaluation

ImDy: Human Inverse Dynamics from Imitated Observations

Learning Versatile Skills with Curriculum Masking

Identifiable Representation and Model Learning for Latent Dynamic Systems

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Prioritized Generative Replay

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Learning Transparent Reward Models via Unsupervised Feature Selection

SAMG: State-Action-Aware Offline-to-Online Reinforcement Learning with Offline Model Guidance

Built with on top of