The field of reinforcement learning and imitation learning is moving towards leveraging offline data and integrating human knowledge to improve policy learning and decision-making. Researchers are exploring novel approaches to address the challenges of offline reinforcement learning, such as distribution shift and suboptimal demonstrations. The use of action masking and trajectory stitching is becoming increasingly popular to enhance the flexibility and robustness of reinforcement learning models. Additionally, the incorporation of human expertise through action masking is showing promising results in improving trust and performance in real-world applications.
Noteworthy papers include:
- Robust Offline Imitation Learning Through State-level Trajectory Stitching, which introduces a state-based search framework to stitch state-action pairs from imperfect demonstrations.
- Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning, which proposes a novel method to reduce reliance on empirical action distribution and enhance safety requirements.
- Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research, which analyzes the benefits and caveats of including human knowledge via action masking in reinforcement learning.