Advancements in Autonomous Driving: Integrating VLMs and LLMs with RL

The field of autonomous driving is increasingly leveraging the integration of vision-language models (VLMs) and large language models (LLMs) with reinforcement learning (RL) to enhance decision-making processes and safety. This trend addresses the limitations of traditional RL approaches, which often rely on manually engineered rewards and lack generalizability. By incorporating VLMs and LLMs, researchers are able to generate more nuanced and semantically rich reward signals, align autonomous vehicle (AV) decisions with human-like preferences, and incorporate human input into the driving process. This integration not only improves the adaptability and efficiency of AVs in complex driving scenarios but also enhances the interpretability and safety of autonomous driving systems. Furthermore, the development of behavior-based neural networks and the emphasis on user-friendly environment descriptions in RL are contributing to more robust and accessible autonomous driving technologies. These advancements signify a shift towards more intelligent, adaptable, and human-aligned autonomous driving systems.

Noteworthy Papers

  • VLM-RL: Introduces a framework combining VLMs with RL for semantic reward generation, significantly improving collision rates and route completion.
  • CLIP-RLDrive: Utilizes CLIP-based reward shaping to align AV decisions with human preferences, enhancing decision-making in complex scenarios.
  • Autoware.Flex: Proposes a system that integrates human instructions into ADS, improving decision appropriateness and safety.
  • Large Language Model guided Deep Reinforcement Learning: Presents a framework where LLMs guide DRL, enhancing learning efficiency and decision-making performance.

Sources

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

CLIP-RLDrive: Human-Aligned Autonomous Driving via CLIP-Based Reward Shaping in Reinforcement Learning

Autoware.Flex: Human-Instructed Dynamically Reconfigurable Autonomous Driving Systems

Application of Multimodal Large Language Models in Autonomous Driving

Towards Selection and Transition Between Behavior-Based Neural Networks for Automated Driving

Environment Descriptions for Usability and Generalisation in Reinforcement Learning

Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

Built with on top of