The field of autonomous driving is moving towards more comprehensive and integrated approaches, combining knowledge graphs, vision-language models, and multimodal editing to improve scene understanding, decision-making, and trajectory prediction. Researchers are exploring the use of knowledge graphs to capture sensory observations and domain knowledge, such as road topology and traffic rules, to enhance the understanding of complex driving scenes. Additionally, the development of fine-grained evaluation benchmarks is enabling a more detailed assessment of vision-language models in autonomous driving contexts. Noteworthy papers include:
- FM4SU, which proposes a novel methodology for training a symbolic foundation model for scene understanding in autonomous driving, achieving a next scene prediction accuracy of 86.7%.
- ORION, which presents a holistic end-to-end autonomous driving framework that combines a QT-Former, a Large Language Model, and a generative planner to achieve an impressive closed-loop performance.
- VLADBench, which introduces a challenging and fine-grained dataset for evaluating vision-language models in autonomous driving, covering various aspects of traffic knowledge understanding and element recognition.