Advances in Robot Learning and Task Planning

The field of robotics is witnessing significant advancements in robot learning and task planning, driven by innovations in large language models, vision-language-action models, and imitation learning. Researchers are exploring new approaches to improve the efficiency, adaptability, and generalization of robotic systems, enabling them to perform complex tasks in diverse environments. Notable developments include the integration of large language models with planning domain definition language, the use of geometric fabrics for safe and stable imitation learning, and the introduction of novel frameworks for continual adaptation, visual data augmentation, and cross-task invariance. These advancements have the potential to enhance the capabilities of robots in various applications, from household tasks to industrial manufacturing. Some noteworthy papers in this regard include LLM+MAP, which introduces a bimanual planning framework that integrates LLM reasoning and multi-agent planning, and TamedPUMA, which proposes an IL algorithm augmented with geometric fabrics for safe and stable imitation learning. Additionally, papers like Gemini Robotics and MoLe-VLA demonstrate significant improvements in robot learning and task planning, with Gemini Robotics introducing a generalist Vision-Language-Action model and MoLe-VLA proposing a dynamic layer-skipping vision language action model.

Sources

LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language

TamedPUMA: safe and stable imitation learning with geometric fabrics

Efficient Continual Adaptation of Pretrained Robotic Policy with Online Meta-Learned Adapters

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

AdaWorld: Learning Adaptable World Models with Latent Actions

CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model

DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Gemini Robotics: Bringing AI into the Physical World

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation

Cultivating Game Sense for Yourself: Making VLMs Gaming Experts

Cooking Task Planning using LLM and Verified by Graph Network

Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Robust Offline Imitation Learning Through State-level Trajectory Stitching

ActionStudio: A Lightweight Framework for Data and Training of Action Models