The field of robotic manipulation and human-object interaction is witnessing significant advancements, particularly in the development of systems capable of understanding and executing complex tasks in unstructured environments. A key trend is the integration of vision-language models (VLMs) with robotic systems to bridge the gap between high-level reasoning and precise manipulation tasks. This is achieved through innovative object-centric representations and interaction primitives that translate commonsense reasoning into actionable spatial constraints. Another notable development is the use of large language models and video-to-code synthesis techniques for zero-shot policy code generation, enabling robots to perform tasks based on visual information and free-form instructions without prior task-specific training. Additionally, there is a growing focus on learning dynamic affordances from interactive exploration and pre-trained video diffusion models, enhancing robots' ability to understand and predict human-object interactions over time. These advancements are complemented by a novel affordance theory based on computational rationality, which redefines affordance perception as a dynamic decision-making process, offering new insights into the design of adaptive and intuitive systems.
Noteworthy Papers
- OmniManip: Introduces a dual closed-loop system for robotic manipulation, leveraging object-centric interaction primitives for robust, real-time control without VLM fine-tuning.
- Robotic Programmer (RoboPro): Achieves state-of-the-art zero-shot performance in robotic manipulation by synthesizing executable code from videos, demonstrating significant improvements over existing models.
- Learning Affordances from Interactive Exploration: Proposes an efficient exploration pipeline for learning robot-specific affordances, resulting in more accurate prediction models.
- DAViD: Models dynamic affordance of 3D objects using pre-trained video diffusion models, outperforming baselines in generating human motion with human-object interactions.
- Redefining Affordance via Computational Rationality: Introduces a novel affordance theory that frames perception as a dynamic decision-making process, validated through thought experiments and applicable across diverse contexts.