Advancements in Robotic Manipulation and Human-Object Interaction

The field of robotic manipulation and human-object interaction is witnessing significant advancements, particularly in the development of systems capable of understanding and executing complex tasks in unstructured environments. A key trend is the integration of vision-language models (VLMs) with robotic systems to bridge the gap between high-level reasoning and precise manipulation tasks. This is achieved through innovative object-centric representations and interaction primitives that translate commonsense reasoning into actionable spatial constraints. Another notable development is the use of large language models and video-to-code synthesis techniques for zero-shot policy code generation, enabling robots to perform tasks based on visual information and free-form instructions without prior task-specific training. Additionally, there is a growing focus on learning dynamic affordances from interactive exploration and pre-trained video diffusion models, enhancing robots' ability to understand and predict human-object interactions over time. These advancements are complemented by a novel affordance theory based on computational rationality, which redefines affordance perception as a dynamic decision-making process, offering new insights into the design of adaptive and intuitive systems.

Noteworthy Papers

  • OmniManip: Introduces a dual closed-loop system for robotic manipulation, leveraging object-centric interaction primitives for robust, real-time control without VLM fine-tuning.
  • Robotic Programmer (RoboPro): Achieves state-of-the-art zero-shot performance in robotic manipulation by synthesizing executable code from videos, demonstrating significant improvements over existing models.
  • Learning Affordances from Interactive Exploration: Proposes an efficient exploration pipeline for learning robot-specific affordances, resulting in more accurate prediction models.
  • DAViD: Models dynamic affordance of 3D objects using pre-trained video diffusion models, outperforming baselines in generating human motion with human-object interactions.
  • Redefining Affordance via Computational Rationality: Introduces a novel affordance theory that frames perception as a dynamic decision-making process, validated through thought experiments and applicable across diverse contexts.

Sources

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation

Learning Affordances from Interactive Exploration using an Object-level Map

DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models

Redefining Affordance via Computational Rationality

Built with on top of