Advancements in Human-Object Interaction Synthesis

The field of human-object interaction (HOI) synthesis is rapidly evolving, with a focus on developing more realistic and physically plausible models. Recent research has explored the use of diffusion-based methods, vision-language models, and multimodal priors to improve the accuracy and diversity of HOI synthesis. These innovations have enabled the generation of more complex and realistic interactions between humans and objects, with applications in areas such as virtual reality, robotics, and animation. Noteworthy papers in this area include those that propose novel frameworks for HOI synthesis, such as the use of autoregressive diffusion models and layout-instructed diffusion models. Notable papers:

  • Auto-Regressive Diffusion for Generating 3D Human-Object Interactions presents a novel autoregressive diffusion model that predicts the next continuous token in a sequence of human-object interactions.
  • Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model introduces a specialized layout representation for hands and objects, enabling effective disentanglement of hand modeling and object adaptation.

Sources

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

An Image-like Diffusion Method for Human-Object Interaction Detection

Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics

HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Guiding Human-Object Interactions with Rich Geometry and Relations

FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation

Built with on top of