Advances in Human Pose and Motion Analysis

The field of human pose and motion analysis is rapidly advancing, driven by the development of novel frameworks and models that enable more accurate and efficient analysis of human movement. A key direction in this field is the use of multimodal large language models (MLLMs) to generate rich and structured pose transition descriptions, which can enhance annotation quality and reduce annotation costs. Another area of focus is the development of efficient and explicit joint-level interaction modeling approaches, which can capture fine-grained joint-level interactions and generate more realistic human-object interactions (HOIs). The creation of large-scale datasets, such as HOIGen-1M, is also playing a crucial role in advancing HOI video generation. Furthermore, the development of controllable video generation frameworks, such as Any2Caption and SkyReels-A2, is enabling more precise control over video generation and opening up new applications in areas such as drama and virtual e-commerce. Noteworthy papers in this area include AutoComPose, which introduces a framework for automatic generation of pose transition descriptions, and MG-MotionLLM, which pioneers a unified framework for motion comprehension and generation across multiple granularities. EJIM is also a notable work, which proposes an Efficient Explicit Joint-level Interaction Model for generating text-guided HOIs.

Sources

AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs

Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation

VideoGen-Eval: Agent-based System for Video Generation Evaluation

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline

SkyReels-A2: Compose Anything in Video Diffusion Transformers

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities

Built with on top of