Advances in Human-Centric Video Generation and Animation

The field of human-centric video generation and animation is rapidly advancing, with a focus on creating more realistic and engaging content. Recent developments have centered around improving the quality and coherence of generated videos, particularly in scenarios involving multiple individuals and complex interactions. Researchers are exploring new techniques, such as diffusion models and contrastive learning, to enhance the accuracy and nuance of facial expressions, lip movements, and body language. Noteworthy papers in this area include Comprehensive Relighting, which introduces a generalizable model for monocular human relighting and harmonization, and DiTaiListener, which generates high-fidelity listener videos with controllable motion dynamics. Overall, these advancements have significant implications for applications in education, entertainment, and human-computer interaction.

Sources

Real Time Animator: High-Quality Cartoon Style Transfer in 6 Animation Styles on Images and Videos

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Multi-identity Human Image Animation with Structural Video Diffusion

FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation

When Less Is More: A Sparse Facial Motion Structure For Listening Motion Learning

Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation

SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity

Interactive Expressive Motion Generation Using Dynamic Movement Primitives