The field of human-centric video generation and animation is rapidly advancing, with a focus on creating more realistic and engaging content. Recent developments have centered around improving the quality and coherence of generated videos, particularly in scenarios involving multiple individuals and complex interactions. Researchers are exploring new techniques, such as diffusion models and contrastive learning, to enhance the accuracy and nuance of facial expressions, lip movements, and body language. Noteworthy papers in this area include Comprehensive Relighting, which introduces a generalizable model for monocular human relighting and harmonization, and DiTaiListener, which generates high-fidelity listener videos with controllable motion dynamics. Overall, these advancements have significant implications for applications in education, entertainment, and human-computer interaction.
Advances in Human-Centric Video Generation and Animation
Sources
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation