Advances in Real-Time Talking Head Synthesis

The field of talking head synthesis is rapidly advancing, with a focus on achieving realistic and efficient generation of 3D talking heads in real-time. Recent developments have seen significant improvements in lip synchronization, facial expression control, and overall perceptual accuracy. Researchers are exploring novel approaches, such as autoregressive motion generation, semantic disentanglement, and audio-dependent plane decomposition, to enhance the quality and efficiency of talking head synthesis. Notably, these advancements have the potential to enable more realistic and engaging virtual communication, with applications in fields like e-commerce, education, and entertainment. Noteworthy papers include: TaoAvatar, which achieves state-of-the-art rendering quality while running in real-time across various devices. DiffusionTalker, which proposes a personalizer-guided distillation approach to improve the efficiency and compactness of speech-driven 3D talking head generation.

Sources

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation

Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation

DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis

Built with on top of