Computer Vision and Graphics for Human and Animal Avatars

Report on Current Developments in Computer Vision and Graphics for Human and Animal Avatars

General Direction of the Field

The recent advancements in computer vision and graphics have been significantly focused on the development of sophisticated models for creating and manipulating 3D avatars of humans and animals. This trend is driven by the increasing demand for realistic and interactive avatars in various applications, including virtual reality, gaming, and telepresence. The field is moving towards more detailed and consistent 3D reconstructions, leveraging advanced machine learning techniques and large-scale datasets.

One of the key areas of innovation is the use of Gaussian Splatting and contrastive learning to achieve 3D-consistent human avatars with sparse inputs. This approach not only enhances the quality of avatar reconstruction but also improves the rendering efficiency, making it suitable for real-time applications. Additionally, there is a growing emphasis on semantic-guided methods that incorporate human body semantic information to achieve fine-detail reconstruction of dynamic avatars.

Another significant development is the integration of synthetic data and diffusion models to improve the robustness and generalization of 3D pose estimation and action recognition. These methods are particularly useful in scenarios where real-world data is scarce or difficult to obtain, such as in wildlife monitoring and egocentric hand pose estimation.

Noteworthy Innovations

HaSPeR: An Image Repository for Hand Shadow Puppet Recognition - This pioneering dataset and research endeavor aim to preserve the dying art of hand shadow puppetry using computer vision approaches. The findings highlight the superiority of traditional convolutional models over attention-based transformer architectures in this domain.
CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning - CHASE introduces innovative methods to maintain 3D consistency and improve detail reconstruction with sparse inputs, outperforming current state-of-the-art methods in both full and sparse settings.
SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting - SG-GS leverages semantics-embedded 3D Gaussians to create photo-realistic animatable human avatars from monocular videos, achieving state-of-the-art geometry and appearance reconstruction performance.
SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition - SHARP introduces a novel method for improving egocentric 3D hand pose estimation using pseudo-depth images, resulting in high accuracy in action recognition tasks.
DEGAS: Detailed Expressions on Full-Body Gaussian Avatars - DEGAS is the first method to incorporate detailed expressions into full-body avatars using 3D Gaussian Splatting, bridging the gap between 2D talking faces and 3D avatars.
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data - This work demonstrates the potential of synthetic data for animal pose estimation, showing consistent generalization to real-world images of zebras and horses.
Multi-view Hand Reconstruction with a Point-Embedded Transformer - POEM introduces a novel multi-view Hand Mesh Reconstruction model that leverages embedded basis points and diverse camera configurations for practical real-world applications.
MPL: Lifting 3D Human Pose from Multi-view 2D Poses - MPL combines 2D pose estimation and 2D-to-3D pose lifting using a transformer-based network, significantly reducing MPJPE errors and improving 3D pose estimation.
HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model - This method achieves state-of-the-art layered 3D human generation with complex clothing, supporting virtual try-on and layered human animation.
SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset - SynPlay enhances human detection and segmentation accuracy by incorporating realistic human motions and multiple camera viewpoints, demonstrating its effectiveness in data-scarce regimes.
ZipGait: Bridging Skeleton and Silhouette with Diffusion Model for Advancing Gait Recognition - ZipGait advances gait recognition by reconstructing dense body shapes from discrete skeleton distributions, outperforming state-of-the-art methods in cross-domain and intra-domain settings.
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding - VTON-HandFit addresses hand occlusion issues in virtual try-on by leveraging hand priors, significantly improving try-on performance in real-world scenarios.
Sapiens: Foundation for Human Vision Models - Sapiens provides a family of models for fundamental

Computer Vision and Graphics for Human and Animal Avatars

Report on Current Developments in Computer Vision and Graphics for Human and Animal Avatars

General Direction of the Field

Noteworthy Innovations

Sources