Advancements in Virtual Try-On and Human Animation

The field of virtual try-on and human animation is rapidly advancing, with a clear trend towards leveraging large multimodal models (LMMs) and diffusion models to enhance the quality, controllability, and versatility of generated content. Innovations are particularly focused on improving text editability, handling significant viewpoint changes, and ensuring lightweight yet effective model architectures. These developments are enabling more realistic and detailed virtual try-on experiences, as well as more dynamic and adaptable human animations. The integration of LMMs is proving crucial for generating detailed descriptions and ensuring high-quality outputs across diverse scenarios. Additionally, the field is seeing a move towards more interactive and automated garment estimation, generation, and editing, facilitated by advancements in vision-language models (VLMs).

Noteworthy Papers

  • PromptDresser: Enhances virtual try-on by leveraging detailed text prompts for high-quality and versatile clothing manipulation.
  • Free-viewpoint Human Animation: Introduces a pose-correlated reference selection strategy to support substantial viewpoint variations in human animation.
  • DreamFit: Offers a lightweight solution for garment-centric human generation, ensuring high-quality results across diverse scenarios.
  • ChatGarment: Automates garment estimation, generation, and editing through interactive dialogue, leveraging large vision-language models.

Sources

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

Free-viewpoint Human Animation with Pose-correlated Reference Selection

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

Built with on top of