The field of virtual try-on and human animation is rapidly advancing, with a clear trend towards leveraging large multimodal models (LMMs) and diffusion models to enhance the quality, controllability, and versatility of generated content. Innovations are particularly focused on improving text editability, handling significant viewpoint changes, and ensuring lightweight yet effective model architectures. These developments are enabling more realistic and detailed virtual try-on experiences, as well as more dynamic and adaptable human animations. The integration of LMMs is proving crucial for generating detailed descriptions and ensuring high-quality outputs across diverse scenarios. Additionally, the field is seeing a move towards more interactive and automated garment estimation, generation, and editing, facilitated by advancements in vision-language models (VLMs).
Noteworthy Papers
- PromptDresser: Enhances virtual try-on by leveraging detailed text prompts for high-quality and versatile clothing manipulation.
- Free-viewpoint Human Animation: Introduces a pose-correlated reference selection strategy to support substantial viewpoint variations in human animation.
- DreamFit: Offers a lightweight solution for garment-centric human generation, ensuring high-quality results across diverse scenarios.
- ChatGarment: Automates garment estimation, generation, and editing through interactive dialogue, leveraging large vision-language models.