Human-Centric and Efficient Generative AI

The recent advancements in generative models, particularly in text-to-image and text-to-video domains, have been marked by a shift towards more human-centric and efficient approaches. Researchers are increasingly focusing on aligning model outputs with human preferences, optimizing for perceptual quality, and addressing computational efficiency without compromising on the quality of generated content. Techniques such as leveraging human feedback for fine-tuning, introducing interpretable intermediate representations, and adaptive diffusion models that optimize computational steps based on perceptual metrics are gaining traction. Additionally, there is a notable push towards developing models that are not only high-performing but also lightweight and suitable for deployment on resource-constrained devices like mobile phones. These developments collectively indicate a move towards more sustainable, user-friendly, and human-aligned generative AI systems.

Noteworthy contributions include a method for fine-tuning text-to-video models using human feedback to improve alignment with human expectations, a novel approach to scene layout generation that offers fine-grained control and interpretability, and a perceptually-guided adaptive diffusion model that optimizes computational efficiency. Furthermore, a framework for aligning and evaluating multi-view diffusion models with human preferences has been introduced, alongside a high-resolution text-to-image model optimized for mobile devices.

Sources

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

SLayR: Scene Layout Generation with Rectified Flow

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Built with on top of