Advancements in Text-to-Image Generation: Fidelity, Personalization, and Control

The field of text-to-image generation is rapidly advancing, with a clear trend towards enhancing the fidelity, personalization, and control over the generated images. Recent developments focus on improving subject preservation, enabling more precise control over image layouts and attributes, and personalizing image generation to align with individual preferences. Innovations include the integration of advanced diffusion models with novel techniques for layout generation, background painting, and personal preference fine-tuning. These advancements aim to address the challenges of maintaining subject fidelity, ensuring image harmonization, and catering to the nuanced preferences of users. Additionally, there is a growing interest in applying these technologies to specialized domains, such as literary works and robotic tasks, further expanding the applicability and impact of text-to-image generation technologies.

Noteworthy papers include:

  • SceneBooth: Introduces a novel framework for subject-preserved text-to-image generation, significantly outperforming baseline methods in subject preservation and image harmonization.
  • 3DIS-FLUX: Extends the 3DIS framework with the FLUX model for enhanced rendering capabilities, surpassing current state-of-the-art methods in performance and image quality.
  • PersonaHOI: A training- and tuning-free framework that improves personalized face generation with human-object interaction, setting a new standard for practical personalized face generation.
  • Poetry in Pixels: Proposes a PoemToPixel framework for generating images that visually represent the inherent meanings of poems, offering a fresh perspective on literary image generation.
  • Personalized Preference Fine-tuning of Diffusion Models: Introduces PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences, enabling generalization to unseen users.
  • Enhancing Image Generation Fidelity via Progressive Prompts: Develops a coarse-to-fine generation pipeline for regional prompt-following generation, enhancing the controllability of DiT-based image generation.
  • FDPP: Proposes fine-tuning diffusion policy with human preference, effectively customizing policy behavior without compromising performance.
  • SHYI: Addresses infidelity in text-to-image generation for actions involving multiple objects, showing promising results with enhanced contrastive learning techniques.
  • ObjectDiffusion: Presents a model that conditions T2I models with new bounding boxes capabilities, demonstrating remarkable grounding abilities across various contexts.
  • AnyStory: Proposes a unified approach for personalized subject generation, achieving high-fidelity personalization for both single and multiple subjects.

Sources

SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation

Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models

Personalized Preference Fine-tuning of Diffusion Models

Enhancing Image Generation Fidelity via Progressive Prompts

FDPP: Fine-tune Diffusion Policy with Human Preference

SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

Built with on top of