Controllable and Efficient Diffusion Models for Image Synthesis and Editing

The recent advancements in diffusion models have significantly pushed the boundaries of image synthesis and editing. A notable trend is the integration of semantic understanding and control into diffusion processes, enabling more precise and diverse image generation. Innovations such as diffusion-negative prompting and lifelong few-shot customization are bridging the semantic gap between human intent and machine understanding, leading to more intuitive and effective image synthesis. Additionally, the development of scalable, tokenization-free architectures is making diffusion models more feasible for on-device applications, addressing the computational challenges associated with traditional architectures. Image editing capabilities are also advancing rapidly, with models like SeedEdit and OmniEdit offering sophisticated, instruction-guided editing that balances fidelity to the original image with creative freedom. Notably, the introduction of training-free object insertion methods and flexible generative perception error models for autonomous driving further exemplifies the versatility and practical application of these models. These developments collectively indicate a shift towards more controllable, efficient, and contextually aware diffusion models, paving the way for broader real-world applications and deeper semantic interactions between humans and machines in image synthesis and editing tasks.

Sources

Improving image synthesis with diffusion-negative sampling

Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

Scalable, Tokenization-Free Diffusion Model Architectures with Efficient Initial Convolution and Fixed-Size Reusable Structures for On-Device Image Generation

SeedEdit: Align Image Re-Generation to Image Editing

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Score-based generative diffusion with "active" correlated noise sources

Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution

MureObjectStitch: Multi-reference Image Composition

Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion

EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners

Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing

Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance

DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing

Golden Noise for Diffusion Models: A Learning Framework

MagicQuill: An Intelligent Interactive Image Editing System