Advancing Diffusion Models: Efficiency, Control, and Versatility

The recent advancements in diffusion models have significantly enhanced the capabilities of image generation and editing. Researchers are focusing on improving the efficiency and quality of these models, with notable innovations in model customization, personalization, and adversarial techniques. Key areas of development include mitigating unintended alterations during model adaptation, enhancing the control over foreground objects in generated images, and dynamically adjusting guidance during the generation process. Additionally, there is a growing interest in the theoretical underpinnings of diffusion models, such as the analysis of Wasserstein convergence and the exploration of optimal control methods. These developments not only improve the performance of existing models but also open new avenues for applications in various fields, including art generation, style transfer, and personalized content creation. Notably, the introduction of novel frameworks like Group Diffusion Transformers and the exploration of zero-shot style-specific image variations highlight the potential for more scalable and versatile generative models. These advancements collectively push the boundaries of what is possible with diffusion models, making them more robust, efficient, and adaptable to a wide range of applications.

Sources

Assessing Open-world Forgetting in Generative Image Model Customization

Personalized Image Generation with Large Multimodal Models

HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects

Dynamic Negative Guidance of Diffusion Models

Parallel Backpropagation for Inverse of a Convolution with Application to Normalizing Flows

Mitigating Embedding Collapse in Diffusion Models for Categorical Data

Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Truncated Consistency Models

Group Diffusion Transformers are Unsupervised Multitask Learners

DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer

ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization

Learning to Synthesize Graphics Programs for Geometric Artworks

Distribution Learning with Valid Outputs Beyond the Worst-Case

AttentionPainter: An Efficient and Adaptive Stroke Predictor for Scene Painting

Conjuring Semantic Similarity

Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement

One-Step Diffusion Distillation through Score Implicit Matching

MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Lower Bounds for Convexity Testing

Scalable Ranked Preference Optimization for Text-to-Image Generation

Training Free Guided Flow Matching with Optimal Control

Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label

Rectified Diffusion Guidance for Conditional Generation

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Fast constrained sampling in pre-trained diffusion models

Towards Visual Text Design Transfer Across Languages

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Testing Support Size More Efficiently Than Learning Histograms

Stable Consistency Tuning: Understanding and Improving Consistency Models