Advances in Diffusion Models and Text-Guided Image Manipulation

The recent advancements in the field of diffusion models and text-guided image manipulation have significantly pushed the boundaries of what is possible in terms of generating and editing high-quality images and videos. A common theme across several papers is the integration of diffusion models with other advanced techniques to achieve more precise and controllable results. For instance, the use of learnable prompts and latent diffusion models has enabled more nuanced control over image generation and editing processes, particularly in scenarios where data is scarce or diverse. Additionally, the incorporation of object-centric and hierarchical approaches has improved the consistency and quality of generated content, addressing challenges related to object disappearance and misaligned motion in video generation. Another notable trend is the development of unsupervised and self-supervised methods, which reduce the reliance on annotated data and open up new possibilities for scalable and generalized image editing. These methods often leverage novel loss functions and alignment techniques to enhance the fidelity and coherence of edited images and videos. Overall, the field is moving towards more sophisticated, controllable, and scalable solutions that can handle a wide range of image and video editing tasks with high precision and naturalness.

Sources

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

StyleDiT: A Unified Framework for Diverse Child and Partner Faces Synthesis with Style Latent Diffusion Transformer

SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models

Re-Attentional Controllable Video Diffusion Editing

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

Prompt Augmentation for Self-supervised Text-guided Image Manipulation

CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

Built with on top of