Advances in Efficient Image and Video Generation

The field of image and video generation is rapidly advancing, with a focus on improving efficiency and quality. Recent developments have seen the introduction of novel caching strategies, such as adaptive caching, which can significantly reduce computational overhead while preserving visual fidelity. Additionally, new approaches to image editing, such as program synthesis and lattice-based algorithms, are being explored to automate the editing process and improve accuracy. Diffusion models are also being improved with techniques such as concept fusion, localized refinement, and dynamic importance, which enable better handling of multiple concepts, prevention of attribute leakage, and enhanced image synthesis. Furthermore, research is being conducted on stochastic texture filtering, tuning-free image editing, and decoupled diffusion transformers to improve texture quality, balance fidelity and editability, and accelerate training convergence. Notable papers in this area include: Model Reveals What to Cache, which introduces a novel adaptive caching strategy. DyDiT++, which proposes a dynamic diffusion transformer for efficient visual generation. CasTex, which investigates text-to-texture synthesis using diffusion models. ColorizeDiffusion v2, which enhances reference-based sketch colorization through separating utilities.

Sources

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices

FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement

Dynamic Importance in Diffusion U-Net for Enhanced Image Synthesis

Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing

Improved Stochastic Texture Filtering Through Sample Reuse

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

DDT: Decoupled Diffusion Transformer

Unifying Autoregressive and Diffusion-Based Sequence Generation

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

CasTex: Cascaded Text-to-Texture Synthesis via Explicit Texture Maps and Physically-Based Shading

ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities

Latent Diffusion U-Net Representations Contain Positional Embeddings and Anomalies

Built with on top of