The field of image and video generation is rapidly advancing, with a focus on improving efficiency and quality. Recent developments have seen the introduction of novel caching strategies, such as adaptive caching, which can significantly reduce computational overhead while preserving visual fidelity. Additionally, new approaches to image editing, such as program synthesis and lattice-based algorithms, are being explored to automate the editing process and improve accuracy. Diffusion models are also being improved with techniques such as concept fusion, localized refinement, and dynamic importance, which enable better handling of multiple concepts, prevention of attribute leakage, and enhanced image synthesis. Furthermore, research is being conducted on stochastic texture filtering, tuning-free image editing, and decoupled diffusion transformers to improve texture quality, balance fidelity and editability, and accelerate training convergence. Notable papers in this area include: Model Reveals What to Cache, which introduces a novel adaptive caching strategy. DyDiT++, which proposes a dynamic diffusion transformer for efficient visual generation. CasTex, which investigates text-to-texture synthesis using diffusion models. ColorizeDiffusion v2, which enhances reference-based sketch colorization through separating utilities.