The recent advancements in the field of diffusion models and text-guided image manipulation have significantly pushed the boundaries of what is possible in terms of generating and editing high-quality images and videos. A common theme across several papers is the integration of diffusion models with other advanced techniques to achieve more precise and controllable results. For instance, the use of learnable prompts and latent diffusion models has enabled more nuanced control over image generation and editing processes, particularly in scenarios where data is scarce or diverse. Additionally, the incorporation of object-centric and hierarchical approaches has improved the consistency and quality of generated content, addressing challenges related to object disappearance and misaligned motion in video generation. Another notable trend is the development of unsupervised and self-supervised methods, which reduce the reliance on annotated data and open up new possibilities for scalable and generalized image editing. These methods often leverage novel loss functions and alignment techniques to enhance the fidelity and coherence of edited images and videos. Overall, the field is moving towards more sophisticated, controllable, and scalable solutions that can handle a wide range of image and video editing tasks with high precision and naturalness.