The recent advancements in virtual try-on technology have significantly enhanced the realism and efficiency of the process, particularly in video-based applications. Researchers are focusing on improving temporal consistency and reducing computational overhead, which are critical for generating smooth and stable try-on videos, even with complex human movements. The integration of diffusion models and dynamic attention mechanisms has shown promise in preserving garment details and ensuring spatiotemporal coherence. Additionally, the development of multi-modal generative models and novel attention modules is enabling more flexible and precise control over the try-on process, allowing for the generation of complex garments and personalized fashion images. Notably, some approaches are exploring the use of untrained diffusion models to simplify the try-on pipeline, offering a more resource-efficient solution without compromising visual quality. These innovations collectively push the boundaries of virtual try-on, making it more accessible and realistic for real-world applications.