The recent advancements in the field of virtual try-on and person image synthesis have shown a significant shift towards more controllable and realistic image generation. Researchers are increasingly focusing on integrating physical attributes, such as clothing size and pose, into generative models to enhance the accuracy and naturalness of synthesized images. Techniques leveraging diffusion models and attention mechanisms are proving particularly effective, allowing for the preservation of fine-grained details while maintaining high image quality. Additionally, the use of multimodal inputs and program synthesis for garment design is opening new avenues for translating abstract concepts into tangible, size-precise sewing patterns. These developments not only push the boundaries of current generative capabilities but also pave the way for more personalized and efficient fashion solutions. Notably, the introduction of size-variable virtual try-on and the disentangling of pose guidance for image animation are particularly innovative, offering substantial improvements in control and realism.