Advancements in Video Generation and Control

The field of video generation and control is witnessing significant advancements with the development of innovative frameworks and techniques. Researchers are focusing on improving the controllability and quality of generated videos, particularly in areas such as camera and human motion control, character animation, and text-to-video synthesis. Notable progress is being made in addressing challenges like layout discontinuity, entity identity drift, and implausible interaction dynamics. Additionally, there is a growing emphasis on ensuring the safety and security of video generation models, with the development of defense frameworks to protect against jailbreak attacks and malicious content.

Some noteworthy papers in this area include: Uni3C, which presents a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. RealisDance-DiT, which introduces a simple yet strong baseline for controllable character animation in the wild. DyST-XL, a training-free framework that enhances off-the-shelf text-to-video models through frame-aware control. T2VShield, a comprehensive defense framework designed to protect text-to-video models from jailbreak threats. We'll Fix it in Post, a novel zero-training video refinement pipeline that leverages neuro-symbolic feedback to automatically enhance video generation.

Sources

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation

T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

Built with on top of