Advances in Video Generation and Control

The field of video generation is rapidly advancing, with a focus on improving control and quality. Recent developments have enabled more precise control over video attributes, such as motion and appearance, and have led to the creation of more realistic and engaging videos. Notably, innovations in diffusion models and transformers have played a key role in these advancements. One of the significant trends in this area is the use of novel frameworks and architectures that integrate multiple conditions and modalities, allowing for more flexible and controllable video generation. Additionally, there is a growing interest in self-supervised and unsupervised learning methods that can learn motion concepts and abstract object movements from videos without requiring extensive labeled datasets. Some noteworthy papers in this regard include Enabling Versatile Controls for Video Diffusion Models, which introduces a novel framework for fine-grained control over pre-trained video diffusion models, and Mask$^2$DiT, which proposes a dual mask-based diffusion transformer for multi-scene long video generation. VideoMage is also a notable work, as it presents a unified framework for video customization over both multiple subjects and their interactive motions.

Sources

Enabling Versatile Controls for Video Diffusion Models

AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation

TransAnimate: Taming Layer Diffusion to Generate RGBA Video

LongDiff: Training-Free Long Video Generation in One Go

Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

AMD-Hummingbird: Towards an Efficient Text-to-Video Model

Video-T1: Test-Time Scaling for Video Generation

Target-Aware Video Diffusion Models

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

Wan: Open and Advanced Large-Scale Video Generative Models

Recovering Dynamic 3D Sketches from Videos

RecTable: Fast Modeling Tabular Data with Rectified Flow

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Built with on top of