Advances in Video Analysis and Generation

The field of video analysis and generation is rapidly evolving, with a focus on developing more accurate and efficient methods for tasks such as camera model identification, 3D reconstruction, and video prediction. Recent research has explored the use of novel architectures, such as transformers and diffusion models, to improve the performance of these tasks. Additionally, there is a growing interest in developing methods that can handle complex and dynamic scenes, such as those found in autonomous driving and sports analytics. Notable papers in this area include those that propose innovative solutions for source camera model identification, 3D consistent video generation, and real-time video prediction. Overall, the field is moving towards more robust and generalizable methods that can be applied to a wide range of applications. Noteworthy papers include: CoGen, which introduces a novel spatial adaptive generation framework for 3D consistent video generation. LIM, which presents a transformer-based feed-forward solution for dynamic reconstruction. Zero4D, which proposes a training-free 4D video generation method that leverages off-the-shelf video diffusion models.

Sources

Camera Model Identification with SPAIR-Swin and Entropy based Non-Homogeneous Patches

CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving

LIM: Large Interpolator Model for Dynamic Reconstruction

Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model

Real-time Video Prediction With Fast Video Interpolation Model and Prediction Training

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Compression Metadata-assisted RoI Extraction and Adaptive Inference for Efficient Video Analytics

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

Hierarchical Flow Diffusion for Efficient Frame Interpolation

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Video Quality Assessment for Resolution Cross-Over in Live Sports

Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

L-LBVC: Long-Term Motion Estimation and Prediction for Learned Bi-Directional Video Compression

Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model