Advances in Video and Image Generation: Control, Quality, and Consistency

The recent advancements in video generation and editing have shown significant progress in enhancing the quality, consistency, and control over generated content. Key developments include the integration of point tracking into video diffusion models to reduce appearance drift and improve temporal coherence, as well as the introduction of semi-supervised learning frameworks that leverage both real and synthetic data to improve multi-view stereo performance. Additionally, there has been a focus on enabling precise camera control in video diffusion models without the need for fine-tuning, and on simulating world inconsistencies to enhance view synthesis robustness. Style control in video generation has also seen improvements with methods that emphasize both global and local textures to achieve high-quality stylized videos. Furthermore, the unification of image generation and editing tasks under a single framework that leverages video dynamics has opened new possibilities for universal image manipulation. Notably, the field has also seen the development of no-reference quality assessment methods tailored for scenes generated by neural view synthesis and NeRF variants, addressing the limitations of traditional quality assessment metrics. Overall, these innovations are pushing the boundaries of what is possible in video and image generation, offering more control, higher quality, and greater consistency in generated content.

Sources

EgoPoints: Advancing Point Tracking for Egocentric Videos

DreamColour: Controllable Video Colour Editing without Training

Prism: Semi-Supervised Multi-View Stereo with Monocular Structure Priors

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training

SimVS: Simulating World Inconsistencies for Robust View Synthesis

StyleMaster: Stylize Your Video with Artistic Generation and Translation

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

NeRF-NQA: No-Reference Quality Assessment for Scenes Generated by NeRF and Neural View Synthesis Methods

Neural Observation Field Guided Hybrid Optimization of Camera Placement

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

Owl-1: Omni World Model for Consistent Long Video Generation

Learning Camera Movement Control from Real-World Drone Videos

GenEx: Generating an Explorable World

Built with on top of