Integrated Frameworks and Controllable Solutions in 3D Scene Generation and Perception

The recent advancements in 3D scene generation and perception have been significantly influenced by the integration of diffusion models, which are now being utilized to bridge the gap between generation and perception tasks. A notable trend is the development of unified frameworks that not only generate high-quality 3D scenes but also enhance perception models through mutual learning paradigms. These frameworks leverage semantic occupancy and joint-training diffusion models to create realistic scenes from text prompts, while also improving perception tasks like semantic occupancy prediction. Additionally, there is a growing focus on multi-object novel view synthesis, where models are being enhanced to handle complex scenarios with multiple objects, ensuring consistent and accurate placement and appearance across different views. Controllability and efficiency in driving simulations are also being addressed, with models designed to initialize and rollout scenes realistically while maintaining inference efficiency and closed-loop realism. Furthermore, the synthesis of photorealistic street views from vehicle sensor data is being advanced through controllable video diffusion models, which offer precise camera control and real-time rendering capabilities. Object insertion tasks are evolving with the introduction of affordance-aware models that seamlessly integrate objects into scenes, addressing the interplay between foreground and background. Lastly, view synthesis from 3D lifting is being refined through progressive techniques that enhance the quality of 3D representations and their rendering. Overall, these developments indicate a shift towards more integrated, controllable, and efficient solutions in 3D scene generation and perception.

Sources

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations

Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles

Built with on top of