Generative Models and Controllable Image Generation

Report on Current Developments in the Research Area

General Direction of the Field

The recent advancements in the research area of generative models and controllable image generation are pushing the boundaries of what is possible with artificial intelligence in creative and visual domains. The field is witnessing a shift towards more self-supervised and scalable methods, which aim to mimic human-like associative and abstractive capabilities. This trend is driven by the need to reduce dependency on annotated datasets and to enhance the robustness and generalization of generative models across diverse tasks and scenarios.

One of the key developments is the integration of neural mechanisms inspired by the human brain, such as cortical modularization and hippocampal pattern completion, into AI frameworks. These approaches are enabling more flexible and scalable controllable generation methods, which can spontaneously associate different visual attributes without the need for extensive supervision. This not only improves the model's ability to handle high-noise scenarios but also enhances its scalability potential.

Another significant trend is the move towards multi-scale and multi-modal architectures that can decompose the generation process into manageable parts, each responsible for different levels of detail. This allows for the creation of high-resolution, intricately detailed images while maintaining global coherence. These architectures are being applied to a wide range of artistic expressions, from digital painting to mural art restoration, showcasing their versatility and potential for diverse applications.

The field is also seeing a rise in the development of open and customizable models, which offer state-of-the-art performance in specific domains like anime illustration. These models are designed to be highly adaptable, allowing for easier customization and personalization, and are often released as open-source to foster community-driven improvements.

Simplified and scalable methods for object-centric learning are also gaining traction. These methods aim to abstract data into reusable concepts, similar to how humans process information. By leveraging simple, fully-differentiable architectures, these approaches are making object-centric learning more accessible and scalable, outperforming more complex methods on standard benchmarks.

Noteworthy Papers

  • Learning from Pattern Completion: Self-supervised Controllable Generation: Demonstrates a self-supervised approach that mimics brain-like associative capabilities, showing superior robustness and scalability.

  • Neural-Polyptych: Content Controllable Painting Recreation for Diverse Genres: Introduces a multi-scale GAN-based framework for high-resolution painting creation, enabling diverse artistic expressions.

  • Illustrious: an Open Advanced Illustration Model: Achieves state-of-the-art performance in anime illustration, offering high-resolution, dynamic color range images with open-source availability.

  • Simplified priors for Object-Centric Learning: Proposes a simple, scalable method for object-centric learning, outperforming more complex approaches on standard benchmarks.

  • ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation: Introduces a novel task of prompt-adaptive workflow generation, improving image quality through tailored workflows.

  • KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models: Offers a dual-pathway framework for flexible sketch-based image generation, adapting to varying user skill levels.

  • Aggregation of Multi Diffusion Models for Enhancing Learned Representations: Proposes a novel algorithm for fine-grained control in diffusion models, enhancing feature representation without additional training.

  • Multi-Scale Fusion for Object Representation: Enhances object-centric learning through multi-scale fusion, improving performance on standard benchmarks.

Sources

Learning from Pattern Completion: Self-supervised Controllable Generation

Neural-Polyptych: Content Controllable Painting Recreation for Diverse Genres

Illustrious: an Open Advanced Illustration Model

Simplified priors for Object-Centric Learning

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models

Aggregation of Multi Diffusion Models for Enhancing Learned Representations

Multi-Scale Fusion for Object Representation

Built with on top of