Scalable and Secure Approaches in Computer Graphics and Vision-Language Models

Report on Current Developments in the Research Area

General Direction of the Field

The recent advancements in the research area are marked by a significant shift towards more scalable, robust, and secure solutions for handling complex data in computer graphics and vision-language models (VLMs). The field is witnessing a move away from traditional linear algebra-based methods in geometry processing towards more scalable and efficient frameworks that can handle high-resolution data. This shift is driven by the increasing computational power of modern hardware, which necessitates new approaches that can scale effectively without compromising on quality.

In the realm of mesh deformation and generation, there is a growing emphasis on developing auto-regressive models that can generate high-quality 3D meshes with fine details and good generalization capabilities. These models are designed to compress and represent complex 3D structures in a more efficient manner, enabling faster training and inference times. Additionally, there is a notable trend towards continuous representations of meshes, which allow for more flexible and generalizable learning of complex geometries.

Security concerns are also gaining prominence, particularly in the context of VLMs. Researchers are increasingly focusing on identifying and mitigating vulnerabilities in these models, especially against backdoor attacks. The challenge lies in developing robust defense mechanisms that can detect and neutralize malicious inputs without relying on extensive labeled data. This is crucial for maintaining the reliability and trustworthiness of VLMs in real-world applications.

Noteworthy Innovations

  1. SShaDe: Introduces a scalable and robust framework for high-resolution mesh deformation, demonstrating significant speed improvements over state-of-the-art methods.

  2. EdgeRunner: Proposes an innovative auto-regressive auto-encoder for high-quality 3D mesh generation, showcasing superior quality and generalization capabilities.

  3. TrojVLM: Pioneers the first backdoor attack on VLMs, highlighting a critical security vulnerability and setting the stage for future defense research.

  4. SpaceMesh: Develops a continuous representation for learning manifold surface meshes, enabling more flexible and generalizable geometry processing tasks.

  5. VLMGuard: Introduces a novel framework for detecting malicious prompts in VLMs using unlabeled data, achieving superior detection results without human annotations.

  6. LaGeM: Presents a hierarchical autoencoder for 3D representation learning, offering efficient compression and generative modeling capabilities.

  7. VLOOD: Addresses the practical challenge of backdoor attacks on VLMs using out-of-distribution data, demonstrating effective attack strategies without access to original training data.

Sources

SShaDe: a framework for scalable shape deformation via local representations

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

TrojVLM: Backdoor Attack Against Vision Language Models

SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes

VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Backdooring Vision-Language Models with Out-Of-Distribution Data

Built with on top of