Advances in Vision-Language Models and Architectural Design

The field of computer vision and architectural design is witnessing significant advancements with the integration of vision-language models (VLMs). Researchers are leveraging VLMs to improve video retrieval systems, enabling adaptive query refinement and enhancing retrieval accuracy. Additionally, VLMs are being applied to architectural design, facilitating efficient search and recommendation of design case studies. Noteworthy papers in this area include:

  • Enhancing Subsequent Video Retrieval via Vision-Language Models, which introduces a novel framework for adaptive video retrieval.
  • ArchSeek, an innovative case study search system that enables text and image queries with fine-grained control.
  • ArchCAD-400K, a large-scale CAD dataset that boasts extended drawing diversity and broader categories, offering line-grained annotations.
  • ViSketch-GPT, a novel algorithm that captures intricate details at multiple scales for sketch recognition and generation.

Sources

Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)

PHT-CAD: Efficient CAD Parametric Primitive Analysis with Progressive Hierarchical Tuning

ArchSeek: Retrieving Architectural Case Studies Using Vision-Language Models

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting

ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Built with on top of