Advances in Document Analysis and Generation

The field of document analysis and generation is experiencing significant developments, driven by innovations in deep learning and computer vision. Researchers are exploring new methods for understanding and generating complex documents, including handwritten notes, sketches, and historical documents. A key trend is the creation of large-scale datasets and benchmarks, which are enabling the development of more accurate and robust models. Another area of focus is the improvement of feature extraction and representation techniques, allowing for better capture of nuanced details and structures within documents. Noteworthy papers include ArchCAD-400K, which introduces a large-scale CAD dataset and a novel baseline model for panoptic symbol spotting, and ViSketch-GPT, which presents a collaborative multi-scale feature extraction approach for sketch recognition and generation. Also notable is InkFM, a foundational model for full-page online handwritten note understanding that achieves state-of-the-art performance on several tasks.

Sources

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting

ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization

InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion

The Cursive Transformer

Archival Faces: Detection of Faces in Digitized Historical Documents

Predicting Movie Production Years through Facial Recognition of Actors with Machine Learning

Built with on top of