The field of historical document analysis is witnessing a significant shift towards leveraging large language models (LLMs) and multimodal approaches to enhance transcription accuracy, entity recognition, and sentiment analysis. Researchers are exploring the potential of LLMs in capturing subtle nuances in historical language, including irony detection and named entity recognition. The development of comprehensive corpora and open-source datasets is also gaining traction, providing a unified platform for robust handwritten text recognition and named entity recognition research. Furthermore, innovative educational models that prioritize hands-on learning and student involvement are emerging, demonstrating the power of collaborative spirit in developing practical tools and solutions.
Noteworthy papers include: Historical Ink, which introduces a semi-automated annotation methodology for refining LLMs results in historical language analysis. TRIDIS, which presents a comprehensive medieval and early modern corpus for handwritten text recognition and named entity recognition research. Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents, which explores the capabilities of multimodal LLMs in performing optical character recognition, post-correction, and named entity recognition tasks on historical documents.