
Our group is delighted to have Michalis Vlachos visiting our group as a Sabbatical Visiting Professor. On January 28th, 2026, Michalis presented his work on “Searching Digital Libraries with Language Models” at a seminar hosted by the Centre for Computational Social Science and Humanities (CSSH) at NUS.
Abstract:
Digital libraries now hold tens of millions of pages, yet most are still accessed through keyword based search over potentially noisy OCR text. As collections expand, traditional search interfaces struggle to support meaningful discovery, context, and evidence-based answers.
This talk explores how modern language models can enhance the entire digital library pipeline. We examine how LLMs can refine OCR output, clean historical text, and enable natural-language search through retrieval augmented generation. Using real-world archival data, we show improvements in character and word error rates, as well as downstream gains in retrieval quality, evaluated through both standard metrics and LLM based judging for faithfulness, correctness, and relevancy.
We also look beyond ranking results, discussing evidence driven answers and immersive, augmented-reality interfaces that open new ways to explore large historical collections. We conclude by reflecting on how these advances can improve transparency, reduce misinformation, and reshape the future of search in digital libraries.
Below is a gallery of the seminar, photo credit to Min and Yisong!



