Multimodal alignment of scholarly documents and their presentations

Abstract

We present a multimodal system for aligning scholarly docu- ments to corresponding presentations in a fine-grained man- ner (i.e., per presentation slide and per paper section). Our method improves upon a state-of-the-art baseline that em- ploys only textual similarity. Based on an analysis of base- line errors, we propose a three-pronged alignment system that combines textual, image, and ordering information to establish alignment. Our results show a statistically sig- nificant improvement of 25%, confirming the importance of visual content in improving alignment accuracy.

Publication
Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.