Lightweight Contextual Logical Structure Recovery

Abstract

Logical structure recovery in scientific articles associates text with a semantic section of the article. Although previous work has disregarded the surrounding context of a line, we model this important information by employing line-level attention on top of a transformer-based scientific document processing pipeline. With the addition of loss function engineering and data augmentation techniques with semi-supervised learning, our method improves classification performance by 10% compared to a recent state-of-the-art model. Our parsimonious, text-only method achieves a performance comparable to that of other works that use rich document features such as font and spatial position, using less data without sacrificing performance, resulting in a lightweight training pipeline.

Publication
Proceedings of the Third Workshop on Scholarly Document Processing
George Po-Wei Huang
UROP Alumnus (Aug ‘21)

UROP student

Abhinav Ramesh Kashyap
Doctoral Alumnus (‘24)

Doctoral Alumni ()

Yanxia Qin
Postdoctoral Alumnus

WING alumni; former postdoc

Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.