Mining Definition of Term in Scientific Articles
Research Area: Natural Language Processing Year: 2013
Type of Publication: Technical Report Keywords: Definition extraction, conditional random fields, DefMiner, ACL Anthology Reference Corpus
  • Yiping Jin
Honours Year Project Report
We consider the identifi cation, demarcation and extraction of de finitions present in scholarly documents. Unlike previous approaches, we deem the task of definition extraction as sequence labelling task, adopting state-of-the-art of conditional random fields machine learning methodology to improve system performance. Our implemented definition extraction system, DefMiner, represents the state-of-the-art in definition extraction, incorporating features that exploit different levels of natural language processing. Compared with previous published work, our system improves performance as judged by F1 significantly, by 12%, achieving an F1 of 85%. We exploit DefMiner to process the sizeable ACL Anthology Reference Corpus (ACL ARC) {a real-world and large-scale digital library of scientific articles in computational linguistics. The resultant large-scale, automatically acquired glossary of terms, represents the combined terminology defined over several thousand individual research articles. Analyzing the glossary, we uncover interesting distributional and structural characteristics of definitions in ACL ARC. Our system performs at an F1 of about 49% over the documents in the ACL ARC.
Digital version