Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Research Area: Natural Language Processing Year: 2013
Type of Publication: In Proceedings Keywords: Definition Mining, DefMiner
  • Yiping Jin
  • Min-Yen Kan
  • Jun-Ping Ng
  • Xiangnan He
This paper presents DefMiner, a supervised sequence labeling system that identifies scientific terms and their accompanying definitions. DefMiner achieves 85% F1 on a Wikipedia benchmark corpus, significantly improving the previous state-of-the-art by 8%. We exploit DefMiner to process the ACL Anthology Reference Corpus (ARC) – a large, real-world digital library of scientific articles in computational linguistics. The resulting automatically-acquired glossary represents the terminology defined over several thousand individual research articles. We highlight several interesting observations: more definitions are introduced for conference and workshop papers over the years and that multiword terms account for slightly less than half of all terms. Obtaining a list of popular defined terms in a corpus of computational linguistics papers, we find that concepts can often be categorized into one of three categories: resources, methodologies and evaluation metrics.
