This paper presents DefMiner, a supervised
sequence labeling system that identifies scientific
terms and their accompanying definitions.
DefMiner achieves 85% F1 on a Wikipedia
benchmark corpus, significantly improving
the previous state-of-the-art by 8%.
We exploit DefMiner to process the ACL Anthology
Reference Corpus (ARC) – a large,
real-world digital library of scientific articles
in computational linguistics. The resulting
automatically-acquired glossary represents
the terminology defined over several
thousand individual research articles.
We highlight several interesting observations:
more definitions are introduced for conference
and workshop papers over the years and that
multiword terms account for slightly less than
half of all terms. Obtaining a list of popular
defined terms in a corpus of computational linguistics
papers, we find that concepts can often
be categorized into one of three categories:
resources, methodologies and evaluation metrics.