ParsCit: an Open-source CRF Reference String Parsing Package

Abstract

We describe ParsCit, a freely available, open-source implementation of a reference string parsing package. At the core of ParsCit is a trained conditional random field (CRF) model used to label the token sequences in the reference string. A heuristic model wraps this core with added functionality to identify reference strings from a plain text file, and to retrieve the citation contexts. The package comes with utilities to run it as a web service or as a standalone utility. We compare ParsCit on three distinct reference string datasets and show that it compares well with other previously published work.

Publication
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.