Nowadays many research fields conduct empirical studies based on real-world datasets. There is a lack of proper mechanism to find papers using certain datasets or identify datasets used in certain papers. Identifying important aspects of scientific publications such as dataset mentions is important for many downstream tasks like indexing, search among many. In social science the dataset form an integral aspect of study, however it is refereed in many different surface forms. In this project, we explore different approaches of identifying such mentions of datasets in papers i.e. mention extraction and classifying the mentions to the refereed dataset i.e. dataset discovery.
Original Dataset : https://coleridgeinitiative.org/richcontextcompetition