Last updated: Wed May 1 06:45:39 SGT 2019
- Prof. Bonnie Webber, U.Edinburgh will deliver the SRI Distinguished Keynote!
- Alex Wade from Chan-Zuckerberg Initiative (CZI) will deliver a keynote! at BIRNDL!
- We are still accepting registrations although registered teams have already starting building their system since March 30! Just go ahead and register and get started!
- CL-SciSumm 2019 has been accepted to be colocated with ACM SIGIR 2019, Paris! Thanks to the PC and participants for their continued support!
- CL-SciSumm 2019 has been proposed to be colocated with ACM SIGIR 2019, Paris!.
Call for Participation
You are invited to participate in the CL-SciSumm Shared Task at
BIRNDL 2019. The shared task will be on automatic paper summarization
in the Computational Linguistics (CL) domain.
This task follows up on the successful
CLScisumm-18 co-located with SIGIR 2018 and three editions prior to that.
Over the four editions, a
training corpus of forty topics
from CL research papers have been released. Participants were invited to
enter their systems in a task-based evaluation. We also released the
annotated dataset, comprising of ACL Computational Linguistics
research papers and summaries.
The output summaries are of two types: faceted summaries of the
traditional self-summary (the abstract) and the community summary
(the collection of citation sentences ‘citances’). We also
group the citances by the facets of the text that they refer to.
In our proposed shared task, we will expand the corpus with a
new test dataset of 10 topics (closed for evaluation).
The CL-SciSumm 2019 corpus is expected to be of interest to a broad
community including those working in computational linguistics and
natural language processing, text summarization, discourse structure
in scholarly discourse, paraphrase, textual entailment and text
simplification. As before, we will have more training data and a blind
test set for evaluation.
primarily responsible for the task's oversight.
Given: A topic consisting of a Reference Paper (RP) and Citing
Papers (CPs) that all contain citations to the RP. In each CP, the text
spans (i.e., citances) have been identified that pertain to a particular
citation to the RP.
Task 1A: For each citance, identify the spans of text (cited text
spans) in the RP that most accurately reflect the citance. These are
of the granularity of a sentence fragment, a full sentence, or several
consecutive sentences (no more than 5).
Task 1B: For each cited text span, identify what facet of the
paper it belongs to, from a predefined set of facets.
Task 2 (optional bonus task): Finally, generate a structured
summary of the RP from the cited text spans of the RP. The length of the
summary should not exceed 250 words.
Evaluation: Task 1 will be scored by overlap of text spans
measured by number of sentences in the system output vs the gold standard
created by human annotators. Task 2 will be scored using the ROUGE family
of metrics between (i) the system output and the gold standard summary from
the reference spans (ii) the system output and the asbtract of the
Organizations wishing to participate in the Shared Task track
at BIRNDL 2019 are invited to register on EasyChair
by March 30.
Participants are advised to register as soon as
possible in order to receive timely access to evaluation resources,
including development and testing data. Registration for the task
does not commit you to participation - but is helpful to know for
planning. All participants who submit system runs are welcome to
present their system at the BIRNDL Workshop.
Dissemination of CL-SciSumm work and results other than in the
workshop proceedings is welcomed, but the conditions of participation
specifically preclude any advertising claims based on these
results. Any questions about conference participation may be sent
to the organizers mentioned below.
Training set for this year's task is available on github here.
You can also download the training set here.
Test set is available on github: Test-Set-2018 on github
The corpus for the CL-SciSumm task has been created by randomly
sampling documents from the ACL Anthology corpus and selecting
their citing papers. The training, development and testing set will
be made publicly available at the GitHub link above at dates
The manually annotated training set of 40 articles and citing papers is already available for download
and can be used by participants to pilot their systems. Further, this year we have introduced 1000 documents sets that were automatically
annotated to be used as training data.
This training data was generated following Nomoto,2018.
Further, for Task 2 one thousand summaries that were released as part of the SciSummNet (Yasunaga et al., 2019)
have been included as human summaries to train on.
The test set of 20 articles
is available in https://github.com/WING-NUS/scisumm-corpus/tree/master/data/Test-Set-2018.
The system outputs from the test set should
be submitted to the task organizers, for the collation of the final
results to be presented at the workshop.
Please consult the BIRNDL
Workshop for official dates for the workshop.
|Training Set Release||Already Online|
|Deadline for Registration and Short System Descriptions||March 30, 2019 (we are open for further registrations although you should know that teams have started building their systems).|
|Test Set Release||Already Online|
|System Runs Due||May 24, 2019|
|Preliminary System Reports Due in EasyChair||June 9, 2019|
|Camera-Ready Contributions Due in EasyChair||July 7, 2019|
|Participant Presentations at BIRNDL 2019, Paris ||July 25, 2019|
All deadlines for the CL-SciSumm shared task are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).
The CL-SciSumm task provides resources to encourage research in a
promising direction of scientific paper summarization, which considers
the set of citation sentences (i.e., "citances") that reference a
specific paper as a (community created) summary of a topic or paper
(Nanba, Kando and Okumura, 2011; Qazvinian and Radev, 2008). Citances
for a reference paper are considered a synopses of its key points and
also its key contributions and importance within an academic community.
The advantage of using citances is that they are embedded with
meta-commentary and offer a contextual, interpretative layer to the
cited text. The drawback, however, is that though a collection of
citances offers a view of the cited paper, it does not consider the
context of the target user (Sparck Jones, 2007; Teufel and Moens, 2002;
Nenkova and McKeown, 2011; Jaidka, Khoo and Na, 2013a), verify the claim
of the citation or provide context from the reference paper, in terms
the type of information cited or where it is in the referenced paper.
CL-SciSumm explores summarization of scientific research, for the
computational linguistics research domain. An ideal summary of
computational linguistics research papers would be able to summarize
previous research by drawing comparisons and contrasts between their
goals, methods and results, as well as distil the overall trends in the
state of the art and their place in the larger academic discourse.
Literature surveys and review articles in CL do help readers to gain a
gist of the state-of-the-art in research for a topic. However,
literature survey writing is labor-intensive and a literature survey is
not always available for every topic of interest. What are needed, are
resources which automate the synthesis and updating of automatic
summaries of CL research papers.
Existing scientific summarization systems have automatically
generated related work sections for a target paper by instantiating a
hierarchical topic tree (Hoang and Kan, 2010), generating model citation
sentences (Mohammad et al., 2009) or implementing a literature review
framework (Jaidka et al., 2013). However, the limited availability of
evaluation resources and human-created summaries constrains research in
this area. The goal of the CL-SciSumm Shared Task Series is to highlight the
challenges and relevance of the scientific summarization problem,
support research in automatic scientific document summarization and
provide evaluation resources to push the current state of the art.
BIRNDL 2019 Workshop
The BIRNDL 2019 workshop will be held on .
The workshop is a forum both for presentation of results
(including failure analyses and system comparisons), and for more
lengthy system presentations describing techniques used, experiments
run on the data, and other issues of interest to NLP researchers. Shared Task
track participants who wish to give a presentation during the workshop
will submit a short abstract describing the experiments they
performed. As there is a limited amount of time for oral
presentations, the abstracts will be used to determine which
participants are asked to speak and which will present in a poster
- Muthu Kumar Chandrasekaran - firstname.lastname@example.org
is an Advanced Computer Scientist, Machine Learning at SRI International's Artificial Intelligence Center.
Previously he was a Ph.D. student at NUS School of Computing.
He is broadly interested in natural language processing, machine learning and their applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications and discussion forums. He has been co-organizing the CL-SciSumm Shared Task series and the BIRNDL workshop series since 2014. He also reviews for ACL, EMNLP, NAACL and JCDL conferences. During his PhD he also spent time at the Allen Institute for Artificial Intelligence's Semantic Scholar research and National Institute of Informatics, Tokyo.
- Dayne Freitag - email@example.com
is the director of the Advanced Analytics group in SRI's Artificial Intelligence Center. His research seeks to apply artificial intelligence to information assimilation, management and exploitation. Specific areas of interest include natural language processing and computational linguistics; machine learning; data mining; information extraction; information retrieval; information diffusion; and information integration. Freitag has served as principal investigator for a number of research projects including several large, multi-institutional efforts. His research goals have focused on the automation of data science; the automatic extension of mechanistic models through machine reading; knowledge federation over diverse information sources through data analytics and natural language processing; explaining the spread of ideas through online communities; and novel approaches to institutional knowledge management using controlled English. Freitag holds a B.A. in English literature from Reed College, and a Ph.D. in computer science from Carnegie Mellon University.
- Michihiro Yasunaga - firstname.lastname@example.org
He is a final year undergraduate student in computer science at Yale University, conducting research in natural language processing (NLP), advised by Prof. Dragomir Radev.
His research includes natural language understanding tasks such as summarization and semantic parsing, and the robustness of machine learning techniques in NLP.
- Dragomir Radev - email@example.com
He is a A. Bartlett Giamatti Professor of Computer Science at Yale
University. His interests include interests include Natural Language Processing (NLP), Artificial Intelligence, Computational Linguistics, Machine Learning, Information Retrieval, Text Summarization, Network Analysis, Text Mining Applications of NLP to Bioinformatics, Social Network Analysis, Political Science, and the Humanities. He has received numerous awards including Fellow of the ACM (Association for Computing Machinery) (2015), University of Michigan Faculty Recognition Award (2013), Linguistics Society of America: Linguistics, Language and the Public Award (2011) (as co-founder and program chair of NACLO), Secretary of ACL (Association for Computational Linguistics) (2006-2015), The Gosnell Prize for Excellence in Political Methodology (shared) (2006), University of Michigan UROP Faculty Award for Outstanding Research Mentorship (2004).
- Min-Yen Kan - firstname.lastname@example.org
My research interests fall under the areas of digital libraries,
natural language processing, information retrieval, human-computer
interaction. Specifically, they include document structure acquisition,
verb analysis, digital library resource annotation and and applied
text summarization. My research goal aims to investigate how natural
language processing and information retrieval can be applied to
improve scholarly publication and knowledge discovery.