You are invited to participate in the 2nd Joint Workshop on Bibliometric-enhanced IR and NLP for Digital Libraries (BIRNDL), to be held as part of 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) in Tokyo, Japan on 11th August 2017.
We are happy to announce that the past BIR and NLPIR4DL organizers are proposing this workshop at SIGIR together. In conjunction with the BIRNDL workshop, we will hold the 3rd CL-SciSumm Shared Task in Scientific Document Summarization.
Reports from the shared task systems will be featured as part of a session at the workshop.
The BIRNDL workshop is the first step to foster a reflection on interdisciplinarity, and the benefits that the disciplines bibliometrics, IR and NLP can derive from it in a digital libraries context. The workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, text mining and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. Researchers are in need of assistive technologies to track developments in an area, identify the approaches used to solve a research problem over time and summarize research trends. Digital libraries require semantic search, question-answering and automated recommendation and reviewing systems to manage and retrieve answers from scholarly databases. Full document text analysis can help to design semantic search, translation and summarization systems; citation and social network analyses can help digital libraries to visualize scientific trends, bibliometrics and relationships and influences of works and authors. All these approaches can be supplemented with the metadata supplied by digital libraries, inclusive of usage data, such as download counts.
We invite papers and presentations that incorporate insights from IR, bibliometrics and NLP to develop new techniques to address the open problems in Big Science, such as evidence-based searching, measurement of research quality, relevance and impact, the emergence and decline of research problems, identification of scholarly relationships and influences and applied problems such as language translation, question-answering and summarization. Finding relevant scholarly literature is key point of the workshop and sets the agenda for tools and approaches to be discussed and evaluated at BIRNDL. At the workshop, we would also like to address the need for established, standardized baselines, evaluation metrics and test collections.See the proceedings of the first BIRNDL workshop at JCDL 2016 and a recent report in SIGIR Forum.
This workshop will be relevant to scholars in computer and information science, specialized in IR, bibliometrics and NLP. The Shared Task is expected to be of interest to a broad community including those working in CL and NLP, especially in the sub-disciplines of text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification. The workshop will also be of importance for all stakeholders in the publication pipeline: implementers, publishers and policymakers. Formal citation metrics are increasingly a factor in decision-making by universities and funding bodies worldwide, making the need for research in applying these metrics more pressing. Today's publishers continue to provide new ways to support their consumers in disseminating and retrieving the right published works to their audience. Even when only considering the scholarly sites within Computer Science, we find that the field is well-represented - ACM Portal, IEEE Xplore, Google Scholar, PSU's CiteSeerX, MSR's Academic Search, Elsevier's Mendeley, Tsinghua's ArnetMiner, Trier's DBLP, Hiroshima's PRESRI; with this workshop we hope to bring a number of these contributors together.
We invite stimulating as well as unpublished submissions on topics including - but not limited to - full-text analysis, multimedia and multilingual analysis and alignment as well as the application of citation-based NLP or information retrieval and information seeking techniques in digital libraries. Specific examples of fields of interests include (but are not limited to):
For the paper sessions, we especially invite descriptions of running projects and ongoing work as well as contributions from industry. Papers that investigate multiple themes directly are especially welcome.
The 3rd Computational Linguistics (CL) Scientific Summarization Shared Task is sponsored by Microsoft Research Asia and will be conducted as a part of this workshop. This is the first medium-scale shared task on scientific document summarization in the computational linguistics domain. The current shared task will be on automatic paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences ‘citances’). We also propose to group the citances by the facets of the text that they refer to.
This task follows up on the successful CLScisumm-2016 task @ JCDL 2016, Newark, NJ, USA and a Pilot Task conducted as a part of the BiomedSumm Track at the Text Analysis Conference 2014 (TAC 2014). In this task, a training corpus of ten topics from CL research papers was released. Participants were invited to enter their systems in a task-based evaluation.
The CLSciSumm17 corpus is expected to be of interest to a broad community including those working in computational linguistics and natural language processing, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification.
|Camera Ready Contributions|
|Workshop||11th August 2017 in Tokyo, Japan|
Check the CL-SciSumm 2017 Shared Task homepage for details on dates with respect to the shared task. The dates are coordinated.
All deadlines for the BIRNDL workshop are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).
|14:00-14:15||Introduction to the workshop||Philipp Mayr and Muthu Kumar Chandrasekaran|
|14:15-14:45||Keynote: Do "Future Work" sections have a real purpose? Citation links and entailment for global scientometric questions||Simone Teufel|
|14:45-15:30||SESSION - 1|
|14:45-15:00||"Can we do better than co-citations? Bringing Citation Proximity Analysis from idea to practice in research articles recommendation"||Petr Knoth and Anita Khadka|
|15:00-15:15||MultiScien: a Multilingual Natural Language Processing System for Mining and Enrichment of Scientific Collections||Horacio Saggion, Francesco Ronzano, Pablo Accuosto and Daniel Ferrés|
|15:15-15:30||Identifying Problems and Solutions in Scientific Text||Kevin Heffernan and Simone Teufel|
|15:30-15:50||COFFEE BREAK (Posters would be mounted)|
|15:50-16:30||SESSION - 2|
|15:50-16:05||Identifying collaborations among researchers: a pattern-based approach||Elena Baralis, Luca Cagliero, Paolo Garza and Mohammad Reza Kavoosifar|
|16:05-16:20||Automatic Generation of Review Matrices as Multi-document Summarization of Scientific Papers||Hayato Hashimoto, Kazutoshi Shinoda, Hikaru Yokono and Akiko Aizawa|
|16:20-16:27||Bibliometrics of “Information Retrieval” – A Tale of Three Databases||Judit Bar-Ilan|
|16:27-16:35||Analysis of Footnote Chasing and Citation Searching in an Academic Search Engine||Ameni Kacem and Philipp Mayr|
|16:35-17:35||SESSION - 3: CL-SciSumm|
|16:35-16:45||Overview||Muthu Kumar Chandrasekaran|
|16:45-16:55||CL-Scisumm Task Winner Talk||TBD|
|16:55-17:35||SESSION - 4: Poster Session (Each poster presenter will give a 1 to 2 min pitch)|
|Automated Generation of Timestamped Patent Abstracts at Scale to Troll Patent-Trolls||Felix Hamborg, Moustafa Elmaghraby, Corinna Breitinger and Bela Gipp|
|K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts||Marc Bertin and Iana Atanassova|
|Are Cited References Meaningful? Measuring Semantic Relatedness in Citation Analysis||Hassan Alam, Aman Kumar, Tina Werner and Manan Vyas|
|CL-Scisumm System Poster Pitches||TBD|
|17:35-18:00||Summary and Outlook|
All submissions must be written in English, following the Springer LNCS author guidelines (max. 6 pages for short and 12 pages for full papers; exclusive of unlimited pages for references) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of no-show the paper (even if accepted) will be deleted from the proceedings and from the program Submissions and reviewing will be managed by the EasyChair conference management system.
Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) and on the ACL anthology (Anthology prefix W17-33xx) - This way the proceedings will be permanently available and citable (digital persistent identifiers and long term preservation)
He is a fourth year Ph.D. student at NUS School of Computing. He is broadly interested in natural language processing and its applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications: the mainstream medium of scholarly communication, discussion and debate forums. He was on the organizing committee of the CL-SciSumm 2016 Shared Task, the CL-SciSumm 2014 Pilot Task and the BIRNDL workshop. He also reviews for ACL, EMNLP and BIR. He believes communication of scholarly research needs to be summarized to avoid redundant or outdated research and ensure faster progress to pressing problems. He is currently doing his Ph.D. research on a similarly motivated problem on Massive Open Online Course (MOOC) discussion forums on recommending salient student discussions for instructors to intervene given their limited bandwidth.
Dr Kokil Jaidka is a postdoctoral researcher in Computer Science and Chief Technology Officer for the World Wellbeing Project at the University of Pennsylvania. She has been the lead coordinator of all aspects of the CL-SciSumm Shared Task since 2014, and she also co-organized the 1ST BIRNDL workshop. She has expertise working on large datasets using machine learning and unsupervised approaches on textual data, and in the specific areas of multi-document summarization and applied linguistics. She is a reviewer for Scientometrics, Applied Linguistics and Aslib journal of Information Processing \& Management. Her PhD dissertation involved the development of a literature review framework for the summarization of research papers. Currently, she is conducting social media analyses and user language modeling for opinion mining, behavioral profiling and health outcomes.
Philipp Mayr is a deputy department head and a team leader at the GESIS -- Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (WTS). He has been a visiting professor for knowledge representation at University of Applied Sciences in Darmstadt, Department of Information Science and Engineering during 2009-2011. Philipp Mayr received his PhD in applied informetrics and information retrieval from the Berlin School of Library and Information Science at Humboldt University Berlin in 2009. To date, he has been awarded substantial research funding (PI, Co-PI) from national and European funding agencies. Philipp Mayr has published in top conferences and prestigious journals in the areas informetrics, information retrieval and digital libraries. His research group focuses on methods and techniques for interactive information retrieval. Philipp Mayr was the main organizer of the Combining Bibliometrics and Information Retrieval at ISSI 2013, the BIR workshops at ECIR 2014, 2015 and 2016 and the first BIRNDL workshop at JCDL 2016.
The main organizers will be supported by our previous co-organizers:
The following committee members have stated their support to review submissions to the workshop.