The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)

Call for Papers

Current digital libraries collect and allow access to digital papers and their metadata (including citations), but mostly do not analyze the items they index. The large scale of scholarly publications poses a challenge for scholars in their search for relevant literature. Searchers of digital libraries, citation indices and journal databases are inundated with thousands of results. The community needs to develop techniques to better support both basic as well as higher-order information seeking and scholarly sensemaking activities.

The BIRNDL 2016 workshop is a joint scientific event gathering scholars from the BIR (Bibliometric-enhanced Information Retrieval) and the NLPIR4DL (Text and citation analysis for scholarly digital libraries) communities. The scope of BIRNDL is on scholarly publications and data - the explosion in the production of scientific literature and the growth of scientific enterprise; its consistent exponential growth approaches an empirical law. The workshop will investigate how natural language processing, information retrieval, scientometric and recommendation techniques can advance the state-of-the-art in scholarly document understanding, analysis and retrieval at scale. Researchers are in need of assistive technologies to track developments in an area, identify the approaches used to solve a research problem over time and summarize research trends. Digital libraries require semantic search, question-answering and automated recommendation and reviewing systems to manage and retrieve answers from scholarly databases. Full document text analysis can help to design semantic search, translation and summarization systems; citation and social network analyses can help digital libraries to visualize scientific trends, bibliometrics and relationships and influences of works and authors. All these approaches can be supplemented with the metadata supplied by digital libraries, inclusive of usage data, such as download counts.

This workshop will be relevant to scholars in the cross-disciplinary field of Computer Science and Digital Libraries, in particular in the research areas of Natural Language Processing and in Information Retrieval; it will also be important for all stakeholders in the publication pipeline: implementers, publishers and policymakers. Even when only considering the scholarly sites within Computer Science, we find that the field is well-represented - ACM Portal, IEEE Xplore, Google Scholar, PSU's CiteSeerX, MSR's Academic Search, Elsevier’s Mendeley, Tsinghua's ArnetMiner, Trier's DBLP, Hiroshima's PRESRI; with this workshop we hope to bring a number of these contributors together. Today's publishers continue to seek new ways to be relevant to their consumers, in disseminating the right published works to their audience. The fact that formal citation metrics have become an increasingly large factor in decision-making by universities and funding bodies worldwide makes the need for research in such topics and for better methods for measuring the impact of work more pressing.

This workshop is also informed by an ongoing COST Action TD1210 KnowEscape.

Workshop Topics

We invite stimulating as well as unpublished submissions on topics including - but not limited to - full-text analysis, multimedia and multilingual analysis and alignment as well as citation-based NLP or IR. Specific examples of fields of interests include (but are not limited to):

Information retrieval (IR) for digital libraries and scientific information portals
IR for scholarly text, e.g. citation-based IR
IR for scientific domains, e.g. social sciences, life sciences etc.
Information Seeking Behaviour
Navigation, searching and browsing in scholarly DLs; Niche search in scholarly DLs; New information access methods for scientific papers
Query expansion and relevance feedback approaches
Question-answering for scholarly DLs
Recommendations based on explicit and implicit user feedback
Recommendation for scholarly papers, reviewers, citations and publication venues
(Social) Book Search
Summarisation of scientific articles; Automatic creation of reviews and automatic qualitative assessment of submissions;
Bibliometrics, citation analysis and network analysis for IR; Citation function/motivation analysis; Novel bibliographic metrics; Topical modeling analysis
Knowledge discovery and analysis of the ancestry of ideas
Metadata and controlled vocabularies for resource description and discovery; Automatic metadata discovery, such as language identification
Translation, multilingual and multimedia analysis and alignment of scholarly works
Analyses of writing style in scholarly publications
Science Modelling (both formal and empirical)
Task based user modelling, interaction, and personalisation
(Long-term) Evaluation methods and test collection design
Collaborative information handling and information sharing
Disambiguation issues in scholarly DLs using NLP or IR techniques; Data cleaning and data quality
Classification, categorisation and clustering approaches
Information extraction (including topic detection, entity and relation extraction)

The CL-SciSumm Shared Task

http://wing.comp.nus.edu.sg/cl-scisumm2016

We will be running shared task on scholarly paper processing as part of the workshop. The current shared task will be on automatic paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences ‘citances’). We also propose to group the citances by the facets of the text that they refer to.

This task follows up on the successful CL Pilot Task conducted as a part of the BiomedSumm Track at the Text Analysis Conference 2014 (TAC 2014). In this task, a training corpus of ten topics from CL research papers was released. Participants were invited to enter their systems in a task-based evaluation. Nine teams from four countries expressed an interest in participating in the shared task; three teams submitted system descriptions and findings. We also released the SciSumm14 manually annotated dataset, comprising of ACL Computational Linguistics research papers and summaries. It offers a community summary of a reference paper based on its collection of citing sentences “citances”. Furthermore, each citance is mapped to its referenced text in the reference paper and tagged with the information facet it represents. In our proposed shared task, we will extend this by releasing pairs of training and test datasets – each pair comprising the annotated citing sentences for a research paper, and the summaries of the research paper.

The CLSciSumm15 corpus is expected to be of interest to a broad community including those working in computational linguistics and natural language processing, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification.

We have secured support for the costs of the shared task annotation from Microsoft Research Asia. The National University of Singapore will be primarily responsible for the task's oversight.

Important Dates

Event	Date
Submissions	Updated 25 April 2016 ~~15 April 2016~~
Notification	Updated 24 May 2016 ~~16 May 2016~~
Camera Ready Contributions	Updated 9 June 2016 ~~3 June 2016~~
Workshop	23 June 2016 in Newark, New Jersey, USA

Check the CL-SciSumm 2016 Shared Task homepage for details on dates with respect to the shared task. The dates are coordinated.

All deadlines for the BIRNDL workshop are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).

Please note the early-bird rates for the BIRNDL workshop and hosting JCDL conference have been extended until 23 May 2016. Click here to reach the main JCDL registration page.

Submission Information

All submissions must be written in English, following the Springer LNCS author guidelines. Regular, long papers should have a maximum of 12 pages. Short papers should have a maximum of 6 pages. Both should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. At least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of a no-show, the paper (even if accepted) will be deleted from the proceedings and from the program. Submissions and reviewing will be managed by the EasyChair conference management system.

Submit a paper

Workshop proceedings are now available online in the CEUR workshop proceedings publication service (ISSN 1613-0073) and also in the ACL Anthology. This way the proceedings will be permanently available and citable.

IJDL Special Issue

Click here to see the special issue call.

The International Journal on Digital Libraries has accepted our proposal to feature a special issue revolving on themes associated with the BIRNDL workshop. The special issue's deadlines and topics are specially catered (but not limited to) the workshop. The subimission deadline for extensions of the workshop's submissions is on 30 September, to allow authors ample time to broaden the scope of their works to accommodate the journal length and format.

The special issue currently targets a publication date by Summer 2017. See the call for specific details: http://static.springer.com/sgw/documents/1558268/application/pdf/Bibliometric-enhanced+IR+and+NLP+for+DL.pdf.

Here are the current articles that have been published from IJDL with respect to the special issue:

[ Whole Bibliography @ GitHub ]

Al Saied, Hazem and Dugué, Nicolas and Lamirel, Jean-Charles (2017) Automatic summarization of scientific publications using a feature selection approach
Colavizza, Giovanni and Romanello, Matteo and Kaplan, Frédéric (2017) The references of references: a method to enrich humanities library catalogs with citation data
Mariani, Joseph and Francopoulo, Gil and Paroubek, Patrick (2017) Reuse and plagiarism in Speech and Natural Language Processing publications
Koopman, Bevan and Russell, Jack and Zuccon, Guido (2017) Task-oriented search for evidence-based medicine
Cohan, Arman and Goharian, Nazli (2017) Scientific document summarization via citation contextualization and scientific discourse
White, Howard D. (2017) Bag of works retrieval: TF*IDF weighting of works co-cited with a seed
Conroy, John M. and Davis, Sashka T. (2017) Section mixture models for scientific document summarization

Programme

Updated The programme structure is finalized. Oral talk durations are differentiated by paper length. Long papers have 20-minute slots (suggested 15 mins, plus 5 for questions); short have 15-minute slots (suggested 10 mins, plus 5 for questions). Please bring your own presentation laptop with VGA capable output, prepare a backup .PDF of your slides on a USB drive.

Updated Workshop proceedings are now available online in the CEUR workshop proceedings publication service (ISSN 1613-0073) and in the ACL Anthology. They are also indexed by DBLP.

Time	Session
9:00—9:10	Opening Remarks / BIRNDL Organising Committee [ Slides (.pdf) ]
9:10—9:50	Keynote: Bibliometrics, Information Retrieval and Natural Language Processing: Natural Synergies to Support Digital Library Research / Dietmar Wolfram (University of Wisconsin-Milwaukee) [ Slides (.pdf) ] Bibliometrics and information retrieval (IR) represent fundamental areas of study in information science. Historically, researchers have not fully capitalized on the potential synergies that exist between these two areas. Knowledge of regularities in information production and use, as well as citation relationships in bibliographic databases, which are studied in bibliometrics, can benefit IR system design and evaluation. Similarly, techniques developed for IR and database technology have made the investigation of large-scale bibliometric phenomena feasible. Both fields of study have also benefited directly from developments in natural language processing (NLP), which has provided new tools and techniques to explore research problems in bibliometrics and IR. Digital libraries, with their full text, multimedia content, searching, and browsing capabilities, represent ideal environments in which to investigate the mutually beneficial relationships that can be forged among bibliometrics, IR and NLP. This presentation will highlight the symbiotic relationship that exists between bibliometrics and IR, and will provide examples of how language-based methods have benefited IR, bibliometrics and their intersection.
9:50—10:55	Morning - Session 1 (9:50-10:10) Marc Bertin and Iana Atanassova: Multiple In-text Reference Aggregation Phenomenon (Long Paper) (10:10-10:25) Gali Halevi and Judit Bar-Ilan: Post Retraction Citations in Context (Short Paper) [ Slides (.pdf) ] (10:25-10:40) Masaki Eto: Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches (Short Paper) [ Slides (.pdf) ] (10:40-10:55) Francesco Ronzano, Ana Freire, Diego Saez-Trumper and Horacio Saggion: Making Sense of Massive Amounts of Scientific Publications: the Scientific Knowledge Miner Project (Short Paper) [ Slides (.pdf) ]
10:55—11:10	~ ~ Break ~ ~
11:10—11:50	Morning - Session 2 (11:10-11:30) Ha Jin Kim, Juyoung An, Yoo Kyung Jeong and Min Song: Exploring the leading authors and journals in major topics by citation sentences and topic modeling (Long Paper) [ Slides (.pdf) ] (11:30-11:50) Aravind Sesagiri Raamkumar, Schubert Foo and Natalie Pang: What papers should I cite from my reading list? User evaluation of a novel task in a literature review and manuscript preparatory assistive system (Long Paper) [ Slides (.pdf) ]
11:50—13:00	~ ~ Lunch ~ ~
13:00—13:55	Afternoon - Session 3 (13:00-13:20) Jevin West and Jason Portenoy: Delineating Fields Using Mathematical Jargon (Long Paper) (13:20-13:40) Joseph Mariani, Gil Francopoulo and Patrick Paroubek: A Study of Reuse and Plagiarism in Speech and Natural Language Processing papers (Long Paper) [ Slides (.pdf) ][ IJDL paper(.pdf) ] (13:40-13:55) Philipp Mayr: How do practitioners, PhD students and postdocs in the social sciences assess topic-specific recommendations? (Short Paper) [ Slides (.pdf) ]
13:55—14:15	~ ~ Break ~ ~
14:15—15:15	CL-SciSumm Oral Session (14:15-14:35) Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal Rustagi and Min-Yen Kan. Overview of the CL-SciSumm 2016 Shared Task [ Slides (.pdf) ] (14:35-14:55) System 8 - Liyuan Mao, Lei Li, Taiwen Huang, Yazhao Zhang, Junqi Chi, Xiaoyue Cong and Heng Peng. CIST System for CL-SciSumm 2016 Shared Task (14:55-15:15) System 6 - Tadashi Nomoto. NEAL: A Neurally Enhanced Approach to Linking Citation and Reference
15:15—16:15	CL-SciSumm Poster and Demo Interactive Session (All participants of this session must prepare a A1-sized poster – portrait or landscape format – to describe their work. Participants are also welcomed to demonstrate their systems) Peeyush Aggarwal and Richa Sharma. Lexical and Syntactic cues to identify Reference Scope of Citance [ Poster (.png) ] Shahryar Baki, Daniel Lee, Luis Moraes and Rakesh Verma. University of Houston at CL-SciSumm 2016: SVMs with tree kernels and Sentence Similarity Stefan Klampfl, Andi Rexha and Roman Kern. Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques [ Poster (.png) ] Wenjie Li, Ziqiang Cao and Dapeng Wu. PolyU at CL-SciSumm 2016 Kun Lu, Jin Mao, Gang Li and Jian Xu. Recognizing Reference Spans and Classifying their Discourse Facets [ Poster (.png) ] Bruno Malenfant and Guy Lapalme. RALI System Description for CL-SciSumm 2016 Shared Task Liyuan Mao, Lei Li, Taiwen Huang, Yazhao Zhang, Junqi Chi, Xiaoyue Cong and Heng Peng. CIST System for CL-SciSumm 2016 Shared Task [ Poster (.png) ] Tadashi Nomoto. NEAL: A Neurally Enhanced Approach to Linking Citation and Reference [ Poster (.png) ] Horacio Saggion, Francesco Ronzano and Ahmed Abura'Ed. Trainable Citation-enhanced Summarization of Scientific Articles
16:15—16:45	CL-SciSumm Discussion: Future Directions / CL-SciSumm Organising Committee
16:45—17:00	BIRNDL Closing Session / BIRNDL Organising Committee
18:00	~ ~ Participant Dinner ~ ~ (Cost on your own)

Organising Committee

Guillaume Cabanac - guillaume.cabanac@univ-tlse3.fr: Guillaume Cabanac is an associate professor at the Department of Computer Science of the University of Toulouse, France. Guillaume received his PhD in 2008 from the University of Toulouse on the topic of personal and collective information management through digital annotations. His academic interests lie in Information Science with a focus on Information Retrieval and Scientometrics. Guillaume strives to tackle compelling issues requiring an interdisciplinary expertise. He currently studies how scientists and the general public alike crowdsource and share academic literature. Guillaume serves as a referee for various conferences and journals in IR and scientometrics. He is also an editorial board member of the Scientometrics journal.
Muthu Kumar Chandrasekaran - muthu.chandra@comp.nus.edu.sg: I am broadly interested in natural language processing and its applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications: the mainstream medium of scholarly communication such as discussion forums and debate forums. Rich rhetorical discourse in their content can be effectively used to achieve a desired organisation such as summarisation. Communication of scholarly research needs to be summarised to avoid redundant or outdated research and ensure faster progress to pressing problems. I helped create a small corpus and organise a pilot task for scientific document summarisation in the 2014 Text Analysis Conference. I am currently a doing my Ph.D. research on a similarly motivated problem on Massive Open Online Course (MOOC) discussion forums on recommending salient student discussions for instructors to intervene given their limited bandwidth.
Ingo Frommholz - ingo.frommholz@beds.ac.uk: Ingo Frommholz is a senior lecturer at the Department of Computer Science and Technology of the University of Bedfordshire in Luton, UK. Prior to this he worked as a postdoctoral research associate at the University of Glasgow, and as a research assistant at the University of Duisburg-Essen and the Fraunhofer Integrated Publications and Information Systems Institute (IPSI) in Darmstadt, Germany, where he contributed to a number of European (FP5, FP6 and FP7) and national German and British projects. He gained further practical experience as a software developer in various companies in Germany and in consulting projects on enterprise search and cultural heritage. Ingo received his PhD in 2008 at the University of Duisburg-Essen on the topic of probabilistic logic-based information retrieval models exploiting user annotations. His current research focuses on formal models for information seeking and searching, for instance based on probabilistic logics and on quantum probabilities as well as implementing the cognitive framework of polyrepresentation. He has a strong expertise in information retrieval, digital libraries and information systems. Ingo is the Managing Editor of the International Journal on Digital Libraries and member of the steering committees of the BCS and the German Information Retrieval Specialist Groups. He has been reviewer, co-organiser and senior PC member for several world-class workshops, conferences and journals.
Kokil Jaidka - kokil@pmail.ntu.edu.sg: Dr Kokil Jaidka is a Ph.D. in Information Studies from Nanyang Technological University, Singapore. Her expertise is in multidocument summarization, natural language processing and applied linguistics. Currently, in her capacity as a computer scientist, she is conducting social media analyses to solve problems for user behavior profiling and topic detection. She is also exploring transaction records in the digital marketing domain for research problems related to outbound marketing. Beyond information science, she is also the data specialist for an international, collaborative political communication research project on Asian democracies.
Min-Yen Kan - kanmy@comp.nus.edu.sg: My research interests fall under the areas of digital libraries, natural language processing, information retrieval, human-computer interaction. Specifically, they include document structure acquisition, verb analysis, digital library resource annotation and and applied text summarization. My research goal aims to investigate how natural language processing and information retrieval can be applied to improve scholarly publication and knowledge discovery.
Philipp Mayr - Philipp.Mayr-Schlegel@gesis.org: Philipp Mayr is a team leader at the GESIS – Leibniz Institute for the Social Sciences department Knowledge Technologies for the Social Sciences. He is a graduate of the Berlin School of Library and Information Science at Humboldt University Berlin where he finished his doctoral research in 2009. Philipp is a member of the European NKOS network and published widely in the areas Informetrics, Information Retrieval and Digital Libraries. He is member of the editorial board of the journals Scientometrics and Information Wissenschaft & Praxis. His research interests include non-textual ranking in digital libraries, bibliometric methods, evaluation of information systems and knowledge organising sytems, as well as applied informetrics. Philipp was the main organizer of the Combining Bibliometrics and Information Retrieval at ISSI 2013 and the BIR workshops at ECIR 2014, 2015 and 2016.
Dietmar Wolfram - dwolfram@uwm.edu: Dietmar Wolfram is a professor at the School of Information Studies at the University of Wisconsin-Milwaukee. He is a graduate of the University of Western Ontario where he received a PhD in Library and Information Science in 1990. His research interests include information retrieval systems design and evaluation, user studies of IR systems, applied informetrics and the intersection of informetrics and information retrieval. He is a member of the editorial boards of several journals, including Cybermetrics, the Journal of Informetrics and the Journal of the Association for Information Science and Technology.

Programme Committee

The below lists the committee members who have stated their support to review submissions to the workshop.

Akiko Aizawa, National Institute of Informatics, Japan
Iana Atanassova, Université de Franche-Comté, France
Joeran Beel, University of Konstanz, Germany
Patrice Bellot, Aix-Marseille University, France
Marc Bertin, Université du Québec à Montréal, Canada
Colin Batchelor, Royal Society of Chemistry, Cambridge, UK
Cornelia Caragea, University of North Texas
Zeljko Carevic, GESIS, Germany
Jason S Chang, National Tsing Hua University, Taiwan
John Conroy, IDA Center for Computing Sciences
Ed A. Fox, Virginia Tech, USA
C. Lee Giles, Penn State University
Bela Gipp, University of Konstanz, Germany
Nazli Goharian, Georgetown University
Sujatha Das Gollapalli, Institute for Infocomm Research, A*STAR, Singapore
Pawan Goyal, Indian Institute of Technology, Kharagpur
Daniel Hienert, GESIS, Germany
Gilles Hubert, University of Toulouse, France
Rahul Jha, Microsoft
Noriko Kando, National Institute of Informatics, Japan
Dain Kaplan, Tokyo Institute of Technology
Roman Kern, Graz University of Technology
Claus-Peter Klas, GESIS, Germany
Anna Korhonen, University of Cambridge
John Lawrence, University of Dundee
Cyril Labbé, Université Joseph Fourier, Grenoble, France
Birger Larsen, Aalborg University, Denmark
Elizabeth Liddy, Syracuse University
Chin-Yew Lin, Microsoft Research
Xiaozhong Liu, Indiana University, Bloomington
Kathy McKeown, Columbia University
Stasa Milojevic, Indiana University, USA
Prasenjit Mitra, Penn State University / Qatar Computing Research Institute
Marie-Francine Moens, KU Leuven
Peter Mutschke, GESIS, Germany
Preslav Nakov, Qatar Computing Research Institute
Doug Oard, University of Maryland, College Park
Manabu Okumura, Tokyo Institute of Technology
Byung-won On, Kunsan National University
Arzucan Ozgur, Bogazici University
Cecile Paris, The Commonwealth Scientific and Industrial Research Organisation
Philipp Schaer, GESIS, Germany
Andrea Scharnhorst, DANS, Netherlands
Henry Small, SciTech Strategies, USA
Kazunari Sugiyama, National University of Singapore
Simone Teufel, University of Cambridge
Mike Thelwall, University of Wolverhampton
Lucy Vanderwende, Microsoft Research
Vasudeva Varma, International Institute of Information Technology, Hyderabad, India
Andre Vellino, University of Toronto
Anita de Waard, Elsevier Labs
Alex Wade, Microsoft Research
Stephen Wan, CSIRO ICT Centre, Australia