23 June 2016, at the Joint Conference on Digital Libraries (JCDL '16), Newark, New Jersey, USA
4th Bibliometric-enhanced Information Retrieval (BIR)
2nd Workshop on text and citation analysis for scholarly digital libraries (NLPIR4DL)
Current digital libraries collect and allow access to digital papers and their metadata (including citations), but mostly do not analyze the items they index. The large scale of scholarly publications poses a challenge for scholars in their search for relevant literature. Searchers of digital libraries, citation indices and journal databases are inundated with thousands of results. The community needs to develop techniques to better support both basic as well as higher-order information seeking and scholarly sensemaking activities.
The BIRNDL 2016 workshop is a joint scientific event gathering scholars from the BIR (Bibliometric-enhanced Information Retrieval) and the NLPIR4DL (Text and citation analysis for scholarly digital libraries) communities. The scope of BIRNDL is on scholarly publications and data - the explosion in the production of scientific literature and the growth of scientific enterprise; its consistent exponential growth approaches an empirical law. The workshop will investigate how natural language processing, information retrieval, scientometric and recommendation techniques can advance the state-of-the-art in scholarly document understanding, analysis and retrieval at scale. Researchers are in need of assistive technologies to track developments in an area, identify the approaches used to solve a research problem over time and summarize research trends. Digital libraries require semantic search, question-answering and automated recommendation and reviewing systems to manage and retrieve answers from scholarly databases. Full document text analysis can help to design semantic search, translation and summarization systems; citation and social network analyses can help digital libraries to visualize scientific trends, bibliometrics and relationships and influences of works and authors. All these approaches can be supplemented with the metadata supplied by digital libraries, inclusive of usage data, such as download counts.
This workshop will be relevant to scholars in the cross-disciplinary field of Computer Science and Digital Libraries, in particular in the research areas of Natural Language Processing and in Information Retrieval; it will also be important for all stakeholders in the publication pipeline: implementers, publishers and policymakers. Even when only considering the scholarly sites within Computer Science, we find that the field is well-represented - ACM Portal, IEEE Xplore, Google Scholar, PSU's CiteSeerX, MSR's Academic Search, Elsevier’s Mendeley, Tsinghua's ArnetMiner, Trier's DBLP, Hiroshima's PRESRI; with this workshop we hope to bring a number of these contributors together. Today's publishers continue to seek new ways to be relevant to their consumers, in disseminating the right published works to their audience. The fact that formal citation metrics have become an increasingly large factor in decision-making by universities and funding bodies worldwide makes the need for research in such topics and for better methods for measuring the impact of work more pressing.
This workshop is also informed by an ongoing COST Action TD1210 KnowEscape.
We invite stimulating as well as unpublished submissions on topics including - but not limited to - full-text analysis, multimedia and multilingual analysis and alignment as well as citation-based NLP or IR. Specific examples of fields of interests include (but are not limited to):
We will be running shared task on scholarly paper processing as part of the workshop. The current shared task will be on automatic paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences ‘citances’). We also propose to group the citances by the facets of the text that they refer to.
This task follows up on the successful CL Pilot Task conducted as a part of the BiomedSumm Track at the Text Analysis Conference 2014 (TAC 2014). In this task, a training corpus of ten topics from CL research papers was released. Participants were invited to enter their systems in a task-based evaluation. Nine teams from four countries expressed an interest in participating in the shared task; three teams submitted system descriptions and findings. We also released the SciSumm14 manually annotated dataset, comprising of ACL Computational Linguistics research papers and summaries. It offers a community summary of a reference paper based on its collection of citing sentences “citances”. Furthermore, each citance is mapped to its referenced text in the reference paper and tagged with the information facet it represents. In our proposed shared task, we will extend this by releasing pairs of training and test datasets – each pair comprising the annotated citing sentences for a research paper, and the summaries of the research paper.
The CLSciSumm15 corpus is expected to be of interest to a broad community including those working in computational linguistics and natural language processing, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification.
We have secured support for the costs of the shared task annotation from Microsoft Research Asia. The National University of Singapore will be primarily responsible for the task's oversight.
|Camera Ready Contributions||Updated |
|Workshop||23 June 2016 in Newark, New Jersey, USA|
Check the CL-SciSumm 2016 Shared Task homepage for details on dates with respect to the shared task. The dates are coordinated.
All deadlines for the BIRNDL workshop are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).
Please note the early-bird rates for the BIRNDL workshop and hosting JCDL conference have been extended until 23 May 2016. Click here to reach the main JCDL registration page.
All submissions must be written in English, following the Springer LNCS author guidelines. Regular, long papers should have a maximum of 12 pages. Short papers should have a maximum of 6 pages. Both should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. At least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of a no-show, the paper (even if accepted) will be deleted from the proceedings and from the program. Submissions and reviewing will be managed by the EasyChair conference management system.
Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) and in the ACL Anthology. This way the proceedings will be permanently available and citable (digital persistent identifiers and long term preservation).
Click here to see the special issue call.
The International Journal on Digital Libraries has accepted our proposal to feature a special issue revolving on themes associated with the BIRNDL workshop. The special issue's deadlines and topics are specially catered (but not limited to) the workshop. The subimission deadline for extensions of the workshop's submissions is on 30 September, to allow authors ample time to broaden the scope of their works to accommodate the journal length and format.
The special issue currently targets a publication date by Summer 2017. See the call for specific details: http://static.springer.com/sgw/documents/1558268/application/pdf/Bibliometric-enhanced+IR+and+NLP+for+DL.pdf.
Updated The programme structure is still subject to change but these are the tentative sessions assigned to presenters. A finalized version will be released in due time
Keynote: Bibliometrics, Information Retrieval and Natural Language Processing: Natural Synergies to Support Digital Library Research / Dietmar Wolfram (University of Wisconsin-Milwaukee)
Bibliometrics and information retrieval (IR) represent fundamental areas of study in information science. Historically, researchers have not fully capitalized on the potential synergies that exist between these two areas. Knowledge of regularities in information production and use, as well as citation relationships in bibliographic databases, which are studied in bibliometrics, can benefit IR system design and evaluation. Similarly, techniques developed for IR and database technology have made the investigation of large-scale bibliometric phenomena feasible. Both fields of study have also benefited directly from developments in natural language processing (NLP), which has provided new tools and techniques to explore research problems in bibliometrics and IR. Digital libraries, with their full text, multimedia content, searching, and browsing capabilities, represent ideal environments in which to investigate the mutually beneficial relationships that can be forged among bibliometrics, IR and NLP. This presentation will highlight the symbiotic relationship that exists between bibliometrics and IR, and will provide examples of how language-based methods have benefited IR, bibliometrics and their intersection.
Morning - Session 1
Morning - Session 2
Afternoon - Session 3
Afternoon - CL-SciSumm Shared Task Oral Session
Afternoon - CL-SciSumm Shared Task Poster Session
Afternoon - CL-SciSumm Discussion / Future Directions
Afternoon - Closing Session
Evening - Participant Dinner (cost on your own)
Guillaume Cabanac is an associate professor at the Department of Computer Science of the University of Toulouse, France. Guillaume received his PhD in 2008 from the University of Toulouse on the topic of personal and collective information management through digital annotations. His academic interests lie in Information Science with a focus on Information Retrieval and Scientometrics. Guillaume strives to tackle compelling issues requiring an interdisciplinary expertise. He currently studies how scientists and the general public alike crowdsource and share academic literature. Guillaume serves as a referee for various conferences and journals in IR and scientometrics. He is also an editorial board member of the Scientometrics journal.
I am broadly interested in natural language processing and its applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications: the mainstream medium of scholarly communication such as discussion forums and debate forums. Rich rhetorical discourse in their content can be effectively used to achieve a desired organisation such as summarisation. Communication of scholarly research needs to be summarised to avoid redundant or outdated research and ensure faster progress to pressing problems. I helped create a small corpus and organise a pilot task for scientific document summarisation in the 2014 Text Analysis Conference. I am currently a doing my Ph.D. research on a similarly motivated problem on Massive Open Online Course (MOOC) discussion forums on recommending salient student discussions for instructors to intervene given their limited bandwidth.
Dr Kokil Jaidka is a Ph.D. in Information Studies from Nanyang Technological University, Singapore. Her expertise is in multidocument summarization, natural language processing and applied linguistics. Currently, in her capacity as a computer scientist, she is conducting social media analyses to solve problems for user behavior profiling and topic detection. She is also exploring transaction records in the digital marketing domain for research problems related to outbound marketing. Beyond information science, she is also the data specialist for an international, collaborative political communication research project on Asian democracies.
My research interests fall under the areas of digital libraries, natural language processing, information retrieval, human-computer interaction. Specifically, they include document structure acquisition, verb analysis, digital library resource annotation and and applied text summarization. My research goal aims to investigate how natural language processing and information retrieval can be applied to improve scholarly publication and knowledge discovery.
Philipp Mayr is a team leader at the GESIS – Leibniz Institute for the Social Sciences department Knowledge Technologies for the Social Sciences. He is a graduate of the Berlin School of Library and Information Science at Humboldt University Berlin where he finished his doctoral research in 2009. Philipp is a member of the European NKOS network and published widely in the areas Informetrics, Information Retrieval and Digital Libraries. He is member of the editorial board of the journals Scientometrics and Information Wissenschaft & Praxis. His research interests include non-textual ranking in digital libraries, bibliometric methods, evaluation of information systems and knowledge organising sytems, as well as applied informetrics. Philipp was the main organizer of the Combining Bibliometrics and Information Retrieval at ISSI 2013 and the BIR workshops at ECIR 2014, 2015 and 2016.
Dietmar Wolfram is a professor at the School of Information Studies at the University of Wisconsin-Milwaukee. He is a graduate of the University of Western Ontario where he received a PhD in Library and Information Science in 1990. His research interests include information retrieval systems design and evaluation, user studies of IR systems, applied informetrics and the intersection of informetrics and information retrieval. He is a member of the editorial boards of several journals, including Cybermetrics, the Journal of Informetrics and the Journal of the Association for Information Science and Technology.
The below lists the committee members who have stated their support to review submissions to the workshop.