| Downloads |
|
|
|
Here you will find deliverables of the projects done by members of WING, exclusive of publications. If you're looking for demonstrations of systems, they are listed with each project. ToolsThese are some of the in-house NLP and IR tools that we have built to facilitate our research at WING. We hope you'll find some of these tools helpful. A full list of all such tools that we have installed for research at NUS (including tons of ones from external sources can be found on our resource page). JavaRAP - A Java open-source reimplementation of the famed RAP (Resolution of Anaphora Program) by Boguraev and Kennedy. Note: this program is not considered competitive for anaphora resolution by today's standards but we have implemented it for benchmarking purposes. Feel free to download and use for non-commercial purposes. Daemonized Collins parser - Got more than a few sentences to parse? The Collins head driven parser is still considered one of the best open-source English language parsers. We've taken Michael's source code and wrapped it into a daemonized version that you can send sentences to through a socket service, avoiding the long initialization needed by the parser. Rapi - An open-source OPAC package under the MIT license that allows you to: 1) build a Lucene index from your MARC files, 2) screen scrape live circulation data from your own iii OPAC, and 3) wrap your OPAC with a customizable user interface. A live demo is available here. CorporaThese are text and image and other datasets used by experiments in our group. Most are freely available for research use (not commercial use in some cases). Most of the links here are not yet working, please bear with us as we consolidate the URLs and distribution paths on our site. NUS SMS Corpus - This is a corpus of about 10K Short Message Service messages from mobile phone users in Singapore. All are in English. The contributors were mostly university students who contributed messages to this corpus for a small amount of renumeration. Compiled by Yijue How. Presentation to Document Alignment Corpus - This is a manual alignment of 20 scholarly papers from the database community to their corresponding presentations. The alignments are from one slide to multiple paragraphs. Compiled by Eugene Ezekiel. Light Verb / Support Verb Annotations - This is a corpus of light verb annotations (aka support verbs; e.g., "make a call") that were annotated to support a supervised learning algorithm to differentiate them from meaning bearing (heavy) verbs. Compiled by Yee Fan Tan. Javascript Functionality Annotations - Over 1.8K different JavaScript units have been extracted and annotated from the WT10G standard web corpus. These are all the unique JavaScript units that we were able to detect in the entire WT10G, although there were many duplicates in the original 20+K unit instances. Compiled by Wei Lu. NPIC Image Corpus - This is a large 4.7GB image collection, comprising of two different collections: a spidered portion gathered from the Web and another portion taken from the freely-accessible Wikipedia Commons. Compiled by Fei Wang. Keyphrase Corpus - The corpus consists of more than 200 scientific publications, each has 4 different formats: PDF, HTML, plain text, and XML. Compiled by Emma Thuy Dung Nguyen.
|
|
| Last Updated ( Wednesday, 25 June 2008 ) |
Downloads 
