ForeCite Web Services

[ Back to the WING home page ]

More NLP services are now being made available on the web. Following this trend you can send your plain text citations to use via our web service. We will parse these for you free of charge (as and when time and processing power allows, these processes are done with lower priority).

N.B. We keep logs of the data that is used when invoking web services, to improve the accuracy and productivity of the services.

whether the web service broker is up and which services it is currently offering.

  1. Mxtag: maximum entropy part of speech tagger or sentence terminator (mxtag):
    • tag_delimited_sentence(sentence): Expects a single tokenized sentence as input
    • tag_delimited_file(url_or_path): Expects a plain text file. Returns parts of speech
    • delimit_file(url_or_path): Expects a plain text file. Returns a sentence per line from input. Can be used before tag_delimited_file
  2. PDFBox: PDF to plain text extractor. A freely-available plain text extractor used to transform PDFs to plain UTF-8 text. Used as a first stage for much scholarly document processing. Expects a plain text file as input:
    • extract_html(url_or_path) and
    • extract_text(url_or_path).
  3. PdfToText: Another PDF to plain text extractor. Expects a plain text file as input.
    • ptt_extract_html(url_or_path) and
    • ptt_extract_text(url_or_path).
  4. ParsHed: Paper header parser
    • extract_header(url_or_path)
  5. ParsCit: An open-source CRF Reference String Parsing Package. Segments and labels reference strings and also extracts their citation contexts from plain text files.
    • extract_citations(url_or_path)
  6. MeURLin: Webpage classification without the webpage. Segments URLs and guesses their classification.
    • classify_websites(url_or_path,lexicon)
    • segment_urls(url_or_path)

Min-Yen Kan <kanmy@comp.nus.edu.sg>
Created on: Fri Dec 24 01:48:05 SGT 2004 | Version: 1.0 | Last modified: Thu Mar 13 13:24:50 SGT 2008