Chinese Word Segmentation

[ Back to the NLP group home page ]
[ Back to the WING home page ]
[ Back to ForeCite/CiteSeer web services ]

This is a demonstration page for the Chinese Word Segmentation software developed at NUS by the NLP group.


Web Service

More NLP services are now being made available on the web. Following this trend you can send your Chinese text files to segment via our web service. We will segment these for you free of charge (as and when time and processing power allows, these processes are done with lower priority).

N.B. We keep logs of what's segmented in these demos, to improve the accuracy and productivity of CWS.

Demonstration

You can run this demo in two different ways:

1. Cut and paste plain Chinese text (in UTF-8 encoding) to be segmented. Some sample text has been inserted, feel free to clear it out.

2. Or upload a UTF-8 encoded file to have it segmented. In this case, make sure to clear out the above text field before running this demo. You can use a file such as: http://wing.comp.nus.edu.sg/~forecite/samples/pk-testref.utf8.txt:

If you have file you want to segment and it is in a different encoding (GB, BIG5), please use a utility to convert the file into UTF-8 first (such as iconv on Unixes).

Publications

Group Members


Min-Yen Kan <kanmy@comp.nus.edu.sg>
Created on: Fri Oct 2 14:07:07 SGT 2009 | Version: 1.0 | Last modified: Mon Oct 5 10:57:00 2009