NPIC: Synthetic Image Classification

NPIC, the synthetic image classifier

[ Back to the WING home page ]

This is the home page of NPIC, a research project which performs image classification (especially for synthetic i.e., non-photographic images). NPIC does its work by supervised machine learning on datasets noisily created from image search engine results.


Enter a file containing a single image to upload and classify:
(and optionally) Enter its URL: , and
enter the URL of the page where it occurs:

N.B.: The classifiers placed on line do not include all of the image features used in the full classifier given in the paper. You will not be able to replicate the accuracy level of our reported results through this demo.

Image Corpus

You can download our image corpus built for NPIC. It is specifically for synthetic (i.e., non-photographic) image classification.

Download the corpus here (will create a separate npicCorpus directory in the current working directory). As these datasets are very large, you will want to download them one at a time. Please don't attempt to do parallel downloads, it eats up all of our webserver's bandwidth. Alternatively, you can write to us and let us know that you want a copy mailed to you on DVD. We will try to entertain these requests as time allows.

11409_photo_imgs/		- lvl 1 photos class (warning, very big file: ~2.7GB)
250_level3_map_imgs/		- lvl 3 map (~299 MB)
350_level3_diagram_imgs/	- lvl 3 diagram (~69 MB)
5000_UROP_imgs/			- lvl 2 classification from Fei's earlier UROP report. (warning, very big file: ~1.4GB)
				  these images taken together form lvl 1 for the synthetic class	
index.html			- this file
wiki_all_imgs/			- wikipedia images used as second test set (warning, very big file: ~1.7GB)
mapping.tsv			- the file to label mapping (needed for all/any of the data files)

Other directories ending in BMP are converted images of the original for the purpose of building image vectors.

Please do us a favor and send a quick message to, if download this corpus and plan on using it. It will only take a minute of your time and will help us get a better idea of what such a corpus might be used for.

The corpus is one of the deliverables of a final year project done by Fei Wang. If you would like to use this corpus, please include cite our paper in CIVR 2006, as below.

Note these images may be copyrighted by their original owners (with the exception of the wikipedia images which are covered under a Creative Commons license). We are distributing them under the terms of fair-use for academic research purposes. You may not use the other (non-wikipedia) images for commercial products.

The work in creating this dataset and distributing it is covered by a license derived from the Open Directory Project. The license is distributed with this corpus.. The collection of this corpus was generously funded by the Faculty Research Fund of the National University of Singapore, R 252-000-245-112.




A more complete version of Fei Wang's work is available as two separate undergraduate theses:

Group Members

Min-Yen Kan <>
Created on: Mon May 22 12:57:43 2006 | Version: 1.0 | Last modified: Wed Jun 14 17:36:27 2006