Constructing an anonymous dataset from the personal digital photo libraries of mac app store users

Abstract

Personal digital photo libraries embody a large amount of information useful for research into photo organization, photo layout, and development of novel photo browser features. Even when anonymity can be ensured, amassing a sizable dataset from these libraries is still difficult due to the visibility and cost that would be required from such a study.We explore using the Mac App Store to reach more users to collect data from such personal digital photo libraries. More specifically, we compare and discuss how it differs from common data collection methods, e.g. Amazon Mechanical Turk, in terms of time, cost, quantity, and design of the data collection application.We have collected a large, openly available photo feature dataset using this manner. We illustrate the types of data that can be collected. In 60 days, we collected data from 20,778 photo sets (473,772 photos). Our study with the Mac App Store suggests that popular application distribution channels is a viable means to acquire massive data collections for researchers.

Publication
Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.