[ChimeText] 17 Nov (3-4pm @ TR7) Yee Fan Tan / Cost-Sensitive Web-Based Information Acquisition for Record Matching (PhD defense)
Min-Yen Kan
knmnyn at gmail.com
Fri Nov 4 15:54:07 SGT 2011
Hi all:
Yee Fan will be defending his thesis two weeks from now, on his topic
of acquiring data from the web for the task of record matching and ata
cleaning. Please do come to his thesis defense! See you!
Also, please note: the talk original scheduled on the 14 Nov by Mamoru
Komachi has been cancelled.
-M
SPEAKER: Yee Fan Tan
TITLE: Cost-Sensitive Web-Based Information Acquisition for Record Matching
VENUE: TR7 (COM2 #01-07)
DATE AND TIME: 17 November 2011 (Thursday), 3:00-4:30pm
CHAIRED BY: A/P Min-Yen Kan
ABSTRACT: In many record matching problems, the input data is either
ambiguous or incomplete, making the record matching task difficult.
However, for some domains, evidence for record matching decisions are
readily available in large quantities on the Web. These resources may
be retrieved by making queries to a search engine, making the Web a
valuable resource. On the other hand, Web resources are slow to
acquire compared to data that is already available in the input. Also,
some Web resources must be acquired before others. Hence, it is
necessary to acquire Web resources selectively and judiciously, while
satisfying the acquisition dependencies between these resources.
This thesis has two major parts. In the first part, I propose methods
for using information from the Web for record matching, establishing
that acquiring web based resources can improve record matching tasks.
In the second and larger part, I propose approaches for selective
acquisition of web based resources for record matching tasks, with the
aim of balancing acquisition costs and acquisition benefits. These
approaches start from the more task-specific and move towards the more
general, culminating in a framework for performing cost-sensitive
resource acquisition problems with hierarchical dependencies. This
graphical framework is versatile and can apply to a large variety of
problems. In the context of this framework, I propose an effective
resource acquisition algorithm for record matching problems, taking
particular characteristics of such problems into account.
BIODATA: Yee Fan Tan presently holds the position of Chief System
Architect in KAI Square Pte Ltd, a company started by alumni of School
of Computing, National University of Singapore. Prior to joining KAI
Square, he performed research work leading to a Ph.D. in the School of
Computing, and was a member of the Web Information Retrieval / Natural
Language Processing Group (WING).
Upcoming talks:
17 Nov / Yee Fan Tan / Cost-Sensitive Web-Based Information
Acquisition for Record Matching (PhD defense)
More information about the ChimeText
mailing list