[ChimeText] 28 July (reminder, tomorrow!) Qiu Long / Context for Semantic Similarity Calculation in Scenario Template Creation
Min-Yen Kan
knmnyn at gmail.com
Sun Jul 27 15:49:23 SGT 2008
Hi all:
Qiu Long will be giving his pre-defense thesis seminar on his recent
text processing
approach, tomorrow morning. Please come to his talk!
Cheers,
Min
DATE: 28 Jul 2008, 9:00-11:00 am
TITLE: Context for Semantic Similarity Calculation in Scenario Template Creation
VENUE: MR6 (AS6 05-12)
Chaired by: Dr Kan Min-Yen
ABSTRACT:
Scenario Template Creation (STC) is a Natural Language Processing
(NLP) task to detect the commonalities among articles on similar
events and generalize them into an abstract representation -- a
scenario template (ST). For this task, the estimation of verb-centric
text span similarity is the key. Since text span similarity
calculation plays an important role in many NLP applications, various
approaches have been proposed. They range from bag-of-words to more
complicated ones involving thesauri and features at different
linguistic levels. However, there are still demands and opportunities
for further improvement. Contextual information, for instance, by
intuition would be a source to enhance text span similarity
estimation. But it has yet to be exploited as well as the internal
features have been.
In this talk, I first discuss an intrinsic similarity measure for
predicate-argument tuples (PATs). It is applied to a Paraphrase
Recognition (PR) task, demonstrating its feasibility. Then I show a
context model to capture contexts that could be more informative
compared to other surrounding tokens. With different contextual
relations defined, I hypothesize that two PATs' semantic similarity
can also be reflected by their extrinsic similarity, i.e., whether
they are contextually similarly connected to similar contexts. I show
experimental results that confirm the correlation between such an
extrinsic similarity and the semantic similarity of PATs. To integrate
intrinsic and extrinsic similarities for PAT clustering, I propose a
graphical framework, using a novel core algorithm called Context
Sensitive Clustering (CSC). This clustering process is guided by the
Expectation-Maximization (EM) algorithm. I conduct experiments
comparing this EM-based CSC algorithm with the standard
K-means algorithm. Under the widely-used purity and inverse purity
metrics, the proposed algorithm outperforms K-means over all the
scenarios tested.
BIODATA: Long Qiu is a Doctoral Student at SoC, NUS, co-supervised by
Professor Chua Tat-Seng and Dr. Min-Yen Kan. He got his Master of
Science (SM)
in Computer Science from Singapore-MIT Alliance in 2002. He is
interested in Natural Language Processing (NLP) and the related
machine learning techniques.
More information about the ChimeText
mailing list