[ChimeText] 9 Jul (reminder, tomorrow! in SR7 at 2-3pm) Mstislav Maslennikov / Relation Extraction for Information Extraction from Free Text
Min-Yen Kan
knmnyn at gmail.com
Tue Jul 8 11:58:31 SGT 2008
Hi all:
Just a short reminder about tomorrow's seminar. See you there.
Min
--
Speaker: Mstislav Maslennikov, NUS
Title: Relation Extraction for Information Extraction from Free Text
Date: Wednesday 9 Jul, 2-3pm
Venue: SR7 (COM1 #02-07)
ABSTRACT:
Information Extraction (IE) is the task of identifying information
(e.g. entities, relations or events) from free text. Numerous previous
context-, ontology-, rule- and classification-based methods were
actively explored during the decades of research on this task.
However, a challenging open question of effectively handling the
flexibility of natural language remains unresolved over the years. In
IE, this implies the problem of sparseness of data instances, which in
turn causes the problems of paraphrasing and misalignment of context
features of the extracted information. In this thesis, we hypothesize
that such problems can be alleviated by combining relations between
entities at the phrasal, dependency, semantic and inter-clausal
discourse levels. To validate our hypothesis, we develop a 2-level
multi-resolution framework ARE (Anchors and Relations). The first
level of ARE extracts candidate phrases (anchors), while the second
level evaluates the relations among the anchors and composes possible
candidate templates.
The relations between the anchors are combined in several ways. First,
we evaluate dependency relations between anchors. We classify
dependency relation paths between the anchors into the Simple, Average
and Hard categories according to the path length and develop different
techniques to handle them. The category-specific strategies resulted
in the improvement of 3%, 4% on the MUC4 (Terrorism) and MUC6
(Management Succession) domains, respectively. The increased
performance demonstrates that dependency relations are important to
handle paraphrases at the syntactic level. Second, we incorporate the
discourse relation analysis in a multi-resolution framework for IE to
handle long distance dependency relations and possible paraphrasings
at the intra-clausal level. This leads to a further improvement of 3%,
7%, 3% and 4% on MUC4, MUC6 and ACE RDC 2003 (general and specific
types) domains, respectively. Third, we explore 2 supplementary
strategies to combine relation paths between anchors. Since the amount
of negative paths between the anchors is many times more than that of
positive paths, we apply a filtering strategy to eliminate negative
paths. Also, we support the learning process of our dependency
relation classifier by the cascading of the features from the
discourse classifier. These 2 strategies further improve the IE
performance on the MUC4, MUC6 and ACE RDC 2003 (general and specific
types) corpora.
Overall, our results affirm the hypothesis that the extraction of
candidate phrases (anchors) and the combination of different relation
types between anchors in a multi-resolution framework is important to
tackle the key problems of paraphrasing and misalignment in
Information Extraction.
BIODATA:
Mr. Maslennikov Mstislav is a Doctoral Student at SOC, NUS. He
received his 5-year diploma (equivalent to M.Sc.) degree from the
Moscow State University, Russia. Since 2002, he has been studying in
the internship and PhD programs under the supervision of Prof. Chua
Tat-Seng and Dr. Tian Qi. His research is on the theme of improving
Information Extraction through relation-based analysis of free text.
Upcoming Talks:
16 Jul: Xiong Deyi (I2R / Linguistically Annotated BTG for Statistical
Machine Translation)
17 Jul: Douglas Oard (University of Maryland / Fourth-Generation
Content Analysis: Supporting social science research using
computational linguistics)
18 Jul: (related seminars) 3 seminars on 1) Real-Time Document Image
Retrieval with LLAH 2) Large-Scale and Real-Time Specific Object 3)
Pattern recognition with supplementary information
25 Jul: Yahoo! Research talks (planned, unconfirmed)
More information about the ChimeText
mailing list