[ChimeText] (note special time, date and place) David Chiang (ISI) / Hiero: Finding Structure in Statistical Machine Translation
Min-Yen Kan
knmnyn at gmail.com
Mon Apr 10 21:52:24 SGT 2006
Hi all:
Tomorrow, *Tuesday*, please come for our special session of CHIME text
processing on SMT, continuing our theme from last week. Please be
there! Note the special place and time.
We will also have another session on Wednesday by Microsoft's Hang Li,
so read on to the next post as well (coming soon).
Min
TITLE : Hiero: Finding Structure in Statistical Machine Translation
TIME : April 11, 2006, 9:00am - 10:00am, Tue
VENUE : SR 2 (S16, #04-05)
SPEAKER : Dr. David Chiang
University of Southern California,
Information Sciences Institute
Chaired by Dr Ng Hwee Tou (nght at comp.nus.edu.sg)
ABSTRACT:
The introduction of data-driven methods into machine translation (MT)
in the 1990s created a whole new way of doing MT, and the recent move
from the word-based models developed at IBM to the phrase-based models
developed by Och and others has led to a breakthrough in MT
performance. The next breakthrough, the move to syntax-based models
that deal with the hierarchical, meaning-bearing, structures of
sentences, is still waiting to happen. Several approaches have been
tried, but none yet have been able to outperform phrase-based models
in large-scale evaluations. Hiero is a first step towards that
breakthrough. It deals with hierarchical structures, similarly to
syntax-based models, but also draws on ideas from phrase-based
translation, including the ability to be trained from parallel
bilingual text without any syntactic annotation, manual or automatic.
In the recent NIST MT Evaluation, it outperformed several state-of-
the-art systems, both phrase-based and syntax-based, on both Chinese-
English and Arabic-English translation. I will present Hiero's
underlying model, its implementation, and experimental results,
including some recent investigations into how syntactic information
does and does not improve translation quality.
BIODATA:
Dr. David Chiang is a research scientist at ISI working on statistical
machine translation using synchronous grammars. His interests span
both the theoretical and practical aspects of this area, from
investigating the formal power of synchronous grammar formalisms to
building a large-scale synchronous CFG based statistical machine
translation system. He has also done research in statistical parsing
with tree- adjoining grammars, formal language theory, and biological
sequence analysis. He received his PhD from the University of
Pennsylvania in 2004 under the supervision of Dr. Aravind K. Joshi.
Next sessions:
12 Apr - Hang Li /
More information about the ChimeText
mailing list