Interpreting Time In Text, Summarizing Text With Time
Research Area: Natural Language Processing Year: 2013
Type of Publication: Phd Thesis  
  • Jun-Ping Ng
In this thesis, I study two key steps in building a logical representation of temporal information --- a timeline --- found within text from newswire articles: 1) intra-sentence event-timex (E-T) temporal relationship classification, and 2) article-wide event-event (E-E) temporal relationship classification. Events and time expressions (timexes) are basic units of temporal information in text. These two steps allow us to build an understanding of the relative ordering between these basic temporal units. For both of these classification tasks, I propose more semantically motivated features, namely the use of typed dependency parses and discourse analyses, to achieve better classification performance. This is in contrast to much work in the existing literature, which have focused on lexico-syntactic features. Working on E-T temporal relationship classification, I also show that crowdsourcing is a very cost-effective and viable avenue through which a high-quality temporal corpus can be built. Making use of the structure of a sentence, I propose a unique way to identify instances which are computationally and cognitively easier. Excluding these instances from a corpus does not degrade subsequent classifier performance significantly. This allows cost savings of up to 37% when building a E-T temporal corpus. Besides putting together a state-of-the-art temporal processing system, this thesis also validates the efficacy and utility of the timelines that are automatically derived. Temporal information from these timelines is incorporated into a competitive baseline multi-document summarization system. I propose several features derived from timelines and show that they lead to a 4.1% improvement in summarization performance. I also introduce a modification to the traditional Maximal Marginal Relevance (MMR) algorithm, TimeMMR. TimeMMR is shown to be useful in the summarization of some document sets. To further improve the performance gains derived from the use of temporal information, I propose a reliability filtering metric which gauges how accurate and useful a timeline is. By selectively making use of timelines guided by this reliability filtering metric, overall summarization performance is increased by a statistically significant 5.9%.
Digital version