Generic soft pattern models for definitional question answering

January 2005

Abstract

This paper explores probabilistic lexico-syntactic pattern matching, also known as soft pattern matching. While previous methods in soft pattern matching are ad hoc in computing the degree of match, we propose two formal matching models: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of these models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform state-of-the-art manually constructed patterns. A critical difference between the two models is that the PHMM technique handles language variations more effectively but requires more training data to converge. We believe that both models can be extended to other areas where lexico-syntactic pattern matching can be applied.

Type

Conference paper

Publication

Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Generic soft pattern models for definitional question answering

Abstract

Hang Cui

Doctoral Alumnus (Jul. ‘06). Thesis: Generic Soft Patterns for Question Answering.

Min-Yen Kan

Associate Professor