Towards Generic Domain-specific Information Retrieval
Research Area: Information Retrieval Year: 2013
Type of Publication: Phd Thesis  
  • Jin Zhao
To improve domain-specific information retrieval, we have identified and examined two generic (domain-independent) but prominent problems in this area: Resource Categorization and Text-to-Construct Linking. The first problem refers to the categorization of domain-specific resources at multiple granularities. This helps a search engine to better meet specific user needs by highlighting task-relevant materials and organize its presentation of search results by more pertinent metadata criteria. The second problem refers to the resolution of domain-specific concepts to their related domain-specific constructs. This allows constructs to properly influence relevance ranking in search results, without troubling users to input them in potentially awkward construct syntax. We observe correlations among various characteristics of domain-specific resources, capturing them in a multi-layered graph. Following this graph, we carry out our research on the two aforementioned problems as follows: For Resource Categorization, we use the key information extraction problem in healthcare as a case study on the categorization of correlated nominal facets. We exploit the correlation between two categorizations at different granularities (i.e., sentence-level and word-level) by propagating information from one to the other sequentially or simultaneously. In addition, we use the readability measurement problem as a case study on the categorization of ordinal facets. We exploit the correlation between the readability of domain-specific resources and the difficulty of domain-specific concepts through iterative computation. For Text-to-Construct Linking, we tackle the linking of math concepts to their representations in math expressions. We exploit the correlation between the observable characteristics of a concept-expression pair and its relation type using supervised learning. To demonstrate the applicability and usefulness of our research, we have implemented two domain-specific search systems, one in the domain of math and the other in healthcare. Both systems incorporate and extend our research findings to handle domain-specific user needs. Our evaluation shows that both the Resource Categorization and the Text-to-Construct Linking features are effective in facilitating domain-specific search.
Digital version