To improve domain-specific information retrieval, we have identified and examined two generic (domain-independent) but prominent problems in this area: Resource Categorization and Text-to-Construct Linking.
The first problem refers to the categorization of domain-specific resources at
multiple granularities. This helps a search engine to better meet specific user
needs by highlighting task-relevant materials and organize its presentation of
search results by more pertinent metadata criteria.
The second problem refers to the resolution of domain-specific concepts to
their related domain-specific constructs. This allows constructs to properly influence
relevance ranking in search results, without troubling users to input them
in potentially awkward construct syntax.
We observe correlations among various characteristics of domain-specific resources,
capturing them in a multi-layered graph. Following this graph, we carry
out our research on the two aforementioned problems as follows: For Resource
Categorization, we use the key information extraction problem in healthcare as a
case study on the categorization of correlated nominal facets. We exploit the correlation
between two categorizations at different granularities (i.e., sentence-level
and word-level) by propagating information from one to the other sequentially
or simultaneously. In addition, we use the readability measurement problem
as a case study on the categorization of ordinal facets. We exploit the correlation
between the readability of domain-specific resources and the difficulty of
domain-specific concepts through iterative computation. For Text-to-Construct
Linking, we tackle the linking of math concepts to their representations in math
expressions. We exploit the correlation between the observable characteristics of
a concept-expression pair and its relation type using supervised learning.
To demonstrate the applicability and usefulness of our research, we have implemented
two domain-specific search systems, one in the domain of math and
the other in healthcare. Both systems incorporate and extend our research findings
to handle domain-specific user needs. Our evaluation shows that both the
Resource Categorization and the Text-to-Construct Linking features are effective
in facilitating domain-specific search.