Implementing a Language Model based Information Retrieval System on the Graphics Processing Unit
Research Area: Information Retrieval Year: 2013
Type of Publication: Technical Report Keywords: GPU, language modeling, information retrieval, search engine
  • Sudhanshu Khemka
Honours Year Project Report
We explain how graphics processing units (GPUs) can be used to accelerate language model (LM) based information retrieval (IR). Compared to the traditional, vector-space model (VSM) of retrieval, the LM-based retrieval model is well grounded in a probabilistic framework and can outperform the VSM model in terms of retrieval accuracy. We show that the LM methodology for IR is ripe for leveraging the strengths of a GPU-based approach, given the high degree of parallelism that is present in both the LM scoring and indexing phases. In particular, we present novel GPU-based algorithms for smoothing, document scoring, and document clustering for a LM-based IR system. In order to speed up smoothing, we provide novel GPU based implementations of the Good Turing and Kneser Ney smoothing algorithms. To speed up document scoring, we present an efficient GPU based implementation of Ponte and Croft’s document scoring model. Finally, to speed up clustering, we implement single link hierarchical agglomerative clustering on the GPU. Experiments on the WT2G collection demonstrate that our GPU-based algorithms can compute LM IR document rankings in excess of 20 times faster than a CPU-based counterpart.
Digital version