We explain how graphics processing units (GPUs) can be used to accelerate language model
(LM) based information retrieval (IR). Compared to the traditional, vector-space model (VSM)
of retrieval, the LM-based retrieval model is well grounded in a probabilistic framework and can
outperform the VSM model in terms of retrieval accuracy.
We show that the LM methodology for IR is ripe for leveraging the strengths of a GPU-based approach, given the high degree of parallelism that is present in both the LM scoring and
indexing phases. In particular, we present novel GPU-based algorithms for smoothing, document
scoring, and document clustering for a LM-based IR system. In order to speed up smoothing, we provide novel GPU based implementations of the Good Turing and Kneser Ney smoothing
algorithms. To speed up document scoring, we present an efficient GPU based implementation
of Ponte and Croft’s document scoring model. Finally, to speed up clustering, we implement single link hierarchical agglomerative clustering on the GPU. Experiments on the WT2G collection demonstrate that our GPU-based algorithms can compute LM IR document rankings
in excess of 20 times faster than a CPU-based counterpart.