Zhu, JH; Wang, J; Cox, I; Taylor, M; (2009) Risky Business: Modeling and Exploiting Uncertainty in Information Retrieval. In: Sanderson, M and Zhai, CX and Zobel, J and Allan, J and Aslam, JA, (eds.) PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL. (pp. 99 - 106). ASSOC COMPUTING MACHINERY
Full text not available from this repository.
Most retrieval models estimate the relevance of each document to a query and rank the documents accordingly. However, such an approach ignores the uncertainty associated with the estimates of relevancy. If a high estimate of relevancy also has a high uncertainty, then the document may be very relevant or not relevant at all. Another document may have a slightly lower estimate of relevancy but the corresponding uncertainty may be much less. In such a circumstance, should the retrieval engine risk ranking the first document highest, or should it choose a more conservative (safer) strategy that gives preference to the second document? There is no definitive answer to this question, as it depends on the risk preferences of the user and the information retrieval system. In this paper we present a general framework for modeling uncertainty and introduce an asymmetric loss function with a single parameter that can model the level of risk the system is willing to accept. By adjusting the risk preference parameter, our approach can effectively adapt to users' different retrieval strategies.We apply this asymmetric loss function to a language modeling framework and a practical risk-aware document scoring function is obtained. Our experiments on several TREC collections show that our "risk-averse" approach significantly improves the Jelinek-Mercer smoothing language model, and a combination of our "risk-averse" approach and the Jelinek-Mercer smoothing method generally outperforms the Dirichlet smoothing method. Experimental results also show that the "risk-averse" approach, even without smoothing from the collection statistics, performs as well as three commonly-adopted retrieval models, namely, the Jelinek-Mercer and Dirichlet smoothing methods, and BM25 model.
|Title:||Risky Business: Modeling and Exploiting Uncertainty in Information Retrieval|
|Event:||32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval|
|Dates:||2009-07-19 - 2009-07-23|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science > Computer Science|
Archive Staff Only: edit this record