Re-ranking Documents Based on Query-Independent Document Specificity.
In: Andreasen, T and Yager, RR and Bulskov, H and Christiansen, H and Larsen, HL, (eds.)
(Proceedings) 8th International Conference on Flexible Query Answering Systems.
(pp. pp. 201-214).
The use of query-independent knowledge to improve the ranking of documents in information retrieval has proven very effective in the context of web search. This query-independent knowledge is derived from an analysis of the graph structure of hypertext links between documents. However, there are many cases where explicit hypertext links are absent or sparse, e. g. corporate Intranets. Previous work has sought to induce a graph link structure based on various measures of similarity between documents. After inducing these links, standard link analysis algorithms, e. g. Page Rank, can then be applied. In this paper, we propose and examine an alternative approach to derive query-independent knowledge, which is not based on link analysis. Instead, we analyze each document independently and calculate a "specificity" score, based on (i) normalized inverse document frequency, and (ii) term entropies. Two re-ranking strategies, i.e. hard cutoff and soft cutoff, are then discussed to utilize our query-independent "specificity" scores. Experiments on standard TREC test sets show that our re-ranking algorithms produce gains in mean reciprocal rank of about 4%, and 4% to 6% gains in precision at 5 and 10, respectively, when using the collection of TREC disk 4 and queries from TREC 8 ad hoc topics. Empirical tests demonstrate that the entropy-based algorithm produces stable results across (i) retrieval models, (ii) query sets, and (iii) collections.
|Title:||Re-ranking Documents Based on Query-Independent Document Specificity|
|Event:||8th International Conference on Flexible Query Answering Systems|
|Location:||Roskilde Univ, Dept Commun, Business & Informat Technol, Roskilde, DENMARK|
|Dates:||26 October 2009 - 28 October 2009|
|Keywords:||Query-independent knowledge, Specificity, Normalized inverse document frequency, Entropy, Ranking, Information retrieval|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science
UCL > School of BEAMS > Faculty of Engineering Science > Computer Science
Archive Staff Only