UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Entropy-Based Static Index Pruning

Zheng, L; Cox, IJ; (2009) Entropy-Based Static Index Pruning. In: Boughanem, M and Berrut, C and Mothe, J and SouleDupuy, C, (eds.) ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS. (pp. 713 - 718). SPRINGER-VERLAG BERLIN

Full text not available from this repository.

Abstract

We propose a new entropy-based algorithm for static index pruning. The algorithm computes an importance score for each document in the collection based on the entropy of each term. A threshold is set according to the desired level of pruning and all postings associated with documents that score below this threshold are removed from the index, i.e. documents are removed from the collection. We compare this entropy-based approach with previous work by Carmel et al. [1], for both the Financial Times (FT) and Los Angeles Times (LA) collections. Experimental results reveal that the entropy-based approach has superior performance on the FT collection, for both precision at 10 (P@10) and mean average precision (MAP). However, for the LA collection, Carmel's method is generally superior with MAP. The variation in performance across collections suggests that a hybrid algorithm that incorporates elements of both methods might have more stable performance across collections. A simple hybrid method is tested, in which a first 10% pruning is performed using the entropy-based method, and further pruning is performed by Carmel's method. Experimental results show that the hybrid algortihm can slightly improve that of Carmel's, but performs significantly worse than the entropy-based method on the FT collection.

Type:Proceedings paper
Title:Entropy-Based Static Index Pruning
Event:31st European Conference on Information Research
Location:Toulouse, FRANCE
Dates:2009-04-06 - 2009-04-09
ISBN-13:978-3-642-00957-0
UCL classification:UCL > School of BEAMS > Faculty of Engineering Science > Computer Science

Archive Staff Only: edit this record