UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Term Frequency Quantization for Compressing an Inverted Index

Zheng, L; Cox, IJ; (2010) Term Frequency Quantization for Compressing an Inverted Index. In: An, A and Lingras, P and Petty, S and Huang, R, (eds.) ACTIVE MEDIA TECHNOLOGY. (pp. 277 - 287). SPRINGER-VERLAG BERLIN

Full text not available from this repository.

Abstract

In this paper, we investigate the lossy compression of term frequencies in an inverted index based on quantization. Firstly, we examine the number of bits to code term frequencies with no or little degradation of retrieval performance. Both term-independent and term-specific quantizers are investigated. Next, an iterative technique is described for learning quantization step sizes. Experiments based on standard TREC test sets demonstrate that nearly no degradation of retrieval performance can be achieved by allocating only 2 or 3 bits for the quantized version of term frequencies. This is comparable to lossless coding techniques such as unary, gamma and delta-codes. However, if lossless coding is applied to the quantized term frequency values, then around 26% (or 12%) savings can be achieved over lossless coding alone, with less than 2.5% (or no measurable) degradation in retrieval performance.

Type:Proceedings paper
Title:Term Frequency Quantization for Compressing an Inverted Index
Event:6th International Conference on Active Media Technology
Location:York Univ, Toronto, CANADA
Dates:2010-08-28 - 2010-08-30
ISBN-13:978-3-642-15469-0
Keywords:RETRIEVAL
UCL classification:UCL > School of BEAMS > Faculty of Engineering Science > Computer Science

Archive Staff Only: edit this record