Zhu, Z and Levene, M and Cox, IJ (2009) Query Classification Using Asymmetric Learning. In: 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009). (pp. 518 - 524). IEEE
Full text not available from this repository.
Understanding the meaning of queries is a key task which is at the heart of web search. Classification of users' queries is a challenging task due to the fact that queries are usually short and often ambiguous. A common approach to tackle the problem of short and noisy queries is to enrich the queries. Various enrichment strategies have been proposed that are based on either pseudo-relevance feedback or secondary sources of information. In general, pseudo-relevance feedback based algorithms exhibit superior performance. However in this case query classification can only occur after performing the retrieval, as the result set is needed to apply pseudo-relevance feedback.Since some applications may prefer to perform query classification prior to, or in parallel with retrieval, there is a need to improve the performance of query classification based on secondary sources. In this paper we present a hybrid strategy, in which training is based on pseudo-relevance feedback, but testing is based on a secondary source, specifically Yahoo's "suggested keywords". These keywords are based on co-occurrence data across queries. The classifier which is built offline with training data, makes use of the top-n results during training, but not cluing testing. Thus, there is an asymmetry between the training and testing data. We compared the classification using symmetrical and asymmetrical approaches on a large AOL search log. Symmetric training and testing using queries enriched with Yahoo keywords yielded a microaveraged F1 score of 44%. Asymmetric training (enriching with the top-10 Google snippets) and testing (enriching with Yahoo suggested keywords) increased the F1 score to 46%. This is comparable with a symmetric approach based on feedback of the top-2 pseudo-relevant documents, in which a similar number of enrichment terms is added.
|Title:||Query Classification Using Asymmetric Learning|
|Event:||2nd International Conference on the Applications of Digital Information and Web Technologies|
|Dates:||2009-08-04 - 2009-08-06|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science > Computer Science|
Archive Staff Only: edit this record