UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

A probabilistic hierarchical clustering method for organising collections of text documents

Vinokourov, A; Girolami, M; (2000) A probabilistic hierarchical clustering method for organising collections of text documents. In: Sanfeliu, A and Villanueva, JJ and Vanrell, M and Alquezar, R and Jain, AK and Kittler, J, (eds.) 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS. (pp. 182 - 185). IEEE COMPUTER SOC

Full text not available from this repository.

Abstract

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is base on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on the multinomial binomial distributions are most appropriate. An Expectation Maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections.

Type:Proceedings paper
Title:A probabilistic hierarchical clustering method for organising collections of text documents
Event:15th International Conference on Pattern Recognition (ICPR-2000)
Location:BARCELONA, SPAIN
Dates:2000-09-03 - 2000-09-07
ISBN:0-7695-0751-4
Keywords:EM ALGORITHM
UCL classification:UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science

Archive Staff Only: edit this record