UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

A probabilistic hierarchical clustering method for organising collections of text documents

Vinokourov, A; Girolami, M; (2000) A probabilistic hierarchical clustering method for organising collections of text documents. Presented at: 15th International Conference on Pattern Recognition (ICPR-2000), BARCELONA, SPAIN.

Full text not available from this repository.

Abstract

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is base on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on the multinomial binomial distributions are most appropriate. An Expectation Maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections.

Type: Conference item (UNSPECIFIED)
Title: A probabilistic hierarchical clustering method for organising collections of text documents
Event: 15th International Conference on Pattern Recognition (ICPR-2000)
Location: BARCELONA, SPAIN
Dates: 03 September 2000 - 07 September 2000
ISBN: 0-7695-0751-4
Keywords: EM ALGORITHM
UCL classification: UCL > School of BEAMS > Faculty of Maths and Physical Sciences
UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science
URI: http://discovery.ucl.ac.uk/id/eprint/1339671
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item