A probabilistic hierarchical clustering method for organising collections of text documents.
Presented at: 15th International Conference on Pattern Recognition (ICPR-2000), BARCELONA, SPAIN.
In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is base on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on the multinomial binomial distributions are most appropriate. An Expectation Maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections.
|Type:||Conference item (UNSPECIFIED)|
|Title:||A probabilistic hierarchical clustering method for organising collections of text documents|
|Event:||15th International Conference on Pattern Recognition (ICPR-2000)|
|Dates:||03 September 2000 - 07 September 2000|
|UCL classification:||UCL > School of BEAMS > Faculty of Maths and Physical Sciences
UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science
Archive Staff Only