A probabilistic hierarchical clustering method for organising collections of text documents.
In: Sanfeliu, A and Villanueva, JJ and Vanrell, M and Alquezar, R and Jain, AK and Kittler, J, (eds.)
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS.
(pp. 182 - 185).
IEEE COMPUTER SOC
In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is base on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on the multinomial binomial distributions are most appropriate. An Expectation Maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections.
|Title:||A probabilistic hierarchical clustering method for organising collections of text documents|
|Event:||15th International Conference on Pattern Recognition (ICPR-2000)|
|Dates:||2000-09-03 - 2000-09-07|
|UCL classification:||UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science|
Archive Staff Only