Blundell, C;
(2015)
Bayesian methods for hierarchical clustering and community discovery.
Doctoral thesis , UCL (University College London).
Abstract
Discovering clusters in data is a common goal of statistical data analysis. Two kinds of clustering, hierarchical clustering and community discovery, are considered here, as well as their composition|discovering hierarchies of communities. Hierarchical clustering discovers hierarchies of clusters of data, represented as a tree whose leaves are the data. Community discovery nds clusters of people, most commonly from the adjacency matrix of a graph of the relationships between the people. We shall leverage Bayesian statistics to construct several models and corresponding e cient learning algorithms for discovering hierarchies, communities and hierarchies of communities. This thesis has three main contributions, each being a model and a learning algorithm for tackling one of these clustering problems. First we develop an e cient model-based hierarchical clustering algorithm using greedy model selection. Unlike many other hierarchical clustering algorithms, our model is not necessarily a binary tree, but can be any tree where each internal node may have any number of children. This can lead to simpler hierarchies that we nd are just as predictive of the data, but are more interpretable as the hierarchies are less visually cluttered and the underlying model has fewer parameters than a binary tree-based model. We then adapt this hierarchical clustering model and algorithm to discovering communities in social networks based upon their adjacency matrix, where the leaves of the discovered tree correspond to people. This adaptation is not straightforward as a na ve adaptation leads to an ine cient learning algorithm. We develop a dynamic programming scheme and number of approximations that yield several fast algorithms. We then show empirically that these approximations are faster than the In nite Relational Model, producing similar or better predictions in less time for the task of predicting unobserved edges in a graph. Finally we tackle the problem of discovering communities directly from interactions among individuals, rather than from the adjacency matrix of a graph. We develop a model that uses a statistical notion of reciprocity to discover communities from time-series interaction data. We then develop an Markov Chain Monte Carlo method for inference and show empirically that this model is much better at predicting future interactions among individuals than several alternate models.
Type: | Thesis (Doctoral) |
---|---|
Title: | Bayesian methods for hierarchical clustering and community discovery |
Language: | English |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/1466632 |
Archive Staff Only
View Item |