UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Optimisation-based approaches for data analysis

Xu, G; (2008) Optimisation-based approaches for data analysis. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Xu.Gang_thesis.Redacted.pdf]
Preview
Text
Xu.Gang_thesis.Redacted.pdf

Download (13MB) | Preview

Abstract

Recent advances in science and technology promote the generation of a huge amount of data from various sources including scientific experiments, social surveys and practical observations. The availability of powerful computer hardware and software offers easier ways to store datasets. However, more efficient and accurate methodologies are required to analyse datasets and extract useful information from them. This work aims at applying mathematical programming and optimisation methodologies to analyse different forms of datasets. The research focuses on three areas including data classification, community structure identification of complex networks and DNA motif discovery. Firstly, a general data classification problem is investigated. A mixed integer optimisation-based approach is proposed to reveal the patterns hidden behind training data samples using a hyper-box representation. An efficient solution methodology is then developed to extend the applicability of hyper-box classifiers to datasets with many training samples and complex structures. Secondly, the network community structure identification problem is addressed. The proposed mathematical model finds optimal modular structures of complex networks through the maximisation of network modularity metric. Communities of medium/large networks are identified through a two-stage solution algorithm developed in this thesis. Finally, the third part presents an optimisation-based framework to extract DNA motifs and consensus sequences. The problem is formulated as a mixed integer linear programming model and an iterative solution procedure is developed to identify multiple motifs in each DNA sequence. The flexibility of the proposed motif finding approach is then demonstrated to incorporate other biological features.

Type: Thesis (Doctoral)
Title: Optimisation-based approaches for data analysis
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest. Third party copyright material has been removed from the ethesis. Images identifying individuals have been redacted or partially redacted to protect their identity.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Chemical Engineering
URI: https://discovery.ucl.ac.uk/id/eprint/15947
Downloads since deposit
63Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item