UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Optimisation approaches for data mining in biological systems

Yang, L; (2016) Optimisation approaches for data mining in biological systems. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Thesis Yang Revised.pdf]
Thesis Yang Revised.pdf - Accepted Version

Download (2MB) | Preview


The advances in data acquisition technologies have generated massive amounts of data that present considerable challenge for analysis. How to efficiently and automatically mine through the data and extract the maximum value by identifying the hidden patterns is an active research area, called data mining. This thesis tackles several problems in data mining, including data classification, regression analysis and community detection in complex networks, with considerable applications in various biological systems. First, the problem of data classification is investigated. An existing classifier has been adopted from literature and two novel solution procedures have been proposed, which are shown to improve the predictive accuracy of the original method and significantly reduce the computational time. Disease classification using high throughput genomic data is also addressed. To tackle the problem of analysing large number of genes against small number of samples, a new approach of incorporating extra biological knowledge and constructing higher level composite features for classification has been proposed. A novel model has been introduced to optimise the construction of composite features. Subsequently, regression analysis is considered where two piece-wise linear regression methods have been presented. The first method partitions one feature into multiple complementary intervals and ts each with a distinct linear function. The other method is a more generalised variant of the previous one and performs recursive binary partitioning that permits partitioning of multiple features. Lastly, community detection in complex networks is investigated where a new optimisation framework is introduced to identify the modular structure hidden in directed networks via optimisation of modularity. A non-linear model is firstly proposed before its linearised variant is presented. The optimisation framework consists of two major steps, including solving the non-linear model to identify a coarse initial partition and a second step of solving repeatedly the linearised models to re fine the network partition.

Type: Thesis (Doctoral)
Title: Optimisation approaches for data mining in biological systems
Event: UCL
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
URI: https://discovery.ucl.ac.uk/id/eprint/1473809
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item