UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Support vector machines for drug discovery

Trotter, MWB; (2007) Support vector machines for drug discovery. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Trotter_thesis.Redacted.pdf]
Preview
Text
Trotter_thesis.Redacted.pdf

Download (12MB) | Preview

Abstract

Support vector machines (SVMs) have displayed good predictive accuracy on a wide range of classification tasks and are inherently adaptable to complex problem domains. Structure-property correlation (SPC) analysis is a vital part of the contemporary drug discovery process, in which several components of the search for novel molecular compounds with therapeutic potential may be performed by computer (in silicd). Inferred relationships between molecular structure and biological properties of interest are used to eliminate compounds unsuitable for further development. In order to improve process efficiency without rejecting useful compounds, predictive accuracy of such relationships must remain high despite a paucity of data from which to infer them. This thesis describes the application of SVMs to SPC analysis and investigates methods with which to enhance performance and facilitate integration of the technique into present practice. Overviews of contemporary drug discovery and the role of machine learning place the investigation into context. Computational discrimination between compounds according to their structures and properties of interest is described in detail, as is the SVM algorithm. A framework for the assessment of supervised machine learning performance on SPC data is proposed and employed to assess SVM performance alongside state-of-the-art techniques for in silico SPC analysis on data provided by GlaxoSmithKline. SVM performance is competitive and the comparison prompts adaptations of both data treatment and algorithmic application to explore the effects of data paucity, class imbalance and outlying data. Subsequent work weights the SVM kernel matrix to recognise heavily populated regions of training data and suggests the incorporation of domain-specific clustering methods to assist the standard SVM algorithm. The notion that SVM kernel functions may incorporate existing domain-specific methods leads to kernel functions that employ existing pharmaceutical similarity measures to treat an abstract, binary representation of molecular structure that is not used widely for SPC analysis.

Type: Thesis (Doctoral)
Title: Support vector machines for drug discovery
Identifier: PQ ETD:593209
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest. Third party copyright material has been removed from the ethesis. Images identifying individuals have been redacted or partially redacted to protect their identity.
URI: https://discovery.ucl.ac.uk/id/eprint/1445885
Downloads since deposit
63Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item