UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Genetic classification of populations using supervised learning

Bridges, M; Heron, EA; O'Dushlaine, C; Segurado, R; International Schizophrenia, C; Morris, D; Corvin, A; ... Pinto, C; + view all (2011) Genetic classification of populations using supervised learning. PLoS One , 6 (5) , Article e14802. 10.1371/journal.pone.0014802. Green open access

[thumbnail of journal.pone.0014802.pdf]
Preview
PDF
journal.pone.0014802.pdf

Download (696kB)

Abstract

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

Type: Article
Title: Genetic classification of populations using supervised learning
Open access status: An open access version is available from UCL Discovery
DOI: 10.1371/journal.pone.0014802
Publisher version: http://dx.doi.org/10.1371/journal.pone.0014802
Language: English
Additional information: © 2011 Bridges et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Division of Psychiatry
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health > Genetics and Genomic Medicine Dept
URI: https://discovery.ucl.ac.uk/id/eprint/1395479
Downloads since deposit
115Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item