UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Inference of population structure using dense haplotype data.

Lawson, DJ; Hellenthal, G; Myers, S; Falush, D; (2012) Inference of population structure using dense haplotype data. PLoS Genetics , 8 (1) , Article e1002453. 10.1371/journal.pgen.1002453. Green open access

[thumbnail of 1368249.pdf]
Preview
PDF
1368249.pdf
Available under License : See the attached licence file.

Download (1MB)

Abstract

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.

Type: Article
Title: Inference of population structure using dense haplotype data.
Location: US
Open access status: An open access version is available from UCL Discovery
DOI: 10.1371/journal.pgen.1002453
Publisher version: http://dx.doi.org/10.1371/journal.pgen.1002453
Language: English
Additional information: © 2012 Lawson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. PMCID: PMC3266881
Keywords: Algorithms, Computer Simulation, Continental Population Groups, Genome, Human, Haplotypes, Human Genome Project, Humans, Linkage Disequilibrium, Models, Theoretical, Polymorphism, Single Nucleotide, Population, Principal Component Analysis, Software
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery.ucl.ac.uk/id/eprint/1368249
Downloads since deposit
153Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item