UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Detecting positive selection in protein coding genes

Anisimova, Maria; (2003) Detecting positive selection in protein coding genes. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Detecting_positive_selection_i.pdf] Text
Detecting_positive_selection_i.pdf

Download (7MB)

Abstract

Selective pressure at the protein level is typically measured by the nonsynonymous/synonymous rate ratio (ω= dN/ds), with ω<1, =1, and >1 indicating purifying selection, neutral evolution, and positive selection, respectively. Methods that detect positive selection using this criterion are reviewed. I focus on maximum likelihood (ML) methods based on codon substitution models accounting for heterogeneous selective pressure across sites. If ML estimates indicate presence of positive selection and the likelihood ratio test (LRT) is significant, Bayes prediction can be used to identify sites under positive selection. I examine the accuracy and power of LRTs for positive selection and Bayes prediction of residues under positive selection. The use of χ2 for significance testing makes the LRT conservative, especially for small samples of closely related lineages. Nevertheless, if a large number of lineages of sufficient divergence are analyzed, the power of the LRT can be as high as 100%. Both accuracy and power of Bayes prediction are low for data containing only few similar sequences. But sampling a large number of lineages improves the performance substantially. Multiple models of heterogeneous selective pressures among sites should be applied in real data analysis. ML models are phylogeny-based and do not incorporate recombination. To evaluate the effect of recombination on the LRTs and Bayes prediction, data are simulated using a coalescent model with recombination. The LRT is found to be robust to low recombination rates. However, for higher rates, the type-I error rate can be very high. Identification of sites under positive selection by the Bayes method is less affected by recombination than is the LRT. Finally, the hepatitis D antigen gene (HDAg) is tested for positive selection. Sites predicted to evolve under positive selection are found in immunogenic domain and in the N-terminus region with reported antigenic activity. No significant evidence of recombination is found.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Detecting positive selection in protein coding genes
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Biological sciences; Protein coding
URI: https://discovery.ucl.ac.uk/id/eprint/10100823
Downloads since deposit
53Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item