Todd, Annabel Elisabeth;
(2002)
Evolution of function in protein superfamilies.
Doctoral thesis (Ph.D), UCL (University College London).
Text
out.pdf Download (40MB) |
Abstract
The recent growth in protein data has revealed the functional promiscuity of many protein superfamilies. Understanding the molecular basis of observed functional variations is essential if we are to benefit from the wealth of genome sequence data from which we must unravel gene functions. This is addressed in this thesis by way of a systematic analysis of proteins in the Protein Data Bank for which we have structural data and detailed functional information. 31 structural enzyme superfamilies have been analysed in detail. A large number of variations and peculiarities are observed, at the atomic level through to gross structural rearrangements. Some superfamilies have achieved their diversity by an extraordinarily wide variety of routes, including the loss/gain of catalytic metal sites, domain embellishments, internal duplications, domain rearrangements, fusion and oligomerisation, whilst a small number have diversified considerably through modifications in the active-site alone, without the recruitment of other modules or subunits. Commonly, substrate specificity is diverse across a superfamily, whilst some or all aspects of the reaction chemistry are maintained. In as many as 15 families, residues that play equivalent functional roles in related proteins may be situated at different points in the protein scaffold. The implications of this work for structural genomics projects are discussed. The applicability of automated protein function prediction by sequence database searching was investigated, using CATH and the EC scheme for their classifications of families and enzyme functions, with additional sequence data included. An initial overview of the data indicates that the majority of superfamilies display variation in enzyme function, with 25% of families having members of different enzyme types. For single- and multi-domain proteins, variation in EC number is rare above 40% sequence identity, and above 30%, the first three digits may be predicted with an accuracy of at least 90%. Below 30%, conservation of function falls rapidly and the structural data are essential for understanding the origins of observed functional differences.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Evolution of function in protein superfamilies |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Biological sciences; Protein superfamilies |
URI: | https://discovery.ucl.ac.uk/id/eprint/10102242 |
Archive Staff Only
View Item |