Dallman, T.J.;
(2008)
An automated approach to remote protein homology classification.
Doctoral thesis , University of London.
PDF
U591450.pdf Download (119MB) |
Abstract
The classification of protein structures into evolutionary superfamilies, for example in the CATH or SCOP domain structure databases, although performed with varying degrees of automation, has remained a largely subjective activity guided by expert knowledge. The huge expansion of the Protein Structure Databank (PDB), partly due to the structural genomics initiatives, has posed significant challenges to maintaining the coverage of these structural classification resources. This is because the high degree of manual assessment currently involved has affected their ability to keep pace with high throughput structure determination. This thesis presents an evaluation of different methods used in remote homologue detection which was performed to identify the most powerful approaches currently available. The design and implementation of new protocols suitable for remote homologue detection was informed by an analysis of the extent to which different homologous superfamilies in CATH evolve in sequence, structure and function and characterisation of the mechanisms by which this occurs. This analysis revealed that relatives in some highly populated CATH superfamilies have diverged considerably in their structures. In diverse relatives, significant variations are observed in the secondary structure embellishments decorating the common structural core for the superfamily. There are also differences in the packing angles between secondary structures. Information on the variability observed in CATH superfamilies is collated in an established web resource the Dictionary of Homologous Superfamilies, which has been expanded and improved in a number of ways. A new structural comparison algorithm, CATHEDRAL, is described. This was developed to cope with the structural variation observed across CATH superfamilies and to improve the automatic recognition of domain boundaries in multidomain structures. CATHEDRAL combines both secondary structure matching and accurate residue alignment in an iterative protocol for determining the location of previously observed folds in novel multi-domain structures. A rigorous benchmarking protocol is also described that assesses the performance of CATHEDRAL against other leading structural comparison methods. The optimisation and benchmarking of several other methods for detecting homology are subsequently presented. These include methods which exploit Hidden Markov Models (HMMs) to detect sequence similarity and methods that attempt to assess functional similarity. Finally an automated, machine learning approach to detecting homologous relationships between proteins is presented which combines information on sequence, structure and functional similarity. This was able to identify over 85% of the homologous relationships in the CATH classification at a 5% error rate. This thesis was gratefully supported by the Biotechnology and Biological Sciences Research Council.
Type: | Thesis (Doctoral) |
---|---|
Title: | An automated approach to remote protein homology classification. |
Identifier: | PQ ETD:591450 |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. Third party copyright material has been removed from the ethesis |
UCL classification: | UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology |
URI: | https://discovery.ucl.ac.uk/id/eprint/1444148 |
Archive Staff Only
View Item |