Sillitoe, Ian;
(2002)
Consensus templates for protein structure recognition.
Doctoral thesis (Ph.D), UCL (University College London).
Text
out.pdf Download (23MB) |
Abstract
Molecular biology has moved into the new millennium with the human genome sequenced and publicly available. The challenge now facing the bioinformatics field is to assign structure and functional information to protein sequences generated by this and many other genomic projects. To meet this challenge, several structural genomics initiatives are currently underway with the aim of providing, where possible, a protein structure within homology modelling distance for every known sequence. As a result, structure classification databases will need to provide novel methods in order to cope with this high influx of structures. This thesis presents work on the classification, analysis and recognition of protein structures using the CATII protein structure classification database. Structural similarity is measured by comparing contact maps, or the points of contact between amino acid residues. By examining related structures, it has been possible to identify contacts that have been highly conserved during the process of evolution. Protocols to generate accurate multiple structure alignments and 3D templates based on consensus contact patterns found in these alignments have been developed. Templates have been generated for all homologous superfamilies in CATH to create a library of unique and identifying 'fingerprint' patterns. These templates were applied to the recognition of models generated at an early stage of ah initio protein structure prediction. Scanning these early models against a library of templates describing conserved contacts allowed the most likely superfamily to be identified. An algorithm was also written that performed fold recognition using only a limited set of contacts with the purpose of application to the early stages of experimental NMR structure determination. Finally, the multiple structural alignments have been used to generate a library of hidden Markov models (HMMs). These structure-based sequence profiles were thoroughly benchmarked using a strict dataset of remote homologues and appear to outperform other commonly used sequence methods. This work was generously supported by the Biotechnology and Biological Sciences Research Council.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Consensus templates for protein structure recognition |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Biological sciences; Protein structure |
URI: | https://discovery.ucl.ac.uk/id/eprint/10102411 |
Archive Staff Only
View Item |