UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Predicting the structure and function of genomic sequences using the CATH structural database

Bray, James Edward; (2001) Predicting the structure and function of genomic sequences using the CATH structural database. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Predicting_the_structure_and_f.pdf] Text
Predicting_the_structure_and_f.pdf

Download (23MB)

Abstract

The field of bioinformatics faces the challenge of reliably annotating genomic sequences with structural and functional information. Structure classification databases are now sufficiently populated to provide a framework for meeting this challenge. This thesis focuses on the superfamily level of structural classification that groups together distantly related proteins that have evolved from a common ancestor. In order to cope with the functional diversity that occurs at the structural superfamily level, sequences have been classified into functionally related protein families that can serve as the basis for genome annotation. Knowledge of the key structural and functional features of structural superfamilies provides valuable insights for accurately transferring biological information. This thesis describes the development of two new structure-based resources that enhance the ability of the CATH structural database to annotate genomic sequences. Firstly, the CATH Dictionary of Homologous Superfamilies (DHS) presents functionally annotated structural alignments for distantly related domains. Key residues can be identified and used diagnostically for validating the results of sequence search algorithms. Secondly, the CATH Protein Family Database (CATH-PFDB) integrates sequence and structure by assigning genomic sequences to structural superfamilies. The sequences within each superfamily are further clustered into families sharing close functional similarity. Extensive benchmarking of this sequence library using pairwise and profile search algorithms showed that both approaches can used to reliably identify distantly related genomic sequences. A protocol for analysing the quality of three-dimensional protein models derived from distantly related proteins has also been developed. Residue environment scores from the SSAP structure comparison algorithm have been used to identify well- modelled structural fragments through histogram and coverage plots. This facilitates the assessment of structure prediction and modelling algorithms that are vital for accurately transferring structural data to genomic sequences. This work was generously supported by the Biotechnology and Biological Sciences Research Council.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Predicting the structure and function of genomic sequences using the CATH structural database
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Biological sciences; CATH structural database
URI: https://discovery.ucl.ac.uk/id/eprint/10102945
Downloads since deposit
45Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item