UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Automated methods for the determination of homologous relationships and functional similarities between protein domains.

Redfern, O.C.; (2007) Automated methods for the determination of homologous relationships and functional similarities between protein domains. Doctoral thesis , University of London. Green open access

[thumbnail of U593383.pdf] Text
U593383.pdf

Download (13MB)

Abstract

CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous relationships between these domains using the structural data from the Protein Data Bank (PDB). The aim of this work was to aid the manual classification of domains into the database and provide putative functional assignments to structures solved by the structural genomics initiatives. A fast and novel algorithm, CATHEDRAL, was developed to make fold assignments to regions of polypeptide chains. By combining a fast secondary-structure method (GRATH) and a slower residue-based method (SSAP), the algorithm was able to accurately assign boundaries for distant relatives, undetectable by sequence methods. Sequence and structural conservation patterns were combined in a novel algorithm, FLORA, to develop structural templates specific to catalytic function. FLORA was able to predict the correct functional site in 80% of cases and combined with global structure comparison, it was able to assign domains to enzyme families within diverse superfamilies. Techniques in structure comparison were also applied to ab initio models of protein domains, in order to assign them to fold groups within the CATH database. A novel scoring method was developed to pre-select models that were more likely to have adopted the correct fold. A selected sample of models for each target structure was then compared against representatives from the CATH database using the MAMMOTH and SSAP algorithms. Data from these alignments were combined using a Support Vector Machine to assign the target to a fold group within CATH. This work was generously supported by the Engineering and Physical Sciences Research Council.

Type: Thesis (Doctoral)
Title: Automated methods for the determination of homologous relationships and functional similarities between protein domains.
Identifier: PQ ETD:593383
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by Proquest
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology
URI: https://discovery.ucl.ac.uk/id/eprint/1446055
Downloads since deposit
104Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item