UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Protein Function Prediction methods exploiting the CATH Protein Domain Classification

Bonello, Joseph; (2023) Protein Function Prediction methods exploiting the CATH Protein Domain Classification. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Bonello_10179491_Thesis.pdf]
Preview
Text
Bonello_10179491_Thesis.pdf

Download (9MB) | Preview

Abstract

Proteins are biological building blocks that carry out crucial functions in an organism. The function of proteins is essential to understand the role they play in an organism and for the role they play in disease. As technology advances, the number of sequences available for analysis has also been increasing rapidly. While experimental characterisation remains the gold standard for assigning function to proteins, this process is notoriously slow leading to a large discrepancy between annotated and unannotated proteins. This work describes the development of three methods to predict protein function using computational methods through which it is possible to bridge the gap between the annotated and unannotated proteins. The methods leverage CATH Functional Families (FunFams), which comprise evolutionary related domains grouped into functionally coherent sets, to score the mapping between the FunFams and the GO Terms of the proteins that map to the FunFams. dcGO4CATH is a method that uses a statistical approach to calculate the size of the overlap between the FunFams and the proteins’ GO Terms. Originally developed for SCOP, the method was adapted to work with CATH FunFams. SetCATH uses a set-based approach and applies the Jaccard, Sørensen-Dice and the Overlap Similarity Indexes for the mapping. Additionally, FunPredCATH is an ensemble based on a set-theoretic approach that combines dcGO4CATH, SetCATH and FunFamer(a predictor developed by the Orengo Group). FunPredCATH capitalises on the strengths of the individual predictors to improve predictions. The methods were tested using the CAFA3 benchmark. The results show that SetCATH and FunPredCATH achieve good results, placing in the top tier. Moreover, the methods were applied to proteins involved in the immune response to SARS-CoV-2 which showed that the methods have practical applications in Biology.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Protein Function Prediction methods exploiting the CATH Protein Domain Classification
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
Keywords: Protein Function Prediction, Bioinformatics
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
URI: https://discovery.ucl.ac.uk/id/eprint/10179491
Downloads since deposit
69Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item