Waman, Vaishali P;
Bordin, Nicola;
Alcraft, Rachel;
Vickerstaff, Robert;
Rauer, Clemens;
Chan, Qian;
Sillitoe, Ian;
... Orengo, Christine; + view all
(2024)
CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds.
Journal of Molecular Biology
, Article 168551. 10.1016/j.jmb.2024.168551.
(In press).
Preview |
Text
1-s2.0-S0022283624001463-main.pdf - Published Version Download (1MB) | Preview |
Abstract
CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.
Type: | Article |
---|---|
Title: | CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds |
Location: | Netherlands |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.jmb.2024.168551 |
Publisher version: | https://doi.org/10.1016/j.jmb.2024.168551 |
Language: | English |
Additional information: | Copyright © 2024 The Author(s). Published by Elsevier Ltd.This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | AlphaFold2, CATH, fold, protein domain, protein structure prediction, superfamily |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology |
URI: | https://discovery.ucl.ac.uk/id/eprint/10190251 |
Archive Staff Only
View Item |