UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins

Kandathil, SM; Greener, JG; Lau, AM; Jones, DT; (2022) Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proceedings of the National Academy of Sciences of the United States of America , 119 (4) , Article e2113348119. 10.1073/pnas.2113348119. Green open access

[thumbnail of Kandathil_pnas.2113348119.pdf]
Preview
Text
Kandathil_pnas.2113348119.pdf

Download (1MB) | Preview

Abstract

Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.

Type: Article
Title: Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins
Open access status: An open access version is available from UCL Discovery
DOI: 10.1073/pnas.2113348119
Publisher version: https://doi.org/10.1073/pnas.2113348119
Language: English
Additional information: Copyright © 2022 the Author(s). Published by PNAS. This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Protein structure prediction, deep learning, metagenomics
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10140396
Downloads since deposit
33Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item