UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterised proteins

Kandathil, S; Jones, D; Greener, J; Lau, A; (2022) Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterised proteins. Proceedings of the National Academy of Sciences of USA (In press).

[thumbnail of Kandathil_DMPfold2-MainManuscriptwithSI-AcceptedVersion.pdf] Text
Kandathil_DMPfold2-MainManuscriptwithSI-AcceptedVersion.pdf - Accepted Version
Access restricted to UCL open access staff until 15 June 2022.

Download (2MB)

Abstract

Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologues of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale 3-D modelling of proteins on minimal hardware, and we demonstrate that by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.

Type: Article
Title: Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterised proteins
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Protein structure prediction, deep learning, metagenomics
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10140396
Downloads since deposit
2Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item