UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features

Jones, DT; Kandathil, SM; (2018) High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics , 34 (19) pp. 3308-3315. 10.1093/bioinformatics/bty341. Green open access

[img]
Preview
Text
DeepCov_accepted.pdf - Accepted version

Download (827kB) | Preview

Abstract

Motivation: In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue–residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. / Results: Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. / Availability and implementation: DeepCov is freely available at https://github.com/psipred/DeepCov. / Contact: d.t.jones@ucl.ac.uk / Supplementary information: Supplementary data are available at Bioinformatics online.

Type: Article
Title: High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/bioinformatics/bty341
Publisher version: https://doi.org/10.1093/bioinformatics/bty341
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10047283
Downloads since deposit
45Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item