UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Automatically extracting functionally equivalent proteins from SwissProt

McMillan, LEM; Martin, ACR; (2008) Automatically extracting functionally equivalent proteins from SwissProt. BMC BIOINFORMATICS , 9 , Article 418. 10.1186/1471-2105-9-418. Green open access

[thumbnail of 1471-2105-9-418.pdf]
Preview
PDF
1471-2105-9-418.pdf

Download (422kB)

Abstract

Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/SwissProt and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/SwissProt which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/SwissProt to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/SwissProt functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/SwissProt annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/SwissProt format, which would facilitate text-mining of UniProtKB/ SwissProt.

Type: Article
Title: Automatically extracting functionally equivalent proteins from SwissProt
Open access status: An open access version is available from UCL Discovery
DOI: 10.1186/1471-2105-9-418
Publisher version: http://dx.doi.org/10.1186/1471-2105-9-418
Language: English
Additional information: © 2008 McMillan and Martin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: GENE DUPLICATION, COG DATABASE, ANNOTATION, GENOMICS, IDENTIFICATION, CLASSIFICATION, ORTHOLOGS, NEIGHBOR
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology
URI: https://discovery.ucl.ac.uk/id/eprint/70521
Downloads since deposit
107Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item