McMillan, LEM;
Martin, ACR;
(2008)
Automatically extracting functionally equivalent proteins from SwissProt.
BMC BIOINFORMATICS
, 9
, Article 418. 10.1186/1471-2105-9-418.
Preview |
PDF
1471-2105-9-418.pdf Download (422kB) |
Abstract
Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/SwissProt and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/SwissProt which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/SwissProt to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/SwissProt functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/SwissProt annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/SwissProt format, which would facilitate text-mining of UniProtKB/ SwissProt.
Type: | Article |
---|---|
Title: | Automatically extracting functionally equivalent proteins from SwissProt |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1186/1471-2105-9-418 |
Publisher version: | http://dx.doi.org/10.1186/1471-2105-9-418 |
Language: | English |
Additional information: | © 2008 McMillan and Martin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | GENE DUPLICATION, COG DATABASE, ANNOTATION, GENOMICS, IDENTIFICATION, CLASSIFICATION, ORTHOLOGS, NEIGHBOR |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology |
URI: | https://discovery.ucl.ac.uk/id/eprint/70521 |
Archive Staff Only
View Item |