Automatically extracting functionally equivalent proteins from SwissProt

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Automatically extracting functionally equivalent proteins from SwissProt

McMillan, LEM; Martin, ACR; (2008) Automatically extracting functionally equivalent proteins from SwissProt. BMC BIOINFORMATICS , 9 , Article 418. 10.1186/1471-2105-9-418. Green open access

Preview

PDF
1471-2105-9-418.pdf
Download (422kB)

Abstract

Background: There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/SwissProt and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs - for example, all instances of protein C.We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/SwissProt which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach.Results: Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance.Conclusion: In summary, FOSTA provides an automated analysis of annotations in UniProtKB/SwissProt to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/SwissProt functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/SwissProt annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/SwissProt format, which would facilitate text-mining of UniProtKB/ SwissProt.

Type:	Article
Title:	Automatically extracting functionally equivalent proteins from SwissProt
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1186/1471-2105-9-418
Publisher version:	http://dx.doi.org/10.1186/1471-2105-9-418
Language:	English
Additional information:	© 2008 McMillan and Martin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords:	GENE DUPLICATION, COG DATABASE, ANNOTATION, GENOMICS, IDENTIFICATION, CLASSIFICATION, ORTHOLOGS, NEIGHBOR
UCL classification:	UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology
URI:	https://discovery.ucl.ac.uk/id/eprint/70521

Downloads since deposit

0Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item