eprintid: 53740
rev_number: 36
eprint_status: archive
userid: 608
dir: disk0/00/05/37/40
datestamp: 2010-10-16 04:02:26
lastmod: 2021-10-19 21:57:29
status_changed: 2010-10-16 04:02:26
type: article
metadata_visibility: show
item_issues_count: 0
creators_name: Tress, ML
creators_name: Cozzetto, D
creators_name: Tramontano, A
creators_name: Valencia, A
title: An analysis of the Sargasso Sea resource and the consequences for database composition
ispublished: pub
divisions: UCL
divisions: B04
divisions: C05
keywords: MULTIPLE SEQUENCE ALIGNMENT, PROTEIN SEQUENCES, PSI-BLAST, FAMILIES, METAGENOMICS, PREDICTION, EVOLUTION, ACCURACY, GENOMES, REGIONS
note: © 2006 Tress et al; licensee BioMed Central Ltd. 


This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
abstract: Background: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.Results: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.Conclusion: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques.
date: 2006-04-19
publisher: BIOMED CENTRAL LTD
official_url: http://dx.doi.org/10.1186/1471-2105-7-213
vfaculties: VENG
oa_status: green
language: eng
primo: open
primo_central: open_green
article_type_text: Article
verified: verified_batch
elements_source: Web of Science
elements_id: 139564
doi: 10.1186/1471-2105-7-213
language_elements: EN
lyricists_name: Cozzetto, Domenico
lyricists_id: DCOZZ96
full_text_status: public
publication: BMC BIOINFORMATICS
volume: 7
article_number: 213
issn: 1471-2105
citation:        Tress, ML;    Cozzetto, D;    Tramontano, A;    Valencia, A;      (2006)    An analysis of the Sargasso Sea resource and the consequences for database composition.                   BMC BIOINFORMATICS , 7     , Article 213.  10.1186/1471-2105-7-213 <https://doi.org/10.1186/1471-2105-7-213>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/53740/1/1471-2105-7-213.pdf