UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database

Orlek, A; Phan, H; Sheppard, AE; Doumith, M; Ellington, M; Peto, TEA; Crook, D; ... Stoesser, N; + view all (2017) A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database. Data in Brief , 12 pp. 423-426. 10.1016/j.dib.2017.04.024. Green open access

[thumbnail of Walker_1-s2.0-S2352340917301567-main.pdf]
Preview
Text
Walker_1-s2.0-S2352340917301567-main.pdf - Published Version

Download (142kB) | Preview

Abstract

Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: “Ordering the mob: insights into replicon and MOB typing…” (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.

Type: Article
Title: A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.dib.2017.04.024
Publisher version: http://dx.doi.org/10.1016/j.dib.2017.04.024
Language: English
Additional information: © 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Keywords: Plasmids, Sequence data curation, Complete genomes, Enterobacteriaceae family
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology > MRC Clinical Trials Unit at UCL
URI: https://discovery.ucl.ac.uk/id/eprint/1554651
Downloads since deposit
79Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item