UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3

García-Ruiz, Sonia; Reynolds, Regina Hertfelder; Grant-Peters, Melissa; Gustavsson, Emil Karl; Fairbrother-Browne, Aine; Chen, Zhongbo; Brenton, Jonathan William; (2023) aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3. GigaByte , 2023 , Article gigabyte87. 10.46471/gigabyte.87. Green open access

[thumbnail of Garcia Ruiz_gigabyte87.pdf]
Preview
Text
Garcia Ruiz_gigabyte87.pdf

Download (684kB) | Preview

Abstract

Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present aws-s3-integrity-check, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our aws-s3-integrity-check tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.

Type: Article
Title: aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
Location: China
Open access status: An open access version is available from UCL Discovery
DOI: 10.46471/gigabyte.87
Publisher version: https://doi.org/10.46471/gigabyte.87
Language: English
Additional information: Copyright © The Author(s) 2023. https://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: Software and Workflows, Software Engineering, Workflows
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health > Genetics and Genomic Medicine Dept
URI: https://discovery.ucl.ac.uk/id/eprint/10175935
Downloads since deposit
9Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item