UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data

Bowles, H; Kabiljo, R; Al Khleifat, A; Jones, A; Quinn, JP; Dobson, RJB; Swanson, CM; ... Iacoangeli, A; + view all (2023) An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data. Frontiers in Bioinformatics , 2 , Article 1062328. 10.3389/fbinf.2022.1062328. Green open access

[thumbnail of fbinf-02-1062328.pdf]
Preview
PDF
fbinf-02-1062328.pdf - Published Version

Download (1MB) | Preview

Abstract

There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.

Type: Article
Title: An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
Location: Switzerland
Open access status: An open access version is available from UCL Discovery
DOI: 10.3389/fbinf.2022.1062328
Publisher version: https://doi.org/10.3389/fbinf.2022.1062328
Language: English
Additional information: This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third-party material in this article are included in the Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Keywords: Benchmarking, bioinformatics, herv-k, retrovirus, whole-genome sequencing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology > Department of Neuromuscular Diseases
URI: https://discovery.ucl.ac.uk/id/eprint/10180920
Downloads since deposit
13Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item