UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Imputation Aided Methylation Analysis

Moghul, Muhammad Ismail; (2021) Imputation Aided Methylation Analysis. Doctoral thesis (Ph.D), UCL (University College London).

[thumbnail of moghul_imputation_aided_methylation_analysis.pdf] Text
moghul_imputation_aided_methylation_analysis.pdf - Published Version
Access restricted to UCL open access staff until 1 September 2022.

Download (92MB)

Abstract

Genome-wide DNA methylation analysis is of broad interest to medical research because of its central role in human development and disease. However, generating high-quality methylomes on a large scale is particularly expensive due to technical issues inherent to DNA treatment with bisulfite, requiring deeper than usual sequencing. In silico methodologies, such as imputation, can be used to address this limitation and improve the coverage and quality of data produced in these experiments. Imputation is a statistical technique where missing values are substituted with computed values. The process involves leveraging information from reference data to calculate probable values for missing data points. In this thesis, imputation is explored for its potential to increase the value of methylation datasets sequenced at different depths: 1. First, a new R package, Methylation Analysis ToolkiT (MATT), was developed to deal with large numbers of WGBS datasets in a computationally- and memory-efficient manner. 2. Second, the performance of DNA methylation-specific and generic imputation tools were assessed by down-sampling high-quality (100x) WGBS datasets to determine the extent to which missing data can be recovered and the accuracy of imputed values. 3. Third, to overcome shortfalls within existing tools, a novel imputation tool was developed, termed Global IMputation of cpg MEthylation (GIMMEcpg). GIMMEcpg default implementation is based on Model Stacking and outperforms existing tools in accuracy and speed. 4. Lastly, to demonstrate its potential, GIMMEcpg was used to impute ten shallow (17x) WGBS datasets from healthy volunteers of the Personal Genome Project UK with high accuracy. Moreover, the extent of missing and low-quality data, as well as the reproducibility and accuracy of methylation datasets, were explored for different data types (Microarrays, Reduced Representation Bisulfite Sequencing (RRBS), Whole Genome Bisulfite Sequencing (WGBS), EM-Seq and Nanopore sequencing).

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Imputation Aided Methylation Analysis
Event: UCL (University College London)
Language: English
Additional information: Copyright © The Author 2021. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
Keywords: DNA Methylation, Data Science, Imputation, Machine Learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Cancer Institute
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Cancer Institute > Research Department of Cancer Bio
URI: https://discovery.ucl.ac.uk/id/eprint/10132964
Downloads since deposit
3Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item