UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Prediction of DNA i-motifs via machine learning

Yang, Bibo; Guneri, Dilek; Yu, Haopeng; Wright, Elisé P; Chen, Wenqian; Waller, Zoë AE; Ding, Yiliang; (2024) Prediction of DNA i-motifs via machine learning. Nucleic Acids Research , Article gkae092. 10.1093/nar/gkae092. (In press). Green open access

[thumbnail of gkae092.pdf]
Preview
Text
gkae092.pdf - Published Version

Download (1MB) | Preview

Abstract

i-Motifs (iMs), are secondary structures formed in cytosine-rich DNA sequences and are involved in multiple functions in the genome. Although putative iM forming sequences are widely distributed in the human genome, the folding status and strength of putative iMs vary dramatically. Much previous research on iM has focused on assessing the iM folding properties using biophysical experiments. However, there are no dedicated computational tools for predicting the folding status and strength of iM structures. Here, we introduce a machine learning pipeline, iM-Seeker, to predict both folding status and structural stability of DNA iMs. The programme iM-Seeker incorporates a Balanced Random Forest classifier trained on genome-wide iMab antibody-based CUT&Tag sequencing data to predict the folding status and an Extreme Gradient Boosting regressor to estimate the folding strength according to both literature biophysical data and our in-house biophysical experiments. iM-Seeker predicts DNA iM folding status with a classification accuracy of 81% and estimates the folding strength with coefficient of determination (R2) of 0.642 on the test set. Model interpretation confirms that the nucleotide composition of the C-rich sequence significantly affects iM stability, with a positive correlation with sequences containing cytosine and thymine and a negative correlation with guanine and adenine.

Type: Article
Title: Prediction of DNA i-motifs via machine learning
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/nar/gkae092
Publisher version: https://doi.org/10.1093/nar/gkae092
Language: English
Additional information: Copyright © The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > UCL School of Pharmacy
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > UCL School of Pharmacy > Pharma and Bio Chemistry
URI: https://discovery.ucl.ac.uk/id/eprint/10187581
Downloads since deposit
40Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item