UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants

Jayaram, N; (2017) Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants. Doctoral thesis , UCL (University College London). Green open access

[img]
Preview
Text
Jayaram_1556214_Thesis_final_version_ethesis.pdf

Download (1MB) | Preview

Abstract

Single nucleotide variants (SNVs) that occur in transcription factor binding sites (TFBSs) can disrupt the binding of transcription factors and alter gene expression which can cause inherited diseases and act as driver SNVs in cancer. The identification of SNVs in TFBSs has historically been challenging given the limited number of experimentally characterised TFBSs. The recent ENCODE project has resulted in the availability of ChIP-Seq data that provides genome wide sets of regions bound by transcription factors. These data have the potential to improve the identification of SNVs in TFBSs. However, as the ChIP-Seq data identify a broader range of DNA in which a transcription factor binds, computational prediction is required to identify the precise TFBS. Prediction of TFBSs involves scanning a DNA sequence with a Position Weight Matrix (PWM) using a pattern matching tool. This thesis focusses on the prediction of TFBSs by: (a) evaluating a set of locally-installable pattern-matching tools and identifying the best performing tool (FIMO), (b) using the ENCODE ChIP-Seq data to evaluate a set of de novo motif discovery tools that are used to derive PWMs which can handle large volumes of data, (c) identifying the best performing tool (rGADEM), (d) using rGADEM to generate a set of PWMs from the ENCODE ChIP-Seq data and (e) by finally checking that the selection of the best pattern matching tool is not unduly influenced by the choice of PWMs. These analyses were exploited to obtain a set of predicted TFBSs from the ENCODE ChIP-Seq data. The predicted TFBSs were utilised to analyse somatic cancer driver, and passenger SNVs that occur in TFBSs. Clear signals in conservation and therefore Shannon entropy values were identified, and subsequently exploited to identify a threshold that can be used to prioritize somatic cancer driver SNVs for experimental validation.

Type: Thesis (Doctoral)
Title: Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology
URI: http://discovery.ucl.ac.uk/id/eprint/1556214
Downloads since deposit
71Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item