UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Computational identification of regulatory features affecting splicing in the human brain

Emmett, WA; (2016) Computational identification of regulatory features affecting splicing in the human brain. Doctoral thesis , UCL (University College London). Green open access

[img]
Preview
Text
warren_emmett_thesis_final.pdf - ["content_typename_Accepted version" not defined]

Download (6MB) | Preview

Abstract

RNA splicing has enabled a dramatic increase in species complexity. Splicing occurs in over 95% of mam- malian genes allowing the development of exceptional cellular diversity without an increase in raw gene numbers. This is highlighted by the fact that human and nematodes have the same number of genes (20,000 human genes versus 19,000 genes in Caenorhabditis elegans). Although the mechanistic process of splicing is now well understood there remains a multitude of unexplored dynamics that have only become visible with the power of next generation sequencing (NGS). The human brain is one of the best examples of an intricate cellular structure. Neuronal cell types are incredibly diverse and specialised, regulated through various transcriptional mechanisms. Recently, long genes (150kb+) have been implicated as crucial to neuronal function and their impairment has been attributed to several neurological disorders. I explore this relationship further by showing that long genes are more highly expressed in the brain than other tissues. Long genes are also distinct in that they are deficient in H3k36me3, a histone mark largely associated with splicing and active transcription. Through analysis of brain RNA-seq data, a novel splicing mechanism known as recursive splicing was identified in long introns. Recursive splice sites (RSS) consist of an intronic 3’splice site followed immediately by a 5’ splice site. These sites result in a zero-length exon that regulates the use of cryptic promoters ensuring only the functional isoform is expressed. This discovery lead me to question if other non-canonical forms of splicing are common in the brain. Backsplicing is a recently discovered splicing mechanism pervasive in the tree of life. This occurs when a 3’ end of a downstream exon is spliced onto the 5’ end of an upstream exon resulting in a circular RNA molecule (hereafter: circRNA). circRNA are enriched in neuronal genes and mediated by RNA binding factors. I have identified and quantified the presence of circRNA within the brain, identifying a large number of highly expressed novel circRNA. From these findings I identify a subset of highly expressed backsplice junctions that occur between two proximal genes from the same family. vii In order to understand the function of these splicing reactions I inspected the splicing features themselves, namely; the 5’ and 3’ splice sites and the branchpoint. The branchpoint remains a poorly char- acterised feature and until recently very few have been experimentally validated. I explore these features through the ExAC and UCLex consortia, using cumulative variant ratios to annotate invariant positions within the branchpoint and splice sites. By identifying invariant positions I could then investigate how vari- ation impacts splicing efficiency by integrating whole exome and RNA sequence data from the GEUVADIS consortium. Findings show that exon expression is a poor indicator of splicing dysfunction, showing a three fold lower sensitivity than direct analysis of splice junction reads. I also devise a variant effect score that captures a significant portion of change in splice site efficiency enabling improved prediction of deleterious variants. Together, this thesis hints at the massive potential of NGS to investigate the diversity of splicing related features while identifying novel features that could be implicated in neurological dysfunction.

Type: Thesis (Doctoral)
Title: Computational identification of regulatory features affecting splicing in the human brain
Event: UCL
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: http://discovery.ucl.ac.uk/id/eprint/1514514
Downloads since deposit
171Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item