A bioinformatic study of the 5' upstream region of the human gene.
Doctoral thesis, UCL (University College London).
Understanding transcription regulation is a key issue in biology. There is much non-coding sequence upstream of the human gene, a small proportion of which is regulatory. The aim of this project is to gain a better understanding of organisation, structure and function of this region via large-scale sequence studies. The 10Kb 5' upstream sequence has been analysed for changes at different positional sections along this stretch. There are four main categories of investigation: 1. Dinucleotide composition and representation. 2. Distance from randomness by comparing real and randomised sequences. 3. Sequence similarity using a pattern analysis. 4. Distribution and representation of regulatory motifs. DNA generally avoids flexible dinucleotide steps. In the upstream sequence the DNA becomes even less flexible towards the start site of transcription. In contrast there is an increase in bistable steps in this direction. It is concluded that these structural changes including enhanced stiff and bistable steps are likely important in transcription regulatory regions. The weak/strong (W/S) and purine/pyrimidine (R/Y) properties in the upstream sequence are different to each other and even opposing at times. For instance, the R/Y sequence becomes more distant from the random model towards the start site whereas the W/S becomes closer to it. Opposing sequence similarity trends are observed across the upstream sequence for R/Y and W/S. Also, the regulatory motif distribution and representation are very different depending on whether the sequence is viewed as R/Y or alternatively as W/S nucleotides. This is likely due to the different roles of these nucleotide properties within the upstream DNA and more specifically within regulatory regions. These results may have important implications for the process of direct and indirect readout in protein-DNA binding. It is suggested that the R/Y sequence generally has a greater influence over indirect readout whereas the W/S sequence has more impact on direct readout. Furthermore it is proposed that avoidance of inappropriate (or promiscuous) regulatory protein binding to DNA occurs primarily via the R/Y sequence of the regulatory elements, i.e. via avoidance of indirect readout and the docking step of protein-DNA binding. Therefore this study of DNA sequence has revealed changes in specific properties across the upstream region of the human gene from which have been drawn conclusions about its role in transcription regulation.
|Title:||A bioinformatic study of the 5' upstream region of the human gene|
|Open access status:||An open access version is available from UCL Discovery|
|Additional information:||Authorisation for digitisation not received|
Archive Staff Only