Sideris, E;
(2007)
Gene expression data annotation, effective storage, and enrichment through data mining.
Doctoral thesis , UCL (University College London).
Preview |
Text
Sideris_thesis.pdf Download (26MB) | Preview |
Abstract
This thesis describes the development of different bioinformatics resources and data-mining strategies for managing and analysing the large amounts of data produced by microarray gene expression experiments. Initially, this involved addressing the problem of effectively capturing gene expression microarray data and the accompanying meta-data annotations de scribing the experimental process. This is necessary for reasons of archiving, interchange and reproducibility of datasets and comparability between them. This was achieved by the development of meditor, a graphical computer programme which allows the description of microarray experimental information through the use of diagrams and ontology-driven forms, meditor adheres to the standards set by the Microarray Gene Expression Data Society (MGED), and therefore is able to capture all the experimental information describable within the standard in a platform-independent manner. Subsequently, in order to provide capabilities for the formal modelling of gene expression analysis concepts, the concepts involved in the external validation of gene expression clusterings were formalised and defined as an object model. This model was developed with the implementation of data interchange file formats in mind. This work complements the object model of the MGED Society and attempts to cover an area that has not been formalised in a platform-independent manner by the standard object model. Finally, a method was developed to allow the use of knowledge on protein functions and protein-protein interactions to identify coherent sets of co-regulated genes suggested by the clustering of gene expression profiles. This was achieved through the development of a gene expression clustering quality metric, which judges the tightness and separation of gene expression clusters, thus providing a quality measure on a clustering or a per-cluster basis. Cluster tightness and separation are assessed by harnessing the manual annotations provided by the Gene Ontology, enriched using integrated biological information available through an in-house data warehouse (BioMap). The metric was tested on a human B-cell gene expression dataset and refined on the basis of the results produced.
Type: | Thesis (Doctoral) |
---|---|
Title: | Gene expression data annotation, effective storage, and enrichment through data mining |
Identifier: | PQ ETD:593171 |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
URI: | https://discovery.ucl.ac.uk/id/eprint/1445847 |
Archive Staff Only
View Item |