UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Gene expression data annotation, effective storage, and enrichment through data mining

Sideris, E; (2007) Gene expression data annotation, effective storage, and enrichment through data mining. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Sideris_thesis.pdf]
Preview
Text
Sideris_thesis.pdf

Download (26MB) | Preview

Abstract

This thesis describes the development of different bioinformatics resources and data-mining strategies for managing and analysing the large amounts of data produced by microarray gene expression experiments. Initially, this involved addressing the problem of effectively capturing gene expression microarray data and the accompanying meta-data annotations de scribing the experimental process. This is necessary for reasons of archiving, interchange and reproducibility of datasets and comparability between them. This was achieved by the development of meditor, a graphical computer programme which allows the description of microarray experimental information through the use of diagrams and ontology-driven forms, meditor adheres to the standards set by the Microarray Gene Expression Data Society (MGED), and therefore is able to capture all the experimental information describable within the standard in a platform-independent manner. Subsequently, in order to provide capabilities for the formal modelling of gene expression analysis concepts, the concepts involved in the external validation of gene expression clusterings were formalised and defined as an object model. This model was developed with the implementation of data interchange file formats in mind. This work complements the object model of the MGED Society and attempts to cover an area that has not been formalised in a platform-independent manner by the standard object model. Finally, a method was developed to allow the use of knowledge on protein functions and protein-protein interactions to identify coherent sets of co-regulated genes suggested by the clustering of gene expression profiles. This was achieved through the development of a gene expression clustering quality metric, which judges the tightness and separation of gene expression clusters, thus providing a quality measure on a clustering or a per-cluster basis. Cluster tightness and separation are assessed by harnessing the manual annotations provided by the Gene Ontology, enriched using integrated biological information available through an in-house data warehouse (BioMap). The metric was tested on a human B-cell gene expression dataset and refined on the basis of the results produced.

Type: Thesis (Doctoral)
Title: Gene expression data annotation, effective storage, and enrichment through data mining
Identifier: PQ ETD:593171
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
URI: https://discovery.ucl.ac.uk/id/eprint/1445847
Downloads since deposit
29Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item