A framework for the management of changing biological experimentation.
Doctoral thesis, UCL (University College London).
There is no point expending time and effort developing a model if it is based on data that is out of date. Many models require large amounts of data from a variety of heterogeneous sources. This data is subject to frequent and unannounced changes. It may only be possible to know that data has fallen out of date by reconstructing the model with the new data but this leads to further problems. How and when does the data change and when does the model need to be rebuilt? At best, the model will need to be continually rebuilt in a desperate attempt to remain current. At worst, the model will be producing erroneous results. The recent advent of automated and semi-automated data-processing and analysis tools in the biological sciences has brought about a rapid expansion of publicly available data. Many problems arise in the attempt to deal with this magnitude of data; some have received more attention than others. One significant problem is that data within these publicly available databases is subject to change in an unannounced and unpredictable manner. Large amounts of complex data from multiple, heterogeneous sources are obtained and integrated using a variety of tools. These data and tools are also subject to frequent change, much like the biological data. Reconciling these changes, coupled with the interdisciplinary nature of in silico biological experimentation, presents a significant problem. We present the ExperimentBuilder, an application that records both the current and previous states of an experimental environment. Both the data and metadata about an experiment are recorded. The current and previous versions of each of these experimental components are maintained within the ExperimentBuilder. When any one of these components change, the ExperimentBuilder estimates not only the impact within that specific experiment, but also traces the impact throughout the entire experimental environment. This is achieved with the use of keyword profiles, a heuristic tool for estimating the content of the experimental component. We can compare one experimental component to another regardless of their type and content and build a network of inter-component relationships for the entire environment. Ultimately, we can present the impact of an update as a complete cost to the entire environment in order to make an informed decision about whether to recalculate our results.
|Title:||A framework for the management of changing biological experimentation|
|Open access status:||An open access version is available from UCL Discovery|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science > Computer Science|
Archive Staff Only