UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A heuristic approach to handling missing data in biologics manufacturing databases

Mante, J; Gangadharan, N; Sewell, DJ; Turner, R; Field, R; Oliver, SG; Slater, N; (2019) A heuristic approach to handling missing data in biologics manufacturing databases. Bioprocess and Biosystems Engineering , 42 (4) pp. 657-663. 10.1007/s00449-018-02059-5. Green open access

[thumbnail of A heuristic approach to handling missing data in biologics manufacturing databases.pdf]
Preview
Text
A heuristic approach to handling missing data in biologics manufacturing databases.pdf - Published Version

Download (988kB) | Preview

Abstract

The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.

Type: Article
Title: A heuristic approach to handling missing data in biologics manufacturing databases
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s00449-018-02059-5
Publisher version: http://dx.doi.org/10.1007/s00449-018-02059-5
Language: English
Additional information: OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Keywords: Biologics manufacturing data, Missing data, Imputation, Parameter recurrence, Data pre-processing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Biochemical Engineering
URI: https://discovery.ucl.ac.uk/id/eprint/10109706
Downloads since deposit
134Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item