Safe model-based design of experiments using Gaussian processes

Construction of kinetic models has become an indispensable step in the development and scale up of processes in the industry. Model-based design of experiments (MBDoE) has been widely used for the purpose of improving parameter precision in nonlinear dynamic systems. This process needs to account for both parametric and structural uncertainty, as the feasibility constraints imposed on the system may well turn out to be violated leading to unsafe experimental conditions when an optimally designed experiment is performed. In this work, a Gaussian process is utilized in a two-fold manner: 1) to quantify the uncertainty realization of the physical system and calculate the plant-model mismatch, 2) to compute the optimal experimental design while accounting for the parametric uncertainty. This method provides a guarantee for the probabilistic satisfaction of the constraints in the context of model-based design of experiments. The method is assisted with the use of adaptive trust-regions in order to facilitate a satisfactory local approximation. The proposed method is able to allow the design of optimal experiments starting from limited preliminary knowledge of the parameter set, leading to a safe exploration of the parameter space. The performance of this method is demonstrated through illustrative case studies regarding the parameter identification of the kinetic model in flow reactors.


Introduction
Biochemical, physicochemical or catalytic processes are often too complex to develop accurate physics-based only models, and plant-model mismatch is inevitable. It is common to develop approximate kinetic models under some assumptions regarding the mechanism of the system to get as accurate representation of the physical system as possible. The availability of a trustworthy kinetic model could be used to predict the behavior of the system outside of the experimental conditions used in the model validation and then be used for design, optimization and control in systems engineering applications [4]. The model identification requires the use of experimental data to evaluate the validity of a proposed model and estimate the model parameters to match the physical system's behaviour over the selected range of operating conditions. Nevertheless, the experimental procedure may become a costly, a very time consuming task and infeasible, when poor designs are implemented.
The identification of an appropriate approximated model is highly dependent to limitations due to the certain observability of the physical phenomena. Additionally, the respected data may be difficult and expensive to obtain because of physical accessibility and/or limitations on the experimental budget.
Model-based design of experiments (MBDoE) has been widely utilized for the purposes of improving parameter precision in highly nonlinear dynamic systems. The main goal of MBDoE is to design control inputs and sampling schedules [14] to achieve the most informative experiment possible by satisfying constraints on feasibility and safety of operation. The optimal MBDoE problem for improving parameter estimation involves the maximization of a measure of the expected information (or the minimization of a measure of the variance of the parameters) by acting on experiment decision variables to ensure of feasibility during the experimental trials. However, since the optimization problem is based on the available model, both plant-model mismatch and parametric uncertainty may significantly affect the result [35].
The literature consists of various approaches to ensure optimality and feasibility under the presence of parametric uncertainty [37]. Two distinctly different scenarios are considered regarding the information and the constraint satisfaction. A robust approach for the information content was proposed [2] , where the information content needs to be optimal for the whole space of the uncertain parameters. The robust optimization problem is usually solved in terms of min-max optimization, in which uncertainties are typically assumed to be deterministic and bounded [37,24,56,45]. This problem is often numerically intractable when candidate models are highly nonlinear and various methods have been proposed to accommodate this, including linearazation of the inner optimization [24] and a sensitivities-based approximate robust approach assuming ellipsoidal joint confidence region for the model parameters [28]. The robust approaches are typically conservative as they require the optimality and satisfaction of constraints for all the possible realizations. On the contrary, a stochastic approach has been popularized, where the information content is maximized in expectation and the constraints are satisfied with a given probability [2]. The stochastic (also called probabilistic) experiment design framework avoids the typical strong conservatism of worst-case MBDoE techniques, as the probability of occurrence of different realizations is accounted differently in the optimization. Various approaches have been proposed in this family of problems usually inspired by the optimal control problem [52,28,16,50,26]. Probabilistic (chance) constraints are incorporated into the MBDoE problem to seek a trade-off between the information content of a designed experiment and allowing for arbitrary level of risks during the experiment. Most approaches compute the expected information given the (previous) parameter identification procedure, where confidence regions are computed using approximation techniques [32] around the point estimates for the parameters.
Galvanin et al. [16] directly utilized a Monte Carlo approach to compute the constraint tightening (backoff) given the arbitrary probability distribution of the parameters. Such a method is very accurate for the estimation of the constraint tightening for the arbitrary constraint satisfaction, however is computationally costly. Telen et al. [52] proposed the use of the unscented transformation [54] to propagate the parametric uncertainty in the objective function and the constraints. Unscented transformation has widely be utilized in the optimal control in the context of model discrimination. In this case the GP is used to approximate the different models and marginalize the parameters of the models given the computed uncertainty. Nevertheless, GP has extensively be used in the area of optimal control, where the prior models and black-box techniques are combined to describe the system. GP may be coupled with approximate method for propagating the uncertainty as unscented Kalman Filter [23], exact moment matching [18,12], linearization [18] and scenario based approaches [53,7]. Recently, GPs have been used in hybrid modeling rational, where they are coupled with the prior (nominal) model that is assumed for the system, and learning is conducted only for the GP [13,20].
The methods of robust MBDoE, e.g. [2,28] use the confidence regions computed in the parameter estimation step. Such assumption may be sufficient to accommodate the uncertainty for objective of the information content. However, the constraint satisfaction should require a different treatment. The satisfaction of the constraints is typically guaranteed given the uncertainty of the parameters and the assumptions that are required include the availability of a large number of available prior experimental data points. In the common scenario of paucity of data at the beginning and a presence of model mismatch, theses assumptions do not hold. As the methodology is model-based, both model mismatch (i.e., a model structure inadequate to represent the physical systems ) and parametric mismatch (i.e., incorrect values of the parameters ) may affect the consistency of the whole design procedure [32,36]. These assumptions have been avoided in some works using a disturbance based estimation [17] and the model-based design of experiments in the context of guaranteed parameter estimation [44]. Additionally, Quaglio et al. [38] developed a method to find the region where the approximated model is valid and avoid the intrinsic issues with the model-mismatch in MBDoE methods [39].
In this work, we propose a novel MBDoE method to guarantee feasibility of operation when possible disturbances/hardware malfunctions and the initial limited data. Plant-model mismatch is a critical issue in optimal control, where stability [34,33], offset-free tracking [31] and satisfaction of constraints has been extensively studied [20]. We treat the objective function of the MBDoE by utilizing a stochastic surrogate, i.e.GP. Additionally, we focus on the satisfaction of constraints that has not been addressed in the literature for the optimal experimental designs avoiding the use of covariance of the parameters that have been computed from the parameter estimation. GP is utilized to compute the model mismatch; and theoretical guarantees a trust-region around operational point, that is updated iteratively. This is inspired by the derivative-free methods [11] that were recently applied in [13]. This ways restricitve assumptions of the nature of the 'real' model are avoided.
The structure of this paper is as follows. The problem definition is outlined in section 2, the details of the proposed method for safe optimal experimental design is presented in section 3. A case study is presented in section 4, where the framework is applied to two separate case studies, and in the last section, conclusions are outlined.

Problem Definition
We assume that the process under consideration can be described by a set of differential and algebraic equations (DAEs) that can only be measure in finite sampling times {1, . . . , N } f(ẋ, x, u, w) = 0 where x ∈ R nx , u ∈ R nu and y ∈ R ny are the vector of state variables, vector of control (manipulated) variables and vector of measured (output) variables, correspondingly. Additionally, the physical system is assumed to be subject to additive normally distributed disturbance (w ∈ R nw ) and noise (ν ν ν ∈ R nν ). The functions f : R nx × R nu × R nw → R nx and h : R nx × R nu × R nw → R ny are unknown functions that describe the observed process. For practical reason this model will be refereed as 'real physical system'.
In practice, the mapping relating the process inputs and outputs of (1) is typically unknown, and only an approximate model is available, parametrized by a set of parameters θ θ θ. However, in many cases [35,1] it is unlikely that a model's predictions coincide with those of the real system due to the relevant assumption, like the possible plant-model mismatch and/or equipment malfunction (e.g. bias in the pump, wrongly calibrated HPLC). Let the approximated model derived from simplifying hypotheses and/or ignoring equipment's where(·) represents the model predictions. The expected values of the parameters θ θ θ and their corresponding distribution are estimated given available measurements.
Optimal model-based design of experiments (MBDoE) for parameter estimation is employed to reduce the variance of the parameters θ θ θ, which is part of the model identification procedure. The technique is usually embodied in a sequence of three key activities (experimental design (MBDoE), experiment execution, parameter estimation) and is particularly flexible, allowing for the definition of a set of active constraints on both state and design variables during the optimization. At the experimental design step, MBDoE is usually inferred as an optimization problem in which the optimal experimental conditions are found (control variables u) to minimize a measure of the predicted covariance matrix or maximize a measure of the expected Fisher information matrix (FIM). Most importantly, the designed experiments should take into account feasibility specification (e.g. safety constraints) during the experimental trials.

Maximum Likelihood Estimation & Confidence Regions
It is common the uncertainty of the model parameters to be computed using the results of the maximum likelihood estimation, and the respective confidence regions are used for the robust/stochastic optimally informative experiment or robust/chance constraint satisfaction. The constructed confidence regions are usually based on the maximum likelihood estimate. In this subsection the most common methods to construct confidence regions are discussed, that are usually used for addressing the feasibility problem of chance constraints. Let n b be the experiments with n m measurements be available If the manipulated variables are assumed to be precisely known, then the joint probability of the prediction-observation mismatch in the available measurements for the parameter values θ θ θ is the following log-likelihood: whereŷ b (t m , θ θ θ) corresponds to the solution of (2). The maximum-likelihood estimation computesθ θ θ to maximize the likelihood function (L) or minimize (3) subject to (2). Now, confidence regions can be computed, under the assumption that model (2) is correct and the respected computed parameters parameters (θ θ θ) are close to the (unique) 'true' values of the model parameters (θ θ θ * ). Then for a large number of available measurements both the likelihood subset ratio statistic and the Wald subset statistic follow a chi-squared distribution [48,25].
These asymptotic confidence results can be used to obtain (approximate interpretation that the probability for a random confidence region) 100(1 − α)% likelihood-based, Θ L , (4) and Θ W Wald (5) confidence regions: with V θ being the covariance for the parameters atθ and χ 2 n dof (1 − α) is the (1 − α) quantile of the chi-squared distribution with n dof degrees of freedom. The covariance V θ is usually approximated by a Taylor expansion of the likelihood. Particularly, (5) has extensively been utilized in the design of experiments [28,2] as the closedform expression is available compare to the likelihood-subset ratio (4), where there is no closed-form expression available and Monte Carlo scheme is required [46]. Nevertheless, unlike the likelihood-based confidence regions, the Wald confidence regions depend on the model reparameterization because of the (approximate) covariance term V θ , which can significantly affect the resulted confidence regions [42]. The information from approximated covariance V θ can also be used for construction individual confidence regions, where independence of the model parameters is assumed resulting in hyperrectagles [16]. Additionally, in the same rationale, a Laplace approximation can be used for the posterior distribution of the parameter estimates given the available data: where V θ is the covariance for the parameters atθ, similar to (5). The parametetric uncertainty estimation of (4-6) are the most common used strategies to approximate confidence regions or posterior distribution of the parameters. Nevertheless, there are two additional methods that can be used that are computationally demanding: Bayesian-based estimation [19] and set-membership estimation [31].
In the context of stochastic MBDoE, these strategies are used to exploit (4-6) to formulate a stochastic optimization problem subject to chance constraints. he parametetric uncertainty estimation of (4-6) are affected by several limitations: i) A large amount of data is required. However this is not the case when we design experiments, since we want to keep the number of data-points to the minimum.
ii) The covariance matrix V θ is approximated by a Taylor expansion. This simply means that the highly nonlinear models will result in unreliable covariance functions.
iii) The expression in (4) is more accurate but with disadvantages (see point (i)) and there is no closed-form expression. Hence, Monte Carlo simulations or other numerical solutions [46] are required.

General Problem Formulation
The general formulation of the problem is constructed here where 1) the constraints are satisfied for a real physical system and 2) the new experiment is the most informative for a given approximated model. The proposed experimental design can be described by the following infinite dimensional problem, P(u) with u being the optimization variables: Approximated Model: Real System: Constraints: where ψ(·) is a scalar function of the Fisher information (it depends on the selected design criterion, e.g. Aoptimal, D-optimal etc), that is maximized in expectation given Θ and M 0 is the prior Fisher information matrix of the preliminary experiments. Additionally, the equality path constraints (8) corresponds to the DAE that describes the approximated model parametrized by θ θ θ and characterized by statesx, the chance constraints (10) are subject to the real behaviour of the physical system that in this case is represented as a stochstic dynamic system (9) with states x.

Proposed Framework
In this section the proposed framework is described. First, each stage of the proposed framework is briefly given and then more details for the approximate optimal design of experiments are illustrated.
The proposed algorithm contains a two-fold approach to deal with both the feasibility and optimality of the uncertain problem: 1) The use of GP is proposed to accommodate the model mismatch that is present, and approximate the constraints. This is combined with the use of a trust region to provide a local GP and steadily reach the optimum.
2) Ideally, the information needs to be optimized in expectation given the approximated confidence regions of the parameters. To accommodate this a GP is utilized to approximate the objective function and propagate the relevant parametric uncertainty.
The intuition behind these two items is the use of non-parametric approximation for the physical system to enforce the constraint satisfaction and the same time maximize the information of the approximated parametric model in expectation. Notice that a global GP would not be sufficient, since it suffers with the same issues that the approximated confidence regions have; it requires a sufficient number of data-points. If there are enough data-points, the GP can provide a prediction for both the mean and the variance for the approximation. In the presence of a limited data points the GP may result in unreliable predictions unless assumptions are made regarding the complexity of the function (see section 3), and that is why local GPs are utilized.
A block diagram with the overall frameworks is given in Fig. 1a. The fundamental steps are briefly illustrated: 1. A preliminary set of data is imported in the system.
2. Parameter estimation is performed for the approximated model.

The experiments are performed and the data-set is updated
This closed-loop is terminated when: • Statistics are satisfied.
• The optimal conditions do not change in between iterations.
Notice that in our proposed method the MBDoE block is replaced with the proposed methodology. The approach to solve would be the solution of the optimization described in (7)(8)(9)(10). However such an approach is intractable: 1) The real system is not known and 2) the expectation and probabilities are in general intractable integrals.
To accommodate these issues the we proposed a novel methodology. The proposed algorithm is based on GP to approximate the model mismatch between the process and the model, additional trust regions are applied to restrict the feasible design space per iteration. Trust regions are additive constraints for the design variables to limit the distance of the next design from the previous design. After that the GP is employed to approximate the expectation of the metric of the Fisher information and propagate the posterior distribution of the parameters.
The (3) rd step is the focus of the this work; the GP-MBDoE performs 4 additional steps before the computation of the new design.
• With the available data, a GP is computed for the plant-model mismatch (details given in Section 3.2).
• A trust-region (TR) is updated-computed to restrict the new designs and allows the use of GP locally (details given in Section 3.3).
• The computed uncertainty in the parameter estimation stage is accounted for the metric of the objective function using an additional GP for the parametric uncertainty of the model. This GP aims to approximate the parametric model and not the real system, hence in-silico data points are employed [29] (details given in Section 3.4).
• The above are then used to perform the optimization step and find the new optimal design with the safety considerations included ( details given in Section 3.5).
where t ref is a reference t-value given by a Student t-distribution with N y − N θ degrees of freedom. The model

Methodology
This section presents the proposed methodology. GP is used in a two-fold manner; 1) The model mismatch between the available model and the collected data is approximated using GP. To ensure the reliability of the predictions of the GP, trust regions will be applied to constrain the design space of the experimental conditions.
2) The parametric uncertainty is also present in the objective of the design of experiment, and GP is employed to compute the expected value and variance of the objective with respect to the parametric uncertainty.

Gaussian Process Fundamentals
In this section GP is introduced, as they are one of key components of the methodology that follows. GP generalizes multivariate Gaussian distribution to a distribution over infinite dimensional vector of functions, such that every finite sample of function values is jointly Gaussian distributed. Due to the Bayesian nature of the solution, GPs are able to consider both epistemic (limited data) and aleatoric (stochasticity of the real physical system) uncertainty. A GP is fully specified with the prior mean function µ(·) and the positive semidefinite kernel function k(·, ·) and the GP regression aims to model an unknown set of functions f : R nx → R ny given some noisy observations . . , f ny , this could be expressed as: where the prior mean µ i function provides knowledge of the mean of a test point prior to observing data, the kernel function k i expresses the covariance between points. Let a number of data points be available, the posterior distribution at a test point x * is then found by the conditional distribution given on the available N noisy data for each output i: Then, the posterior's mean m i (x * ) and variance Notice that Σ i is the variance of the noise-free prediction, if the noise is taken into consideration then variance should be Σ i + σ 2 i , and σ 2 i is many times treated as a hyperparameter for the GP. The mean and covariance function (kernel) define the prior GP: with m(x * ) = m 1 , . . . , m ny and Σ Σ Σ(x * ) = diag( Σ 1 (x * ), . . . , Σ ny (x * ) ). Common choices for the kernel k i (·, ·) is the squared-exponential (SE) covariance function: and Matérn class of covariance functions, with special cases the Matérn 3/2 and Matérn 5/2: . . , λ nx ]) and the noise level σ 2 i are hyperparameters for the GP and, to estimate them, a maximum likelihood estimation is carried out.In this work, we wish to estimate an a priori unknown function f with GP by collecting measurements during the operation.

Gaussian Process for Model Mismatch
In this work the constraints are not imposed based on the approximated parametric uncertainty, as it is shown that such approach would suffer in small data sets. For this reason the model will be used as a prior for the GP that will approximate the model mismatch. Since the states of the real system affect the feasible space, here we propose the approximation of constraints instead of the real system as whole. Specifically, the prior of the GP for the constraints is computed using (8) first descretizing it and then constraints can be computed. As a result, the constraints based on the model are: then with some abuse of the notation the symbolĝ i is used for the constraints associated with the available model:ĝ When, constraints g i (u) are observed, a GP can be built using as prior the predictions of the modelĝ i . This the prior mean µ i of the i − th constraints is µ i =ĝ i and now the hyperparameters of the GP can be found for this hybrid model.
T ∈ R N , the mean and variance of the posterior for the (13) can now be written as: Then we wish to satisfy the constraints with a given probability. To satisfy the chance constraints, a robust reformulation is employed via the Chebyshev theorem [27] using the mean and variance of the constraints.
where q ∈ R nq is some random variable. Let Q be a family of distributions with mean µ q and variance Σ q .
Then for any ∈ (0, 1), the distributionally robust probabilistic constraint is equivalent to the following constraint: This theorem provide a conservative estimate for the constraints as it holds for the whole family of Q.
Remark. It should be noted that this theorem was proposed for general constraints. Additionally the parameter r is often used as a design parameter to reduce the conservatism of this formulation, however loosing any respected guarantees.
This reformulation relies highly on the accuracy of the mean and variance via a GP and the initial small amount of data points may significantly affect the result. Notice that even though the result in Theorem 3.1 is conservative for all the family of distribution with mean µ q and Σ q , the wrong estimation of mean and variance may lead to infeasible designs.

Trust Regions
Here we introduce the assumption and bound that it is usually employed, then we show that these bounds may not be used globally using a motivating simple example.
It is common to achieve bounds for the unknown function, using the following assumption: Assumption 3.1. [51,3] The unknown function f i has bounded norm in reproducing kernel Hilbert space (RKHS) H k [47], induced by a the continuously differentiable kernel k i , ||f i || ki ≤ B fi , B fi > 0.
Note that Assumption 3.1 does not make any probabilistic assumption on f i but it requires to have a bounded norm in the RKHS or be drawn by a GP prior [49]. Given assumption 3.1, reliable confidence intervals on f i can be found: To motivate need of local approximations and trust regions the following example is presented.

Calculation of Trust Regions: A simple motivating example
We first provide a simple motivating example that illustrates the reasons why such approximation is not cautious globally, and then the trust regions for our problem are introduced.
Let a simple function f (x) = x sin(x) be assumed as the 'true' model, with x ∈ R and the collected data are corrupted by a Gaussian noise f with standard deviation 0.01, y = f (x) + f . A GP is used to approximate the data points, specifically a squared-exponential (SE) covariance function is used and its hyperparameters are fitted using maximum likelihood estimation (with multi-starts to avoid local optima). The mean m and variance Σ of the GP for each x are illustrated in Fig 2. The shaded areas show the m + r √ Σ for r = 1, 2, 3, 4 and 10. Using the Theorem 3.1 and the Lemma 3.2, we would expect the 'true' model (red dashed line) to lie within these shaded areas. Specifically, the Chebyshev theorem states that for r = 9.94, x should lie within the shaded area with probability 99%. However, due to the shape of the 'true' model and the collected data, such guarantee is lost as the predicted variance and mean are poorly computed. This motivates the further restriction of the proposed optimization problem, as it is clear that the GP cannot be used globally with small data-sets and it is a paramount importance to satisfy the constraints with high probability.

Trust region updates
The use of local models has extensively been employed in the literature of derivative-free optimization [11,10], where trust-regions (TRs) are used to bound the predictions of the surrogate function of the respected objective function around the current point. The size of the TR is updated in every iteration according to a ratio of the objective's actual reduction, over the predicted reduction. In this work the same philosophy is used for the approximation of the constraints: A TR is defined for each constraints in the problem and it is updated according to the accuracy of the predictions on the new observations. The TR is defined as: where u k is the operating point (experimental condition) and R i the radius of the trust region. The rule for updating the TR is described in Algorithm 1.

Compute
According to Algorithm 1, the radius R i is updated according to the accuracy (ρ i ) of the new predicted point. If the percentage error is high, i.e. ρ i ≤ η 1 then the trust region is reduced if the percentage error is low then the trust region is increased.
The values of the parameters η 1 , η 2 , t 2 < 1 < t 1 , can be seen as notion of conservatism for the optimization.
If the parameters t 1 and t 2 are closer to 1, then the scheme becomes more conservative, as the trust region changes very little at each iteration. The values for η 1 and η 2 are the bounds that we want for the approximation of the physical system. If their values are selected to me large then relaxation is imposed to the scheme, as the radius will increase even if the GP is not good enough.
The predictions are now inside the trust region and the computed mean and variance are accurate locally.
This way the Lemma 3.2 and Theorem 3.1 may be applied to guarantee the feasibility of the experimental design.

GP-based Uncertainty Propagation for the Information Metric
In this section the expected value of the information metric is discussed. GP will be used to directly approximate the metric as function of the optimization variables and the kinetic parameters. Then, model parameter marginalization is used to compute the expected value and variance of the information metric with respect to a computed uncertainty of the parameters. The use of surrogates to propagate the uncertainty is not new in MBDoE, where polynomial chaos expansion is mainly used. Similar approach was followed in [29], but for the case of model discrimination.
Consider the objective function of the experimental design at the k th experiment: Notice that in this case the training set for the GP is generated in-silico using the available approximated model, hence a sufficiently large data-set can be generated [29]. Given the uncertainty region that has been computed (see section 2.1) the uncertainty is propagated using the expressions from section Appendix A.1, where the expected value and variance of the objective is computed with respect to the uncertain parameters θ θ θ.

Approximate Optimal Model-Based Design of Experiments
The optimization problem that is solved to fnd the new design u * given the previous design u k and the parameter's computed mean (µ µ µ θ θ θ * ) and variance (Σ Σ Σ θ θ θ * ) is: where α J > 0 is used to add a small exploration to the design. This parameter can be seen as the upper confidence bound formulation in Bayesian optimization [15]. The larger the value of α J the more significant the exploration will be. In this work, the optimization will mainly focus on exploitation, and for that reason α J will have the value of 0.1 The Algorithm 2 summarizes the proposed framework.
(ii) if g i (u k ) > 0: R i = t 3 R i and u k+1 = u k 5. Perform new experiment and update U, g i and y j .
The steps followed in this work as illustrated in Fig. 1b are the following for each iteration k: Given the preliminary data (and the updated data), Parameter Estimation is performed, specifically Step 1 and Step 2 are used to obtain the best parameters θ θ θ given the initial data points we have and also approximate the posterior distribution. As it is described in section 2.1, the parametric uncertainty that is obtained is not suitable for the constraint satisfaction. However, the parametric uncertainty can be used for the objective of the experimental design. If the Termination criteria are satisfied (see Section 2.4) then in Step 3 the experimental procedure stops. Algorithm 2 is then performed in Step 4, where the new optimal design is found. This step also entails the construction of the GP for the model mismatch using the parametric model as prior (Approximate the Model Mismatch in Fig. 1b), the GP-based uncertainty propagation for the objective function (GP-based Uncertainty Propagation in Fig. 1b) and solution of the optimization for the computation of the new design (26) (Solve the Optimization Problem in Fig. 1b). In Step 4.i-ii the Update Trust-Region (Fig. 1b) step is performed following Algorithm 1. After that in Step 5 the new experiment is performed and the last step contains a backtracking that is used when an infeasible design happen where the next design is return to the previous value and the trust-region of that constraint shrinks.

Case Studies
The proposed methodology (GP-MBDoE) is illustrated using two case studies simulated in silico: • Case Study 1: a flow reactor where that the assumed mechanism is different than the 'real' one with an additional constant disturbance.
• Case Study 2: a flow reactor that a Nucleophilic aromatic (SnAr) substitution [21]  Notice that other more sample efficient methods could be used [6], but would require additional assumption for the distribution of the constraints.
• Disturbance estimation-based robust MBDoE (DE-MBDoE)The structural model mismatch will be modeled by a constant disturbance, a method that is originated by the MPC literature [30] and applied to online MBDoE [17]. This method can be seen as a simplification to our approach where instead of using a GP, a constant disturbance is used. A estimation of a constant disturbance is performed using the difference between the measurement and predictions.

Case Study 1: Flow Reactor: 'Wrong Mechanism'
A flow reactor is assumed where the system can be described by the following set of DAEs: where c i is the concentration (mol/L) of the i th reagent, a ij is the stoichiometric coefficient of the i th reagent in the j th reaction with r j reaction rate. The length of the reactor is L = 25cm with surface A = 1.2cm 2 and F (mL/min) is the inlet flowrate F . The temperature dependency of the reaction rate is described using Arrhenius law: where A j denotes the frequency factor and E aj the activation energy. In this setting the design variables are the inlet flowrate F and temperature T , i.e. u = [F, T ]. The formulation in (28) tends to be sloppy [9], for that reason the following formulation used instead: The mechanism that takes place in the real physical system is The parameters for the real physical system are depicted in Table 3: Table 1: Parameter values assumed for the true kinetic model P arameter Value The  (Table 2). The following configurations are discussed: • MBDoE under parametric uncertainty (MC-MBDoE).
• MBDoE with disturbance estimation (DE-MBDoE) • GP-based MBDoE (GP-MBDoE) A D-optimal design criterion is used in all the proposed experiment design configurations. The results follows in the next paragraphs of this section in terms of manipulated inputs, simulated profiles and a posteriori statistics on the final parameter estimation. To fairly evaluate all the methodologies the same method for propagation of uncertainty in the objective function is used. Specifically, the posterior distribution of the parameters is approximated with a normal distribution, which is updated in every iteration. The parameters needed for the Algorithm 3 are depicted in Table 3: Typical values for t i and R i can be found in [11]. Notice that the trust region is defined for the normalized design variables that are inside [−1, 1]. All the methods are applied in the same closed loop manner, and the results are presented next. Fig. 9 depicts the constraint satisfaction of the first constraint g 1 for the GP-MBDoE, it is evident that the constraint remain feasible in the all the sequence of experiments even though it approaches the boundary of the constraint. The blue shaded area represent the 90% confidence intervals of the GP. As expected due to the design space that is used in every consecutive optimization the measurements of the real physical system lie within the confidence interval. Notice that the confidence interval shrinks as more measurements are acquired. This behaviour can also be seen in Fig. 4 , where the trust region R 1 remains increases as new measurements are acquired and the GP gets better.
On the other hand none of MC-MBDoE and DE-MBDoE formulations managed to remain in the feasible area of the constraint g 1 . This is not a surprise as none of the aforementioned method accounted the structural model mismatch properly. It is noticeable from Fig. 5 that the resulted solution from the optimization is predicted to remain feasible with relatively large confidence intervals. However, the MC-MBDoE can account only for a parametric uncertainty, in which the posterior of the parameters has been approximated , and the DE-MBDoE allows the structural model mismatch to be constant. Additionally, the solution of the MC-MBDoE is also computationally intensive as requires the use Monte Carlo sampling from the confidence region of the parameters, in this case 1000 samples where sampled at each iteration (more details in [16]).
Notice that the design with GP-MBDoE defines a test that is now both safe (feasible) and informative for the approximated model, as it can be seen in Table 4 that the t-values for the model parameters are larger than the t ref .    The parameter are updated in each additive sample that it is obtained. Fig. 12 presents the trajectory in time for the estimated parameter E a2 has. The behaviour of the parameters is interesting, as their estimation of the parameters appear to have similar behaviour with the MC-MBDoE, which appears to be the least conservative/restrictive (with the most infeasible designs). Notice that due to the model mismatch there are no  'true' parameters for the approximated model. The 'actual' predictive capabilities of the approximated models is depicted in Fig. 7. It is clear that even-though the proposed method GP-MBDoE is more restrictive in order to obtain feasible designs, it is able to get predictions as good as the Monte Carlo-based designs that are infeasible.

Case Study 2: Flow Reactor: Corrupted Measurements
To further validate the proposed method, a different case study is applied in the following section, where the model will be now assumed to be correct but the disturbance will be function of the designs. A second case study considers a nucleophilic aromatic substitution (SNAr) of 2,4-difluoronitrobenzene {1} with pyrrolidine {2} in ethanol (EtOH) to give a mixture of the desired product ortho-substituted {3}, para-substituted {4} and bis-product {5} as side products. We chose this reaction as a benchmark to evaluate new systems because it produces 3 different major products that are relevant for pharmaceuticals and fine chemicals [8]. The inlet flow rates of the baulk reagents 2,4-difluoronitrobenzene {1} with pyrrolidine {2} (F 1 and F 2 ), solvent ethanol (F 3 ) and the temperature T are the design variables: u = [T, F 1 , F 2 , F 3 ] This case study has been adopted from [21] and the schematic of the reaction is depicted in Fig. 8. The molar balance of the reactor, using the chemistry  Fig. 8, can be written as: The values for the kinetic parameters of the real physical system are adopted from [21].
In this case study the assumptions of the kinetic models are correct, however there is a malfunction of an equipment (e.g. HPLC, pumps) that results in a disturbance for 2,4-difluoronitrobenzene {1}. This systematic error is assumed to be unknown and leads to an overestimation of concentration measurements of 10%. An additional measurement noise is considered with zero mean and 10 −3 M standard deviation. The constraint that is and considered is g 1 = c 1 ≤ 0.1 with probability P(g 1 ) ≥ 0.9.
The design space is defined by following lower/upper bounds: As in the previous Latin hypercube is used to generate an initial set of 5 experiments (Table 9).  In this case study a D-optimal design is also used in all the proposed experiment design configurations. The same parameters for the algorithm used as in case study 1 (Table 3)  the results are presented next. Fig. 9 depicts the constraint satisfaction of the first constraint g 1 for the GP-MBDoE, it is evident that the constraint remain feasible in the all the sequence of experiments even though it approaches the boundary of the constraint. The blue shaded, area represent the 90% confidence intervals of the GP. As expected due to the design space that is used in every consecutive optimization the measurements of the real physical system lie within the confidence interval. Notice that the confidence interval of the predictions of the concentration shrinks as more measurements are acquired. This behaviour can also be seen in Fig.10 , where the trust region R 1 remains constant and increases as more measurements are acquired.
On the other hand none of the other formulation managed to remain in the feasible area of the constraint g 1 .
This is no a surprise as none of the aforementioned method accounted the structural model mismatch properly.
It is noticeable from Fig. 5 that the resulted solution from the optimization is predicted to remain feasible with relatively large confidence intervals. However, the MC-MBDoE can account only for a parametric uncertainty, in which the posterior of the parameters has been approximated, and the DE-MBDoE can only allow the structural model mismatch to be constant. Additionally, the solution of the MC-MBDoE is also computationally intensive as requires the use Monte Carlo sampling from the confidence region of the parameters, in this case 1000 samples where sampled at each iteration (more details in [16]).
Notice that the design with GP-MBDoE defines a test that is now both safe (feasible) and informative for the approximated model, as it can be seen in Table 4 that the t-values for the model parameters are larger than the t ref .    As in the previous case, the parameters are updated with each additive sample. The parameter E a2 is illustrated in Fig. 12 and initially does not pass the t-test. The behaviour of the parameters for each design appears to be similar, with the MC-MBDoE and DE-MBDoE having a smoother behaviour of the estimates as the parametric uncertainty reduces. Nevertheless, the parameter estimates are not asymptotically stable for MC-MBDoE and GP-MBDoE. This behaviour and this behaviour could be potentially be caused by the structural mismatch that is present in the models. The 'actual' predictive capabilities of the approximated models is depicted in Fig. 7, where the all measured concentrations and predictions are depicted. It appears that the predictions for all the designs are similar. The noticeable differences is in the lower end of the plot and the upper end of the parity plot, where the Monte Carlo and Disturbance-based designs result in measurement of high concentration.
The result here is similar with Case study 1, where the proposed GP-MBDoE is performs better the MC-MBDoE, but with feasible designs. In this case study the Table 7 shows that the GP-MBDoE managed to outperform the standard robust MBDoE methods (MC-MBDoE and DE-MBDoE), performing better in terms of their predicting capabilities.
Additionally, the GP-MBDoE managed to remain feasible in all the experiments.

Conclusions
In this article, a new methodology for the safe MBDoE based on GP and trust regions in the presence of structural uncertainty. GP-MBDoE was proposed and discussed; The optimal design of an experiment for improving parameter estimation is a particular form of optimization problem that can be very effective where both optimality and feasibility of the designed experiment are important issues to consider. As parameter uncertainty and structural model mismatch affects both design and experiment feasibility, a novel methodology exploiting the information about the parametric system and the non-parametric nature of the GP to design the necessary constraint tightening. The proposed strategy allows to restrict the feasible space maintaining the optimal design in the feasible (safe) region of the state variables. Two simulated (in-silico) case studies have been proposed to assess the effectiveness of the new technique. In the first case study, the methodology was applied to a model of of a chemical reaction in a flow reactor, where the assumed mechanism is wrong. The designs. Future work will aim at further improving the proposed approach by using methods related to Bayesian calibration to calibrate the approximated model itself [22] via a Bayesian framework. The parameter of the kinetic model can be seen as calibration parameters and estimated through their posterior distribution, which provides a non-asymptotic way for the uncertainty quantification. Additionally, the structural mismatch can be potentially be diagnosed [40] and different structure could be potential be found via artificial neural networks [41].