Bayesian nonparametric modelling of multiple graphs with an application to ethnic metabolic differences

We propose a novel approach to the estimation of multiple Gaussian graphical models (GGMs) to analyse patterns of association among a set of metabolites, under different conditions. Our motivating application is the SABRE (Southall And Brent REvisited) study, a triethnic cohort study conducted in the United Kingdom. Through joint modelling of pattern of association corresponding to different ethnic groups, we are able to identify potential ethnic differences in metabolite levels and associations, with the aim of gaining a better understanding of different risk of cardiometabolic disorders across ethnicities. We model the relationship between a set of metabolites and a set of covariates through a sparse seemingly unrelated regressions model and we use GGMs to represent the conditional dependence structure among metabolites. We specify a dependent generalised Dirichlet process prior on the edge inclusion probabilities to borrow strength across groups and we adopt the horseshoe prior to identify important biomarkers. Inference is performed via Markov chain Monte Carlo.


INTRODUCTION
Diabetes poses an enormous individual and societal burden, with high risk of major complications and diminished quality and length of life. Hence, it is imperative to understand causal mechanisms in order to identify those at highest risk and to tailor preventive and therapeutic measures for appropriate periods during the life course. The global epidemic of type 2 diabetes disproportionately affects non-European ethnic groups. South Asians (from the Indian subcontinent) form the largest ethnic minority group in the United Kingdom with prevalence of diabetes in South Asians estimated to be 2-4 times higher than that of the general population (Sproston & Mindell, 2006). People of African-Caribbean origin in United Kingdom, although fewer, are also at greater risk of developing type 2 diabetes, with prevalence also estimated at 2-4 times that of the general UK population (Sproston & Mindell, 2006). Research to date suggests that insulin resistance and differences in body fat distribution explain some of the ethnic differences in diabetes risk, but the underlying mechanistic pathways are poorly understood, although are likely to involve a complex interplay between environmental, behavioural, metabolic, genetic and epigenetic influences. The SABRE (Southall And Brent REvisited) population-based cohort was initiated in the late 1980s in north-west London with the aim of studying ethnic differences in cardiovascular disease and diabetes. Since then, it has accumulated a wide range of phenotypic and disease outcome data. The study includes people of European, South Asian and African-Caribbean descent, aged 40-69 years at baseline. Recently, metabolic analyses have been performed on over 3000 stored blood samples from the baseline and 20-year follow-up studies. Metabolomics is the large-scale study of metabolites, within cells, biofluids, tissues or organisms. Collectively, these metabolites and their interactions within a biological system are known as the metabolome. Measurements of over 200 metabolites or ratios of metabolites, obtained through nuclear magnetic resonance spectroscopy (Soininen et al., 2015) are available for more than 3000 stored baseline serum samples. Lipoproteins are classified according to their density (very-low-density, low-density, intermediate-density and high-density lipoproteins). Each lipoprotein subclass can be further characterised by its lipid composition (i.e. triglycerides, phospholipids, cholesterol esters and free cholesterol) and its particle size. The full list of metabolites included in the analysis is reported in Table S1 in Supplementary Material. We exclude from the analysis individuals with known diabetes at the time of the first visit. This is motivated by the fact that people with known diabetes were already receiving treatment that may alter their metabolite levels. Furthermore, we include in the analysis: (a) clinical markers, such as the homeostasis model assessment as an index of insulin resistance (Matthews et al., 1985)-an important risk factor for the development of diabetes; (b) three important enzymes (alanine aminotransferase, aspartate aminotransferase and gamma-glutamyl transferase) of which the first two are clinical biomarkers indicators of liver health while latter is used as a diagnostic marker for liver disease; and (c) anthropometric variables measuring the body fat distribution such as the waist-to-hip ratio (WHR). The full list of covariates included is given in Table S2 in Supplementary Material.
In this work, we focus on the SABRE study fasting baseline metabolic and phenotypic data set, with a view to identifying and elucidating potential mechanistic pathways to insulin resistance (and hence risk of developing type 2 diabetes), and to explore ethnic differences in these pathways. The statistical analysis poses several challenges: intersubject variability, the large number of variables under investigation and the high correlation between metabolite levels. There is a wealth of proposals in the statistical literature on how to tackle these problems. We employ the seemingly unrelated regression (SUR) model introduced by Zellner (1971). The SUR model can be seen as a generalisation of the linear regression model, where multiple regression equations, each one having its own dependent variable and possibly a specific set of regression covariates, are linked together by specifying a joint distribution on the error terms, which can exhibit a correlation structure. The SUR model offers a flexible modelling tool, but at the price of a large number of parameters to be estimated. To regularise posterior inference we adopt a sparse SUR approach, assuming a local-global shrinkage prior for the regression coefficients, that is the horseshoe prior (Carvalho et al., 2010), and we model the associations among metabolites employing a Gaussian graphical model (GGM, Dempster, 1972). Zeros in the error precision matrix are obtained by imposing a set of conditional independence constraints arising from an underlying graphical model (Lauritzen, 1996). Two common choices of prior distribution for the precision matrix are the G-Wishart prior of Lenkoski and Dobra (2011) and the Bayesian graphical Lasso of Wang (2012). The G-Wishart prior explicitly treats the graph as an unknown parameter leading to a direct inference of its underlying structure. However, the convergence of the posterior distribution can be slow due to the single edge update and the intractable normalising constant that needs to be approximated. On the other hand, the Bayesian graphical Lasso is fast, thanks to the continuous priors, which enable a block Gibbs sampler that updates the precision matrix one column at time. However, this method does not explicitly provide a treatment of the underlying graphical structure. The problem of estimating sparse matrices of regression coefficients and precision matrices jointly has been tackled before in the frequentist (Cai et al., 2013;Rothman et al., 2010), as well as Bayesian (Bhadra & Mallick, 2013;Deshpande et al., 2019) framework. The former modelling approach is usually based on 1 penalisation, while the latter exploits the specification of spike-and-slab prior distributions. In both frameworks, the estimation of the precision matrix is the most demanding part, as the parameter space is restricted to the cone of positive-definite matrices, often requiring computationally intensive algorithms. Here we use the stochastic search structure learning (SSSL) algorithm of Wang (2015) to specify the precision matrix prior distribution. The SSSL uses the best aspects of the G-Wishart and Bayesian graphical Lasso priors, enabling explicit structure learning while maintaining good scalability.
We specify a generalised Dirichlet process prior (GDP, Hjort, 2000), an extension of the well-known Dirichlet process (DP, Antoniak, 1974;Ferguson, 1973Ferguson, , 1974, on the edge inclusion probabilities, allowing for clustering of the edges and through the calibration of the GDP base measure we ensure the desired degree of sparsity in the graph. The GDP is a probability model defined on the space of probability distributions. Similarly to the DP (Sethuraman, 1994), the GDP can be defined through a constructive definition of the process. If a random probability measure P is distributed according to a GDP, with concentration parameter , mean parameter and base measure P 0 , then where 1 , 2 , … are iid realisations from P 0 and k is the Dirac measure that assigns unit mass probability in correspondence of the location k . The weights k are generated according to the stick-breaking construction: where k represents the expected value of the beta random variable and k is a concentration parameter, for k = 1, 2, … Here we use the more parsimonious parametrisation given by Hjort (2000), where k = and k = , for k = 1, 2, … By construction we have 0 ≤ k ≤ 1 and ∑ ∞ k=1 k = 1. We extend the GDP prior to multiple GGMs, enabling borrowing information between graphs under different biological conditions. More in detail, we assume a SUR model for the metabolites for each ethnic group. We model the error precision matrix in each group conditionally on an ethnic-specific graph, and we propose a joint nonparametric prior for the graphs. This strategy allows us to highlight common patterns and structural differences. In this context, each graph is characterised by the same set of nodes (representing the dependent variables of the SUR model), connected by a set of group-specific edges. Thanks to the clustering property of the GDP prior, we allow edges from different graphs to share the same edge probability and consequently to inform each other.
The paper is organised as follows. Section 2 introduces the sparse SUR model, the nonparametric prior on the edge inclusion probabilities and its extension to handle multiple GGMs. Section 3 illustrates the performance of the model on simulated data sets. Section 4 presents the analysis of the SABRE data, with a discussions of the clinical relevance of the findings. Finally, Section 5 concludes the work with a discussion of the main results and future directions.

METHODS
In this section, we review the main properties of the SUR model and its generalisation to sparse SUR. We also introduce the main properties of GGMs and we present our choice of prior distribution for the graph space based on the GDP. Finally, we generalise our modelling strategy to multiple GGMs.

Sparse SUR model
Consider M response variables y l , l = 1, … , M, each observed on n subjects, that is y l = (y l1 , … , y ln ) ′ , modelled as individual linear regressions where the X l is a n × p l response-specific matrix of explanatory variables, is a p l -dimensional vector of regression coefficients and u l = (u l1 , … , u ln ) is the n-dimensional vector of error terms, distributed as a multivariate normal, N (0, I n ), where I n is the identity matrix of dimension n × n.
The error terms are assumed to be correlated across equations. We denote by Ω the cross-equation precision matrix. We can rewrite the system of equations in a compact matrix form, as by concatenating the responses in a unique column vector y 1∶M of dimension Mn. X is now a block diagonal matrix of dimension Mn × Q, where Q = ∑ M l=1 p l is the total number of parameters. is a Q-dimensional vector containing all the regression coefficients. Here ⊗ denotes the Kronecker product. Note that the precision matrix of the concatenated error vectors implies that error terms within the same equation are independent (e.g. u lj and u li for j ≠ i), but error terms corresponding to the same subject in different equations are assumed to be correlated (e.g. u lj and u rj for l ≠ r). We shall denote the generic element of the regression coefficients vector by lj , which corresponds to the regression coefficient associated to the j-th covariate in the l-th equation.

Background on graphical models
We give a brief introduction to graphical models, following Lauritzen (1996). Let G = (V, E), with V = {1, 2, … , M} the vertex set and E ⊂ {(i, j) ∈ V × V:i < j} the edge set, be an undirected graph whose vertices are associated with a M-dimensional vector of variables y = (y 1 , … , y M ) following a multivariate normal distribution, N(0, Ω). Note that in this section, for simplicity of notation, we use y to denote the vector of variable corresponding to each node in the graph. The graph G can be represented by a set of r = M(M−1)∕2 binary variables Z = (z ij ) i<j , where z ij = 1 ⟺ e ij ∈ E, with e ij denoting the edge between node i and j in the graph G, for i, j ∈ {1, … , M}. Thus, r = M(M − 1)∕2 is the total number of possible edges in the graph G. There is a direct correspondence between the elements of the precision matrix Ω and the edges in the graph G. A missing edge in E implies ij = 0 (Wermuth, 1976), which in turn corresponds to a conditional independence assumption of y i and y j given the remaining variables y −ij , where y −ij denotes the elements of the random vector y excluding the i and j coordinates. The parameter Ω is constrained to belong to the cone PD G , that is the set of positive definite matrices with entries equal to zero for all e ij ∉ E.

Prior specification
We adopt the horseshoe prior of Carvalho et al. (2010) to impose regularisation on the regression coefficients . The horseshoe prior has the desirable property of being characterised by a singularity at zero to strongly shrink small or negligible coefficients, while leaving important coefficients unaffected thanks to its heavy tails. The horseshoe prior is specified as follows with j = 1, … , p l and l = 1, … , M. C + denotes the standard half-Cauchy distribution, 2 lj is the local shrinkage parameter, specific for the coefficient lj , while 2 l represents the overall shrinkage level for equation l. The choice of a half-Cauchy distribution results in aggressive shrinkage over small or negligible coefficients and is therefore suitable for variable selection in a Bayesian context. Carvalho et al. (2010) compares the performance of the variable selection based on (3) with that of a spike and slab prior (George & McCulloch, 1993), showing that the posterior selection given by the horseshoe is consistent with that of the spike and slab. See, for instance, Piironen and Vehtari (2017) for a discussion on how to set the horseshoe prior hyperparameters. To perform posterior inference, we adopt the conjugate sampler proposed by Makalic and Schmidt (2016), which allows a fast Gibbs sampling, avoiding working directly with the half-Cauchy distribution. Makalic and Schmidt (2016) exploit the following relationship. Let and be random variables such that We model the cross-equation precision matrix Ω with the SSSL prior of Wang (2015), specified as where Exp( | ) represents the exponential density with expectation 1∕ and 1 {⋅} is the indicator function. The normalising constant C( ), with = {v 0 , v 1 , , }, ensures that p(Ω) integrates to one over the space PD G . The parameters v 0 , v 1 are set to be small and large, respectively, in order to perform variable selection on the off-diagonal elements of the precision matrix. We do not impose regularisation on , fixing its value to 1 as done in Wang (2015). The prior on is discussed later. The first product in Equation (6) involving the off-diagonal elements of Ω, is a mixture of two normal distributions. The second product multiplies M exponential densities for the diagonal elements of Ω. Now, recalling the connection between the graph G and its binary representation through the adjacency matrix Z = (z ij ) i<j , (6) can be rewritten as where v 2 z ij = v 2 1 if z ij = 1 and v 2 z ij = v 2 0 if z ij = 0, and C( ) and C(Z, v 0 , v 1 , ) are normalising constant for the respective densities. The joint distribution p(Ω, Z| ) admits (6) as a marginal distribution for Ω. In the representation in Equations (7) and (8), small values of v 0 give high probability to the event z ij = 0, so that the distribution of ij is concentrated around 0, implying that the correspondent edge will have a close-to-zero probability to be included in the graph G. For an appropriately chosen large value of v 1 , the event z ij = 1 implies that the distribution of ij is the diffuse component N(0, v 2 1 ) and so ij can be estimated to be substantially different from zero (Wang, 2015). See also Malsiner-Walli and Wagner (2018) for a comparison of spike and slab priors.
The choice of v 0 and v 1 = v 0 × h is important to ensure a good mixing of the MCMC and quick convergence to the true posterior distribution. The value of v 0 should be such that if the evidence is in support of z ij = 0 then ij is small enough to be replaced by zero. Wang (2015) discuss the choice of v 0 and h and observe that, with standardised data, the MCMC converges quickly with v 0 ≥ 0.01 and h ≤ 1000. Finally, choosing a value for is easier, as with standardised data, a choice of = 1 assigns probability to the entire region of plausible values for the inverse variances ii . There is a wealth of literature regarding the choice of the prior distribution for ij , the edge inclusion probability. See, for example Carvalho and Scott (2009) and Tan et al. (2017) for a review of some popular methods. In this paper, we adopt a nonparametric Bayesian approach to model the uncertainty about the inclusion probabilities, allowing for clustering of the edges and the possibility to impose sparsity on the graph. We specify a GDP prior on ij as follows The choice of a nonparametric prior allows for flexible modelling of the edge inclusion probabilities. Moreover, we can tune the hyperprior parameters characterising the base measure P 0 to achieve the desired level of sparsity. The parameters and control the clustering structure of the GDP (note that posterior clustering depends also on the choice of the base measure). The choice of the hyperparameters depends on the particular application. The model for e ij is then given by: The above equations defines a GDP Mixture model (GDPM, Lo, 1984;Barcella et al., 2017) for Recalling the discrete nature of the GDP. we can rewrite (10) as where the k denote the (unique) locations of the GDP prior.

Degree distribution
One of the main consequences of choosing a GDP prior is that the edges are clustered on the basis of their inclusion probability. A priori, the GDP does not constrain the number of clusters to a finite value, indeed their number can grow as new data become available. Only a posteriori, once we observe the data, the estimated number of clusters is finite, potentially equal to the number of edges. We now investigate the possible graph structures supported by a GDP prior. We follow the framework of Tan et al. (2017) and describe some properties of the degree distribution. The degree D i of a node i is the number of connections that involve node i, so D i = ∑ j≠i e ij , where e ij is the edge connecting nodes i and j. The degree D i is then bounded between 0 and M − 1, the total number of nodes minus one. The following properties hold (proofs in Supplementary Material): 1. Conditionally on ij , the probability that a node i is connected to a node j is ij . 2. The degree of a node i is distributed as a mixture of Binomial distributions, with mixing weights given by the GDP where, once again for ease of notation, we have substituted the index (ij) with k. We have 3. Marginalising over the random measure, we obtain (see Supplementary Material): The shape of the degree distribution highlights structural characteristics of the graph implied by the prior choice, which are relevant in data analysis. In particular, we focus on sparsity. In a dense graph, each node is connected to many others and, as a consequence, there are few pairwise conditional independences, while a sparse graph presents fewer connections and hence the graph can be decomposed into subgraphs defined by conditional independence structures. A careful choice of prior hyperparameters allows us to obtain the desired level of sparsity, retaining at the same time a good level of flexibility. To better understand the shape of the degree distributions implied by the GDP prior in Equation (9), we perform a sensitivity analysis for different values of and and different parametrisation of the base distribution P 0 . Figures S1 and S2 in Supplementary Material present the resulting degree distribution for different combinations of hyperparameters. It is evident that our prior choice is able to accommodate different shapes. However, simulations show that, by appropriate choice of hyperparameters, we can obtain an exponential decay in the tails of the functions, but not a power law decay.

Multiple GGMs
Often in applications we observe groups of subjects under different experimental conditions. In the SABRE study, for example, we are interested in understanding how patterns of association between metabolites vary across three different ethnicities, in particular in relation with cardiovascular diseases and diabetes. In our application, ethnicity defines three natural subsamples, each characterised by its own graph. In general, we expect different groups to share some common structure as well as group-specific connection patterns. Estimating a single graphical model would lead to an implicit assumption of homogeneity of the underlying graphs across the ethnicities, with a consequent loss of information about their heterogeneity and a consequent high risk of false positives. On the other hand, inferring each graph individually might lead to a loss of power given the reduction in sample size. There is a growing research interest in multiple graphical models. For example, Saegusa and Shojaie (2016) estimate multiple graphs specifying a global penalisation and using optimisation techniques, while in a Bayesian framework Peterson et al. (2015) propose a joint model for multiple GGMs employing a Markov random field prior, encouraging sharing of common edges. The Markov random field prior is also used by Lin et al. (2017) for multiple graphs presenting both spatial and temporal dependence. Also relevant are the works of Tan et al. (2017), which propose a multiplicative prior to capture common and group-specific structures, and of Bilgrau et al. (2020), which presents a penalisation approach to estimate multiple precision matrices, allowing for the incorporation of prior information. We propose to model multiple graphs through an extension of the GDP prior, that is the dependent generalised Dirichlet process (DGDP, Barcella et al., 2017). Due to the discrete nature of the DGDP, each edge can be clustered together with any other edge, independently of the g-th group of origin. This ensures sharing of structural information among groups, at the same time maintaining parsimony in the number of parameters to be estimated. This strategy also allows detecting group-specific connections. Suppose we observe R groups, for example defined by ethnicity in the SABRE study. Each sub-sample g is characterised by a specific sample size n g and its own graph G g , for g = 1, … , R.
Here, we assume that the vector of regression parameters is common to all groups, although this assumption can be easily relaxed. The prior distributions in Equations (7) and (8) are generalised to handle multiple precision matrices Ω g , and therefore multiple adjacency matrices Z g as follows: The hyperparameters v 2 0 and v 2 1 remain unchanged and are common to all groups. We can see that, conditional on the inclusion probabilities g,ij , g , v 2 0 and v 2 1 , Equations (11) and (12) are independent across groups. The prior in Equation (9) on g,ij can be extended in the presence of multiple groups, so that the random measures associated to each group are dependent. Dependence can be introduced in the weights of the stick-breaking representations, by allowing k to be a function of a categorical x, identifying the group. Note that dependence on other group-specific covariates (when available) can be easily introduced. The resulting process is called DGDP, which is defined as follows. Let be the random measure associated to the g-th group, for g = 1, … , R. The locations are iid draws from a common base measure P 0 , as before. The weights still admits the stick-breaking representation: Each kg has a beta distribution, Beta( g , (1 − g )), but now g is group specific. (Barcella et al., 2017) propose to introduce dependence across the { g } employing a beta regression framework and letting the g depend on a categorical covariates denoting group. Using the DGDP, the model in Equation (9) can then be extended to the multiple graphs as follows, for g = 1, … , R: where x g is a categorical design vector of dimension R which includes an intercept term and identifies the group from which the observations come from.
is a vector of regression coefficients, to which we assign a normal prior. In our application, the European ethnicity is the reference group. The DGDP process offers a convenient way to share information across different groups and ensures a greater flexibility than the GDP thanks to the richer parametrisation. Note that x g can include other group specific covariates when available. The MCMC algorithm for posterior inference from a DGDP process is based on a truncation of the infinite mixture (Ishwaran & James, 2001). A discussion on how to choose the truncation level can be found in Ishwaran and James (2001) and Barcella et al. (2017). Details of the MCMC algorithm can be found in Supplementary Material.

SIMULATION RESULTS
We perform a simulation study to investigate the efficacy of the proposed model. We simulate data from multivariate normal distributions, focussing on the estimation of the precision matrix and the multiple graphs. We present here two simulation scenarios, while further simulated examples can be found in Section 4 of Supplementary Material, together with details on the required computational times.
To assess the performance of the proposed model, we consider 20 replicas for each scenario described and we compare the resulting estimates with existing methods. In particular, we consider: (a) the ANOVA-DDP Ishwaran and James (2001) and Barcella et al. (2017); (b) a parametric version of the proposed SSSL model, in which the edge-inclusion probabilities are beta distributed and group specific; (c) the Bayesian structure learning model of Mohammadi et al. (2015), based on a birth-death MCMC in a standard conjugate model specification involving a multivariate normal likelihood and a G-Wishart prior distribution for the precision matrix, available through the R package BDgraph; (d) the graphical Lasso (Friedman et al., 2008) as implemented in the R package glasso; (e) the graphical group Lasso (Danaher et al., 2014), which imposes an additional regularisation between multiple precision matrices to enforce a similar structure, as implemented the R package JGL. We use different metrics for the comparisons: (a) the receiver operating characteristic-area under the curve (ROC-AUC), which is a normalised measure of the area under the ROC curve created by plotting the true positive rate against the false positive rate at various thresholds; (b) the mean square error (MSE), evaluated as the mean squared difference between the true precision matrices and the estimated ones; (c) the MSE restricted to the set of non-zero entries present in the simulated graph. Notice that some of the models implemented for comparison purposes do not allow for direct estimation of the multiple graph structures, namely the graphical Lasso and the graphical group Lasso. Therefore, these two methods are only compared in terms of MSE in the following analyses. Throughout all simulations, we run the Bayesian algorithms for 15000 iterations (of which 10,000 are discarded for the burn-in period) and we use the following parameter specifications:

Scenarios with 20 nodes and 4 groups
We first generate four multiple graphs following the guidelines of Peterson et al. (2015). We construct four precision matrices Ω 1 , Ω 2 , Ω 3 and Ω 4 corresponding to graphs G 1 , G 2 , G 3 and G 4 , of M = 20 nodes (for a total number of possible edges of r = M(M − 1)∕2 × 4 = 760). We first define the precision matrix Ω 1 and then we derive the others as a perturbation of the first. Ω 1 is a M × M symmetric matrix with the main diagonal elements equal to one, first off-diagonal elements i,i+1 = i+1,i = 0.5, for i = 1, … , 19 and second off-diagonal elements i,i+2 = i+2,i = 0.4, for i = 1, … , 8, while the rest of the elements are set to zero. This defines an AR structure for the element of Ω 1 . The total number of non-zero off-diagonal elements is 37. To construct Ω 2 , we remove ten edges at random from Ω 1 , setting the corresponding entries to zero. Then, we randomly add ten edges that are not present in Ω 1 , giving a value of 0.5 to the new precision coefficients. The procedure is repeated similarly for Ω 3 and Ω 4 , avoiding the replacement of edges that were previously modified. The newly created matrices are not necessarily positive definite, to this end, we compute the nearest positive-definite approximation through the R function nearPD (Higham, 2002), from the R package Matrix. The precision matrices Ω 2 , Ω 3 , Ω 4 constructed with this procedure are a perturbation of Ω 1 : as a result they exhibit some common edges and some group specific connections. The number of observations is 60, 50, 50 and 40 for group 1, 2, 3 and 4, respectively. The second simulation scenario is similar to the first, but Ω 1 is now a unit diagonal matrix and we add 60 non-zero off-diagonal elements, chosen randomly from the r possible edges, and fix the corresponding entries of Ω 1 to 0.5. Ω 2 , Ω 3 and Ω 4 are constructed removing 10 edges and adding 10 new edges, randomly selected as before. Once again the number of observations is fixed to 60, 50, 50, 40 for group 1, 2, 3 and 4, respectively.
The ROC-AUC distributions for 10 simulated data sets for the first and second scenarios are displayed in Figure 1. The distributions are concentrated between 0.7 and 1 for all groups in the scenario with the AR structures, denoting the ability of all models to recover the true graphical structure. In the second scenario, the performance is very similar across models, but none of them seems to be able to effectively estimate the edges in the graph.
The boxplots of the MSE distributions are displayed in Figures 2 (full precision matrix) and 3 (considering only the non-zero entries of the precision matrix) for all models considered. The error metric takes low values in general, with the Lasso-based algorithms achieving the lowest values of MSE in the random structure scenario. The DGDP model shows an almost identical performance when compared with the ANOVA-DDP and the parametric models in both scenarios and all groups. The BDgraph yields higher MSE values in most groups and scenarios. In general, while the comparison between the models yields similar results in Figures 2 and 3  overall the MSE values are higher when only the non-zero entries of the precision matrix are considered. Section 4.1 of Supplementary Material shows the results relative to an additional simulation based on the one just described, but characterised by M = 200 nodes and two groups with n 1 = n 2 = 100 in the first scenario, while n 1 = 100 and n 2 = 50 in the second one. The comparison with alternative models, based on the computation of ROC-AUC and MSE values for 10 replicas, shows comparable performance of the proposed model with the ANOVA-DDP and parametric models, and improvements with respect to the BDgraph and Lasso-based methods.

Scenario with 91 nodes and 3 groups
We investigate the model performance on three imbalanced groups as in the SABRE study. We construct three graphs, with M = 91 nodes each, which is equal to the number of metabolites used in the application presented in Section 4. The precision matrices of each group are set equal to the empirical precision matrix among all metabolites of each ethnic group in the SABRE study, setting to zero those elements smaller than 0.1 in absolute value. The resulting graphs have 683, 706 and 2259 non-zero edges, respectively. The new empirical precision matrices are not positive definite; therefore, we use as before the R package nearPD (Higham, 2002). We simulate observations for each group using the same sample sizes of the three groups in the SABRE study, that is 1103, 978 and 119.
In Figure 4, we show the ROC-AUC for each group. We can see that the ROC-AUC values are concentrated around 0.9 for the DGDP model on the first and second group, and are higher than 0.6 for the third group, which is clearly the most difficult to accurately estimate because of the much smaller sample size. In general, the DGDP model outperforms the alternative models considered.
In Figure 5 we report the boxplots of the MSE, where we can notice a good performance of the proposed model when compared to BDgraph and Lasso-based models. The performance in terms of MSE compared to the ANOVA-DDP and the parametric models is slightly worse for the DGDP, but comparable. Once again, the MSE values are in general higher when only the non-zero entries of the precision matrix are considered.
We also examine the posterior distribution of the partition induced by the proposed model, in comparison with the one obtained from the Anova-DDP model. In Figure 6, we present a summary of the posterior distribution of the number of clusters K ⋆ within each group. The distribution of the number of clusters is concentrated on lower values for the DGDP model, indicating that the proposed model favours coarser partitions. Considering the partition estimate obtained by minimising the Binder loss function (Binder, 1978;Lau & Green, 2007), the medians and ranges of the number of clusters across replicas for the three groups are 1 (1, 2), 1 (1, 3) and 1 (1, 2) for the DGDP. For the ANOVA-DDP, they are equal to 6 (1, 9), 6 (1, 10) and 5 (1, 10), respectively. To better highlight the features of the partitions implied by the two models, in Table 1, we report summary statistics for the following features of the partitions: the number of singleton clusters (posterior median and range) and the size of the largest cluster (posterior median and range). The results are averaged over the 20 replicas. The Table shows that the DGDP provides a more parsimonious representation of the data in terms of number of singleton clusters and size of the largest cluster. Section 4.2 of Supplementary Material shows the results for an additional simulation similar to the one described in this section, with the inclusion of a non-zero mean term. The comparison shows a general difficulty in estimating the graph structure in the smallest group, especially when the graph structure in the third group is characterised by a large number of edges, as in our case.

SABRE RESULTS
In this Section, we fit the proposed model for multiple GGMs to the SABRE metabolic data set. The data set described in Section 1 has a total of 2200 observations, stratified in three ethnicities, 1103 Europeans, 978 South Asians and 119 African-Caribbean. The number of nodes (i.e. the number of equations in the SUR model) is M = 91, a list of which can be found in Table S1 in Supplementary Material. The number of metabolites is reduced here from 200 to 91 because we focus on the absolute concentrations of the metabolites and we exclude the ratios between the concentrations. As predictors in the regression term of the mean, we include the covariates listed in Table S2, consisting of measures of body-fat distributions, liver health and other risk factors, such as smoking habits, sex and age (the total number of the covariates is p = 18 with an additional intercept). All the covariates are included in each equation, but variable selection is equation specific. We specify the following prior distributions. The scale parameters for the normal mixture in Equation (11) are chosen to ensure sparsity in the estimated graph, so that negligible and small off-diagonal coefficients of the precision matrix are set to zero. We choose v 0 = 0.01 and h = 100, while g = 1 following the recommendations of Wang (2015). The DGDP base measure P 0 is a Beta(a = 0.5, b = 0.5). The concentration parameter is assigned a Gamma( a = 0.1, a = 2) prior, while the vector of coefficients in the beta regression is given a normal distribution with mean = 0 and covariance matrix Σ = 10 × I R . We specify a horseshoe prior for regression coefficients as described in Equation (5). We run the MCMC for 12,000 iterations, comprising a burn-in period of 2000 iterations and thinning every 5 iterations. In addition to the multiple graphs, using the output of the MCMC algorithm, we also estimate the differential networks (De la Fuente, 2010;Valcárcel et al., 2011) arising from the pairwise comparison between the three ethnicities. A differential network includes all the edges that are present only in one of the two groups (i.e. present in one group and not the other and vice-versa), thus helping us to understand the main differences between two ethnicities. Here we focus mainly on the differences between Europeans and South Asians, since the African-Caribbean ethnicity has a very small sample size that heavily affects the estimation of the latent graph, as it was also observed in the simulation scenarios with analogous sample sizes of Section 3.2 and 4.2 in Supplementary Material.
In Figure 7 are shown the posterior means of the regression coefficients lj , for equation l = 1, … , M and covariate j = 1, … , p. The only covariates that do not show association with any metabolite are waist-to-hip ratio and diastolic blood pressure. It is worth noting that WHR has a strong positive correlation with some of the other measures of adiposity, such as sagittal TA B L E 1 Scenario with 91 nodes and 3 groups: summary statistics of the number of singleton clusters and size of the largest cluster (posterior median and range) under the dependent generalised Dirichlet process (DGDP) or ANOVA-DDP models. The estimates are obtained by averaging over the 20 replicas

Summary
Group DGDP ANOVA-DDP diameter, which can result in the selection of a variable over the other. The same scenario applies to the variable diastolic blood pressure, which is positively correlated with systolic blood pressure. We summarise the attributes of the three group-specific networks inferred by the proposed model (i.e., Europeans, South Asians and African-Caribbean) in Table 2, while their plots can be found in Figures S11, S12 and S13 of Supplementary Material. Table 2 shows the main features of the individual networks, namely: the number of inferred edges, the number of connected and isolated nodes; the average edge and node betweenness (i.e. the number of shortest paths passing through an edges or node); the average node degree (i.e. the number of connections departing a node) and the average clustering coefficient (also called transitivity, i.e. the probability that the adjacent nodes of a node are connected). All these measures are computed using the R package igraph. The individual networks are characterised by a high number of edges, especially in the European and South Asian groups, connecting all nodes in the networks. In particular, the first two groups are very similar in all the features reported in Table 2, while the African-Caribbean group is characterised by less connections and lower betweenness and clustering coefficient. This indicates different network organisations and metabolite interactions within the three ethnic groups, especially when comparing the European and South Asian groups with the African-Caribbean group.
In Figure 8, we show the differential network between Europeans and South Asians, where an edge between two nodes is added to the differential graph if the posterior mean of the absolute difference between the adjacency matrices in the corresponding position is higher than 0.5. This quantity is computed by averaging over the posterior MCMC chain. It is worth noting that there are no edges among the majority of lipoproteins subfractions, which implies that the presence or absence of those connections are shared by both of these ethnicities. On the other hand, the majority of the amino acids have some distinct connections, highlighting potential differences in the underlying metabolic processes. For example, the amino acid Histidine features many connections with other amino acids and a subset of lipoprotein subfractions, with these edges only present in the South Asian group. Other central nodes in the differential network are acetoacetate, acetate, pyruvate and lactate.
We also show the differential networks between Europeans or South Asians and African-Caribbean in Figures S14 and S15 in Section 5 of Supplementary Material. These differential networks have more edges compared to the one between Europeans and South Asians, due to the fact that the network for the African-Caribbean ethnic group is much sparser. Therefore, in order to highlight the few connections unique to the Africans Caribbean, we limit the inclusion of differential edges in the Europeans and South Asian groups to those with an inclusion probability higher than 0.8.
To gain a better understanding of the estimated connections and to relate the estimated graph to known metabolic pathways, we conduct a pathway over-representation analysis (ORA) using the online software MetaboAnalyst (Chong et al., 2018). We include in the analysis all metabolites that have a connection in the differential network of Figure 8. ORA evaluates statistically the fraction of metabolites in a particular pathway found among the user-specified set of metabolites, in our case, the metabolites with connections in the differential network. For each pathway, input metabolites that are part of the pathway are counted. Next, every pathway is tested for over or under-representation in the list of input metabolites using the hypergeometric test. The most represented pathways are the ones with smaller p-value levels and higher number of over-represented metabolites. Here we discuss the first four top-ranked conditions identified by this pathway analysis, pyruvate dehydrogenase deficiency (E3), pyruvate carboxylase deficiency, diabetes mellitus (MODY) non-insulin-dependent and chronic progressive external ophthalmoplegia (CPEO)∕Kearns-Sayre syndrome (KSS). Pyruvate dehydrogenase and pyruvate carboxylase Europeans only South−Asians only F I G U R E 8 Southall And Brent REvisited study: differential network between Europeans and South Asians groups. The red edges represent the connections between metabolites for the European ethnicity whose probability of inclusion in the differential network is higher than 0.5, while blue edges represent the connections between metabolites for the South Asian ethnicity whose probability of inclusion in the differential network is higher than 0.5. [Colour figure can be viewed at wileyonlinelibrary.com] deficiency are the most common disorders in pyruvate metabolism. Pyruvate dehydrogenase (PDH) is an enzyme complex made of three catalytic subunits, pyruvate dehydrogenase (E1), dihydrolipoamide acyltransferase (E2) and dihydrolipoamide dehydrogenase (E3), and two cofactors, thiamine pyrophosphate and lipoic acid. The enzyme complex converts pyruvate, after it enters the mitochondria, into acetyl-CoA, that together with oxaloacetate, are two essential substrates in the production of citrate. PDH complex deficiency therefore leads to a limited production of citrate and because citrate is the first substrate in the tricarboxylic acid cycle, the cycle is blocked and other metabolic pathways need to be stimulated to produce acetyl-CoA. However, the most common deficiency involves the E1 subunit, while mutations in E2 and E3 are less often the cause for PDH Complex Deficiency. The enzyme defect causes more pyruvate to be metabolised to lactate and leads to lactic acidosis (Bissonnette & Bissonnette, 2006). Overall, PDH complex plays a key role in regulating the supply of adenosine triphosphate during the feed-fast cycle, where cells must select fatty acid or glucose as energy source. Therefore, PDH Complex is important in regulating the glucose metabolism with PDH deficiency related to metabolic diseases, e.g. type 2 diabetes and obesity (Lee, 2014). Of particular interest is pyruvate carboxylase deficiency. Lao-On et al. (2018) explore the roles of pyruvate carboxylase in human diseases, such as diabetes. Pyruvate carboxylase (PC) is an anaplerotic enzyme which plays an essential role in various cellular metabolic pathways, including gluconeogenesis and glucose-induced insulin secretion. Pyruvate originates as the final product of the pathway pyruvate. In aerobic conditions, pyruvate enters mitochondria via the mitochondrial pyruvate carrier, where may be further metabolised in two different means. In non-gluconeogenic tissues, like muscles and brain, pyruvate is decarboxylated to form acetyl-CoA catalysed by the pyruvate dehydrogenase complex. In gluconeogenic tissues, where pyruvate carboxylase is highly abundant, most of pyruvate entering mitochondria is carboxylated by the enzyme pyruvate carboxylase to form oxaloacetate. Given the importance of oxaloacetate in various biochemical pathways, perturbation of oxaloacetate production by PC can produce serious diseases such as type 2 diabetes or neurological disorder. MODY is an autosomal dominant monogenic disorder of pancreatic beta cells that usually manifests itself before the age of 30 and accounts for 1%-3% of diabetes in this age group (Misra & Owen, 2018), although the prevalence of MODY in South Asians is low, despite their increased risk of type 2 diabetes (Ehtisham et al., 2004). Finally, CPEO is one of the most common mitochondrial disorders in adults. The main symptom is a slowly progressive extra-ocular muscle weakness. KSS and CPEO are probably the same disorder but differ in the degree of severity (Gilman, 2011). In both CPEO and KSS, hearing loss and diabetes mellitus can precede the onset of muscle involvement by years (Shoffner et al., 1990). Additionally, involvement of systems other than muscle is common in CPEO. Multi-system involvement can cause functional impairments secondary to dysfunction of (proximal) skeletal muscles, retina, cochlea, cerebrum, cerebellum and heart (Smits et al., 2011). Ocular manifestations include retinopathy, optic atrophy, and rarely, cataracts. Cardiac manifestations include cardiac conduction block and cardiomyopathy. Cerebral manifestations include epilepsy, cerebellar ataxia and dementia. The peripheral nervous system can also be affected, typically with axonal sensory neuropathy. Endocrine involvement includes diabetes mellitus, hypothyroidism, hypoparathyroidism and hypogonadism. Sensorineural hearing loss and gastrointestinal involvement are also possible (Vorgerd & Deschauer, 2011). In interpreting the pathway over-representation analysis, we note that all the highlighted conditions are associated with defective mitochondrial function and altered pyruvate metabolism. We therefore speculate that alterations in metabolic flexibility and mitochondrial aerobic metabolism (Smith et al., 2018) may be a fruitful area for further study. Consistent with this suggestion, a previous small experimental study reported that overweight South Asian men had impaired metabolic flexibility compared with matched European counterparts (Bakker et al., 2015) and we have previously observed poorer oxidative capacity in skeletal muscle independent of diabetes in South Asians compared with Europeans (Jones et al., 2020).
To evaluate the sensitivity of the model to the prior choice of v 0 , playing a major role in determining the level of sparsity of the graphs, we repeat the analysis for v 0 ∈ {0.01, 0.1}. A comparison of the estimates given v 0 = 0.01 and v 0 = 0.1 shows differences in the number of edges included in each of the three graphs and consequently in the differential networks. However, the main characteristics of the individual graphs and the differential network are maintained. For example, the differential network between Europeans and South Asians estimated with v 0 = 0.1, reported in Figure S16 in Supplementary Material, highlights similar connectivity patterns to the one estimated with v 0 = 0.01, for example the amino acid Histidine presents connections with other amino acids and lipids subfractions, with these edges being only present in the South Asian group, as before. Furthermore, varying the prior values of a , b between 0.01 and 0.5 does not lead to substantial changes in posterior inference.

CONCLUSIONS
This paper proposes the use of a GDP prior on the edge inclusion probabilities of a GGM together with the SSSL prior on the precision matrix. The model allows the specification of a desired level of sparsity in the graph and the inclusion of prior information about specific connections between pairs of nodes, when prior knowledge is available, for example, from literature or from expert opinions. We analyse the properties induced on the graph by the GDP prior in terms of the degree distribution. We demonstrate that this prior is able to capture a wide range of structures, from sparse to more dense graphs. The GDP prior allows us to cluster a posteriori the edges based on their inclusion probabilities. Using an extension of the GDP process, the DGDP, we develop a framework for inference on multiple GGMs. The DGDP offers a convenient way to share information across groups and allows for the possibility to include group specific information in the model. The SSSL prior ensures good scalability of the MCMC thanks to its efficient update scheme and good convergence rates. The SUR model is completed by specifying a global-local shrinkage prior on the coefficients in the mean regression term, allowing each equation to have its own vector of regression parameters and its variable selection. The horseshoe prior effectively shrinks small and negligible coefficients to zero, while leaving important coefficients unaffected thanks to its heavy tails, as such performing (group-specific) variable selection. We illustrate the performance of the proposed model, and compare it with an alternative nonparametric prior on the edge inclusion probabilities (the ANOVA-DDP prior) in a simulation study. The results highlight the ability of the model to recover the true underlying structure of the graphs and to correctly identify association between covariates and response. Finally, we employ the proposed sparse SUR model to analyse the SABRE metabolomics data set. Our clinical interest focuses on different patterns of metabolite associations within the three ethnicities. Our approach allows us to provide an interpretable set of unique associations patterns which can aid mechanistic understanding of between-group differences in the development of insulin resistance and diabetes and can highlight areas for further research. In doing this, we still correct for potential confounders within the SUR framework. Our findings are interesting because they can lead to formulation of new hypothesis, for example metabolic pathways associated with diabetes and cardiovascular disorders, and guide further experimentation.
Clinical Research Network (NIHR CRN). The authors are extremely grateful to all the people who took part in the study, and past and present members of the SABRE team who helped to collect the data.