A deep neural network framework for real-time on-site estimation of acceleration response spectra of seismic ground motions

Various earthquake early warning (EEW) methodologies have been proposed globally for speedily estimating information (i.e., location, magnitude, ground-shaking intensities, and/or potential consequences) about ongoing seismic events for real-time/near real-time earthquake risk management. Conventional EEW algorithms have often been based on the inferred physics of a fault rupture combined with simplified empirical models to estimate the source parameters and intensity measures of interest. Given the recent boost in computational resources, data-driven methods/models are now widely accepted as effective alternatives for EEW. This study introduces a highly accurate deep-learning-based computational framework named ROSERS (i.e., Real-time On-Site Estimation of Response Spectra) to estimate the acceleration response spectrum ( S a ( T )) of the expected on-site ground-motion waveforms using early non-damage-causing early p -waves and site characteristics. The framework is trained using a carefully selected extensive database of recorded ground motions. Due to the well-known correlation of S a ( T ) with structures’ seismic response and resulting damage/losses, rapid and accurate knowledge of expected on-site S a ( T ) values is highly beneficial to various end-users to make well-informed real-time and near-real-time decisions. The framework is thoroughly assessed and investigated through multiple statistical tests under three historical earthquake events. These analyses demonstrate that the overall framework leads to excellent prediction power and, on average, has an accuracy above 85% for hazard-consistent early-warning trigger

waveform to trigger alerts for preparing against the high-amplitude (destructive) transverse s-waves. Typically, p-waves are 1.7 times faster than s-waves in any given material (Kramer, 1995). EEW systems exploit this time lag between the arrival of the p-waves and s-waves to predict various parameters of interest for real-time decision-making purposes.
EEW systems mainly consist of two general sub-classes: (1) regional and (2) on-site. A regional EEW system is a network-based system consisting of seismic sensors installed near a seismic source zone. When an earthquake is detected, a regional EEW system uses the information from the initial seconds of the ground motions (mainly p-waves) recorded at several stations close to the source to alert target sites farther from the source. The alert is generally based upon the rapid estimation of the earthquake location and magnitude, which are then used as inputs to pre-calibrated ground-motion models (GMMs). These GMMs estimate the distribution of intensity measures (IMs) of the expected ground-motion at a given site (Bhardwaj et al., 2016;Li et al., 2013;Mu & Yuen, 2016). This type of approach leads to large variabilities in the IM estimates due to the propagation of uncertainty for estimating source parameters (location and magnitude) and the uncertainties, both epistemic and aleatory, in GMMs (Iervolino et al., 2009). Various regionals EEW methodologies attempt to directly predict ground-motion IMs at target sites bypassing the source parameter estimation (Hoshiba & Aoki, 2015;Münchmeyer et al., 2020). However, these approaches are not ideal for regions in the vicinity of a seismic source (generally known as blind zones) due to multiple reasons such as longer computation times and the requirement of a dense array of sensors, limiting the handing of maximal information in a real-time setting. Some studies, such as Panakkat and Adeli (2009) and Rafiei and Adeli (2017), have also proposed detailed methodologies to forecast an earthquake event parameter (such as location), weeks to days before its occurrence (such methods fall under a separate class of short-term earthquake forecasting frameworks, known as operational earthquake forecasting [OEF]). Panakkat and Adeli (2008) presented several advances made in OEF methods.
On the other hand, an on-site EEW system is a standalone system based on a single sensor (or a small array of sensors) located in the proximity of the target sites. Within this setting, the characteristics of early p-waves observed on-site are used to estimate the earthquake source parameters or the IMs of the expected ground motion that mainly comprise the late s-waves and surface waves at the same site. This approach is specifically beneficial for sites located within blind zones as the expected shaking levels are estimated for the same site observing the ground-motion, thereby requiring minimum time. Most of the on-site EEW approaches utilize pre-trained parametric or nonparametric models to estimate amplitude-based, energybased, frequency-based, and duration-based IMs using the initial seconds of the ground-motion waveforms in realtime (e.g., Brondi et al., 2015;Caruso et al., 2017;Colombelli & Zollo, 2015;Hsu et al., 2013;Münchmeyer et al., 2020;Wu & Kanamori, 2008). However, the development of such methods requires careful attention in terms of using IMs that provide (statistically) sufficient and efficient information for real-time decision-making. For instance, most of the existing on-site EEW algorithms (such as Caruso et al., 2017;Colombelli & Zollo, 2015;Hsu et al., 2013;Münchmeyer et al., 2020) are trained to estimate scalar quantities such as peak ground acceleration (PGA) or peak ground velocity (PGV). These only provide information about the shaking of the ground and not their potential consequences on the structures/infrastructure in the area. The use of poorly informative IMs can, in turn, lead to sub-optimal EEW decision-making (i.e., increased occurrences of false and missed EEW alarms) with the current approaches. Such insufficient EEW frameworks can have a detrimental impact on the socio-economic conditions of the affected region. False positives (cases in which a wrong prediction of high shaking is made, i.e., false alarms) can result in long downtimes and panic. On the other hand, false negatives (cases in which a wrong prediction of low shaking is made, i.e., missed alarm) have self-evident consequences in terms of economic losses and even casualties. Furthermore, false negatives severely increase a community's vulnerability since a population convinced that it is likely to be warned before high levels of shaking are likely to be less prepared. Such cases are already noticed in research studies (Meier, 2017;Minson et al., 2019;Wald, 2020), which have criticized the uses of EEW systems and demonstrated mistrust in EEW. Studies such as those by Wu et al. (2013) and Hsu et al. (2016) proposed relevant approaches that can assist EEW in reducing false alarms. In particular, Wu et al. (2013) proposed an earthquake probability-based automated decision-making framework. The framework uses basic decision theory and existing cost-benefit analysis procedure (i.e., based on conventional GMMs) to suggest a general decision criterion in EEW. Coupling such decision-making methods with an accurate IM prediction method (as proposed in this study) can lead to more robust EEW systems.
In the field of performance-based earthquake engineering, the acceleration response spectrum, S a (T), is a widely used IM that can effectively integrate the characteristics of the ground-motion waveform (such as amplitude, frequency content, etc.) with the dynamic behavior of a structural system idealized as a single-degree-of-freedom system (SDoF; Bazzurro et al., 1998;Vamvatsikos & Allin Cornell, 2002). An accurate estimation of the groundmotion S a (T) values in real time can help stakeholders better design engineering applications of EEW, including those relying on risk-informed decision support systems (Cremen & Galasso, 2021). Recently, Iaccarino et al. (2020) and Jozinović et al. (2020) proposed data-driven methods (such as mixed-effect model and convolutional neural networks, respectively) to obtain S a (T) values for few periods using initial seconds of the arriving ground motions; however, based on the results presented in their work, the frameworks' accuracy and robustness to work efficiently in real time could be further improved.
This paper contributes to extending the objectives of current EEW frameworks to be accurate, timely, and both structurally-and ground-motion-informed. In particular, this study proposes a highly precise and novel on-site EEW framework named real-time on-site estimation of response spectra (ROSERS) developed using a deep neural network (DNN) and a variational autoencoder (VAE). The framework uses site characteristics (SC) of the recording station(s) and an IM vector (representing amplitude, energy, duration, and frequency content) of the initial 3 s of the arriving ground-motion (after detection of pwaves) to estimate a vector of PGA (PGA is also referred to S a (T = 0)) and spectral ordinates (i.e., the S a (T) spectrum for 95 SDoF periods (T)) for the expected on-site complete ground-motion waveform. The framework utilizes a trained feed-forward DNN to concurrently estimate a set of two mathematical-representative latent variables using the SC and the considered IM vector as inputs. The obtained latent variables are then inputted to a trained VAE decoder to develop the 96-period S a (T) spectrum (including PGA) of the expected on-site ground-motion. The framework is evaluated through rigorous statistical tests and testbed studies for three historic earthquakes events. Hence, apart from utilizing a more structurally informed IM as a target decision variable for EEW, this study utilizes an advanced neural-network approach to optimally conduct dimensionality reduction of the S a (T) spectrum into surrogate latent variables that can be practically used in real time. Furthermore, using data-driven and deep-learning-based methods minimizes the assumptions involved in the model development and makes the framework flexible and easily re-trainable. In addition, the high dimensional interpolative and extrapolative nature of DNNs makes this study more robust than non-deep learning-based and traditional methods (Gustafson et al., 1990;Xu et al., 2021). The proposed framework is observed to perform well on the used dataset, and based on the findings of this paper, it can be highly beneficial to advance EEW research and support various end-users.
The paper starts by introducing the proposed framework and its key implementation steps. Then, the considered ground-motion database, algorithms, and models used to develop the framework are explained, and the results of various statistical tests are presented. Finally, the proposed framework is illustrated by applying it to three historical earthquake events recorded in California, used as testbeds. The illustration is conducted to test the performance of the proposed framework in hazard-consistent alarm triggering.

PROPOSED FRAMEWORK
The proposed ROSERS framework is illustrated in Figure 1. A digital recording station continuously monitors ground shaking at a given location. As the seismic sensor detects the initial p-waves of the arriving ground-motion (e.g., Akazawa, 2004;Kalkan, 2016;Sleeman & van Eck, 1999), the ROSERS framework stands by to receive 3 s of the incoming ground-motion. While any picker algorithm can be used to detect p-waves arrival, this study uses the P PHASE P ICKER algorithm by Kalkan (2016). ROSERS utilizes the first 3 s of the observed ground-motion in real time to compute a vector of seven intensity measures denoted as IM 3s . The vector IM 3s correspond to the amplitude, energy, frequency, and significant duration of the initial 3 s of the ground-motion waveform after detection of p-waves. The computed IM 3s is combined with the vector of site characteristics (denoted as SC) of the recording station, known a-priori. The obtained IM 3s and SC are then used as inputs to a pre-trained feed-forward DNN estimating two mean latent variables (ˆ1 andˆ2 ), which are then cascaded to a pre-trained VAE decoder. It should be noted here that the two latent variables are not physically derived but represent a two-dimensional statistical surrogacy of the complete ( ) spectrum, which is obtained by training a VAE. The VAE decoder transforms the estimated mean latent variables (ˆ1 andˆ2 ) into the vector of PGA and 95-period S a (T) spectrum (denoted as S a ). The obtained S a vector can be utilized by various endusers/stakeholders to informatively alert a community through risk-informed EEW decision support systems, develop shakemaps, perform structural control, and other similar applications in real-time (Cremen & Galasso, 2021). The estimation of the S a vector through ROSERS takes less than 1 s on average, thereby providing end-users with an ample amount of time for decision-making.

GROUND-MOTION DATABASE
The framework is developed and trained using the recorded ground motions available in the comprehensive database of the Next-Generation Attenuation West version F I G U R E 1 Illustration of the real-time on-site estimation of response spectra (ROSERS) framework. The illustration flows from left to right. The implementation of this ROSERS framework, on average, takes ∼0.75 s on a modern personal computing machine 2 (NGA-West2) project (Ancheta et al., 2014). The database utilized in this study consists of 13,916 ground-motion components from 277 seismic events recorded worldwide and are processed based on the recommendations of (Boore & Bommer, 2005). It is worth noting that ground motions in real time are generally not processed in the same manner, and the processing usually involves the use of simplistic techniques such as trend removals and low-pass and high-pass filtering (Caruso et al., 2017;Hsu et al., 2013). However, this assumption is not expected to significantly change the proposed framework's predictive performance: The data-driven nature of the artificial neural networks, with sufficient training, can quickly make the framework adapt to any minor deviances and noise (Borodinov et al., 2019;Jumper et al., 2021). The moment magnitude (M) and closest rupture distance (R rup ) metadata of the considered database is provided in Figure 2a. Other details of the metadata can be obtained from (Fayaz et al., 2021). As can be observed from Figure 2a, the database is skewed toward events with ≤ 5.5. Hence, in this study, the ground motions with ≤ 5.5 (majority class) are undersampled (Lemaitre et al., 2017) such that the number of ground motions with ≤ 5.5 and > 5.5 is equivalent while maintaining the distributions of and site's average shear-wave velocity of the soil up to 30 m ( 30 ). This results in 6,392 ground-motion components whose event data is presented in Figure 2b. The distributions for ground motions with ≤ 5.5 and > 5.5 are observed to be similar. The bimodal shape of the M histograms is inherited in the data as not enough seismic recordings are available for 5 ≤ ≤ 6. For the obtained ground motions, their acceleration response spectra at 96 periods (S a (T) for periods from 0 to 5 s) is computed.
To assess the ground-motion time used by the proposed framework (i.e., initial 3 s after the detection of p-waves) with respect to the ground-motion duration for the undersampled dataset, the overall duration ( ) and time to PGA after detection of p-waves is computed. To avoid the inclusion of zeros and very small acceleration values (< 10 -5 g) in the computation of (which are padded while processing of ground motions in NGA-West2), is computed by obtaining the time difference between attainment of 99.9% and 0.1% of (similar to the computation of significant duration 5−95 ; Kramer, 1995). Figure 3a,b shows the and time to PGA of the undersampled ground-motion dataset, respectively. It is observed that has a mode of about 40 s, and the time to PGA has a mode of about 30 s. This means that the predictions made within the initial 3 s of the on-site ground motions (after detection of p-waves) can still allow an ample amount of warning time for decision-making. Thus, the proposed framework can assist in taking real-time risk mitigation actions such as the start of evacuation procedures/move to safer locations within a structure, automatic emergency responses, road closures, and so forth, before the arrival of the peak acceleration and before shaking ends. This further indicates the framework's usefulness to perform well in a real-time setting.

F I G U R E 2
Ground-motion database before (a); and after (b) undersampling F I G U R E 3 (a) (s); and (b) time to peak ground acceleration (s) of the undersampled dataset

p-phase arrival-time picker
Though the ground motions in the NGA-West2 database are already processed, the ground-motion records do not necessarily start with p-waves and may contain additional zeros and early station noise in the recordings. Hence it is crucial to detect the p-waves in the ground motions to develop the proposed EEW framework. This study obtains the p-waves arrival time in the ground-motion time histories using an automated p-phase picker approach developed by Kalkan (2016). Unlike other approaches (Akazawa, 2004;Sleeman & van Eck, 1999), the proposed method for picking p-phase arrival times in single-component acceleration or broadband velocity records does not require any detection interval or threshold setting. The algorithm P PHASE P ICKER obtains the response of an SDoF oscillator with viscous damping subjected to the input ground-motion and then measures the dissipated damping energy. The rate of change (power) of the measured damping energy is then used to pick p-wave phases (Kalkan, 2016). The SDoF oscillator is developed to possess a short natural period and consequently a high resonant frequency. This is done to ensure that the natural frequency of the SDoF is higher than the frequencies mostly observed in seismic waves, which prevents the effects of resonance. Phases angles are maintained as the SDoF is provided with a high damping ratio (60% of critical) at which the frequency response reaches the Butterworth maximally flat magnitude filter. The relative input energy imparted to the oscillator by the input ground-motion is converted to elastic strain energy and then dispelled by the damping element as damping energy. The computed damping energy generates a smooth envelope over the duration of the seismic ground-motion. The envelope is close to zero at the beginning of the signal before the arrival of p-phase and builds up swiftly with the arrival of the p-wave. The considerable change in the damping energy function at the onset of the p-wave is used to track and pick the arrival of p-phase. The P PHASE P ICKER particularly detects the onset of p-phase using the histogram method (Anon, 2011;Solomon et al., 2001). The histogram method is used as the picking algorithm, as it does not require any detection interval or threshold settings. The 6,392 ground-motion components are analyzed using the P PHASE P ICKER algorithm to obtain their corresponding p-wave arrival time, and only the ground-motion time histories after p-wave arrival are considered for the computation of IM 3s and further analysis in this study.

COMPONENTS OF THE ROSERS FRAMEWORK
This section presents the details of the models used in the proposed framework and the statistical analysis results conducted to test their prediction power. The section first explains the details of the VAE model that is used to project the a vector into the two latent-variable space. Then the DNN model used to estimate the mean of the two latent variables is discussed; finally, the hierarchical mixed-effects regression of the residual is described. For each sub-section, the model is explained first, and then its prediction power is described.

Variational autoencoder
The vector computed for the 6,392 ground-motion components for the 96 periods is used as input to be reconstructed in the VAE (Kingma & Welling, 2019). The VAE is bottlenecked to have two independent normally distributed latent variables (denoted as 1 and 2 with means 1 and 2 and variances 2 1 and 2 2 , respectively) in the sampling layer. A two-dimensional latent variable space is used as it results in a good trade-off between higher reconstruction power and a lower number of latent dimensions. The neural network-based VAE is trained through cross-validation using a randomly split 80% of the dataset while the remaining 20% is used as the final test set. The training and testing of the VAE are performed using a log transformation of ( ) as the ground-motion IMs are generally observed to be lognormally distributed (Zarrin et al., 2020). VAE provides a probabilistic approach to describe a vectorial observation in their latent variable space. This is done using a neural network-based encoder (recognition model) trained in conjunction with a neural networkbased decoder (generative model) that can use the latent variable space to reconstruct the observations. This means that the encoder describes a probability distribution for each latent attribute from which values are randomly sampled to be fed into the decoder that is expected to accurately reconstruct the input. Hence, the latent space is essentially compelled to possess continuous and smooth representations. Consequently, nearby values in the latent space correspond to similar reconstructions using the decoder. It is worth mentioning that while there are other approaches to this aim, such as principal component analysis, singular value decomposition, regular autoencoders, and so forth, such methods do not necessarily provide a continuous probabilistic space of reduced dimensions (i.e., latent variables) along with such high accuracy and reconstruction power.
The underlying idea of training the VAE relies on the basic Bayes' rule given in Equation (1), where represents the true vector of input variables and represents the latent variable space. Due to its intractability, ( ) can be estimated using variational inference (Blei et al., 2017). This relies on estimating ( | ) by another distribution ( | ), which is defined such that it has a tractable distribution. By defining ( | ) such that it is very similar to ( | ), it can be used to perform approximate inference of the intractable distribution. This is done by minimizing the Kullback and Leibler (KL) divergence (Kullback & Leibler, 1951) between ( | ) and ( | ) shown in Equation (3), where the KL divergence between any two distributions and can be computed using Equation (2). The minimization process of Equation (3) can be conducted by maximizing the expression given in Equation (4) where ( | ) log ( | ) represents the reconstruction likelihood, and KL[ ( | )|| ( )] ensures that the learned posterior distribution ( | ) is similar to the prior distribution ( ), which is assumed to be a standard Gaussian distribution (∼ (0, 1)) for each latent variable. ( | ) log ( | ) is the mean-squared error (L2 loss) between and̂ (Kingma & Welling, 2019).
The proposed VAE consists of eight layers in the encoder and decoder (including the input and output layers) with a total of 850 neurons and two-dimensional latent variable space. The VAE is trained with the train set in epochs using the Adaptive Moment Estimation (Adam) (Kingma & Ba, 2014) optimizer and early stopping (Chollet, 2018) regularization. The means of the two latent variable distributions are presented in Figure 4, where the colors of the markers represent the magnitude (M) of the seismic event ( Figure 4a) and rupture distance ( ) of the station site (Figure 4b). It can be observed from Figure 4a that smaller magnitudes (M < 5.5) tend to have 1 > 0 and 2 > 0, and larger magnitudes ( > 5.5) tend to have 1 < 0 and 2 < 0. In general, 1 and 2 tend to decrease with an increase in magnitudes. Similarly, from Figure 4b, it can be observed that smaller rupture distances ( < 45 km) tend to have 1 < 0 and 2 < 0, and larger rupture distances ( > 45 km) tend to have 1 > 0 and 2 > 0. In general, 1 and 2 tend to increase with an increase in rupture distances. This shows that the latent variables have an inverse attenuation relation with the and , as they decrease with an increase in and decrease in . Furthermore, Figure 5a,b shows two examples of the true a spectrum and reconstructedˆspectrum for different levels of shaking intensities. It can be observed from the F I G U R E 5 Examples of randomly selected true versus reconstructed with (a) > 1 g; and (b) <1 g figures that the trained VAE performs well in reconstruing the spectrum using the mean latent variables without any clear bias toward any level of shaking.
The overall results of the developed VAE are presented in Figure 6a,b. The coefficient of determination R 2 between the true a and reconstructedˆa for the 96 periods for both train and test sets are presented in Figure 6a. R 2 for all periods is observed to be above 0.98, demonstrating excel-F I G U R E 6 (a) R 2 of reconstructed ( ) at various periods; and (b) true versus reconstructed ( ) lent reconstruction power of the developed VAE with minimal bias and variance. This can be further observed from Figure 6b, where the true versus reconstructed ( ) are presented for periods 0.2, 0.5, 1, 2, 5, and 0 s (PGA). For all the cases, it is observed that the true vs. reconstructed ( ) follow the identity line (1:1) very closely for all ranges of values, demonstrating no bias and high reconstruction power of the VAE. Even with just a visual comparison (since a comparable goodness-of-fit measure is not reported in the reference studies) of Figure 6b with other state-of-practice EEW frameworks (Münchmeyer et al., 2020;Caruso et al., 2017;Colombelli & Zollo, 2015;Hsu et al., 2013), the superiority of the proposed framework can be observed. Based on Figure 6a,b, it is clear that the mean latent variables 1 and 2 can adequately reconstruct the a using the VAE decoder. Hence, if one can estimate the mean latent variables 1 and 2 of the ground motions accurately, the trained VAE decoder can be used to transform the estimated latent variables into the a . It is worth mentioning that, although a VAE is primarily used as a generative model (Gorijala & Dukkipati, 2017), the idea of dimensionality reduction coupled with a VAE's requirement to possess continuous and smooth representations of latent variables makes it a perfect candidate for the real-time estimation of the a of the incoming ground motions. The latent variables for new observations can be easily interpo-F I G U R E 7 2 of linear regression versus ground-motion time windows lated and extrapolated, leading to good construction of the corresponding spectrum. Also, the utilization of the surrogacy and dimensionality reduction leads to the estimated a that is inherently cross-correlated as the prediction is made jointly for the complete ( ) spectrum at 96 periods.

Estimation of latent variables
In a real-time setting, to construct the a of the expected ground-motion, it is essential to accurately estimate the mean latent variables 1 and 2 , which can then be used in the VAE decoder. To allow ample on-site warning time during the occurring ground-motion, which can last up to 1 to 2 min in some conditions (Figure 3a; Fayaz, et al., 2020), only a few early seconds of the arriving waveform can be practically used for any predictions and early warning. Various initial time windows (ranging from 1 to 10 s) after detection of p-wave arrival were considered for computing the considered ground-motion IMs. For each time window, the mean latent variables were used as the target variables for linear regression, and the computed seven IMs and two site characteristics were used as the predictors. During this exercise, the time window of 3 s was observed to be a good trade-off between the prediction power to estimate 1 and 2 and the requirement of a short time window. Prediction power was decided in terms of the average R 2 ( 2 ) obtained for estimating 1 and 2 ; it was observed that IMs obtained from time windows longer than 3 s did not lead to a significant increase in the average R 2 . Figure 7 presents the observed 2 for the different time windows.
The final IMs used in this study include: Arias Intensity ( ; in m/s), significant duration ( 5−95 ; in s), mean period ( ; in s), PGA (in g), PGV (in m/s), PGD (in m), and cumulative absolute velocity (CAV; in m/s), which are described in Equations (5) to (11), where ( ) represents the acceleration time history of the ground-motion, represents the time instance, is the Fourier amplitude spectrum of acceleration at linearly spaced frequencies, , spanning the range 0.25 ≤ ≤ 20 Hz. Site characteristics are quantified using the site shear-wave velocity ( 30 ) and basin depth to a shear velocity of 2.5 km/s ( 2.5 ). Several other IMs such as root mean square acceleration ( ), characteristic intensity ( ), predominant period ( ), basin depth to a shear velocity of 1 km/s ( 1 ), and so forth, were used in the trials to train the framework. The final selection of the seven IMs and two site-characteristics was based on three criteria: (1) correlation with the mean latent variables; (2) minimal collinearity with other IMs; and (3) ease of real-time computation. Table 1 presents the correlations of the mean latent variables 1 and 2 against the vector of SC and IMs computed for the initial 3 s (IM 3s ) for the 6392 ground-motion components. The above defined notations of IMs are subscripted with '3s' to specify that they are computed using early three seconds of ground-motions. It can be observed from Table 1 that both SC and IM 3s vectors are highly correlated with 1 and 2 emphasizing their importance in the real-time prediction process. 1 and 2 are observed to be positively correlated with each other and negatively correlated with the IM 3s and SC. Thus, the two mean latent variables demonstrate inverse attenuation relations with the SC and IM 3s .
To estimate the 1 and 2 using the SC and IM 3s vectors, four types of regression models were utilized: (1) linear regression (Freedman, 2009); (2) support vector machines (Cortes et al., 1995;Radial Basis Function (RBF) kernel); (3) XGBoost (Chen & Guestrin, 2016; with maximum depth = 10); and (4) DNN (Chollet, 2018;Rosenblatt, 1958). For all four regressions, the predictors (SC and IM 3s ) are transformed to the log domain, while for the target variables ( 1 and 2 ) the log ( + 1) transformation is used as their values consist of both positive and negative values close to zero. The regressions are conducted using a training dataset (randomly selected as 80% of the entire dataset), while evaluations are conducted on the remaining test dataset. In all four cases, the average 2 of the predictions for both mean latent variables are observed to be greater than 0.7, with the highest value of ∼0.91 observed for DNNs. Apart from high prediction power, another advantage of using DNNs is that both the mean latent variables are estimated simultaneously, ensuring that the estimations are implicitly linked, that is, properly reflecting their internal correlation. It can be observed from Table 1 that both mean latent variables 1 and 2 are inherently correlated. Since the tested regression Methods 1-3, mainly involve the prediction of one target variable at a time (disjoint estimates), they would lead to independent predictions of 1 and 2 , which requires postprocessing to explicitly model the observed correlation between 1 and 2 .
Hence, a 15-layered DNN with two nodes and linear activation function in the output layer is trained and used as the final method to estimate 1 and 2 using SC and IM 3s . The DNN is trained using a training dataset (randomly selected as 80% of the entire dataset), while evaluations are conducted on the remaining test dataset. The training is conducted using stochastic gradient descent with dropout and early stopping regularizations (Chollet, 2018). The true versus predicted 1 and 2 from the trained DNN are presented in Figure 8a,b, respectively. The nature of predictions is observed to be very close to the identity line (1:1) for both train and test sets, thereby indicating the good prediction power of the proposed DNN to estimate the two mean latent variables. The mean pre-F I G U R E 8 True versus predicted 1 and 2 from the trained deep neural network dictions of the trained DNN led to 2 of 0.91 and 0.93 for 1 and 2 , respectively. A correlation coefficient of 0.91 was observed in DNN's predictions of 1 and 2 as compared to the true correlation coefficient of 0.88. This means the trained DNNs are highly successful in estimating 1 and 2 jointly. Due to the hierarchical nature of the data (i.e., multiple recordings from the same event and multiple recordings from different events), the residuals from the DNN-VAE framework are further used to conduct a mixed-effects regression (Demidenko, 2004) to develop inter-event ( ) and within-event ( ) covariance matrices for the 96 periods. Since various available EEW systems compute different IMs for the incoming groundmotion and possess different site-characteristics, a separate DNN (which is usually less computationally expensive) allows the users to efficiently retrain the model to estimate the two latent variables as per their requirements.

Residual structure
Due to the hierarchical structure of the ground motions arising from multiple recordings from the same event and recordings from different events, the residuals between the and predictions of DNN-VAEˆare used to com-F I G U R E 9 Correlation structure of the residuals pute 96 values of inter-event and within-event variabilities for the ith event and jth recording. This is done by fitting a mixed-effects regression (Demidenko, 2004) model to the residuals as given in Equation (12) where represents the between-event variability with variance matrix for 96 periods, represents the within-event variability with variance matrix for the 96 periods, and 0 represents any pending bias in the residuals for the 96 periods. 0 was observed to be very close to zero (failing the hypothesis test) and dropped in the overall analysis. Also, empirical Pearson correlations are computed for the residuals of the 96 periods (presented in Figure 9), which is then used to convert the inter-event and within-event variance matrices into their respective covariance matrices. In summary, the overall DNN-VAE framework is developed for mean predictions, and the residuals are used to construct betweenevent ( ) and within-event ( ) covariance matrices.

IMPLEMENTATION OF ROSERS ON EXAMPLE SEISMIC EVENTS
The developed framework is exercised on three notable seismic events recorded in California (Ancheta, et al., 2014): (1) Northridge (M = 6.69, 1994); (2) Loma Prieta (M = 6.93, 1989); and (3) Whittier (M = 5.99, 1987). These events are considered pivotal points in the history of earthquake engineering as the reconnaissance data collected after these events led to substantial changes to the principles of structural analysis and design (Popov, et al., 1998). Apart from being historically significant seismic events, the three selected events were recorded at a relatively large number of stations, making them ideal candidates for validating the proposed ROSERS framework. Based on the criteria of ≤ 90 km, NGA-  (3) 108 stations for Whittier events. In this implementation example, the ROSERS framework is employed to conduct a seismic warning trigger classification for the selected three events. This means that for each event and station, decision analysis is performed based on whether the expected level of ( ) (using the ROSERS framework) exceeds a hazard-consistent threshold ,ℎ ( ) for two natural periods, that is, 0.5 and 1.0 s. Due to the difference in the seismicity of each site, unlike other studies, this study does not utilize one strict ( ) the threshold for all sites of the event. For example, a site close to an active seismic fault, ( ) = 0.1 g may correspond to a frequent return period, and hence the structure/infrastructure of interest would not be highly susceptible to the resulting shaking. However, for a site away from active seismic faults, ( ) = 0.1 g may represent a seismic event of significant hazard level and would lead to infrastructural devastation/disruption. Hence this study utilizes a hazardconsistent approach of trigger classification rather than a strict threshold. ,ℎ ( ) is determined for each station site by independently obtaining their respective ( ) hazard curves using the US Geological Survey Unified Hazard Tool (2014; Petersen et al., 2020; developed by conducting probabilistic seismic hazard analysis [PSHA] for the target site) and computing ( ) corresponding to six hazard levels, including mean return periods of 25, 50, 75, 100, 150, and 200 years. In simple terms, ,ℎ ( ) is the expected level of ( ) for a given period and hazard level, which is obtained by considering all the known faults around the target site and performing a rigorous PSHA. Figure 10 presents the obtained ,ℎ ( ) (in units of g) against the (in units of km) of their respective station sites for the three considered seismic events and two periods. As expected, for all the station sites, ,ℎ ( ) increases as the mean return period increases and for the same hazard level and site, ,ℎ (T = 0.5 s; reaching up to ∼1.5 g) is greater than ,ℎ (T = 1.0 s; reaching up to ∼1.2 g). It should be noted here that though the ( ) for the selected events are expected to attenuate with an increase in , ,ℎ ( ) remains unaffected as it represents the hazard-targeted ( ) computed for each site representing its modeled seismicity (Field et al., 2009(Field et al., , 2014. For the three seismic events, the ,ℎ ( ) values obtained for their respective station sites (for the six hazard levels and two periods) are used as thresholds to classify whether to trigger EEW warnings. For each event and station, the true ( ) is known from the seismic recording and compared against their respective ,ℎ ( ). If the true ( ) ≥ ,ℎ ( ), then the seismic warning must be triggered (positive case) for the particular hazard level and period, while if ( ) < ,ℎ ( ) no warning should be issued (negative case). For the selected events and stations, the initial 3 s of the seismic recordings are used in the proposed ROSERS framework to obtain ( ) predictions and conduct trigger classification. The trigger classifications conducted using the ROSERS framework are compared against the true classification in two ways, using deterministic confusion matrices and probabilistic receiver operating characteristic (ROC) curves (Brown & Davis, 2006).

Trigger classification using confusion matrices
First, the mean predictions( ) are compared against ,ℎ ( ) as deterministic estimates to develop confusion matrices for each hazard level, event, and period. Figure 11a-f presents the confusion matrices generated for the Northridge event (consisting of 137 stations) for ( = 1.0 s ) for the six hazard levels, where TN, FP, FN, and TP represent true negative, false positive, false negative, and true positive cases, respectively. In all cases, the percentages of TP and TN are significantly higher than FP and FN, which shows the high classification power of the proposed ROSERS framework. As observed in Figure 10, ,ℎ ( ) increases with the increase in hazard level, which leads to a lower number of stations with true ( ) ≥ ,ℎ ( ). Hence, in Figure 11e,f, it can be observed that the percentage of TP and FN cases are lower than those in Figure 11a-d. However, in general, it can be observed that TP and TN are significantly higher than FP and FN, thereby demonstrating the high classification accuracy of ROSERS. Similar confusion matrices were developed and analyzed for all other cases. To summarize the results, the accuracies (i.e., (TP + TN)∕(TP + TN + FP + FN)) for the six hazard levels and two periods are presented in Figure 11g,h,i for the Northridge, Loma Prieta, and Whittier events, respectively. In these figures, the color shade of the markers represents the percentage of stations with true ( ) ≥ ,ℎ ( ) (i.e., FN + TP). It can be observed in Figure 11g-i that, with an increase in hazard level, the accuracy of classification increases from ∼80% to ∼99% for both periods and three seismic events. This means that the proposed framework is capable of accurately estimating whether ( ) ≥ ,ℎ ( ) and conduct trigger classification for low-rise (∼T = 0.5 s) as well as mid-rise (∼T = 1 s) structures. Furthermore, as mentioned earlier, it can be observed from all cases that, with an increase in hazard level, the color shades of the scatter plots tend to fade, indicating that a lower number of stations lead to true ( ) ≥ ,ℎ ( ). This indicates that the shaking observed at the stations for their respective earthquake events is generally lower than the expected shaking computed through PSHA of the region/sites for higher hazard levels. Due to this, a lower number of TP cases and a more significant number of TNs are noticed for the increased hazard levels for all three events. However, the framework is continuously observed to lead to higher accuracies with a more significant number of TNs and a smaller number of FPs. Consequently, the proposed framework can help in decreasing the number of false alarms, thereby reducing the nuisance and hindrance caused to a community. In general, the results indicate that the framework's accuracy for all events and periods is above ∼80% and tends to increase for shaking levels with longer return periods (higher hazard levels), leading up to ∼99% for a 200-year return period. Given that the predictions of the proposed framework are tested against hazard-consistent levels of ( ) (unlike previous studies such as Münchmeyer et al. (2020), Caruso et al. (2017), Colombelli & Zollo (2015), Hsu et al. (2013); that use a constant ( ) level for testing), the classification results are observed to be on the high end of accuracy.

Trigger classification using ROC-AUC curves
In the second set of comparisons, the mean predictionŝ ( ) are combined with their inter-event and withinevent variabilities of the residuals to construct a probabilistic distribution of the predictions for the particular period. The probabilistic distributions are used to obtain (( ) ≥ ,ℎ ( )) for all three events, six hazard levels, and two natural periods, which are compared against the true classification of ( ) ≥ ,ℎ ( ) using the ROC curves. The area under the ROC curve is termed Area Under Curve (AUC), representing the degree or measure of separability between TP and FP rates for various thresholds. The ROC-AUC curves for the six hazard levels for ( = 1.0 s) for the Northridge event are presented in Figure 12a-f; the AUC scores of the cases are shown in Figure 12g-i for the three events and two periods. Similar to the observation made in terms of prediction accuracies, AUC values are observed to be higher than ∼0.8 for all cases and tend to increase with an increase in hazard levels. The ROC curves in Figure 12a-f show how the ROSERS framework's distinguishing power between the two classes improves for ( ) with a higher hazard level (i.e., more critical) leading up to AUC = 0.95 for a 200year return period. This further highlights the accuracy of the decision-making power of the proposed framework to issue probabilistic warnings during a seismic event, especially for more dangerous shaking levels (higher return periods).

CONCLUSIONS
This study introduced a deep learning-based on-site EEW framework named. The framework utilizes a pre-trained DNN to estimate two mean latent variables (that represent statistical surrogates of the ( ) spectrum) using the SC and initial 3 s of on-site ground-motion after p-wave detection. The initial 3 s of the on-site ground-motion waveform are used to compute seven IMs representing the energy, amplitude, significant duration, and frequency content of the initial seconds (mainly p-waves) of the 3 s of the arriving ground-motion waveform. The computed IMs are combined with two site-parameters and used as inputs to a trained DNN that estimates two mean latent variables that are used in a trained decoder of VAE to estimate PGA and 95-period spectral acceleration ( ( )) response spectrum of the expected on-site ground-motion. The trained framework performs well on the used dataset and leads to high prediction power. The prediction of the complete ( ) spectrum (while inherently maintaining the cross-correlations) with only 3 s of the initial ground-motion waveform in a highly accurate manner (coupled with an average prediction time of less than 1 s) makes the proposed framework valuable for real-time EEW decision-making and near-real-time rapid response systems.
The proposed ROSERS framework was first trained and tested on the NGA-West2 database and then exercised on over 300 recording stations of three historical seismic events recorded in California. The expectation for EEW is that if ( ) > ,ℎ ( ) for the target site, then the alarm must be triggered. For the three events, the true recorded ( ) and ,ℎ ( ) is available; hence, the true labels and thresholds are known. Using the first 3 s (after detection of p-waves) of the available recordings, the ROSERS framework was implemented for all stations of the three events for two hazard levels and periods. The outputs of the ROSERS framework were analyzed using confusion matrices and ROC curves. Results of the confusion matrices show that the ROSERS framework leads to high accuracy (> 80%) in most cases of the decision classification. Furthermore, the framework leads to 0.8 <AUC < 0.95 demonstrating excellent classification power even for the infrequent earthquake events.
The framework developed in this study can help improve existing on-site EEW approaches. While the analysis presented in this paper demonstrate that the framework has good prediction power to accurately estimate the ( ) spectrum using the initial 3 s of the ground motions, future analyses should extend the reliability and confidence in the proposed framework, particularly in actual real-time settings. Currently, the framework is mainly trained using a database of processed crustal ground motions. Testing it on unprocessed datasets can be valuable for validation in a real-time setting. However, the framework's performance is not expected to significantly change within this backdrop as the data-driven nature of the framework would adapt to any discrepancies due to the addition of station noise in the considered few sec-onds of early p-waves. Furthermore, the framework can be extended to regional EEW systems by utilizing the spatially correlated prior estimates of the latent variables at stations farther from the source. In addition, similar to GMMs, the proposed framework can be trained independently for different geographic regions to obtain their localized latent spaces and DNN structures. This can also accelerate the understanding of regional seismicity from a mathematical perspective.
Finally, the developed framework can be tested by conducting a consequential analysis on the false positives and false negatives cases to make end-users and stakeholders aware of the significances of various decisions. In general, as quoted by J. R. Tolkien, "False hopes are more dangerous than fears,"-It is believed that false negatives are riskier than false positives; however, the case is not so evident in earthquakes as even false positives can lead to significant economic losses for downtime, shutdowns, and panic. Furthermore, a community may also face psychological hindrances to return to work after a false alarm. However, on the flip side, false positives can also be beneficial in some ways, such as conducting unplanned drills for seismic events, checks for preparedness, and so forth (Velazquez et al., 2020).

D ATA A N D R E S O U R C E AVA I L A B I L I T Y
The recorded ground motions used in this study can be obtained from NGAWest2 database (https://ngawest2. berkeley.edu/). The authors have also developed an executable software to utilize the proposed ROSERS framework. The software can be download from the GitHub repository (https://github.com/jfayaz/ROSERS).

A C K N O W L E D G M E N T S
This research is funded by the European Union's Horizon 2020 research and innovation program, specifically Grant Agreement Number 821046: TURNkey "Towards more Earthquake-resilient Urban Societies through a Multi-sensor-based Information System enabling Earthquake Forecasting, Early Warning and Rapid Response actions." O R C I D Jawad Fayaz https://orcid.org/0000-0003-4706-9761 Carmine Galasso https://orcid.org/0000-0001-5445-4911