Time-Series Prediction Approaches to Forecasting Deformation in Sentinel-1 InSAR Data

Many tectonically stable regions suffer from significant ground motion due to the effects of former coalfields (McCay et al., 2018), landslides (Chambers et al., 2008), the shrink and swell of shallow clays (Aldiss et al., 2014; Crilly, 2001), tree growth, coastal erosion, natural sinkholes (Banks et al., 1995; Lamont-Black et al., 2002), and tunneling (e.g., Crossrail, [Milillo et al., 2018]). Ground motion analysis has recently focused on satellite-based InSAR, which uses the phase difference between pairs of radar satellite images to map ground deformation at mm/yr precision. In particular, the Copernicus Sentinel-1 constellation has revolutionized the coverage, frequency, and availability of InSAR data and can be used to produce high-resolution maps of ground motion across Europe every 6 days in near real-time. To this end, many companies have generated postprocessed ground motion data maps and time series based on Sentinel-1 InSAR data (e.g., cgg.com; satsense.com; tre-altamira.com). Machine learning methods have been used to automatically flag deformation or changes in deformation in the large data sets (Anantrasirichai et al., 2018, 2019a, 2019b; Gaddes et al, 2018, 2019; Valade et al., 2019). Here, we investigate the possibility that these Sentinel-1 data sets can be used to forecast future behavior.


Introduction
Many tectonically stable regions suffer from significant ground motion due to the effects of former coalfields (McCay et al., 2018), landslides (Chambers et al., 2008), the shrink and swell of shallow clays (Aldiss et al., 2014;Crilly, 2001), tree growth, coastal erosion, natural sinkholes (Banks et al., 1995;Lamont-Black et al., 2002), and tunneling (e.g., Crossrail, [Milillo et al., 2018]). Ground motion analysis has recently focused on satellite-based InSAR, which uses the phase difference between pairs of radar satellite images to map ground deformation at mm/yr precision. In particular, the Copernicus Sentinel-1 constellation has revolutionized the coverage, frequency, and availability of InSAR data and can be used to produce high-resolution maps of ground motion across Europe every 6 days in near real-time. To this end, many companies have generated postprocessed ground motion data maps and time series based on Sentinel-1 InSAR data (e.g., cgg.com; satsense.com; tre-altamira.com). Machine learning methods have been used to automatically flag deformation or changes in deformation in the large data sets (Anantrasirichai et al., 2018(Anantrasirichai et al., , 2019a(Anantrasirichai et al., , 2019bGaddes et al, 2018Gaddes et al, , 2019Valade et al., 2019). Here, we investigate the possibility that these Sentinel-1 data sets can be used to forecast future behavior.
Time series forecasting defines a prediction model to forecast future values of a univariate or multivariate time series based on previously observed values. Time series forecasting plays a significant role in many application domains such as econometrics (Lütkepohl et al., 2004), mathematical finance (Taylor, 2008), electroencephalography (Blinowska & Malinowski, 1991), astronomy (Weigend, 2018), and communications engineering (Brown, 2004). Due to the financial importance of large scale forecasting of commodity values, time series forecasting has been led by disciplines associated with economics. Economic time series forecasting has led to standard time series prediction tools such as SARIMA (Box et al., 2015;Brockwell & Davis, 2016;Hamilton, 1994); a key forecasting tool evaluated within our work. More recently, Recurrent Abstract Time series of displacement are now routinely available from satellite InSAR and are used for flagging anomalous ground motion, but not yet forecasting. We test conventional time series forecasting methods such as SARIMA and supervised machine learning approaches such as long short-term memory (LSTM) compared to simple function extrapolation. We focus initially on forecasting seasonal signals and begin by characterizing the time-series using sinusoid fitting, seasonal decomposition, and autocorrelation functions. We find that the three measures are broadly comparable but identify different types of seasonal characteristic. We use this to select a set of 310 points with highly seasonal characteristics and test the three chosen forecasting methods over prediction windows of 1-9 months. The lowest overall median RMSE values are obtained for SARIMA when considering short term predictions (<1 month), whereas sinusoid extrapolation produces the lowest median RMSE values for longer predictions (>6 months). Machine learning methods (LSTM) perform less well. We then test the prediction methods on 2,000 randomly selected points with a range of seasonalities and find that simple extrapolation of a constant function performed better overall than any of the more sophisticated time series prediction methods. Comparisons between seasonality and RMSE show a small improvement in performance with increasing seasonality. This proof-of-concept study demonstrates the potential of time-series prediction for InSAR data but also highlights the limitations of applying these techniques to nonperiodic signals or individual measurement points. We anticipate future developments, especially to shorter timescales, will have a broad range of potential applications, from infrastructure stability to volcanic eruptions.
HILL ET AL.
Neural Networks have been effectively used for time series prediction using methods such as long shortterm memory (LSTM; Greff et al., 2017;Hochreiter & Schmidhuber, 1997) and sequence to sequence (Se-q2Seq) methods (Cho et al., 2014;Sutskever et al., 2014). LSTM and Seq2Seq methods are easily adapted to both univariate or multivariate time series prediction (Rebane et al., 2018;Torres & Qiu, 2018). It should be noted that some authors have identified weaknesses and limitations in using machine learning methods in general and LSTM methods in particular for time series forecasting (e.g., Chacón et al., 2020).
For many of the processes that contribute to InSAR measurements, we expect that prior observations will not contain sufficient information to accurately predict future observations. This includes both signals of interest, such as sudden catastrophic failures and noise terms, such as turbulent atmospheric effects. However, some components of the signal have repeating characteristics, such as multiyear trends and seasonal effects. We begin by analyzing the characteristics of the input data set to select signals with repeating characteristics with a period of 1 year (Section 3), and then focus on forecasting over time periods of 1-9 months (Sections 4 and 5). Finally, we discuss the potential applications and current limitations of time-series forecasting for Sentinel-1 InSAR data.

InSAR Data
We test our algorithms on Sentinel-1 data processed by Satsense Ltd. using an algorithm based on the RapidSAR approach (Spaans & Hooper, 2016). Atmospheric effects are the dominant source of noise in most InSAR data sets and have been reduced within the Satsense data through: (1) the removal of long wavelength signals from each InSAR image using a Gaussian spatial filter; (2) the removal of short wavelength atmospheric signals using an APS (Atmospheric Phase Screen) filter. This isolates the random-intime effects using a highpass filter and then uses a low-pass spatial filter to estimate the spatially correlated temporally random atmospheric effects. (3) Smoothing the displacements in time using a per-time-series temporal filter to reduce the effects of overall temporal noise which may include some residual atmospheric noise not removed by the APS filter.
Sentinel-1 acquires data every 6 days over Europe, but due to operational factors, there are small infrequent gaps in the time-series, particularly in the first year when only Sentinel-1A was operating. Since the algorithms proposed here require regularly sampled data, we interpolate onto an even 6-day temporal grid as shown in Figure S1 of supporting information. Simple linear interpolation between neighbors is used to avoid unnecessary assumptions.

Case Study Area
This project is part of the UK Digital Environment Program and we use the subsidence of the West Yorkshire coal mines as a case study (Burke et al., 2015;Lake et al., 1992). Here, we choose to work on the area around Normanton, which was mined until the mid-1970s and where there is a high density of InSAR scatterers ( Figure 1). The area is currently subsiding at a rate of up to 15 mm/yr and superimposed on this are seasonal signals, particularly associated with some of the large warehouse buildings in the area.
A subset of the time series (points P1-P8) have been selected for further analysis and forecasting experiments, and these are shown in Figure 1. P1, P3, and P8 illustrate the combination of a (downward) trend and seasonality; P4-P6 have a strong seasonal signal, but no long-term trend, and P7 and P8 show trends without seasonality. Points P1-P6 were selected as being the top six seasonal signals, according to the analysis in Section 3, and points P7 and P8 the lowest. P1-P3 and P6-P7 are car parks; P4 and P5 are the roofs of a house, and P8 is the roof of the XPO Logistics warehouse.

Measures of Seasonality
Our hypothesis is that InSAR signals contain some periodic components, for which time series forecasting may be useful. For this application, we chose to focus on the most common natural periodic variations, those that occur annually. We start by testing the most commonly used method for estimating and removing seasonal components of geodetic time series, namely sinusoid fitting (Colesanti et al., 2003;Watson et al., 2002); see Section 3.1.1. However, this measures the correlation with purely sinusoidal behavior and could potentially exclude periodic signals with other nonsinusoidal but repeating waveforms. A summary of an exhaustive variety of methods of detecting seasonality (Hartmann et al., 1992;Hylleberg, 1995;Zubaidi et al., 2018) is given in Table S1 of supporting information and a detailed review of most suitable (to this application) methods is given in the remainder of this section. We focus on methods that are able to generate quantitative measures of annual seasonality rather than simple detection and can be used to analyze predefined periods (12 months) rather than estimate the period of seasonality. Based on these criteria, we select "Seasonal and Trend decomposition using Loess" (STL; see Section 3.1.2; R. B. Cleveland et al., 1990) and the AutoCorrelation Function (ACF; see Section 3.1.3; Chen & Boccelli, 2018) for further study.
The choice of whether or not to normalize the seasonality measures is a key design decision. With normalization, the amplitude of the seasonality will be disregarded, but if there is no normalization, high amplitude stochastic signal components will often mask truly seasonal signals with small amplitude. For this reason, all three considered seasonality measures are normalized.

Sinusoid Fitting and Correlation (Sin) Method
We fit a sinusoid of fixed frequency (12 months) to the detrended time series using a least squares method and extract the amplitude and phase parameters. An obvious measure of seasonality is the magnitude of the fitted sinusoid, however, in this case, large magnitude signals that are not particularly seasonal will produce a bigger seasonality index than smaller magnitude signals that are truly seasonal. Instead, we define the seasonal index for this method to be the normalized correlation between the training signal and the fitted sinusoid, where ρ is normalized correlation and sin W is the fitted sinusoid.

STL Decomposition
The concept of a "seasonal decomposition" of a time series signal means that the time series can be decomposed into a sum (or a product) of three components: a trend (T), a seasonal component (S), and a residual HILL ET AL.

10.1029/2020JB020176
3 of 17 (R). We have used the common implementation of STL as initially described by Cleveland (R. B. Cleveland et al., 1990) assuming an additive STL model. This implementation uses Loess smoothing, which uses iterative sliding window regression to generate smooth functions (seasonal and trend; W. S. Cleveland, 1979). First, Loess smoothing is applied to remove the seasonal component then a separate Loess smoothing is applied to remove the trend. The remaining component is the residual.
A logical measure of the seasonality can then be defined using the ratio of the variance of the residual (R) to the variance of the signal without the trend (R + S). As this ratio increases as seasonality decreases, we define seasonality as follows. SIndex STL is mathematically well behaved and mostly varies from 0 to 1.

ACF Method
The ACF measures how self-similar a signal is by measuring the correlation of the signal with shifted versions of itself (Carla et al., 2016;Chen & Boccelli, 2018). These shifts are known as lags, and in this case, we are only interested in the lag corresponding to 12 months. As the InSAR signal is sampled every 6 days (from 2015 to 2018), the lag is set to be 60. SIndex ACF is well behaved and varies from 1 (perfect correlation) to −1 (perfect anticorrelation). It is defined in Equation 3 where ρ is the normalized ACF function (with lag 60).
In order to properly estimate seasonality, isolated from the influence of trend, the trend is removed by fitting a second degree polynomial to the InSAR time series and subtracting it when using the ACF method. A second-degree polynomial was chosen to properly model DC variations over the trained signal (this is not done for the STL method where the trend is extracted independently). Confidence values can then be calculated as the rejection of the null hypothesis that the ACF value is insignificant using standard errors under the assumption of a Gaussian source (see Figure 2a).

Comparison of Seasonality Measures
For the ACF method ( Figure Figure 3 shows a comparison of the seasonality measures SIndex Sin , SIndex STL and SIndex ACF for all the datapoints in Normanton region (with points P1-P8 labeled). The approximately linear relationship between the measures demonstrates that they are broadly comparable, and the points P1-P6 are classified as highly seasonal by all three indices, whereas P7-P8 lie with the majority of points which are not seasonal. However, there is considerable scatter showing that the three indices identify different types of seasonality, with especially large differences between the ACF and STL measures. We use the ACF measure for the subsequent experiments (due to the advantages given in Table S1: supporting information and because it appears to be the most spatially coherent in Figure 2.).

Ground Motion Forecasting
The task of forecasting InSAR time series can be approached in one of three ways: (1) Future displacements forecast on each point individually, using only information from that point (Mazzanti et al., 2011); (2) Future displacements can be forecast for each point individually, using the time series itself and a selected group of related time series; (3) Groups of time series can be forecast in a multidimensional sense: Seq2Seq models (Rebane et al., 2018) and LSTM models (Torres & Qiu, 2018). For this proof-of-concept, we have focused on the first two approaches for simplicity.
In this paper, we evaluate the forecasting methods by dividing each signal into a "training" subrange and a contiguous "testing" subrange in order to be able to generate objective evaluations of each forecast. That is, given an entire temporal signal of T datapoints W = {w 1 , w 2 , …, w T }, we define "today" as being timestep = t and our goal is to predict a new signal as similar as possible to the original subsignal We also define the training set as W t = {w 1 , w 2 , …, w t } ∈ W. The value N y is a positive scalar integer which determines the time range to be forecast -i.e. the number of future observations. We set N y to approximately 9 months (264 days) for all experiments, but evaluate the prediction over time periods of 1-9 months. The value N x is a positive scalar integer which determines the period of time for training, that is, the number of past observations (in samples where samples are every 6 days). An illustrative example of the predicted and test signals is shown on the right of Figure 4.
A summary of all the forecast methods compared is given in Table 1. In order to compare the SARIMA and LSTM approaches with previously used methods, we include a standard sinusoid fitting algorithm (Watson et al., 2002), and project the fit forward in time. A sinusoid and trend are fitted to the same part of the each of the time series as used for training the other methods (i.e., W t ) and future values extrapolated using the resulting parameterization.
HILL ET AL.

LSTM Networks
A LSTM (Greff et al., 2017;Hochreiter & Schmidhuber, 1997) network is a Recurrent Neural Network (RNN; Rumelhart et al., 1986) architecture used in the field of deep learning for time series data. LSTMs keep track of arbitrary long-term dependencies in the input sequences, and they can scale to much longer sequences than conventional RNN networks. They are designed to process sequences of variable lengths, where parameters are shared with all previous output members. LSTMs have the ability to add or remove information to a temporal learning "state." This is carefully regulated by structures called gates. The learning selectively keeps some part of the past (using the temporal states) and forgets others (using "forget" gates). LSTMs are commonly used for classification applications.
The use of LSTMs for time series forecasting of InSAR derived ground motion data has both limitations and advantages. Due to the capabilities of LSTMs to model long-and short-term patterns, LSTMs should be able to flexibly model and predict time series data (including annual, seasonal, and trend variations) and are easily extendible to larger and multivariate data sets. Furthermore, the use of a supervised learning system such as an LSTM based system also has the advantage of modeling any predictable variations found in the data not just those defined in ad-hoc methods such as SARIMA or Sinusoid prediction. However, the sliding window approach required by any supervised machine learning method is a significant limitation as it reduces the amount of training data but also limits the internal LSTM state explicitly modeling very long periodic signals components. The fact that LSTMs are not able specifically to analyze a single seasonality period of interest (in this case annual seasonality) is both an advantage and a disadvantage. In the case of our data, Figure S7 shows that there is no distinct frequency peak. However, it may be useful to focus on annual seasonality due to physical process causing cyclic variations over annual periods. Finally, supervised learning approaches such as LSTMs are very dependent on the amount of data. There are only 91 training datapoints for a sliding window in a single signal we have used (LSTM1).
Within this application we are using LSTM within the supervised regression framework illustrated in Fig  The considered signal is split into training and test sets. The training set contains the observable data and the test signal represents the future observations in the test set. This test signal is to be compared with the predicted signal obtained by our method as well as its associated prediction error. Our multistep approach for LSTMs reframes the whole training data into temporal sliding windows of sizes N x and N y for past and future observations, respectively. LSTM, long short-term memory.
ture input into the network whereas the multisignal methods are still one-dimensional but have more than one input signal (LSTM2-3). Figure 4 shows the univariate case of forecasting ground motion using a supervised LSTM network. A sliding window forms a training data frame (for a single signal) of inputs (X) and outputs (Y) to train the network. Once trained, the testing input (X test ) is ingested into the network to generate the forecast ŷ approximating the true sequence y. This requires a final layer in the network to generate a vector the same length as y. This is done using a fully connected dense layer without any subsequent pooling as illustrated in Figure 5. For the LSTM experiments (resulting in Figures 6-9), N x and N y are set to 9 months (44 samples equating to 264 days). This is considered to be long enough to characterize the seasonal nature of the signals, be able quickly to adapt to changes and also have the maximum amount of training data from the sliding window. This design choice is evaluated in Section 5.1.1. We use a network of two LSTM layers fully connected to a dense layer outputting the N y regression outputs. The first layer has a large number of nodes in order to effectively characterize the data across all models (for simplicity and communicability Sinu Sinusoid fitting method: A simplex gradient descent method was used to fit the amplitude and phase of a sinusoid to the data (together with the slope of a linear trend term).

LSTM1
Single signal used for prediction (based on the univariate method illustrated in Figure 4). Architecture included: two LSTM layers (first with 256 nodes and second with 128 nodes). The final state output of the second LSTM layer is connected to a dense layer of 128 nodes and then subsequently connected to an output layer with N y nodes. Dropout of level 0.5 is included between each layer, the activation function was ReLU, the loss was MSE, the optimizer was ADAM.

LSTM3
The top 1% of the seasonal signals (seasonality measured using SIndex ACF ) concatenated and used for training as per LSTM2. The number of samples in the training sliding windows of the top 1% seasonal signal is 10,738. Size(X): (None × 44 × 1), Size(Y): (None × 44 × 1).

LSTM4
The eight spatially closest time series signals (see Figure S1b in supporting information) are formed into different features in the multivariate learning process (with a single dimensional feature predicted for the considered time series). The number of multivariate samples in the training sliding window is 91. Size(X): (None × 44 × 8), Size(Y): (None × 44 × 1).
Seq2Seq1 Seq2Seq architecture. The encoder was a single encoder LSTM layer (with 200 nodes) whose output was copied N y times. This time distributed output was then input into the decoder; a single LSTM layer (with 200 nodes). This was then input to a fully connected dense layer with a final single (but time distributed output layer) node. No dropout was used. The remaining aspects of this architecture were as per LSTM1. Variation of network parameters (within reasonable limits) did not change performance. A similar set of network parameters was chosen for simplicity of definition.
Seq2Seq2 Same as LSTM2 but with Seq2Seq architecture as described for Seq2Seq1.
Seq2Seq3 Same as LSTM3 but with Seq2Seq architecture as described for Seq2Seq1.
Seq2Seq4 Same as LSTM4 but with Seq2Seq architecture as described for Seq2Seq1.

Table 1
Summmary of Forecasting Methods numbers of nodes and complex network architectures tend to overfit. However, we have chosen to deal with overfitting using dropout layers. Each layer has an integrated dropout function (set to a dropout factor of 0.5). The optimization was based on the ADAM method (Kingma & Ba, 2015) and Mean Square Error (MSE) as the loss function. We train our networks using 2,000 iterations (epochs) to achieve convergence. The number of samples used in each method LSTM1-4 is given in Table 1.

Multisignal LSTM: LSTM2-4
We adapt the univariate approach shown in Figure 4 to include data from a set of training signals (described below). This multisignal LSTM is illustrated in Figure 5. This system uses the same network structure as above but vertically concatenates all of the sliding window data from a set of training signals (described below). The testing data remains the same (i.e., a single spatial sample in each case). The LSTM2 system uses the top six seasonal signals for training. The LSTM3 system uses the top 1% of the seasonal signals (using the SIndex ACF method) for training. Conversely, LSTM4 uses the eight spatially closest time series signals as features in an eight-dimensional multivariate LSTM input. A multivariate LSTM architecture is then used to generate a univariate forecast from the multivariate InSAR derived ground motion time series data. The size of the associated input and output network tensors are indicated in Table 1.

Seq2Seq LSTMs: Seq2Seq1-4
Seq2Seq is an encoder-decoder deep learning architecture for making multistep predictions (Cho et al., 2014;Sutskever et al., 2014). The previous methods (LSTM1-4) generated the prediction vector using the single output of an LSTM layer together with dense and fully connected layers (with a final vector regression output). Seq2Seq methods have an independent encoder that analyses the input time sequence and generates a characterizing set of states that are subsequently input into the decoder. We have used a single LSTM layer as the encoder that outputs the LSTM states of the input time series data as an initial stage. These output states are then copied multiple times (with the number of copies being the required length of the prediction vector output). These copies then form a multidimensional time series input to a decoder (another single LSTM layer). The time distributed outputs are then input into time distributed dense layers outputting a vector forecast result ŷ. Each method LSTM1-4 has been modified to include a Seq2Seq architecture to form methods Seq2Seq1-4, respectively, that is, the other architectural forms and input/output data structures are equivalent for these two sets of methods.
HILL ET AL.

10.1029/2020JB020176
8 of 17 Figure 5. MultiSignal LSTM. Every signal in the training set is split into training and test sets in the multisignal approach. First, we use the training sets to frame every signal as a supervised machine learning problem, constructed by a set of inputs X and outputs Y. Each considered time-step for the sliding window for each signal becomes a sample in the feature space of input X and output Y of the network. LSTM, long short-term memory.

SARIMA
SARIMA is an analysis and prediction method for time series data (Box et al., 2015;Brockwell & Davis, 2016;Hamilton, 1994). It is used to model nonstationary data series, where the data are not statistically consistent across time, for example, mean and variance varies with time. It is often combined with Kalman filters that use the model to predict future values of a time series signal. SARIMA is an analysis tool primarily used to model economic data and is able to identify, model, and predict both trend and seasonality (and their variations) over time. SARIMA consists of two sets of forecasting models: trend and seasonality. Each of these two models are divided into three submodels: an autoregressive model (AR) and a Moving Average (MA) model in order to model time variations ("tendencies"). The MA model is the equivalent of an estimated Finite Impulse Response (FIR) filter that just weights recent inputs to combine into an estimated output. Conversely, the AR model is an estimated allpole or Infinite Impulse Response Filter (IIR) that uses a feedback loop to estimate output given a weighted sum of previous outputs. The input is often further locally differenced (the I stage) to model changes in offset (the third submodel).
The model is comprised of these three submodels (AR, MA and I) estimated directly on the data to model trend but also over a set lag directly related to the seasonality of the signal. A SARIMA model is then defined as the order of these six models (plus the analysis seasonality lag m): where p is the order of the AR term, q is the order of the MA term, d is the number of differencing operations required to make the time series stationary, P is the order of the AR seasonality term, Q is the order of the MA seasonality term, D is the number of differencing operations required to make the seasonal time series stationary, and m is the seasonality lag.
The parameters of the SARIMA model are commonly not estimated automatically, that is, the statistics and correlation of the time series signal are analyzed by hand and the parameters are tuned until the signal (when compensated by the found parameters) is considered to be stationary. However, recent automatic parameter estimation methods do a minimization search on some training data to determine the best combination of SARIMA parameters (Hyndman et al., 2007). This method estimates the stationarity of the signal under the parameters and specifically uses the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) estimators to compare models. The lower these values, the better the model fits the data (Hyndman et al., 2007).
Here, SARIMA parameters are fitted to the training data using Hyndman's method (Hyndman et al., 2007). A typical model for the analyzed InSAR time series below was SARIMA(3, 0, 2) (1,1,0) 60 . The parameters are estimated using the same part of each of the time series as used for training with LSTMs (i.e., W t ) and then SARIMA is used to predict the same part of each time series as with LSTMs (i.e., W y ). Kalman filters are used with the SARIMA model in standard SARIMA implementation packages to predict future time series values. We have used such an implementation for time series forecasting.
HILL ET AL.

Seasonal Signals
We test the forecasting performance of LSTMs and SARIMA on a set of 310 highly seasonal signals selected using the SIndex ACF metric. We benchmark the results against sinusoid extrapolation and a constant value prediction. To assess the performance of each model, we use the Root Mean Square Error, RˆMSE y E y y (Figures 6 and 7). We also consider normalized RMSE and define n1RMSE and For a one month prediction (Figure 6a), the best performing methods were SARIMA and LSTM3, which had a slightly lower median RMSE than the constant value prediction. Of the Seq2Seq methods, the lowest median RMSE was produced by the univariate version Seq2Seq1 and Seq2Seq4, which was trained with using the eight geographically closest points. For these short time periods, the sinuoidal extrapolation method (Sinu) had a median RMSE value considerably higher than that of the constant value prediction. Conversely, for longer time periods (Figure 6b), the lowest median RMSE was sinusoid extrapolation, with a median RMSE value about 75% of the constant value prediction. Of the LSTMs, the lowest median RSME values were associated with LSTM3 and LSTM4, while Seq2Seq1 and Seq2Seq4 had lower median RMSE values than the multisignal Seq2Seq methods (Seq2Seq2-3). Over these time periods, most of the methods had lower median RMSE values than the constant value prediction, with only LSTM2, Seq2Seq2, and Seq2Seq3 performing worse (when considering median value of n2RMSE: Figure S3 in supporting information).
For all the time periods considered, the multisignal Seq2Seq models (Seq2Seq2-3) trained using a set of seasonal signals had a lower median RMSE than the univariate case (Seq2Seq1). We conclude that any improvements gained by having a larger training data set are offset by the potentially spatially unrelated data statistics and characteristics. However, Seq2Seq4, which was trained using geographically close signals, had a median RMSE value a little better than the univariate case (Seq2Seq1) suggesting that geographically close points have more similar signals, as for example, they may be located on the same structure.
Based on this assessment, we select LSTM3, Seq2Seq4, and SARIMA for further analysis and some examples of the predicted and real time series are shown in Figure 8. Points P1-P6 were selected as they have the most seasonal characteristics signals as defined by SIndex ACF . All methods capture some aspects of the signal, and the time series plots are helpful in identifying sources of misfit. For example, sinusoid extrapolation is a global fitting method, so there is often a discontinuity between the training and prediction data (e.g., P3, P6; Figure 8), which explains why the RMSE is high when short prediction periods are considered (Figure 6a). Similarly, the SARIMA results can be seen to characterize the subseasonal variations of many of the example six signals, but for P1 and P2, the trend has not been accurately estimated and the prediction, although plausible in shape, has an inaccurate offset.
The results in Figure 6 suggests that performance varies according to prediction window, so we test the selected methods over periods of 1-9 months and compare the distribution of RSME values (Figure 7). The lowest RMSE values are obtained for SARIMA when considering short term predictions of <3 months, whereas sinusoid extrapolation performs best for predictions of >6 months. As expected RMSE increases with increasing prediction window: the constant value prediction has a median RMSE value of 1.4 cm for a 1-month window, increasing to 3.6 cm for a 9-month window. Normalizing the RMSE to the RMSE value of the constant value prediction (n2RMSE, Figure S5 in supporting information) removes this effect, and shows that SARIMA and Seq2Seq4 outperform the constant value prediction for all windows, whereas Sinu and Seq2Seq4 only perform better when forecasting three or more months into the future.
HILL ET AL.
The multisignal LSTM (LSTM3) gave the best results for short term prediction (<3 months). SARIMA also gave good results for short term prediction but gave significantly worse results (compared to LSTM3) for predicting many months into the future (see Table 1 for a detailed description of the different training methods associated with each method). The performance of the Seq2Seq4 method was virtually identical to the LSTM3 method for a period of 9 months but had a slightly larger median error (by 0.3 cm) for 1 month.

Analysis of Retrained Models Varying N x and N y
An independent set of experiments were conducted to evaluate the effect of changing the lengths of N x and N y on the RMSE performance for an individually retrained machine learning system (specifically Se-q2Seq1). These experiments (see Figure S7) show that varying varying N x either makes no difference (for N x = 18 months compared to N x = 9 months) or actually decreases performance (N x = 27 months compared to N x = 18 or N x = 9 months). We attribute this to be because of the sliding window approach where a wide range of phases of the input data are modeled across the whole temporal range of data. When large values of N x are chosen, the amount of training data decreases resulting in the decrease in performance for N x = 27. Results from these experiments are shown in Figure S7. Although the results in Figures 6 and 7 are generated from fixing N y = 9 and evaluating the performance over different. The results shown in Figure S7 show that there is very little difference in retraining the models for varying values of N y .

Randomly Selected Signals
Finally, we select 2,000 points at random from the Normanton data set with no regard to seasonality and test the methods that performed best on the seasonal signals: SARIMA and Seq2Seq4. Descriptions of training methods for SARIMA and Seq2Seq4 are given in Sections 4.2 and 4.1.3, respectively. We no longer consider LSTM3 since it was trained specifically on highly seasonal signals. Points P7-P8 shown in Figure 8 illustrate the challenges of time-series prediction for nonseasonal signals. Figure 7 shows that the relative variation in RMSE with prediction window is similar to that for seasonal signals. However, this figure shows that none of the methods perform better than the constant value prediction when signals are randomly selected (see also Figure S5 in supporting information). Figure 9 shows the relationship between forecast performance (RMSE) and seasonality (SIndex ACF ) for a prediction window of 9 months for the Seq2Seq4, SARIMA and Sinu Methods. For Seq2Seq4, the forecast performance appears independent of seasonality, whereas the SARIMA and Sinu methods perform better (decreased RMSE) with increased seasonality. To test the statistical significance we calculate Pearson's correlation coefficient (and associated p-value) for each relationship illustrated in Figure 9. The Pearson correlation coefficients were: Seq2Seq4(−0.072), SARIMA(−0.223), and Sinu (−0.255). The corresponding p-values for these correlation values were: Seq2Seq4(0.0014), SARIMA(7.6 × 10 −24 ) and Sinu (3.9 × 10 −31 ).
These values indicate that all three results are negatively correlated. However, the Seq2Seq method is less negatively correlated than the other two methods, but there is an extremely small confidence in the negative correlation of SARIMA and Sinu (given by very small p-values). A similar pattern is seen for all prediction windows ( Figure S6 in supporting information).

Discussion
Previous studies have reported annual variations in InSAR data associated with processes such as tropospheric water vapor (Heleno et al., 2010), thermal contraction, and expansion (Lazecky et al., 2016), ground water (Bell et al., 2008) and freeze-thaw cycles (Daout et al., 2017). We find that our data set from the Normanton area of the United Kingdom also contains signals with periodic variations, the strongest of which are clustered on large warehouses suggesting the dominant effect here is thermal expansion and contraction of man-made structures.
We test the ability of a range of established time series prediction methods to forecast InSAR time series and find that several methods perform better than a constant value prediction when signals dominated by periodic variations are considered. The lowest median RMSE values are obtained for SARIMA when considering short term predictions (<3 months), whereas sinusoid extrapolation performs best for longer predictions (>6 months). However, for nonseasonal signals, the simple extrapolation of a constant function perform better overall than any of the more sophisticated time series prediction methods. Comparisons between seasonality and RMSE show a reduction in RMSE with increasing seasonality. However, this is shown to be statistically insignificant (given the very low confidence values in the Pearson correlation coefficients between RMSE and ACF based seasonality: SIndex ACF ).
Much of the comparisons above only compare the median RMSE values, but the breadth of the distribution and scatter of outliers shows that the misfit is highly variable between observation points, even when seasonality is taken into account. Thus, if a prediction for a single observation point is required, there may be a large misfit in the prediction even for the best performing approaches, but a poorly performing method might produce a very accurate prediction.
The machine learning methods (LSTM and Seq2Seq) tested performed well in some cases, with the use of multivariate or concatenated signals improving the performance. However, it is interesting that they performed less well overall than simple extrapolation of a constant value (for nonseasonal signals) or a sinusoid (for seasonal signals). Interestingly, the performance of the machine learning methods only improved slightly with increasing seasonality, suggesting that they are failing to capture the periodic component of the signal, perhaps because they are only trained over 9 months. Poor performance in predicting financial time series using LSTMs has also been reported (Sirignano & Cont, 2019). This is assumed to be related to the nonstationary nature of the data and the inability of LSTMs to model feedback effectively. Improvements in prediction using LSTMs should follow through both large increases in training data (number of data sequences and length of sequences) together with the integration of SARIMA type feedback modeling.
In this study, we have focused on predictions for windows of less than the period of the signal (1 year), but both SARIMA and sinusoid extrapolation are able to predict for an arbitrary amount of time into the future. Figure 10 demonstrates that predictions for several years into the future show plausible time series, but unfortunately, no quantitative evaluation is possible until a longer data set of measurements is acquired.
HILL ET AL.

10.1029/2020JB020176
14 of 17 Similarly, LSTM methods require a training window that is at least as long as the prediction window, and will require longer time-series before long-term predictions can be tested.
Real-time monitoring and ground motion forecasting of periodic signals from InSAR data could be used in one of two ways. The first of these is to predict seasonally varying ground motion signals that could otherwise obscure subtle deformation changes that could be precursors to rapid and critical collapses (Selvakumaran et al., 2018). In this case, the reduction in background noise could enable the detection of anomalous or unexpected behavior. Alternatively, the periodic motion itself is of interest to insurance companies looking to forecast claims due to ground cracking and subsidence (Crilly, 2001), or bridge motion (Lazecky et al., 2016). The broad distribution of misfit values suggests these approaches will only be useful when considering the distribution of a large number of datapoints, and the probability of a good prediction for any single observation point is quite small. This is a proof-of-concept study and the methods described here can be further refined. Possible future directions include testing different neural network architectures including convolutional LSTMs and attention based systems; the combination of SARIMA and LSTMs; the integration of spatial analysis using CNNs and multivariate prediction using Vector Autoregression. Future developments in machine learning and artificial intelligence may improve performance, but the lack of periodic or repeating signals within the data set may always be a barrier to time series prediction.

LSTMs and Supervised Learning Methods for InSAR Ground Motion Forecasting
Given the stringent analysis of LSTM and supervised learning methods described in Section 4.1, we believe the use and analysis of such methods justified. Despite the fact they did not give significantly better results than other methods in our results they should be considered as an effective model in situations where: (1) there is a larger amount of training data, (2) the signals are easier to predict, (3) there is a complex mixture of seasonality within the data, and (4) the data is comprised of very high dimensional multivariate signals (which would be difficult to analyze using the other methods discussed within this work). In our opinion, supervised machine learning methods such as LSTM methods should not be discounted by researchers but included in a toolbox of forecasting methods mapped out within this paper.

Conclusion
In this proof-of-concept study, we have tested a range of time series prediction tools on ground motion data collected using InSAR. For randomly selected data, a simple constant value prediction outperforms both conventional time series analysis and forecasting methods such as SARIMA and supervised machine learning approaches such as LSTMs. This reflects the stochastic nature of the signals and the difficulties in using any trained system to predict far into the future. The time series prediction methods performed better on signals containing strong annual variations, and both LSTM based architectures and SARIMA performed better over short periods of time (less than three months) than the extrapolation of a sinusoidal function. Figure S7 (in the supporting information) shows that there is no distinct frequency peak within the data and therefore. This suggests that there is not a strong temporal signal within the data but may go some way to explain the low performance of the temporal models. This suggests that a preprocessing step could be used to select signals that are suitable for forecasting. However, further developments in machine learning and artificial intelligence will be needed before time series predictions of InSAR data are sufficiently reliable to be used in practice.

Data Availability Statement
Data supporting this research are available in https://www.satsense.com/data-portal/. The data is available to any user that signs an NDA and license agreement with Satsense. The authors are not aware of any reason why the data would not be made available. Contact Satsense at https://www.satsense.com/data-portal/ to gain access to this data.