The Informativeness of Estimation Moments

This paper introduces measures for how each moment contributes to the precision of parameter estimates in GMM settings. For example, one of the measures asks what would happen to the variance of the parameter estimates if a particular moment was dropped from the estimation. The measures are all easy to compute. We illustrate the usefulness of the measures through two simple examples as well as an application to a model of joint retirement planning of couples. We estimate the model using the UK-BHPS, and we find evidence of complementarities in leisure. Our sensitivity measures illustrate that the estimate of the complementarity is primarily informed by the distribution of differences in planned retirement dates. The estimated econometric model can be interpreted as a bivariate ordered choice model that allows for simultaneity. This makes the model potentially useful in other applications.


Introduction
Indirect inference and other nonlinear GMM estimators are used extensively in empirical research. These estimators are, however, sometimes seen as black boxes. It can be difficult to understand exactly what features of the data are informative about which parameters, and how sensitive parameter estimates are to moments included in the objective function.
In this paper, we provide simple and easy-to-compute measures that can indicate how altering the moments used in estimation affects the precision of parameter estimates. Informally, we think of these as measures of how informative each moment is about a particular parameter. More precisely, we provide measures of the effect on asymptotic standard errors from i) a marginal increase in the noise associated with a moment, ii) completely removing a (set of) moments from estimation, and iii) a marginal increase in the weight put on a moment.
The measures are derived from the asymptotic distribution of the class of GMM-type estimators considered here and are, for the most part, based on derivatives of the asymptotic covariance matrix. The measures are almost costless to calculate because most of the required quantities are already constructed when calculating asymptotic standard errors. Furthermore, the measures have straightforward interpretations if scaled in a meaningful way.
There is a growing literature investigating sensitivity of estimators in economics. Recently, for example, Andrews, Gentzkow and Shapiro (2017) proposed a measure to inform researchers on the sensitivity of the asymptotic bias in estimators to misspecification of moments included in the estimation function. We note that their measure is also related to the change in the asymptotic variance from a marginal change in the included moments, which inspired our proposed alternative measures. While we focus on the precision of the parameter estimates, more recently Armstrong and Kolesár (2018) and Bonhomme and Weidner (2018) have also studied local misspecification. Christensen and Connault (2019) studied global misspecification.
We illustrate the applicability of our measures through two simple examples and an empirical application. The two examples are a binary outcome probit model and a proportional hazards Weibull duration model with time-varying covariates. The application is a simple structural model of joint retirement planning of dual-earner households. The model is founded in utility maximization with household bargaining, but can also be interpreted as a bivariate ordered choice model that allows for simultaneity. The parameters of the model are most easily estimated by indirect inference, but the complexity of the model makes it difficult to understand the link between the data and the parameter estimates.
While a growing empirical literature has established that dual earner households tend to retire simultaneously or in quick succession in age, 1 the empirical evidence of joint retirement planning of couples is much more scarce and with ambiguous findings. 2 We contribute to this literature by estimating a structural model of dual-earner retirement planning using indirect inference and prospective retirement planning questions in the British Household Panel Survey (BHPS). Our estimation results support the notion of leisure complementarities in retirement.
Our proposed sensitivity measures confirm the intuition that the parameter estimate measuring leisure complementarities in the model is sensitive to the distribution of the difference in the year of planned retirement between household members.
The remaining paper is organized as follows. In Section 2, we present the sensitivity measures and show examples of their use in Section 3. In Section 4, we apply our measures to a novel model of dual earner retirement planning before concluding with final remarks in Section 5.

Framework and Sensitivity Measures
Indirect inference and other nonlinear GMM estimators are sometimes seen as black boxes where it can be difficult to understand exactly what features of the data are informative about which parameters. In this section, we review and introduce a number of measures that are meant to provide information about this.
To fix ideas, consider a set of moment conditions E [f (x i , θ 0 )] = 0, where x i is data for observation i and it is assumed that this defines a unique θ 0 . The generalized method of moments where W n is a symmetric, positive definite matrix. While some of the measures below also apply to justidentified models, we focus here on over-identified models where the number of moments are larger than the number of parameters in θ, and the weighting matrix thus plays a role.
Subject to the standard regularity conditions, the derivation of the asymptotic distribution of θ gives where G = E ∂f (x i ,θ 0 ) ∂θ and W is the limit of W n . See Hansen (1982). The limiting distribution of the GMM estimator is ] under random sampling. If we use the optimal weighting matrix, W = S −1 , the asymptotic covariance collapses to Intuitively, when there is little sampling variability in the moment functions, f , S will be small.
G is larger if the moment condition is more sensitive to perturbations in the parameter. Both of these contribute to the precision of the estimates as the proposed measures highlight. Andrews, Gentzkow and Shapiro (2017) proposed the sensitivity measure It is clear from (1) that M 1 provides the mapping from moment misspecification of the type E [f (x i , θ 0 )] = ρ = 0 into parameter biases for small ρ. Alternatively, by noting that Σ = M 1 SM 1 , M 1 tells us how additional noise in each of the sample moments 1 n n i=1 f (x i , θ 0 ) would result in additional noise in each element of θ. This is what motivates our alternative measures that address the sensitivity of estimation precision to each moment.
The proposed measures are intended to complement the measure of sensitivity to misspecification proposed by Andrews, Gentzkow and Shapiro (2017). Like M 1 , our measures are matrices where the (j, k)'th element provides an answer to how the precision of the j'th element of θ depends on the k'th moment.
Our first measure asks the hypothetical question: How much precision would we lose if the k'th moment is subject to a little additional noise? This measure is formally defined as where O kk is a matrix with 1 in the (k, k) element and zero elsewhere. This measure assumes that the optimal weighting matrix is used and updated. Alternatively, we could ask the same question keeping the (possibly non-optimal) weighting matrix unchanged. This measure is The difference between M 2,k and M 3,k is that the former evaluates the potential information in each moment while the latter evaluates the information actually used in the estimation. With efficient GMM (so W = S −1 ), M 3 equals M 2 . This is also true in the just-identified case where the number of moments equals the number of parameters to be estimated.
Related to M 2,k , we could consider the change in the asymptotic variance from completely excluding the k'th moment, Here denotes element-wise multiplication and ι k is a J × 1 vector with ones in all elements except the k'th element, which is zero. M 4,k leaves the weighting matrix on the remaining moments unchanged after we have excluded the k'th moment.
We note that this measure assumes that the parameter vector is identified after the k'th moment has been excluded. Specifically, (G W k G) needs to have full rank. Importantly, this means that the original model has to be over-identified in the sense that it has more moments than parameters. In practice, G has to be estimated, and violations of the full rank assumption will result in ( G W k G) being close to singular. Extremely large values in the estimate of M 4,k therefore suggest that the model is not point-identified when the k'th moment is excluded. This can happen even if the original model was over-identified.
Alternatively, one could also consider measures that adjust the weighting matrix. For example, one could consider a measure that compares the precision of the optimal GMM estimator that uses all moments to the optimal GMM estimator that excludes that k'th moment, where G −k is the same as matrix G except that the k'th row has been removed, and S −k is S with the k'th row and column removed. This measure also assumes that the parameter vector is identified after the k'th moment has been excluded, and it implicitly assumes that the original number of moment conditions exceeds the number of parameters to be estimated. combinations of the original moment conditions. In that case, it might be useful to construct a measure that reflects giving zero weight to all the moments that come from a specific auxiliary model. This approach would be application-specific, and we therefore do not pursue it in this paper.
Our final measure addresses the question: How would the precision of our estimates change if we slightly increased the weight put on the kth moment? This measure is formally defined as the derivative We do not think of M 6,k as a measure of moment sensitivity, but rather as a measure of how close the chosen weighting matrix is to being optimal. M 6,k will be 0 when W is the optimal weighting matrix. It will also be 0 in the just-identified case, where the number of moments equals the number of parameters to be estimated.
These measures are not invariant to scale of the included moments in f (·). One approach, which we take, is to report scaled measures. Concretely, we report the sensitivity of the j'th parameter to the k'th moment as Note that E

Examples
In this section, we illustrate the use of our proposed measures through two concrete examples.
The first example is a simple binary choice probit model and the second example is a proportional hazards duration model. The first example is chosen because it is a case where one would have a strong prior about which moments matter. The second example, on the other hand, is complicated enough that this is not obvious.
For both examples, we use both the optimal weighting matrix and a diagonal weighting matrix with the inverse of the moment variances on the diagonal. We chose the latter nonoptimal weighting matrix because it is very common in empirical applications. 3 3 There are many examples of this. This includes Eisenhauer, Heckman and Mosso (2015) and Gayle and Shephard (2019) to name two. The motivation stems from Altonji and Segal (1996) who show that the optimal weighting matrix can have quite poor finite sample properties. They suggest equally weighted moments (i.e., W = I) as an alternative. Of course, using equal weights will not be invariant to changes in units (or other rescaling), which explains the practice we have adopted.

Example 1: Method of Moments Estimation of a Probit Model
We first consider a simple probit model where (x 1,i , x 2,i ) has a bivariate normal distribution with means equal to 0, variance 1 and correlation 0.5. ε i is independent of (x 1,i , x 2,i ) and distributed according to a standard normal.
We consider the asymptotic distribution of a moment-based estimator of where we use the six moments and e i (θ) = y i − Φ (β 0 + β 1 x 1,i + β 2 x 2,i ). In the corresponding logit model, the first three moments correspond to the first order conditions for maximum likelihood estimation. Although they are formally different, the logit and probit models are quite similar. We therefore expect the first three moments to be the most informative about θ 0 . Moreover, we expect the first moment to be the most important for determining β 0 , and the second and third for determining β 1 and β 2 , respectively. Table 1 shows results using the optimal weighting matrix and Table 2 shows results using the diagonal weighting matrix with the inverse of the moment variances on the diagonal. 4 We think of the latter as a practical alternative to the efficient weighting matrix.
It is clear from Table 1 that the first three moments are indeed the most informative about β 0 , β 1 and β 2 , respectively. As mentioned, this is expected since these moments would be the first order conditions for maximum likelihood estimation of a logit model. 4 We illustrate the proposed sensitivity measures through Monte Carlo simulation of the expected values using 10 7 simulated observations. The elements in the last three columns of M 1 in Table 1 are much smaller than the elements in the first three. This suggests that the optimal GMM estimator is much less sensitive to misspecification of the last three moments than to misspecification of the first three moments.
The reason is that the first three moments get almost all the weight (in the corresponding logit model, they would literally get all the weight). As expected, this is less pronounced in Table   2. The values of E 2 in Tables 1 and 2 confirm that the efficient GMM estimator of θ 0 is driven by the first three moments. 5 Adding noise to the last three moments has essentially no effect on the precision of the optimal GMM estimator of θ 0 , whereas adding noise to the first three elements can have a big effect. The values of E 3 in Table 2 illustrate that the precision of the non-optimal GMM estimator is less sensitive to noise in the last three moments (because they get relatively less weight) and more sensitive to adding noise to the first three moments (because they get relatively more weight).
Next, E 4 and E 5 suggest that leaving out, for example, the second moment would increase the asymptotic variance of both the efficient and the inefficient GMM estimator of β 1 by around 400 percent. This confirms that E[ex 1 ] is instrumental for precise estimation of β 1 .
The final measure, E 6 in Table 1 is 0 by construction. Since we are using the weighting matrix that minimizes the variance of the estimator of each element of θ, the derivative of the variance with respect to the elements of the weighting matrix must be 0. E 6 in Table 2 shows that in this case, the diagonal weighting matrix with the inverse of the moment variances on the diagonal puts too little weight on the first three moments.

Example 2: Duration Model
The probit example in Section 3.1 was chosen because it is an example where we have good prior intuition about which moments matter for what parameter. We now turn to an example where this is much less obvious.
Consider a duration, T , which follows a mixed proportional hazard model with time-varying covariates and a Weibull as the baseline hazard Tables 1 and 2 differ only because of simulation error. where α is the scale parameter which captures duration dependence and x (t) β is the effect of the time-varying explanatory variables. An example of a two-dimensional time-varying set of explanatory variables could be Finally, η captures unobserved heterogeneity. Except for moment assumptions, no assumptions are made on the distribution of η.

Equation (2) suggests moment conditions of the type
for functions of the covariates, ψ. Here, β 0 captures the mean of − log (η) which is assumed to be finite.
In other words, with time-invariant covariates the moments implied by (3) do not identify (β, α), but only β/α. It turns out that it is possible to estimate α by other methods (see, for example, Honoré (1990)), but it is not possible to estimate (β, α) at the usual √ n rate (see Hahn (1994)).
This makes it interesting to investigate how precision in estimation of (β, α) depends on the various moments in (3) when x does contain time-varying covariates.
We consider a data generating process with one time-invariant and one time-varying covari- standard normal distributions. The heterogeneity term, η, follows a log-normal distribution, where the underlying normal has mean 0 and variance 1/2. η is independent of x (·). Finally, is not meant to mimic any realistic empirical example.
The sensitivity measures are given in Tables 3 and 4. In this design, the derivative of the first two moments at the true parameter values are non-zero with respect to θ 0 and θ 1 , respectively.
The derivatives are 0 with respect to the other parameters. This implies that G becomes singular when we exclude either of the first two moments. This explains the extreme entries for E 4 and This is exactly what the discussion above would predict. Interestingly, the first moment is also important for α. Presumably, this is because this moment determines the estimate of the mean of the (log of the) unobserved heterogeneity. It is well-known in the duration literature that unobserved heterogeneity is poorly distinguished from duration dependence. As a result, we do not consider this surprising.

Application: Joint Retirement Planning
In this section, we apply the proposed sensitivity measures to an extremely simple structural model of the joint retirement planning of dual-earner couples.

Data and Institutional Setting
We use the British Household Panel Survey (BHPS), which is a completed panel of 18 waves collected from 1991 through 2009. In waves 11 and 16 of the BHPS, each adult household member is asked, "Even if this is some time away, at what age do you expect you will retire?" We use this to measure the subjective retirement plans of each spouse. 6 Based on the age at the interview and the expected retirement age, we can calculate the expected retirement year of each household member and use that to investigate joint retirement plans.
Besides retirement plans, we use information in the BHPS on annual labor market income, the number of visits to the general practitioner (GP), subjective expectations about future health status, eligibility for an employer provided pension scheme (EPP), and whether individuals save any of their income in a private personal pension (PPP). 7 Finally, we define individuals as highly skilled if they have completed the first or second stage of tertiary education (ISCED codes 5 or 6).
We use information on households consisting of two opposite-sex household members who are either married or cohabiting, and who meet the following sample selection criteria: i) Both members are between 40 and 59 years old when interviewed, ii) At least one member is not retired at the time of the interview, and iii) Retirement plans are observed in the age range 50 to 70 for at least one member not retired at the time of the interview. If a household satisfies the criteria in both waves (11 and 16), we use both survey responses in the analysis. We refer to each household member as husband or wife, although we also include households, where couples are cohabiting, but not necessarily married.
The State Pension Age (SPA) The state pension age (SPA) in the U.K. is the age where individuals become eligible to receive state pension from the government. Individuals who have reached SPA and contributed to the scheme for sufficiently many years are eligible to receive a weekly transfer with no means testing.
In 2009 The EPP includes both defined and contributed benefit (DB and CB) plans and we cannot distinguish between them. Blundell, Meghir and Smith (2004) show, however, that DB plans were most common in the U.K. in this period.
allows for an effect of the Pension Act 1995 on retirement planning. Table 5 reports the descriptive statistics for the variables that we use. All statistics are based on households in which both members are not retired at the time of the interview, which is around 97 percent of our sample. Husbands in the estimation sample are approximately 1.5 years older than their wives, plan to retire two years later than their wives (at age 63 on average), and the average difference in the planned retirement year is approximately 0.83 years.

Descriptive Statistics
This difference should be viewed in light of the fact that the SPA of men is 65, while it is substantially lower for most women in our sample and as low as 60 for women born before 1950.
To illustrate simultaneous retirement planning, Figure 1 shows the distribution of the difference in the planned year of retirement between husband and wife. The left panel illustrates the unconditional distribution and the right panel conditions on the husband being at least 2 years older than his wife. The peak around zero indicates joint retirement planning, and the mass to the right of zero likely stems from men being older than women and women having a lower SPA.
When conditioning on the husband being at least 2 years older than his wife in the right panel, we see a substantial mass at 0 (same planned retirement year); we now also see a substantial mass at −2 (same planned retirement age).
Table 5 also shows that around 16 and 14 percent of men and women, respectively, are classified as highly skilled, and we see that men tend to visit the GP much less than women.
Interestingly, however, men are more likely to expect their health to worsen in the future. The labor income of husbands is around £25, 000 while that of the wives is on average around £14, 000. Only around 13 percent of wives and 28 percent of husbands contribute to a private pension (PPP), while around 47 percent of wives and 51 percent of husbands are eligible to some occupational retirement scheme (EPP).

A Model of Retirement Planning of Dual-Earner Households
In this section, we formulate a discrete time version of the continuous time bivariate duration model proposed in Honoré and de Paula (2018). Specifically, we parameterize the difference in the utility flow between being retired and working. Utility maximization then gives an estimatable model for joint retirement planning of couples.
Consider first the husbands. We specify the difference in utility from being retired in period t compared to working as where C h (t) is the calendar time, t w is the retirement age of the wife, and C w (t w ) thus is the calendar time at which the wife plans to retire. We interpret the term γ1 {C(t)≥C(tw)} as a utility externality that allows the husband to enjoy a higher utility flow from planned retirement if the wife also plans to be retired at that time. We parameterize the planned retirement age function, δ h (t), as a linear trend plus indicator functions for t ≥ 55, t ≥ 60 and t ≥ 65. The histograms in Figure 2 below suggest that these are empirically important. We interpret the first two as reflecting either social norms or heaping, while the third also reflects the fact that the SPA for men is 65.
Similarly, the difference in utility flow for the wife is We again parameterize the function δ w (t) as a linear trend plus indicator functions for t ≥ 55, t ≥ 60 and t ≥ 65. The term α1 {tw≥SP Aw} reflects the idea that for women, there is variation in the SPA as discussed above. This allows one to infer the effect of the SPA separately from the dummies that reflect either heaping or institutional features (e.g., early and statutory retirement ages) at 55, 60 and 65.
To close the model, we assume that (ε h , ε w ) is jointly normal with mean zero and covariance matrix Ω, where the off-diagonal element of Ω captures possibly correlated retirement preferences within households. We also assume that retirement is an absorbing state. When the difference in utility from retirement compared to working is increasing in age, this is not a binding constraint in the sense that individuals would not want to re-enter the labor market once retired.
If a husband and a wife plan to retire at ages r h and r w , their discounted individual utilities for a husband aged age h and for a wife aged age w . Finally, the optimal retirement plan for a household is determined jointly where A(·, ·) is a household aggregator. For the estimation, we choose A(V h , V w ) = V h + λV w as in the Nash bargaining setting from Honoré and de Paula (2018) or, more generally, the collective model framework surveyed in Browning, Chiappori and Weiss (2014).
It is clear that two scale normalizations are necessary in order to estimate the model. First, the scale of A cannot be identified and we therefore normalize the variance of ε h to be σ 2 h = 1. Secondly, the only effect of λ is to re-scale all the parameters in V w . We therefore normalize λ = 1. The model is thus in effect unitary.
Our parameterization is inspired by the ordered probit model. Consider the husbands. If γ = 0 (such that there is no utility externality) and δ h is increasing, then the utility maximation will lead to planned retirement the first time x h β h + δ h (t) + ε h > 0. In other words, the chosen planned retirement age satisfies which is exactly the ordered probit model. In that sense, the proposed model is a generalization of the ordered probit model to a bivariate case with simultaneity between the two outcomes.
The weighting matrix, W , is diagonal with the inverse of the variances of the moments in the diagonal. g(θ) is a K × 1 vector of differences between statistics/moments in the data and identical moments based on simulated data.
For each couple i, we simulate synthetic retirement plans by drawing S sim vectors of taste from the joint normal distribution and calculate the value of all combinations of retirement ages where the individual values are calculated as in (??) and (??). We then find the simulated retirement ages that maximize utility, for a given value of θ.
To estimate the model parameters, we use four sets of auxiliary models/moments with a total of K = 52 elements in g(θ). We describe in detail the construction of these moments in the supplemental material and only list them here: 1. OLS coefficients from individual regressions of the planned retirement age on own and spousal covariates x i,h and x i,w together with indicators for the wife's birth cohort 1{1950 < cohort w,i ≤ 1954} and 1{1955 ≤ cohort w,i }.
3. The covariance matrix of residuals from the regression in bullet 1 above for each household member.
4. The share of couples with retirement plans such that i) the wife plans to retire 1-2 years before her husband, ii) the husband plans to retire 1-2 years before his wife, or iii) the couple plan to retire in the same year.
The first set of moments are primarily included to help estimate β h , β w , and α in the utility function. The second set of moments are included primarily to help estimate the linear age trend and age dummies in δ h and δ w . The third set of moments are primarily included to estimate the covariance of the preference shocks for husband and wife, Ω. Recall that we normalize σ 2 h = 1 and the remaining parameters in Ω are thus σ 2 w and σ hw . The final set of moments are included to estimate the value of joint leisure, γ. We will use our proposed sensitivity measures below to investigate these claims in a more systematic way.

Empirical Results
We use the BHPS data discussed above to estimate the model of joint retirement planning of couples. We use the same moments as above and simulate S sim = 2000 draws when approximating the expected moments. Table 6 reports the estimation results. We find a positive value of coordination of around γ ≈ 0.026, around two to four times as large as the marginal utility from additional labor income of £1, 000 and significant at the 5% level (p-value of 0.02).
Overall, the remaining statistically significant parameter estimates have the expected signs.
High skilled individuals value retirement less. Less healthy people value retirement more, and having some form of pension savings increase the value of retirement. Having an employer provided pension (EPS) especially increases the utility from retirement compared to working for husbands. Perhaps surprisingly, we find that higher earning women value retirement more but this could proxy for higher wealth, which could lead to a higher propensity to retire. All spousal variables seem to matter less and are not statistically significant at most common significance levels. Interestingly, we estimate a small positive and insignificant increase in the expected retirement age of women in response to an increased SPA. This goes in line with other studies finding a relatively low degree of awareness of the reform ). Figure 2 shows the histogram of planned retirement ages for women and men. We see that the model does a quite good job fitting the empirical distribution. Likewise, Figure 3 shows the empirical and predicted distribution of retirement year differences between couples. The predicted distribution matches the empirical one well, although there are small deviations. Table 7 show the proposed sensitivity measures together with the one proposed by Andrews, Gentzkow and Shapiro (2017). We only report the measures for the parameter of interest here: The value of joint leisure, γ. All reported measures are scaled as discussed in Section 2.
The measure proposed by Andrews, Gentzkow and Shapiro (2017) is scaled such that E Clearly, the moments which γ is most sensitive to are related to simultaneous retirement. In particular, we see from E 4 and E 5 that leaving out the moment "the share planning to retire the same year" (moment 52) when estimating the model would increase the asymptotic variance of γ by a factor of 8. This confirms the intuition that this moment is extremely informative about the value of joint leisure. The share retiring within 2 years difference also seems important.
In particular, the correlation between the OLS regression residuals are important. This is also intuitive since this moment captures a combination of correlated shocks and preferences for joint leisure.

Concluding Remarks
Structural econometric models are often estimated by matching moments that depend on the parameters and on the data in a highly nonlinear way. This can make it difficult to develop intuition for which moments of the data are informative about which parameter. In this paper, we have proposed a number of very simple sensitivity measures that are meant to shed light on this.
We have illustrated our measures in two artificial examples. The first is a simple probit model and the second a mixed proportional hazard model with time-varying covariates. The first illustrates that the proposed measures are reasonable in a setting where the answer is rather obvious ex ante. The second is chosen because it illustrates how the measures can be used to gain insights, which are not so obvious.
We also illustrated the measures in a simple structural econometric model of household retirement planning. This application is of independent interest because it highlights the importance of modelling wives' and husbands' retirement decisions jointly.
The econometric model for retirement that we develop can be interpreted as a bivariate ordered choice model with simultaneity. Specifically, if the "utility externality" parameter is 0, then the model that we estimate simplifies to a bivariate ordered probit model. This may make it tractable in other applications.
van der Klaauw, W. and K. I. Wolpin (2008): "Social security and the retirement and savings behavior of low-income households," Journal of Econometrics, 145(1-2), 21-42.       Regression, husband  Notes: Figure 1 illustrates the difference in the year of retirement between husband and wife. The peak around zero indicates joint retirement planning. Because the SPA of women is lower from that of men for most cohorts, it is expected that the distribution is right-tailed. The left panel illustrates the unconditional distribution and the right panel illustrates the distribution conditional on the husband being at least 2 years older than his spouse. Online supplemental material

Definition of Moments used for Estimation
Individual OLS Moment Conditions. Let R i,j denote the planned retirement age of member j in household i and X i = (1, x i,h , x i,w , 1{1950 < cohort w,i ≤ 1954}, 1{1955 ≤ cohort w,i }) denote the set of control variables. We include as the first set of moments Stacking all moments together gives g(θ) = (M 1 (θ), M 2 (θ), M 3 (θ), M 4 (θ)) and the estimator of θ isθ = arg min θ∈Θ g(θ) W g(θ) where we use as weighting a matrix, W , the inverse of the bootstrapped variances of the moments on the diagonal and zero everywhere else.
We solve the minimization problem by successively applying different minimization routines in Matlab. We perform the sequence of estimators four times and report the estimates yielding the lowest criteria function. For each of the four estimation runs, we start with MATLABs particleswarm which is a "global" optimization routine using randomization to search through the parameter space. We use 80 particles and switch to Nelder-Mead (fminsearch in MATLAB) using the best candidates from the converged particleswarm. We use S sim = 100 simulation draws for this estimation. After the four sequences of these two algorithms, we increase the number of simulation draws to S sim = 2, 000 and do one final Nelder-Mead minimization starting at the parameters yielding the lowest objective function over the four sequences of estimators.
We then report the parameter values that solves this final minimization.