ROBUST BAYESIAN INFERENCE FOR SET-IDENTIFIED MODELS

This paper reconciles the asymptotic disagreement between Bayesian and frequentist inference in set-identiﬁed models by adopting a multiple-prior (robust) Bayesian approach. We propose new tools for Bayesian inference in set-identiﬁed models and show that they have a well-deﬁned posterior interpretation in ﬁnite samples and are asymptotically valid from the frequentist perspective. The main idea is to construct a prior class that removes the source of the disagreement: the need to specify an unrevisable prior for the structural parameter given the reduced-form parameter. The corresponding class of posteriors can be summarized by reporting the ‘posterior lower and upper probabilities’ of a given event and/or the ‘set of posterior means’ and the associated ‘robust credible region’. We show that the set of posterior means is a consistent estimator of the true identiﬁed set and the robust credible region has the correct frequentist asymptotic coverage for the true identiﬁed set if it is convex. Otherwise, the method provides posterior inference about the convex hull of the identiﬁed set. For impulse-response analysis in set-identiﬁed Structural Vector Autoregressions, the new tools can be used to overcome or quantify the sensitivity of standard Bayesian inference to the choice of an unrevisable prior.


INTRODUCTION
IT IS WELL KNOWN THAT the asymptotic equivalence between Bayesian and frequentist inference breaks down in set-identified models. First, the sensitivity of Bayesian inference to the choice of the prior does not vanish asymptotically, unlike in the pointidentified case (Poirier (1998)). Second, any prior choice can lead to 'overly informative' inference, in the sense that Bayesian interval estimates asymptotically lie inside the true identified set (Moon and Schorfheide (2012)). This paper reconciles this disagreement between Bayesian and frequentist inference by adopting a multiple-prior robust Bayesian approach.
In a set-identified structural model, the prior for the model's parameter can be decomposed into two components: the prior for the reduced-form parameter, which is revised Raffaella Giacomini: r.giacomini@ucl.ac.uk Toru Kitagawa: t.kitagawa@ucl.ac.uk This paper merges and extends two previously circulated (and now retired) working papers: Giacomini, R. and T. Kitagawa (2015): 'Robust Inference about Partially Identified SVARs' and Kitagawa, T. (2012): 'Estimation and Inference for Set-Identified Parameters using Posterior Lower Probabilities'. We would like to thank Matthew Read for outstanding research assistance and Alessio Volpicella for providing useful computational insights. We also thank Gary Chamberlain, Jean-Pierre Florens, Eleonora Granziera, Frank Kleibergen, Sophocles Mavroeidis, Andriy Norets, Joris Pinkse, Frank Schorfheide, three anonymous referees, and several seminar and conference participants for their valuable comments. Both authors gratefully acknowledge financial support from ERC Grants 536284 and 715940 and the ESRC Centre for Microdata Methods and Practice (CeMMAP) (Grant RES-589-28-0001). by the data; and the prior for the structural parameter given the reduced-form parameter, which cannot be revised by data. Our robust Bayesian approach removes the need to specify the prior for the structural parameter given the reduced-form parameter, which is the component of the prior that is responsible for the asymptotic disagreement between Bayesian and frequentist inference. This is accomplished by constructing a class of priors that shares a single prior for the reduced-form parameter but allows for arbitrary conditional priors for (or ambiguous beliefs about) the structural parameter given the reduced-form parameter. By applying Bayes's rule to each prior in this class, we obtain a class of posteriors and show that it can be used to perform posterior sensitivity analysis and to conduct inference about the identified set.
In practice, we propose summarizing the information in the class of posteriors by reporting the 'posterior lower and upper probabilities' of an event and/or the 'set of posterior means (or quantiles)' in the class of posteriors and the associated 'robust credible region'. These outputs can be expressed in terms of the (single) posterior of the reducedform parameter, so they can be obtained numerically if one can draw the reduced-form parameter randomly from its posterior.
We show that, if the true identified set is convex, the set of posterior means converges asymptotically to the true identified set and the robust credible region attains the desired frequentist coverage for the true identified set asymptotically (in a pointwise sense). If the true identified set is not convex, the method provides posterior inference about the convex hull of the identified set.
The paper further proposes diagnostic tools that measure the plausibility of the identifying restrictions, the information contained in the identifying restrictions, and the information introduced by the unrevisable prior that would be required by a standard Bayesian approach.
The second part of the paper presents a detailed illustration of the method in the context of impulse-response analysis in Structural Vector Autoregressions (SVARs) that are set-identified due to under-identifying zero and/or sign restrictions (Faust (1998), Canova and Nicolo (2002), Uhlig (2005), among others). As is typical in this literature, we focus on pointwise inference about individual impulse responses. A scalar object of interest facilitates computing the set of posterior means and the robust credible region, since the posterior of an interval can be reduced to the posterior of a two-dimensional object (the upper and lower bounds). 1 Most empirical applications of set-identified SVARs adopt standard Bayesian inference and select a non-informative-but unrevisable-prior for the rotation matrix that transforms reduced-form shocks into structural shocks. 2 Baumeister and Hamilton (2015) cautioned against this approach and showed that it may result in spuriously informative posterior inference. Our method overcomes this drawback by removing the need to specify a single prior for the rotation matrix.
We give primitive conditions that ensure frequentist validity of our method in the context of SVARs. The conditions are mild or easy to verify, and cover a wide range of applications. In particular, the results on the types of restrictions that give rise to a convex identified set with continuous and differentiable endpoints are new to the literature and may be of separate interest regardless of whether one favors a Bayesian or a frequentist approach.
We provide an algorithm for implementing the procedure, which in practice adds an optimization step to the algorithms used in the literature, such as those of Uhlig (2005) and Arias, Rubio-Ramírez, and Waggoner (2018).
Our practical suggestion in empirical applications is to report the posterior lower (or upper) probability of an event and/or the set of posterior means and the robust credible region, as an alternative or addition to the standard Bayesian output. Reporting the outputs from both approaches, together with the diagnostic tools, can help one separate the information contained in the data and in the identifying restrictions from that introduced by choosing a particular unrevisable prior.
As a concrete example of how to interpret the robust Bayesian output in an SVAR application, the finding that the posterior lower probability of the event 'the impulse response is negative' equals, say, 60%, means that the posterior probability of a negative impulse response is at least 60%, regardless of the choice of unrevisable prior for the rotation matrix. The set of posterior means can be interpreted as an estimate of the impulseresponse identified set. The robust credible region is an interval for the impulse response such that the posterior probability assigned to it is greater than or equal to, say, 90%, regardless of the prior for the rotation matrix.
The empirical illustration applies the method to a standard monetary SVAR that imposes various combinations of equality and sign restrictions typically used in the literature. The findings show that all 90% robust credible regions contain zero, casting doubts on the informativeness of such restrictions. In particular, sign restrictions alone have little identifying power, which means that standard Bayesian inference is largely driven by the choice of the unrevisable prior for the rotation matrix. The addition of zero restrictions tightens the estimated identified set, makes standard Bayesian inference less sensitive to the choice of prior for the rotation matrix, and can lead to informative inference about the sign of the output response to a monetary policy shock. This paper is related to several literatures in econometrics and statistics. Robust Bayesian analysis has a long history in statistics. See Berger (1994) and references therein. In econometrics, pioneering contributions using multiple priors are Chamberlain and Leamer (1976) and Leamer (1982), who obtained the bounds for the posterior mean of regression coefficients when a prior varies over a certain class. No previous studies explicitly consider set-identified models, but rather focus on point-identified models, and view the approach as a way to measure the global sensitivity of the posterior to the choice of prior (as an alternative to a full Bayesian analysis requiring the specification of a hyperprior over the priors in the class).
In econometrics, there is a large literature on estimation and inference in set-identified models from the frequentist perspective. See Canay and Shaikh (2017) for a survey of the literature and references therein. In the context of certain sign-restricted SVARs, Granziera, Moon, and Schorfheide (2018) proposed frequentist inference by inverting tests for a minimum-distance type criterion function, while Gafarov, Meier, and Montiel-Olea (2018) applied the delta-method using directional derivatives. Our approach is complementary, as it accommodates a broader class of identifying restrictions, while relying on different conditions to attain asymptotic frequentist validity.
There is also a growing literature on Bayesian inference for set-identified models. Some propose posterior inference based on a single prior irrespective of the posterior sensitivity introduced by set identification (Baumeister and Hamilton (2015), Gustafson (2015)). Our paper does not intend to provide a normative argument as to whether one should adopt a single prior or multiple priors under set identification: our main goal is to offer new tools for inference and to show that they have a well-defined posterior interpretation in finite samples and yield asymptotically valid frequentist inference. In parallel work, Norets and Tang (2014) and Kline and Tamer (2016) considered Bayesian inference about the identified set. Norets and Tang (2014) focused on dynamic discrete choice models. Kline and Tamer (2016) focused on moment inequality models and constructed confidence regions that share some relation with our robust credible regions. Their proposal does not have a formal Bayesian interpretation, and their credible sets do not minimize volume, so the coverage can be conservative. If one were to extend our framework to moment inequality models, one could view our approach as providing a formal posterior interpretation for the confidence regions in Kline and Tamer (2016), while ensuring that they have minimum volume. In addition, we show how to construct the set of posterior means as a consistent estimator of the true identified set. Wan (2013) and Chen, Christensen, and Tamer (2018) proposed using Bayesian Markov Chain Monte Carlo methods to overcome some computational challenges of the frequentist approach to inference about the identified set.
Finally, a key insight of this paper is to recognize that from the Bayesian perspective, under mild regularity conditions, an identification region can be viewed as a random closed set, and Bayesian inference on it can be carried out using elements of random set theory. This theory has proven very helpful for partial identification analysis since its introduction to econometrics (Beresteanu and Molinari (2008), Beresteanu, Molchanov, and Molinari (2012)), and a novel contribution of our paper is to bring these tools into Bayesian inference.
The remainder of the paper is organized as follows. Section 2 considers the general setting of set identification and introduces the multiple-prior robust Bayesian approach. Section 3 analyzes the asymptotic properties of the method. Section 4 illustrates the application to SVARs. Section 5 discusses the numerical implementation. Sections 4 and 5 are self-contained, so a reader interested in SVARs can focus on these sections. Section 6 contains the empirical application and Section 7 concludes. The proofs are in Appendix A. The Supplemental Material (Appendix B in Giacomini and Kitagawa (2021)) contains additional results and discussion about the validity of the assumptions in SVARs.

Notation and Definitions
This section describes the general framework of set-identified structural models. In particular, it introduces the definitions of structural parameter θ, reduced-form parameter φ, and parameter of interest η that are used throughout the paper.
Let (Y Y) and (Θ A) be the standard Borel measurable spaces of a sample Y ∈ Y and a parameter vector θ ∈ Θ, respectively. We restrict attention to parametric models, so Θ ⊂ R d , d < ∞. Assume that the conditional distribution of Y given θ exists and has a probability density p(y|θ) at every θ ∈ Θ with respect to a σ-finite measure on (Y Y), where y ∈ Y indicates sampled data.
Set identification of θ arises when multiple values of θ are observationally equivalent, so that for θ and θ = θ, p(y|θ) = p(y|θ ) for every y ∈ Y (Rothenberg (1971)). Observational equivalence can be represented by a many-to-one function g : (Θ A) → (Φ B), such that g(θ) = g(θ ) if and only if p(y|θ) = p(y|θ ) for all y ∈ Y (see, e.g., Barankin (1960)). This relationship partitions the parameter space Θ into equivalent classes, in each of which the likelihood of θ is 'flat' irrespective of observations, and φ = g(θ) maps each of the equivalent classes to a point in a parameter space Φ. In the language of structural models in econometrics (Koopmans and Reiersol (1950)), φ = g(θ) is the reducedform parameter that indexes the distribution of the data. The reduced-form parameter carries all the information for the structural parameter θ through the value of the likelihood function, in the sense that there exists a B-measurable functionp(y|·) such that p(y|θ) =p(y|g(θ)) for every y ∈ Y and θ ∈ Θ. 3 Let the parameter of interest η ∈ H be a subvector or a transformation of θ, η = h(θ) with h : (Θ A) → (H D), H ⊂ R k , k < ∞. The identified sets of θ and η are defined as follows.
DEFINITION 1-Identified Sets of θ and η: (i) The identified set of θ is the inverse image of g(·): We define the identified set for θ in terms of the likelihood-based definition of observational equivalence of θ. As a result, IS θ (φ) and IS η (φ) are ensured to give sharp identification regions at every distribution of data indexed by φ. In some structural models, including SVARs, the space of the reduced-form parameter Φ on which the reduced-form likelihood is well-defined can be larger than the space of the reduced-form parameter generated from the structure g(Θ); that is, the model is observationally restrictive in the sense of Koopmans and Reiersol (1950). In this case, the model is falsifiable, and IS θ (φ) can be empty for some φ ∈ Φ.

Multiple Priors
In this section, we discuss how set identification induces unrevisable prior knowledge and we introduce the use of multiple priors.
Let π θ be a prior (distribution) of θ and π φ be the corresponding prior of φ, obtained as the marginal probability measure on (Φ B) induced by π θ and g(·): Since the likelihood for θ is flat on IS θ (φ) for any Y , conditional independence θ ⊥ Y |φ holds. The posterior of θ, π θ|Y , is accordingly obtained as where π θ|φ denotes the conditional distribution of θ given φ, 4 and π φ|Y is the posterior of φ. Expression (2.2) shows that the prior of the reduced-form parameter, π φ , can be updated by the data, whereas the conditional prior of θ given φ is never updated because the likelihood is flat on IS θ (φ) ⊂ Θ for any realization of the sample. In this sense, one can interpret π φ as the revisable prior knowledge and the conditional priors, {π θ|φ (·|φ) : φ ∈ Φ}, as the unrevisable prior knowledge.
In a standard Bayesian setting, the posterior uncertainty about θ is summarized by a single probability distribution. This requires specifying a single prior for θ, which induces a conditional prior π θ|φ that is unique up to π φ -almost sure equivalence. If one could justify this choice of conditional prior, the standard Bayesian updating formula (2.2) would yield a valid posterior for θ. A challenging situation arises if a credible conditional prior is not readily available. In this case, a researcher who is aware that π θ|φ is never updated by the data might worry about the influence that a potentially arbitrary choice can have on posterior inference.
The robust Bayesian analysis in this paper focuses on this situation, and removes the need to specify a single conditional prior by introducing ambiguity for π θ|φ in the form of multiple priors.
In this paper, we shall not discuss how to select π φ , and treat it as given. As the influence of this prior on posterior inference disappears asymptotically, any sensitivity issues in this respect potentially only concern small samples. Another reason for not introducing multiple priors for φ is to avoid possible issues of non-convergence of the class of posteriors, as discussed in the literature on global sensitivity analysis, for example, Ruggeri and Sivaganesan (2000).
In order to derive an analytical expression for π η|Y * (·), we make the following assumption.
ASSUMPTION 1: (i) The prior of φ, π φ , is proper, absolutely continuous with respect to a σ-finite measure on (Φ B), and π φ (g(Θ)) = 1, that is, IS θ (φ) and IS η (φ) are non-empty, π φ -a.s. (ii) The mapping between θ and φ, g : The mapping between θ and η, h : Assumption 1(i) guarantees that the identified set IS η (φ) can be viewed as a random set defined on the probability space both a priori (Φ B π φ ) and a posteriori (Φ B π φ|Y ), which we exploit in the proof of Theorem 1 below. As we discuss in Section 5, the numerical implementation of our method allows an improper prior with support larger than g(Θ) and imposes the assumption by only retaining draws that give a non-empty identified set. Assumptions 1(ii) and 1(iii) are mild conditions ensuring that IS θ (φ) and IS η (φ) are random closed sets satisfying a measurability requirement. The closedness of IS θ (φ) and IS η (φ) is implied, for example, by continuity of g(·) and h(·).
The next theorem expresses the posterior lower and upper probabilities for the parameter of interest in terms of the posterior of φ. This provides the basis for the numerical approximation of these probabilities, which only requires the ability to compute the identified set at values of φ randomly drawn from its posterior. THEOREM 1: Under Assumption 1, for D ∈ D, The expression for π η|Y * (D) shows that the lower probability on D is the probability that the (random) identified set IS η (φ) is contained in D in terms of the posterior probability of φ. The intuition for this result can be best understood when η = θ. The decomposition in equation (2.2) suggests that, to minimize the posterior probability on {θ ∈ D}, we choose, if possible, a conditional prior π θ|φ that assigns all the probability outside D so as to attain π θ|φ (D) = 0. Such choice of prior is, however, not possible for φ such that IS θ (φ) ⊂ D, since the requirement π θ|φ (IS θ (φ)) = 1 binds and any choice of prior satisfies π θ|φ (D) = 1. Symmetrically, the posterior probability on {θ ∈ D} is maximized by choosing, if possible (i.e., if IS θ (φ) ∩ D = ∅), a conditional prior that puts all the probability inside D. These constructions of the extreme conditional priors immediately lead to the expressions of the lower and upper probabilities in Theorem 1. 5 Setting η = θ gives the posterior lower and upper probabilities for θ in terms of the containment and hitting probabilities of IS θ (φ). In standard Bayesian inference, the posterior of θ is transformed into a posterior for η = h(θ) by integrating the posterior probability measure of θ for η, while here it corresponds to projecting random sets IS θ (φ) onto H via η = h(·). This highlights the difference between standard Bayesian analysis and robust Bayesian analysis based on the lower probability. Corollary A.1 in Appendix A shows that for each D ∈ D, the set of posterior probabilities {π η|Y (D) : π η|Y ∈ Π η|Y } coincides with the connected intervals [π η|Y * (D) π * η|Y (D)], implying that any posterior probability in this set can be attained by some posterior in Π η|Y .
It is well known in the robust statistics literature (e.g., Huber (1973)) that the lower probability of a set of probability measures is, in general, a monotone nonadditive measure (capacity). The posterior lower and upper probabilities in this paper coincide with the construction of the posterior lower and upper probabilities of Wasserman (1990) when it is applied to our prior class. An important distinction from Wasserman's analysis is that our posterior lower probability is guaranteed to be an ∞-order monotone capacity (a containment functional of random sets), which simplifies the investigation of its analytical properties and the practical implementation of the method.

Set of Posterior Means and Quantiles
The posterior lower and upper probabilities shown in Theorem 1 summarize the set of posterior probabilities for an arbitrary event of interest D. To summarize the information in the posterior class without specifying D, we propose to report the set of posterior means of η.
The next proposition shows that the set of posterior means of η is equivalent to the Aumann expectation of the convex hull of the identified set.
THEOREM 2: Suppose Assumption 1 holds and the random set IS η (φ) ⊂ H, φ ∼ π φ|Y , is L 1 -integrable with respect to π φ|Y in the sense that E φ|Y (sup η∈ISη(φ) η ) < ∞. Let co(IS η (φ)) be the convex hull of IS η (φ) 6 and let E A φ|Y (·) denote the Aumann expectation of a random set with underlying probability measure π φ|Y . 7 Then, the set of posterior means is convex and equals the Aumann expectation of the convex hull of the identified set: bility distributions for θ formed by measurable selections of IS θ (φ) is given by Since a measurable selection of IS θ (φ) corresponds to degenerate conditional priors {π θ|φ = 1 ξ(φ) : φ ∈ Φ} and it is included in the set of conditional priors used in our analysis, the set of posteriors for θ, Π θ|Y , defined in (2.4) with η = θ satisfies Π θ|Y ⊇ Π S θ|Y . As implied by Artstein's inequality, however, the upper probability of Π θ|Y obtained in Theorem 1 with η = θ agrees with the upper probability of Π S θ|Y on any closed measurable A ∈ A. 6 co(IS η ) : Φ ⇒ H is viewed as a closed random set defined on the probability space (Φ B π φ|Y ). 7 Let X : Φ ⇒ H be a closed random set defined on the probability space (Φ B π φ|Y ), and ξ(φ) : Φ → H be its measurable selection, that is, ξ(φ) ∈ X(φ), π φ|Y -a.s. Let S 1 (X) be the class of integrable measurable selections, S 1 (X) = {ξ : ξ(φ) ∈ X(φ) π φ|Y -a.s., E φ|Y ( ξ ) < ∞}. The Aumann expectation of X is defined as (Aumann (1965) ] (see, e.g., Theorem 1.26 in Chapter 2 of Molchanov (2005)) and a support function one-to-one corresponds to the closed convex set. Hence, the analytical characterization in Theorem 2 suggests that the set of posterior means can be computed by approximating E φ|Y [s(IS η (φ) ·)] using the draws of IS η (φ) φ ∼ π φ|Y and mapping back the approximated average support function to obtain the set of posterior means E A φ|Y [co(IS η (φ))]. In case of scalar η, the set of posterior means has the particularly simple form } are the lower and upper bounds of IS η (φ). This lower (upper) bound is attained by the conditional priors {π θ|φ : φ ∈ Φ} that allocate probability 1 to θ attaining the lower (upper) bound of IS η (φ). In applications where it is feasible to compute (φ) and u(φ), we can approximate E φ|Y ( (φ)) and E φ|Y (u(φ)) by using a random sample of φ drawn from π φ|Y .
In case of scalar η, the set of posterior τth quantiles of η is also simple to compute. We apply Theorem 1 with D = (−∞ t], −∞ < t < ∞, to obtain the set of the posterior cumulative distribution functions (CDF) of η for each t. Inverting the upper and lower bounds of this set at τ ∈ (0 1) gives the set of posterior τth quantiles of η.

Robust Credible Region
This section introduces the robust Bayesian counterpart of the highest posterior density region that is typically reported in standard Bayesian inference. For α ∈ (0 1), consider a subset C α ⊂ H such that the posterior lower probability π η|Y * (C α ) is greater than or equal to α: C α is interpreted as 'a set on which the posterior credibility of η is at least α, no matter which posterior is chosen within the class'. Dropping the italicized sentence yields the usual interpretation of a posterior credible region, so this definition seems like a natural extension to our robust Bayesian setting. We refer to C α satisfying (2.6) as a robust credible region with credibility α. As in the standard Bayesian case, there are multiple ways to construct C α satisfying (2.6). We propose to resolve this multiplicity by choosing the C α with the smallest volume: where Leb(C) is the volume of C in terms of the Lebesgue measure and C is a family of subsets in H. 8 We refer to C * α as a smallest robust credible region with credibility α. The credible regions for the identified set proposed in Schorfheide (2011), Norets andTang (2014), and Kline and Tamer (2016) satisfy (2.6), so they are robust credible regions in our definition. However, these works do not consider the volume-optimized credible region (2.7). 9 Obtaining C * α is challenging if η is a vector and no restriction is placed on the class C in (2.7). Proposition 1 below shows that, for scalar η, this can be overcome by constraining C to be the class of closed connected intervals. C * α can then be computed by solving a simple optimization problem.
PROPOSITION 1-Smallest Robust Credible Region for Scalar η: Let η be scalar and let d : , with C restricted to the class of closed connected intervals, is a closed interval centered at η * c = arg min ηc ∈H r α (η c ) with radius r * α = r α (η * c ).

Plausibility of Identifying Restrictions
For observationally restrictive models (i.e., g(Θ) is a proper subset of Φ), quantifying posterior information for assessing the set-identifying restrictions can be of interest. To do so, we start with a prior of φ that supports the entire Φ, which we denote byπ φ . Trimming the support ofπ φ on g(Θ) = {φ : IS θ (φ) = ∅} gives π φ satisfying Assumption 1(i). We updateπ φ to obtain the posterior of φ with extended domainπ φ|Y .
Since emptiness of the identified set can refute the imposed identifying restrictions, their plausibility can be measured by the posterior probability that the identified set is non-empty,π φ|Y ({φ : IS η (φ) = ∅}). 10 Note that this measure depends only on the posterior of the reduced-form parameter, so it is free from the issue of posterior sensitivity due to set identification. By reporting the posterior plausibility of the identifying restrictions and the set of posterior means conditional on {IS η (φ) = ∅}, we can separate inferential statements about the validity of the identifying restrictions from inferential statements about the parameter of interest, which is difficult to do from a frequentist perspective (see the discussion in Sims and Zha (1999)).

Informativeness of Identifying Restrictions and of Priors
The strength of identifying restrictions can be measured by comparing the set of posterior means relative to that of a model that does not impose these restrictions but is otherwise identical. For instance, suppose the object of interest η is a scalar. Let M s be the set-identified model imposing the identifying restrictions and M l be the model that relaxes the restrictions. For identification of η, the identifying power of the restrictions 10 An alternative measure is the prior-posterior odds of the non-emptiness of the identified set, The amount of information in the posterior provided by the choice of the unrevisable prior π θ|φ in a standard Bayesian analysis can be similarly measured by comparing the width of C α satisfying (2.6) to the width of the standard Bayesian credible region obtained from the single prior: Informativeness of the choice of prior = 1 − width of a Bayesian credible region of η with credibility α width of a robust credible region of η with credibility α (2.9) This measure captures by what fraction the credible region of η is tightened by choosing a particular unrevisable prior π θ|φ .

ASYMPTOTIC PROPERTIES
The set of posterior means or quantiles and the robust credible region introduced in Section 2 have well-defined (robust) Bayesian interpretations in finite samples and they are useful for conducting Bayesian sensitivity analysis to the choice of an unrevisable prior. To examine whether these quantities are useful from the frequentist perspective, we now analyze their asymptotic frequentist properties. We show two main results. First, the set of posterior means can be viewed as an estimator of the identified set that converges to the true identified set asymptotically when the true identified set is convex. Otherwise, the set of posterior means converges to the convex hull of the true identified set. Second, the robust credible region has the correct asymptotic coverage for the true identified set. These results show that introducing ambiguity for non-identified parameters induces asymptotic equivalence between (robust) Bayesian and frequentist inference in set-identified models. An implication of this finding is that our robust Bayesian analysis can also appeal to frequentists.
In this section, we let φ 0 ∈ Φ denote the true value of the reduced-form parameter and Y T = (y 1 y T ) denote a sample of size T generated from P Y T |φ 0 .

Consistency of the Set of Posterior Means
Assume the following conditions: ASSUMPTION 2: (i) IS η (φ 0 ) is bounded, and the identified set correspondence IS η : Φ ⇒ H is continuous at φ = φ 0 (see, e.g., Sundaram (1996) for the definition of continuity for correspondences).
for all large enough T.
Assumption 2(i) requires that the identified set of η is a continuous correspondence at φ 0 . In the case of scalar η with convex identified set IS η (φ) = [ (φ) u(φ)], this means that (φ) and u(φ) are continuous at φ 0 . Since (φ) and u(φ) can be viewed as the values of the optimizations (φ) = min θ∈IS θ (φ) h(θ) and u(φ) = max θ∈IS θ (φ) h(θ), the theorem of maximum (e.g., Theorem 9.14 in Sundaram (1996)) shows that sufficient conditions for continuity of (φ) and u(φ) are continuity of h(·) and continuity of the correspondence for IS θ (φ). In a common special case where IS θ (φ) is a polyhedron and [ (φ) u(φ)] are the values of linear programming, continuity of the polyhedral correspondence is implied by the condition of dimension-stability of the polyhedron (e.g., Proposition 6 in Wets (1985)). For SVAR models, Appendix B.2 in Giacomini and Kitagawa (2021) shows that the continuity property is mild and easily verifiable.
Assumption 2(ii) requires that Bayesian estimation of the reduced-form parameter is a standard estimation problem in the sense that almost-sure posterior consistency holds. Assumption 2(iii) strengthens Assumption 2(i) by assuming that IS η (φ) is π φ|Y T -almost surely compact-valued and its radius has finite (1 + δ)th moment. In the scalar case, Assumption 2(iii) holds with δ = 1 if (φ) and u(φ) have finite posterior variances.
is the Hausdorff distance. (ii) Suppose Assumption 2 holds and the prior for φ, π φ , is non-atomic; then the set of posterior means almost surely converges to the convex hull of the true identified set, that is, The first claim of Theorem 3 states that the identified set IS η (φ), viewed as a random set induced by the posterior of φ, converges in posterior probability to the true identified set IS η (φ 0 ) in the Hausdorff metric. This claim only relies on continuity of the identified set correspondence and does not rely on Assumption 2(iii) or on convexity of the identified set. The second claim of the theorem provides a justification for using (a numerical approximation of) the set of posterior means as a consistent estimator of the convex hull of the identified set. The theorem implies that the set of posterior means converges to the true identified set if this set is convex.

Asymptotic Coverage Properties of the Robust Credible Region
We first state a set of conditions under which the robust credible region asymptotically attains correct frequentist coverage for the true identified set IS η (φ 0 ).

ASSUMPTION 3: (i)
The identified set IS η (φ) is π φ -almost surely closed and bounded, and IS η (φ 0 ) is closed and bounded. (ii) The robust credible region C α belongs to the class of closed and convex sets C in R k .
Assumption 3(i) is a weak requirement in practical applications. We allow the identified set IS η (φ) to be non-convex, while Assumption 3(ii) constrains the robust credible region to be closed and convex. Under convexity of C α , IS η (φ) ⊂ C α holds if and only if co(IS η (φ)) ⊂ C α holds, so that the inclusion of the identified set by C α is equivalent to the dominance of their support functions, s(IS η (φ) q) = s(co(IS η (φ)) q) ≤ s(C α q) for all q ∈ S k−1 (see, e.g., Corollary 13.1.1 in Rockafellar (1970)). This fact enables us to characterize a set of conditions for correct asymptotic coverage of C α in terms of the limiting probability law of the support functions, which has been studied in the literature on frequentist inference for the identified set (e.g., Beresteanu and Molinari (2008), Bontemps, Magnac, and Maurin (2012), Kaido (2016)).
where the probability law of X φ|Y T is induced by π φ|Y T , T = 1 2 , and the probability law of X Y T |φ 0 is induced by the sampling process P Y T |φ 0 , T = 1 2 . The following conditions hold: (i) X φ|Y T X as T → ∞ for P Y ∞ |φ 0 -almost every sampling sequence, where denotes weak convergence.
and Pr(X = c) = 0 for any non-random function c ∈ C(S k−1 R). (iv) Let C α be a robust credible region satisfying α ≤ π φ|Y T ({φ : Assumption 4(i) states that the posterior distribution of the support function of the identified set IS η (φ), centered at the support function of IS η (φ) and scaled by a T , converges weakly to the stochastic process X. The weak convergence of the scaled support function to the tight Gaussian process on S k−1 holds with a T = √ T , for instance, if the central limit theorem for random sets applies; see, for example, Molchanov (2005) and Beresteanu and Molinari (2008). Assumption 4(i) is a Bayesian analogue to the frequentist central limit theorem for the support functions.
Assumption 4(ii) states that, from the viewpoint of the support function, the difference between IS η (φ) and the true identified set scaled by a T converges in distribution to the stochastic process Z, and the probability law of Z coincides with the probability law of X. 13 Since the distribution of X is defined conditional on a sampling sequence while Z is unconditional, the agreement of the distributions of X and Z implies that the dependence of the posterior distribution of X φ|Y T on the sample Y T vanishes as T → ∞. Beresteanu and Molinari (2008) and Kaido and Santos (2014) provided practical examples where the limiting process Z is a zero-mean tight Gaussian process in C(S k−1 R).
Assumptions 4(i) and 4(ii) are delicate assumptions and whether they hold depends on the geometry of the identified set. The working paper version of this paper discusses an example (Example C.2 in Appendix C) showing that, if the support function s(IS η (φ) q) is not differentiable in φ at some q, this can lead to violation of Assumptions 4(i) and 4(ii). See also Kitagawa, Montiel-Olea, Payne, and Verez (2020) for properties of the asymptotic posterior for a non-differentiable function of parameters satisfying the Bernsteinvon Mises property such as φ.
Assumption 4(iii) means that the limiting process X is continuously distributed and non-degenerate in the stated sense, which holds true if X follows a non-degenerate Gaussian process. In addition to the convexity requirement of Assumption 3(ii), Assumption 4(iv) requires C α to be bounded and to lie in a neighborhood of IS η (φ) shrinking at rate 1/a T .
Second, Theorem 4 shows pointwise asymptotic coverage rather than asymptotic uniform coverage over a class of the sampling processes φ 0 . As stressed in the frequentist literature (e.g., Imbens and Manski (2004), Stoye (2009), Romano and Shaikh (2010), Andrews and Soares (2010), it is desirable for frequentist methods to attain asymptotically valid coverage in the uniform sense. Examining this property for our procedure is, however, challenging because it would require the Bernstein-von Mises condition for the support function processes (Assumptions 4(i) and 4(ii)) to hold uniformly over a class of the sampling processes φ 0 . To our knowledge, little is known to what extent the Bernsteinvon Mises property holds in the uniform sense even for the standard case of identifiable parameters. We thus leave this investigation for future research.
Third, the confidence region considered by Moon and Schorfheide (2011) and Norets and Tang (2014) can attain asymptotically correct coverage under a different set of assumptions (Assumptions 1 and 5(i) in this paper). Although these assumptions may be easier to check than Assumption 4, the credible region proposed by these authors is generally conservative. In contrast, Theorem 4 shows that if C α is constructed to satisfy (2.6) with equality (e.g., it is the smallest robust credible region C * α ), the asymptotic coverage probability is exact. Theorem 5 in Kline and Tamer (2016) shows a similar conclusion to Theorem 4 under the conditions that the Bernstein-von Mises property holds for estimation of φ and that a T (φ − φ 0 ) andĉ T (·) are asymptotically independent. Our Assumption 4(iv) implies the asymptotic independence condition of Kline and Tamer (2016) by assumingĉ T converges to a constant. Theorem 4, on the other hand, assumes the Bernsteinvon Mises property in terms of the support functions of the identified set rather than the underlying reduced-form parameters.
Assumption 4 is rather high-level, and could be difficult to check when η is a vector. For a scalar η, we can obtain a set of sufficient conditions for Assumptions 4(i)-4(iii) that are simple to verify in empirical applications, for example, the set-identified SVARs considered in Section 4.
(i) The maximum likelihood estimatorφ is strongly consistent for φ 0 , and the posterior of φ and the sampling distribution ofφ are √ T -asymptotically normal with an identical covariance matrix: Assumption 5(i) implies that likelihood-based estimation of φ satisfies the Bernsteinvon Mises property in the P Y ∞|φ 0 -almost sure sense. See Borwanker, Kallianpur, and Prakasa Rao (1971) for an almost-sure version of the Bernstein-von Mises theorem and regularity conditions on the likelihood function and the prior for φ. Additionally imposing Assumption 5(ii) implies applicability of the delta method to (·) and u(·), which implies Assumptions 4(i)-4(iii) for scalar η. In addition, it can be shown that the shortest robust credible region in (2.7) satisfies Assumption 4(iv). Hence, C * α is an asymptotically valid frequentist confidence set for the true identified set with asymptotic coverage probability exactly equal to α.
It is important to note that Assumption 5(ii) is restrictive in some aspects. First, it does not allow (φ) or u(φ) to be flat at φ 0 . This is because the Bernstein-von Mises property for φ (Assumption 5(i)) does not carry over to (φ) or u(φ) through the secondorder delta method if their first-order derivatives are zero at φ 0 , and it leads to violation of Assumptions 4(i) and 4(ii). Second, non-differentiability of (φ) and u(φ) arises if the projection bounds for η involve the max or min operations and the minimizers or maximizers are not unique at φ = φ 0 . Bounds involving the max or min appear in the intersection bound analysis of Manski (1990) and the partial identification analysis via linear programming of Balke and Pearl (1997). As shown in Kitagawa et al. (2020), the Bernstein-von Mises property breaks down for non-differentiable functions of φ, and it leads to violation of Assumptions 4(i) and 4(ii). Appendix B.3 in Giacomini and Kitagawa (2021) discusses the differentiability assumptions and their plausibility in the context of SVAR models.
Hence, by Theorem 2, C * α is an asymptotically valid frequentist confidence set for IS η (φ 0 ) with exact coverage, Lemma 1 of Kline and Tamer (2016) obtains a similar result for a robust credible region different from our smallest credible region C * α ; theirs takes the form where c α is chosen to satisfy (2.6) with equality.

ROBUST BAYESIAN INFERENCE IN SVARS
In this section, we illustrate our method in the context of impulse-response analysis in set-identified SVARs. This section is self-contained. Consider an SVAR(p): where y t is an n × 1 vector and t is an n × 1 vector white noise process, normally distributed with mean zero and variance the identity matrix I n . The initial conditions y 1 y p are given. We assume that one always imposes the sign normalization restrictions that the diagonal elements of A 0 are nonnegative. The reduced-form VAR(p) representation of the model is We restrict the domain Φ to the set of φ's such that the variance-covariance matrix of the reduced-form errors is nonsingular and the reduced-form VAR(p) model can be inverted into a VMA(∞) model.
Let Q ∈ O(n) be an n × n orthonormal 'rotation' matrix and O(n) be the set of n × n orthonormal matrices. As in Uhlig (2005) and Rubio-Ramírez, Waggoner, and Zha (2010), consider the transformation where Σ tr is the lower-triangular Cholesky factor of Σ with nonnegative diagonal elements. This transformation is one-to-one, as it is invertible with nonsingular Σ, so that A 0 = Q Σ −1 tr and [a A 1 A p ] = Q Σ −1 tr B. Since θ in Section 2 can be any one-to-one transformation of the structural parameters, in this section we follow the convention in the literature and set θ = (φ vec(Q) ) .
The VMA(∞) representation of the model is where C j is the jth coefficient matrix of (I n − p j=1 B j L j ) −1 . We denote the hth horizon impulse response by the n × n matrix IR h , h = 0 1 2 and the long-run cumulative impulse-response matrix by The scalar parameter of interest η is a single impulse response, that is, the (i j)element of IR h : where e i is the ith column vector of I n and c ih (φ) is the ith row vector of C h Σ tr . Note that the analysis developed below for the impulse responses can be extended to the structural parameters A 0 and [A 1 A p ], since the (i j)th element of A l can be obtained as e j (Σ −1 tr B l ) q i , with B 0 = I n .

Set Identification in SVARs
Set identification in an SVAR arises when knowledge of the reduced-form parameter φ does not pin down a unique A 0 . Since any A 0 = Q Σ −1 tr satisfies Σ = (A 0 A 0 ) −1 , in the absence of identifying restrictions {A 0 = Q Σ −1 tr : Q ∈ O(n)} is the identified set of A 0 's, that is, the set of A 0 's that are consistent with φ (Uhlig (2005, Proposition A.1)). Imposing identifying restrictions can be viewed as restricting the set of feasible Q's to lie in a subspace Q of O(n), so that the identified set of A 0 is {A 0 = Q Σ −1 tr : Q ∈ Q} and the corresponding identified set of η is In the following, we characterize the subspace Q under common types of identifying restrictions.

Under-Identifying Zero Restrictions
Examples of under-identifying zero restrictions typically used in the literature are restrictions on some off-diagonal elements of A 0 , on the lagged coefficients {A l : l = 1 p}, on contemporaneous impulse responses IR 0 = A −1 0 , and on the cumulative long-run responses CIR ∞ in (4.5).
All these restrictions can be viewed as linear constraints on the columns of Q. For example: We can thus represent a collection of zero restrictions in the general form: where F i (φ) is an f i × n matrix. Each row in F i (φ) corresponds to the coefficient vector of a zero restriction that constrains q i as in (4.8), and F i (φ) stacks all the coefficient vectors that multiply q i into a matrix. Hence, f i is the number of zero restrictions constraining q i . If the zero restrictions do not constrain q i , F i (φ) does not exist and f i = 0. In order to implement our method, one must first order the variables in the model.
DEFINITION 3-Ordering of Variables: Order the variables in the SVAR so that the number of zero restrictions f i imposed on the ith column of Q (i.e., the rows of F i (φ) in (4.9)) satisfy f 1 ≥ f 2 ≥ · · · ≥ f n ≥ 0. In case of ties, if the impulse response of interest is that to the jth structural shock, order the jth variable first. That is, set j = 1 when no other column vector has a larger number of restrictions than q j . If j ≥ 2, then order the variables so that f j−1 > f j . 14 Rubio-Ramírez, Waggoner, and Zha (2010) showed that, under regularity assumptions, a necessary condition for point identification is that the rank of F i (φ) equals to n − i for all i = 1 n. Here we consider restrictions that make the SVAR set-identified because with strict inequality for at least one i ∈ {1 n}. 15 The following example illustrates how to order the variables in order to satisfy Definition 3. 14 The assumption pins down a unique j, while it does not necessarily yield a unique ordering for the other variables if some of them admit the same number of constraints. However, the condition for convexity in Appendix B in Giacomini and Kitagawa (2021) is not affected by the ordering of the other variables as long as the f i 's are in decreasing order. 15 The class of under-identified models considered here does not exhaust the universe of all possible nonidentified SVARs, since there exist models that do not satisfy (4.10), but for which the structural parameter is not globally identified for some values of the reduced-form parameter. For instance, in the example in Section 4.4 of Rubio-Ramírez, Waggoner, and Zha (2010), with n = 3 and f 1 = f 2 = f 3 = 1, the structural parameter is locally, but not globally, identified. EXAMPLE 1: Consider a SVAR for (π t y t m t i t ) , where π t is inflation, y t is (detrended) real GDP, m t is the (detrended) real money stock, and i t is the nominal interest rate. Consider the following under-identifying restrictions imposed on A −1 0 : ⎛ ⎜ ⎝ Let the objects of interest be the impulse responses to i t (a monetary policy shock). Let [q π q y q m q i ] be a 4 × 4 orthonormal matrix. By (4.8), the imposed restrictions imply two restrictions on q m and two restrictions on q i . An ordering consistent with Definition 3 is (i t m t π t y t ) , and the corresponding numbers of restrictions are (f 1 f 2 f 3 f 4 ) = (2 2 0 0) with j = 1. The restrictions in this example satisfy (4.10). If instead the objects of interest are the impulse responses to y t (interpreted as a demand shock), order the variables as (i t m t y t π t ) and let j = 3.

Sign Restrictions
Sign restrictions could be considered alone or in addition to zero restrictions. If there are zero restrictions, we maintain the ordering in Definition 3. If there are only sign restrictions, we order first the variable whose structural shock is of interest. Suppose there are sign restrictions on the responses to the jth structural shock. Sign restrictions are linear constraints on the columns of Q: S hj (φ)q j ≥ 0, 16 where S hj (φ) ≡ D hj C h (B)Σ tr , with D hj an s hj × n matrix that selects the sign-restricted responses from the impulse-response vector C h (B)Σ tr q j . The nonzero elements of D hj equal 1 or −1 depending on whether the corresponding impulse responses are positive or negative.
Stacking S hj (φ) over multiple horizons gives the set of sign restrictions on the responses to the jth shock as , with 0 ≤h ≤ ∞ the maximal horizon in the impulse-response analysis. If there are no sign restrictions on thẽ hth horizon responses, sh j = 0 and Sh j (φ) is not present in S j (φ).
Let I S ⊂ {1 2 n} be such that j ∈ I S if some of the impulse responses to the jth structural shock are sign-constrained. We denote the set of all sign restrictions, S j (φ)q j ≥ 0 for j ∈ I S , as S(φ Q) ≥ 0 (4.13)

The Impulse-Response Identified Set
The identified set for the impulse response in the presence of under-identifying zero restrictions and sign restrictions is given by IS η (φ|F S) = η(φ Q) : Q ∈ Q(φ|F S) (4.14) 16 For y = (y 1 y m ) , y ≥ 0 means y i ≥ 0 for all i and y > 0 means y i ≥ 0 for all i and y i > 0 for some i.
where Q(φ|F S) is the set of Q's that jointly satisfy the sign restrictions (4.13), the zero restrictions (4.9), and the sign normalizations (4.3), Proposition B.1 in Giacomini and Kitagawa (2021) shows that, unlike the cases with only zero restrictions, with sign restrictions the identified set of η can be empty.

Multiple Priors in SVARs
Letπ φ be a prior for the reduced-form parameter. We ensure that the prior for φ is consistent with Assumption 1(i) by trimming the support ofπ φ as where {φ ∈ Φ : Q(φ|F S) = ∅} is the set of reduced-form parameters that yield nonempty identified sets for any structural parameters or impulse responses. A joint prior for θ = (φ Q) ∈ Φ × O(n) that has φ-marginal π φ can be expressed as π θ = π Q|φ π φ , where π Q|φ is supported only on Q(φ|F S). Since (A 0 A 1 A p ) and η are functions of θ = (φ Q), π θ induces a unique prior for the structural parameters and the impulse responses. Conversely, a prior for (A 0 A 1 A p ) that incorporates the sign normalizations induces a unique prior for π θ . While the prior for φ is updated by the data, the conditional prior π Q|φ is not updated.
Under point identification, the restrictions pin down a unique Q (i.e., Q(φ|F S) is a singleton), in which case π Q|φ is degenerate and gives a point mass at such Q. Specifying π φ thus suffices to induce a single posterior for the structural parameters and for the impulse responses. In contrast, in the set-identified case, specifying only π φ cannot yield a single posterior and one would also need to specify a prior π Q|φ . This is the standard Bayesian approach adopted by the vast majority of the empirical literature using set-identified SVARs (e.g., Uhlig (2005)), and its potential pitfalls have been discussed by Baumeister and Hamilton (2015). 17 The robust Bayesian procedure in this paper does not require specifying a prior π Q|φ , but considers the class of all priors π Q|φ supported on Q(φ|F S), Π Q|φ = π Q|φ : π Q|φ Q(φ|F S) = 1 π φ -almost surely (4.17) Combining Π Q|φ with the posterior for φ generates a class of posteriors for θ = (φ Q), Π θ|Y = {π θ|Y = π Q|φ π φ|Y : π Q|φ ∈ Π Q|φ } (4.18) and a class of posteriors for the impulse response η, Π η|Y ≡ π η|Y (·) = π Q|Y η(φ Q) ∈ · dπ φ|Y : π Q|Y ∈ Π Q|Y (4.19) 17 Since (φ Q) and (A 0 A 1 A p ) are one-to-one (under the sign normalizations), the difficulty of specifying a prior π Q|φ can be equivalently stated as the difficulty of specifying a prior for the structural parameters that is compatible with π φ .

Set of Posterior Means and Robust Credible Region
Applying Theorem 2 to the impulse response, we obtain the set of posterior means: Section 5 discusses how to compute (φ) and u(φ).
The smallest robust credible region with credibility α for the impulse response can be computed using draws of [ (φ) u(φ)], φ ∼ π φ|Y and applying Proposition 1. It is interpreted as the shortest interval estimate for the impulse response η, such that the posterior probability put on the interval is greater than or equal to α uniformly over the posteriors in the class (4.19).
To validate the frequentist interpretation of the set of posterior means, Appendix B in Giacomini and Kitagawa (2021) provides conditions for convexity, continuity, and differentiability of the identified set map IS η (φ|F S) for the impulse response. By Theorems 2 and 3(ii), convexity and continuity of IS η (φ|F S) as a function of φ allow us to interpret the set of posterior means as a consistent estimator of the true identified set. In addition, by Proposition 2, differentiability of the upper and lower bounds of the impulse-response identified set in φ with nonzero derivatives guarantees that the robust credible region is an asymptotically valid confidence set for the true impulse-response identified set.

NUMERICAL IMPLEMENTATION
We present an algorithm to numerically approximate the set of posterior means, the robust credible region, and the diagnostic tool in Section 2.6.1 for the case of SVARs. The algorithm assumes that the variables are ordered as in Definition 3 and the zero restrictions, if any imposed, satisfy (4.10). Matlab code implementing the procedure can be obtained from the authors' personal websites or upon request. ALGORITHM 1: Let F(φ Q) = 0 and S(φ Q) ≥ 0 be the set of identifying restrictions (one or both of which may be empty), and let η = c ih (φ)q j * be the impulse response of interest.
Step 1. Specifyπ φ , the prior for the reduced-form parameter φ. 18 Estimate a Bayesian reduced-form VAR to obtain the posteriorπ φ|Y .
Step 2.1. Let z 1 ∼ N (0 I n ) be a draw of an n-variate standard normal random variable. Letq 1 = M 1 z 1 be the n × 1 residual vector in the linear projection of z 1 onto an n × f 1 regressor matrix F 1 (φ) . For i = 2 3 n, run the following procedure sequentially: draw z i ∼ N (0 I n ) and computeq i = M i z i , where M i z i is the residual vector in the linear projection of z i onto the n × The vectorsq 1 q n are orthogonal and satisfy the equality restrictions. 18π φ need not be proper, nor satisfy the conditionπ φ ({φ : Q(φ|F S) = ∅}) = 1 (i.e., the prior may assign positive probability to regions of the reduced-form parameter space that yield an empty set of Q's satisfying the restrictions).
Step 2.2. Givenq 1 q n obtained in the previous step, define Q = sign σ 1 q 1 q 1 q 1 sign σ n q n q n q n where · is the Euclidean metric in R n . If (σ i ) q i is zero for some i, set sign((σ i ) q i ) equal to 1 or −1 with equal probability. This step imposes the sign normalization that the diagonal elements of A 0 are nonnegative. Step 2.3. Check whether Q obtained in Step 2.2 satisfies the sign restrictions S(φ Q) ≥ 0. If so, retain this Q and proceed to Step 3. Otherwise, repeat Step 2.1 and Step 2.2 a maximum of L times (e.g., L = 3000) or until Q is obtained satisfying S(φ Q) ≥ 0. If none of the L draws of Q satisfies S(φ Q) ≥ 0, approximate Q(φ|F S) as being empty and return to Step 2 to obtain a new draw of φ.
REMARKS: First, the step of the algorithm drawing orthonormal Q's subject to zero and sign restrictions (Step 2) is common to our approach and the existing standard Bayesian approach of, for example, Arias, Rubio-Ramírez, and Waggoner (2018). In particular, Step 2.1 is similar to Steps 2 and 3 in Algorithm 2 of Arias, Rubio-Ramírez, and Waggoner (2018), but uses a linear projection instead of their QR decomposition and imposes different sign normalizations. 20 Second, Step 3 is a non-convex optimization problem and the convergence of gradientbased optimization methods in these problems is not guaranteed. To mitigate this problem, at each draw of φ one can draw multiple values of Q from Q(φ|F S) to use as starting values in the optimization step, and then take the optimum over the solutions obtained from the different starting values.
Third, if the zero and sign restrictions restrict only a single column of Q, Steps 2.1-2.3 and 3 can be substituted by an analytical computation of the bounds of the identified set at each draw of φ, using the result of Gafarov, Meier, and Montiel-Olea (2018). While they applied the result atφ in a frequentist setting, we apply it at each draw from the posterior of φ.
Step 6 can also be replaced by analytically checking whether the identified set is empty at each draw of φ. 21 The advantage of the analytical approach is that we can assess the emptiness even when the identified set is narrow, and it is computationally faster. The advantage of the numerical approach is that it is applicable even when the restrictions involve multiple columns of Q (i.e., the restrictions are on multiple structural shocks).
Fourth, if there are concerns about the convergence properties of the numerical optimization step due to a large number of variables and/or constraints, but there are restrictions on multiple columns of Q (so the analytical approach cannot be applied), one could use the following algorithm.
ALGORITHM 2: In Algorithm 1, replace Step 3 with the following: Step 3'. Iterate Step 2.1-Step 2.3 K times and let (Q l : l = 1 K ) be the draws that satisfy the sign restrictions. (If none of the draws satisfy the sign restrictions, draw a new φ and iterate Step 2.1-Step 2.3 again.) Let q j * k , k = 1 K , be the j * th column vector A downside of this alternative is that the approximated identified set is smaller than IS η (φ|F S) at every draw of φ. Nonetheless, the estimator of the identified set is still consistent as the number of draws of Q goes to infinity. Comparing the bounds obtained using Algorithms 1 and 2 may also provide a useful check on the convergence properties of the optimization in Step 3.

EMPIRICAL APPLICATION
We illustrate how our method can be used to: (1) perform robust Bayesian inference in SVARs without specifying a prior for the rotation matrix Q; (2) obtain a consistent estimator of the impulse-response identified set; and (3) if a prior for Q is available, disentangle the information introduced by this choice of prior from that solely contained in the identifying restrictions.
The model is based on the four-variable SVAR considered by Granziera, Moon, and Schorfheide (2018), which in turn is based on Aruoba and Schorfheide (2011). The observables are the federal funds rate (i t ), real GDP per capita in log differences ( y t ), inflation as measured by the GDP deflator (π t ), and real money balances (m t ). 22 The data are quarterly from 1965:1 to 2006:1. The model is ⎛ ⎜ ⎝ a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 and the impulse response of interest is the output response to a monetary policy shock, ∂y t+h ∂ y t+j ∂ i t , so j * = 1. The sign normalization restrictions (nonnegative diagonal elements of the matrix on the left-hand side) and the assumption that the covariance matrix of the structural shocks is the identity matrix imply that the output response is with respect to a unit standard deviation positive (contractionary) monetary policy shock.
We consider different combinations of the following zero and sign restrictions: (i) a 12 = 0: the monetary authority does not respond contemporaneously to output.
(ii) IR 0 ( y i) = 0: the instantaneous impulse response of output to a monetary policy shock is zero. (iii) CIR ∞ ( y i) = 0: the long-run impulse response of output to a monetary policy shock is zero. (iv) Sign restrictions: following a contractionary monetary policy shock, the responses of inflation and real money balances are nonpositive on impact and after one quarter ( ∂π t+h ∂ i t ≤ 0 and ∂m t+h ∂ i t ≤ 0 for h = 0 1), and the response of the interest rate is nonnegative on impact and after one quarter ( ∂i t+h ∂ i t ≥ 0 for h = 0 1). We start from a model that does not impose any identifying restrictions (Model 0). We then impose different combinations of the restrictions, summarized in Table I, which all give rise to set identification. Restrictions (i)-(iii) are zero restrictions that constrain the first column of Q, so f 1 = 1 if only one restriction out of (i)-(iii) is imposed (Models II to IV), and f 1 = 2 if two restrictions are imposed (Models V to VII). No zero restrictions are placed on the remaining columns of Q, so for all models f 2 = f 3 = f 4 = 0, and the order of the variables satisfies Definition 3.
All models impose the sign restrictions in (iv), which are those considered in Granziera, Moon, and Schorfheide (2018). This implies that Model I coincides with their model (with a different measure of output).
The prior for the reduced-form parameters,π φ , is the improper Jeffreys' prior, with density proportional to |Σ| − 4+1 2 . This implies a normal-inverse-Wishart posterior from  which it is easy to draw. The results are based on Algorithm 1, considering five starting values as discussed in the remarks in Section 5. 23 We draw φ's until we obtain 1000 realizations of the non-empty identified set and check for convexity of the set using Proposition B.1 in Appendix B in Giacomini and Kitagawa (2021). We also compare our approach to standard Bayesian inference based on a uniform prior for Q. We obtain draws from the single posterior for the impulse response by iterating Steps 2.1-2.3 of Algorithm 1, and retaining only the draws of Q that satisfy the sign restrictions. 24 Table II provides the posterior inference results for the output responses at h = 1 (3 months), h = 10 (2 years and 6 months), and h = 20 (5 years) in each model, for both the robust Bayesian and the standard Bayesian approaches. The table also shows the posterior lower probability that the impulse response is negative, π η|Y * (η < 0), as well as the diagnostic tools from Section 2.4. Figures 1 and 2 report the set of posterior means for the impulse responses (vertical bars) and the smallest robust credible region with credibility 90% (continuous line), for the robust Bayesian approach; for the standard Bayesian approach, they report the posterior mean (dotted line) and the 90% highest posterior density region (dashed line). 25 We can draw several conclusions. First, choosing a uniform prior for the rotation matrix affects posterior inference: in Model I, this prior choice is more informative than the identifying restrictions; in some of the models and for some horizons, standard Bayesian analysis suggests that the output response is negative, whereas the robust Bayesian lower probability of this event is very low, implying that the conclusion of standard Bayesian analysis in these cases is largely driven by the choice of unrevisable prior. See Wolf (2020) for a similar finding.
Second, all 90% robust credible regions contain zero, casting doubts about the informativeness of these common under-identifying restrictions. In particular, sign restrictions alone (Model I) have little identifying power and result in set estimates that are too wide to draw informative inference about the sign of the impulse response. Adding a single zero restriction (Models II to IV) makes the identified set estimates tighter, although the identifying power varies across horizons: the restriction on the contemporaneous response (restriction (ii)) is more informative at short horizons and the long-run restriction (restriction (iii)) is more informative at long horizons. The zero restriction on A 0 (restriction (i)) is informative at both short and long horizons.
Third, imposing additional zero restrictions (Models V to VII) makes the identifying restrictions more informative than the choice of the prior and reduces the gap between the conclusions of standard and robust Bayesian analysis. The robust Bayesian analysis also becomes informative for the sign of the output response. 23 The results are visually indistinguishable when using the analytical approach discussed in the remarks in Section 5. Five initial values appear sufficient to achieve convergence of the numerical algorithm to the true optimum. 24 In Models 0 and I, this is equivalent to Uhlig (2005), as it obtains draws from the uniform prior over the space of rotation matrices satisfying the sign normalizations and sign restrictions (if any). In models with both zero and sign restrictions, this is comparable to Arias, Rubio-Ramírez, and Waggoner (2018), aside from the small differences in the algorithms discussed in Section 5 and the fact that they used a normal-inverse-Wishart prior for the reduced-form parameter. Using the same prior as Arias, Rubio-Ramírez, and Waggoner (2018) gives visually indistinguishable results. 25 These figures do not capture the dependence of the responses across different horizons.   Table I for the definition of models. In each figure, the points are the standard Bayesian posterior means, the vertical bars are the set of posterior means, the dashed curves are the upper and lower bounds of the standard Bayesian highest posterior density regions with credibility 90%, and the solid curves are the upper and lower bounds of the robust credible regions with credibility 90%.

CONCLUSION
We develop a robust Bayesian inference procedure for set-identified models, providing Bayesian inference that is asymptotically equivalent to frequentist inference about the identified set. The main idea is to remove the need to specify a prior that is not revised by the data, but allow for ambiguous beliefs (multiple priors) for the unrevisable component of the prior. We show how to compute an estimator of the identified set and the associated smallest robust credible region that respectively satisfy the properties of consistency and correct frequentist coverage asymptotically.
We conclude by summarizing the recommended uses and advantages of our method. First, by reporting the robust Bayesian output, one can learn what inferential conclusions can be supported solely by the identifying restrictions and the posterior for the reducedform parameter. Even if a user has a credible prior for parameters for which the data are not informative, the robust Bayesian output will help communicate with other users who may have different priors. Second, by comparing the output across different sets of identifying restrictions, one can learn which restrictions are crucial in drawing a given inferential conclusion. Third, the procedure can be a useful tool for separating the information contained in the data from any prior input that is not revised by the data.
The fact that, in applications to macroeconomic policy analysis using SVARs, the set of posterior means and the robust credible region may be too wide to draw informative policy recommendations should not be considered a disadvantage of the method. Wide sets may encourage the researcher to look for additional credible identifying restrictions and/or to refine the set of priors, by inspecting how the data are collected, by considering empirical evidence from other studies, and by turning to economic theory. If additional restrictions are not available, our analysis informs the researcher about the amount of ambiguity that the policy decision will be subject to. As Manski (2013) argued, knowing what we do not know is an important premise for a policy decision without incredible certitude.  (Φ B), that is, IS θ (φ) and IS η (φ) are closed and, for A ∈ A and D ∈ H, PROOF OF LEMMA A.1: Closedness of IS θ (φ) and IS η (φ) is implied directly by Assumptions 1(ii) and 1(iii). To prove measurability of {φ : IS θ (φ) ∩ A = ∅}, Theorem 2.6 in Chapter 1 of Molchanov (2005) is invoked, which states that, given Θ as a Polish space, {φ : IS θ (φ) ∩ A = ∅} ∈ B holds if and only if {φ : θ ∈ IS θ (φ)} ∈ B is true for every θ ∈ Θ. Since IS θ (φ) is an inverse image of the many-to-one mapping, g : Θ → Φ, {φ : θ ∈ IS θ (φ)} is a singleton for each θ ∈ Θ. Any singleton set of φ belongs to B, since Φ is a metric space. Hence, {φ : θ ∈ IS θ (φ)} ∈ B holds.

PROOF OF LEMMA A.2: For the given subset
where the equality follows by the definition of conditional probability. By the construction Thus, the inequality is proven. Q.E.D.
PROOF OF LEMMA A.3: The claim holds trivially when A = ∅ or A = Θ. We hence prove the claim for A different from ∅ or Θ. Fix A ∈ A and let Φ A 1 be as in the proof of Lemma A.2 and where each of Φ A 0 , Φ A 1 , and Φ A 2 belongs to B by Lemma A.1. Note that Φ A 0 , Φ A 1 , and Φ A 2 are mutually disjoint and constitute a partition of g(Θ) ⊂ Φ. Consider a Θ-valued measurable selection ξ A (·) defined on Φ A 2 if non-empty, such that ξ A (φ) ∈ (IS θ (φ) ∩ A c ) holds for π φ -almost every φ ∈ Φ A 2 . Such measurable selection ξ A (φ) can be constructed, for instance, by ξ A (φ) = arg max θ∈IS θ (φ)∩A d(θ A), where d(θ A) = inf θ ∈A θ − θ and A = {θ : d(θ A) ≤ } is the closed -enlargement of A (see Theorem 2.27 in Chapter 1 of Molchanov (2005) for B-measurability of such ξ A (φ)). If Φ A 2 is empty, we do not need to construct ξ A (·) and the construction of the conditional probability distribution π A θ|φ * shown below remains valid.
PROOF OF THEOREM 1: We first show the special case of η = θ. In the expression of the posterior of θ given in equation (2.2), π θ|Y (A) is minimized over the prior class by plugging in the attainable pointwise lower bound of π θ|φ (A). By Lemmas A.2 and A.3, the attainable pointwise lower bound of π θ|φ (A) is given by . The upper probability follows by its conjugacy, π * θ|Y (A) = 1 − π θ|Y * (A c ). For the general case η = h(θ), the expression of the posterior lower probability follows from The expression of the posterior upper probability follows again by the conjugacy property. Q.E.D.

The conclusion follows by setting
Q.E.D.
PROOF OF PROPOSITION 1: Let C r (η c ) denote the closed interval centered at η c with radius r. The event {IS η (φ) ⊂ C r (η c )} happens if and only if {d(η c IS η (φ)) ≤ r}. So, r α (η c ) ≡ inf{r : π φ|Y ({φ : d(η c IS η (φ)) ≤ r}) ≥ α} is the radius of the smallest interval centered at η c that contains random sets IS η (φ) with a posterior probability of at least α. Therefore, finding a minimizer of r α (η c ) in η c is equivalent to searching for the center of the smallest interval that contains IS η (φ) with posterior probability α. The attained minimum of r α (η c ) is its radius. Q.E.D.
PROOF OF THEOREM 3: (i) Let > 0 be arbitrary. Since Assumption 2(i) implies that IS η (·) is compact-valued in an open neighborhood of φ 0 , continuity of the identified set correspondence at φ 0 is equivalent to continuity of IS η (·) at φ 0 in terms of the Hausdorff metric (see, e.g., Proposition 5 in Chapter E of Ok (2007)). This implies that there exists an open neighborhood G of φ 0 such that d H (IS(φ) IS(φ 0 )) < holds for all φ ∈ G. Consider where the equality follows because {φ : d H (IS η (φ) IS η (φ 0 )) > } ∩ G = ∅ by the construction of G. The posterior consistency of φ yields lim T →∞ π φ|Y T (G c ) = 0, P Y ∞ |φ 0 -a.s.
To show (A), note that any weakly converging sequence of stochastic processes in C(S k−1 R) is tight (see, e.g., Lemma 16.2 and Theorem 16.3 in Kallenberg (2001)). Hence, Assumption 4(i) implies that for almost every sampling sequence of Y T , there exists a class of bounded functions F ⊂ C(S k−1 R) such that F contains {ĉ T (·)} for all large T . Furthermore, we can constrain F to equicontinuous functions because the support functions for bounded sets are Lipshitz continuous.
For (A), it suffices to show sup c∈F P X T X T (·) ≤ c(·) − P X X(·) ≤ c(·) → 0 as T → ∞ (A.2) for any weakly converging stochastic processes, X T X, where P X T denotes the probability law of X T . Suppose this claim does not hold. Then, there exists a subsequence T of T , a sequence of functions {c T (·) ∈ F}, and > 0 such that P X T X T (·) ≤ c T (·) − P X X(·) ≤ c T (·) > (A.3) holds for all T . By Assumption 4(iv) and the Arzelà-Ascoli theorem, F is relatively compact. Hence, there exists a subsequence T of T such that c T converges to c * ∈ C(S k−1 R) (in the supremum metric) as T → ∞. By Assumption 4(iii), P X (X(·) ≤ c T (·)) → P X (X(·) ≤ c * (·)) as T → ∞. By the assumption that X T X and the continuous mapping theorem, X T − c T X − c * . Hence, Assumption 4(iii) and the Portmanteau theorem 26 imply that P X T (X T (·) − c T (·) ≤ 0) → P X (X(·) − c * (·) ≤ 0) as T → ∞. We have shown |P X T (X T (·) ≤ c T (·)) − P X (X(·) ≤ c T (·))| → 0 along T , which contradicts (A.3), so the convergence (A.2) holds.
The proof of Proposition 2 uses the next lemma.
LEMMA A.4: Let Lev T and Lev be the α-level sets of J T (·) and J(·), respectively, Lev T = c ∈ R 2 : J T (c) ≥ α Lev = c ∈ R 2 : J(c) ≥ α Define a distance from point c ∈ R 2 to set F ⊂ R 2 by d(c F) ≡ inf c ∈F c − c , where · is the Euclidean distance. Under Assumption 2, (a) d(c Lev T ) → 0 in P Y T |φ -probability as T → ∞ for every c ∈ Lev, and (b) d(c T Lev) → 0 in P Y T |φ -probability as T → ∞ for every {c T : T = 1 2 } sequence of measurable selections of Lev T .
To prove (b), suppose again that the conclusion is false, implying that there exist subsequence T , , δ > 0, and a sequence of (random) measurable selections c T = (c T c u T ) from Lev T such that P Y T |φ 0 (d(c T Lev) > ) > δ for all T . Since d(c T Lev) > implies J(c T + 2 c u T + 2 ) < α, P Y T |φ 0 J c T + 2 c u T + 2 < α > δ (A.5) holds along T . Note, however, that in P Y T |φ 0 -probability, where the strict inequality follows from that J(·) is strictly monotonic and J T (c T ) ≥ α. The convergence in probability in the last line follows from the continuity of J(·) and sup c∈R 2 |J(c) − J T (c)| → 0 for any sequence of distributions J T converging weakly to a distribution with continuous CDF (e.g., Lemma 2.11 in van der Vaart (1998)). This in turn implies P Y T |φ 0 (J(c T + 2 c u T + 2 ) ≥ α) → 1 as T → ∞, which contradicts (A.5). Q.E.D.