Limited Information Bayesian Model Averaging for Dynamic Panels with An Application to a Trade Gravity Model

This paper extends the Bayesian Model Averaging framework to panel data models where the lagged dependent variable as well as endogenous variables appear as regressors. We propose a Limited Information Bayesian Model Averaging (LIBMA) methodology and then test it using simulated data. Simulation results suggest that asymptotically our methodology performs well both in Bayesian model averaging and selection. In particular, LIBMA recovers the data generating process well, with high posterior inclusion probabilities for all the relevant regressors, and parameter estimates very close to their true values. These findings suggest that our methodology is well suited for inference in short dynamic panel data models with endogenous regressors in the context of model uncertainty. We illustrate the use of LIBMA in an application to the estimation of a dynamic gravity model for bilateral trade.

Abstract

This paper extends the Bayesian Model Averaging framework to panel data models where the lagged dependent variable as well as endogenous variables appear as regressors. We propose a Limited Information Bayesian Model Averaging (LIBMA) methodology and then test it using simulated data. Simulation results suggest that asymptotically our methodology performs well both in Bayesian model averaging and selection. In particular, LIBMA recovers the data generating process well, with high posterior inclusion probabilities for all the relevant regressors, and parameter estimates very close to their true values. These findings suggest that our methodology is well suited for inference in short dynamic panel data models with endogenous regressors in the context of model uncertainty. We illustrate the use of LIBMA in an application to the estimation of a dynamic gravity model for bilateral trade.

I. Introduction

Model uncertainty is an issue encountered often in the econometric study of socioeconomic phenomena. Typically, model uncertainty arises because the lack of clear theoretical guidance and trade-offs in the choice of regressors result in a broad number of possible and testable specifications, a phenomenon labeled as “open-endedness” of economic theories by Brock and Durlauf (2001). Since several modes may be plausible given the data, inferences about parameters of interest may be fragile and may even result in contradictory conclusions. Typical attempts to deal with model uncertainty include down-playing its importance (Temple (2000)) and engaging in unsystematic searches of possible model specifications.

A growing number of researchers are turning to the Bayesian methods in order to deal with the problem of model uncertainty. Bayesian model selection attempts to find the data generating process by choosing the most probable model in the universe of models, that is, the model for which the posterior model probability is the largest (see George (1999), and Chipman, George and McCulloch (2001)). The posterior model probability provides a representation of the model uncertainty conditional on the information contained in the observed data, and inference pertaining to a quantity of interest is conditional on the “selected model” model. However, this may underestimate the uncertainty related to that quantity of interest since uncertainty across different models is not accounted for. Bayesian Model Averaging (BMA)—initially proposed by Leamer (1978)—fully incorporates model uncertainty by basing inferences of a quantity of interest on an average of the posterior distributions under each model, using posterior model probabilities as weights. Contributions to the BMA literature include those of Moulton (1991), Madigan and Raftery (1994), Kass and Raftery (1995), Raftery, Madigan and Hoeting (1997), and Fernández, Ley and Steel (2001b). The BMA framework has been applied in various areas of social sciences, including economics.1

Despite the increasing interest in BMA to address model uncertainty, most of the work so far uses static models and cross section analysis with data averaged over the time dimension, effectively ignoring dynamic relationships among variables. Moreover, to the best of our knowledge, none of the models allow for the inclusion of endogenous variables, namely regressors that are correlated with the disturbances.2 In this paper, we first propose a limited information methodology for dealing with model uncertainty in the context of a panel data model with short time periods, where the lagged dependent variable as well as endogenous variables appear as regressors. We label our methodology “Limited Information Bayesian Model Averaging” (LIBMA). Then, we evaluate the performance of LIBMA relative to both Bayesian model selection and Bayesian Model Averaging using Monte Carlo simulations. Finally, we present an application of LIBMA to the estimation of a dynamic panel data gravity model for bilateral trade, which illustrates its usefulness in practice.

Specifically, we propose a method for constructing the model likelihoods and posteriors based only on information elicited from moment conditions, with no specific distributional assumptions. The limited information likelihood we build is a “true” likelihood, derived by taking advantage of the linear structure of the model, as well as the asymptotic properties of the Generalized Method of Moments (GMM) estimator via the central limit theorem. Therefore, for the purpose of Bayesian inference, our limited information likelihood can be used, in principle, with any prior. In this paper, however, we only consider a likelihood dependent, unit information prior (see Kass and Wasserman (1995)) which provides a posterior with a simple Bayesian Information Criterion (BIC)-like form with a good approximation, which also reduces the computational burden in obtaining model posteriors. In addition, we derive the marginal likelihood using standard Bayesian procedures, that is, integrating out the parameters in the likelihood function, which satisfies large sample properties automatically. Our approach differs from earlier work by Kim (2002), Tsangarides (2004), and Hong and Preston (2008) that approximates the model marginal likelihood by quasi likelihood functions whose specifications are justified only through their large sample properties. Finally, our approach is similar in spirit to the work of Schennach (2005) and Ragusa (2008) in the sense that the likelihood is constructed using a Bayesian procedure, but it is simpler in construction.

Section 2 introduces the concept of model uncertainty in the Bayesian context and then reviews model selection and model averaging. Section 3 develops the theoretical framework of the LIBMA methodology in the context of dynamic panels with endogenous regressors. Section 4 discusses the proposed simulation experiment and presents the results. Section 5 presents an application of LIBMA to the estimation of a dynamic gravity model for bilateral trade. The final section concludes.

II. Model Uncertainty in the Bayesian Context

For completeness, we begin by reviewing briefly the basic theory of uncertainty in the Bayesian context. Excellent discussions include Hoeting, Madigan, Raftery and Volinsky (1999), and Chipman, George and McCulloch (2001).

A. Model Selection and Hypothesis Testing

Suppose there is a universe of K possible explanatory variables x1, x2, …, xK. Let Z be the design matrix of all possible explanatory variables. For a given model MU = {1, 2, K}, we consider the standard linear regression model

Y=ZMθM+u(1)

where Y is the variable of interest,θ = (θ1, …, θK)′ is a vector of parameters to be estimated, and u is the error term. We use M to represent that only the part that corresponds to the variables whose index belongs to M is selected.

Given the universe of K possible explanatory variables, a set of K = 2K models M = (M1, …, MK) are under consideration. In the spirit of Bayesian inference, priors p(θ/Mj) for the parameters of each model, and a prior p(Mj) for each model in the model space M can be specified.

Model selection seeks to find the model Mj in M = (M1, …, MK) that actually generated the data. Let D = Y Z denote the data set available to the researcher. The probability that Mj is the correct model, given the data D, is, by Bayes’ rule

p(Mj|D)=p(D|Mj)p(Mj)l=1Kp(D|Ml)p(Ml)(2)

where

p(D|Mj)=p(D|θj,Mj)p(θj|Mj)dθj(3)

is the marginal probability of the data given model Mj.

Based on the posterior probabilities, the comparison of model Mj against Mi is expressed by the posterior odds ratio p(Mj|D)p(Mi|D)=p(D|Mj)p(D|Mi)p(Mj)p(Mi). Essentially, the data updates the prior odds ratio p(Mj)p(Mi) through the Bayes factor p(D|Mj)p(D|Mi) to measure the extent to which the data support Mj over Mi. When the posterior odds ratio is greater (less) than 1 the data favor Mj over Mi (Mi over Mj). Often, the prior odds ratio is set to 1 representing the lack of preference for either model (as also in Fernández, Ley and Steel (2001b)), in which case the posterior odds ratio is equal to the Bayes factor Bji.

B. Bayesian Model Averaging

A natural strategy for model selection is to choose the most probable model Mj, namely the one with the highest posterior probability, p(Mj|D). Alternatively, especially in cases where the posterior mass of the model space M is not concentrated only on one model, Mj, it is possible to consider averaging models using the posterior model probabilities as weights. Raftery, Madigan, and Hoeting (1997) show that BMA tends to perform better than other variable selection methods in terms of predictive performance.

Using Bayesian Model Averaging, inference for a quantity of interest can be constructed based on the posterior distribution

p(Γ|D)=j=1Kp(Γ|D,Mj)p(Mj|D)(4)

which follows by the law of total probability. Therefore, the full posterior distribution of is a weighted average of the posterior distributions under each model (M1;…, MK), where the weights are the posterior model probabilities p(Mj|D). From the linear regression model (1), BMA allows the computation of the inclusion probability for every possible explanatory variable.

p(Zi|D)=j=1KI(Zi|Mj)p(Mj|D)

where

I(Zi|Mj)={1ifZiMj0ifZiMj.

Using (4) one can compute the posterior mean for parameters θl as follows

E(θl|D)=j=1KE(θl|D,Mj)p(Mj|D).

Implementation of BMA presents a number of challenges, including the evaluation of the marginal probability in (3), the large number of possible models, and the specification of prior model probabilities p (Mj); and the parameters, p (θ|Mi).

C. Choice of Priors

Evaluating Bayes factors required for hypothesis testing and Bayesian model selection or model averaging requires calculating the marginal likelihood

p(D|Mj)=p(D|θ,Mj)p(θ|Mj)dθ.

Here, the dimension of the parameter is determined by model Mj. In many cases, the likelihood p (D|θ, Mi) is fully specified with some nuisance parameter ζ. Therefore,

p(D|Mi)=p(D|θ,ζ,Mi)p(θ,ζ|Mi)dθdζ.

In this case, determining the prior p (θ, ζ|Mi) becomes an important issue.3

For Gaussian models the nuisance parameter is the variance σu2 of the noise term. A common selection of the prior for the pair (θ,σu2) is the Normal-Gamma distribution, which has the bene–t of rendering a closed-form posterior.4 With this prior, θ is a Normal random variable with mean θ0 and variance σu2V given σu2, while σu2 is a Gamma random variable with mean γλ and variance γλ2. Due to the sensitivity of the Bayes factors to the prior parameters {θ0, V, γ, λ}choosing specific values for them is often avoided. As discussed in Kass and Wasserman (1995), and Fernández, Ley and Steel (2001a), one possibility is to use a diffuse prior for σu with density p(σu)σu1. This prior has a nice scale invariance property and is equivalent to setting γ = λ = 0 in the Gamma distribution σu2. For the prior distribution of conditioned on σu2, one popular choice is Zellner’s g-prior

p(θ|σu2)~N(0,g1(ZZ)1σu2)

which can be motivated by the fact that the correlation of the OLS estimate θ^ is proportional to (Z˜Z˜)1σu2.

The prior used in this paper is another choice of prior for Bayes factors when data is an i.i.d. sequence of observations, namely, the so-called unit information prior. Suppose we have a parameter estimate θ^ for model Ml. The prior is a kl dimensional multivariate normal distribution with mean θ^ and variance I(θ^)1. Here I(θ^) is the expected Fisher information matrix at θ^ for one observation. It is a kl × kl matrix and its (i, j) entry is defined as

Iij(θ^)=Eθ^[2p(D1|θ,Mt)θiθj].

We denote one observation from D by D1. Intuitively, this prior provides roughly the same amount of information that one observation would give on average.

For the specification of model priors p(Mj), several options exist. Model priors may reflect the researcher’s view about the number of regressors that should be included, with a penalty that increases proportionally with the number of regressors included in the model. Such a prior model probability structure which reflects the researcher’s prior about the size of the model, was initially proposed by Mitchell and Beauchamp (1988) and was used by Sala-i-Martin, Doppelhofer and Miller (2004).5 We choose to specify model priors using an alternative approach used by Fernández, Ley and Steel (2001b). We assume a Uniform distribution over the model space which implies that we have no prior preference for a specific model, so that p(M1)=p(M2)==p(MK)=1K.6

III. Limited Information Bayesian Model Averaging

This section provides a discussion of the LIBMA using a dynamic panel data model with endogenous and exogenous regressors and derives the limited information criterion using the moment conditions implied by the GMM framework.

A. A Dynamic Panel Data Model with Endogenous Regressors

Consider the case where a researcher is faced with model uncertainty when trying to estimate a dynamic model for panel data. Assume that the universe of potential explanatory variables, indexed by the set U, consists of the lagged dependent variable, indexed by 1, a set of m exogenous variables, indexed by X, as well as a set of q endogenous variables, indexed by W, such that {{1}, X, W} is a partition of U.

Therefore, for a given model MjU, (1) becomes

yit=(yi,t1xitwit)Mj(αθxθw)Mj+uituit=ηi+vit||<1;i=1,2,,N;t=1,2,,T(5)

where yit, xit; and wit are observed variables, ηi is the unobserved individual effect, and vit is the idiosyncratic random error. The exact distributions for vit and ηi are not specified here, but assumptions about some of their moments and correlation with the regressors are made explicit below. We assume that E (vit) = 0 and that vit’s are not serially correlated, xit is a 1 × m vector of exogenous variables, and wit is a 1 × q vector of endogenous variables. Therefore, the total number of possible explanatory variables is K = m + q + 1. The observed variables span N individuals and T periods, where T is small relative to N. The unknown parameters, θx, and θw are to be estimated. In this model, is a scalar, θx is a 1 × m vector while θw is a 1 × q vector.

Given the assumptions made so far, for any model Mj and any set of exogenous variables, xit, we have

E(xitlvis)=0,i,t,s;xitlxit.

Similarly, for any endogenous variable we have

E(witlvis){0,st=0,otherwise,witlwit.

Note that, in principle, the correlations between endogenous variables and the idiosyncratic error may change over different individuals and/or periods.

B. Estimation and Moment Conditions

A common approach for estimating the model (5) is to use the system GMM framework (Arellano and Bond (1991), Arellano and Bover (1995), and Blundell and Bond (1998)). This implies constructing the instrument set and moment conditions for the “level equation” (5) and combining them with the moment conditions using the instruments corresponding to the “first difference” equation. The first difference equation corresponding to model (5) is given by

Δyit=(Δyi,t1ΔxitΔwit)Mj(αθxθw)+Δvit|α|<1;i=1,2,,N;t=2,3,,T.

One assumption required for the –rst difference equation is that the initial value of y, yi0, is predetermined, that is, E (yi0vis) = 0 for s = 2; 3, …, T. Since yi,t−2 is not correlated with Δvit we can use it as an instrument. Hence, we have E (yi,t−2 Δvit) = 0 for t = 2; 3, …, T. Moreover, yi,t−3 is not correlated with Δvit. Therefore, as long as we have enough observations, that is T ≥ 3, yi,t−3 can be used as an instrument. Assuming that we have more than two observations in the time dimension, the following moment conditions could be used for estimation

E(yi,tsΔvit)=0,t=2,3,,T;s=2,3,,t; for T2,i=1,2,,N.(6)

Similarly, the exogenous variable (xitl,Δxitl)xit is not correlated with Δvit and therefore we can use it as an instrument. That gives us additional moment conditions

E(xitlΔvit)=0,t=2,3,,T;l=1,2,3,,m;i=1,2,,N.(7)

The endogenous variable wi,t2l,wi,t2lwit, is not correlated with Δvit and therefore it can be used as an instrument. We have the following possible moment conditions

E(wi,tslΔvit)=0,t=3,4,,T;s=2,t1;for T3,l=1,2,,q;i=1,,N.(8)

Table A summarizes the moment conditions that could be used for the first difference equation.

Table A.

Moment Conditions for the First Difference Equation

article image

The first difference equation provides T (T−1/2) moment conditions for the lagged dependent variable, m (T−1) moment conditions for the exogenous variables, and q (T−2) (T−1/2) moment conditions for the endogenous variables.

Returning to the level equation (5), it is easy to see that first differences for the lagged dependent variable are not correlated with either the individual effects or the idiosyncratic error term, and hence, we can use the following moment conditions

E(Δyi,t1uit)=0,t=2,3,,T.(9)

Similarly, for the endogenous variables, the first difference Δwi,t1l is not correlated with uit. Therefore, assuming that wi,0l is observable, and as long as T ≥ 3 we have the following additional moment conditions

E(Δwi,t1luit)=0,t=3,4,,T,l=1,2,,q.(10)

Finally, based on the assumptions made so far, the first difference of the exogenous variables Δxitl,xitlxit are not correlated with current realizations of uit; and so another set of moment conditions can be used

E(Δxitluit)=0,t=2,3,,T,l=1,2,,m.(11)

Table B summarizes the moment conditions for the level equation. There are (T−1) moment conditions for the lagged dependent variable, m (T−1) moment conditions for the exogenous variables, and q (T−2) moment conditions for the endogenous variables.

Table B.

Moment Conditions for the Level Equation

article image

As shown by Ahn and Schmidt (1995), an additional (T − 1) linear moment conditions are available if the vit disturbances are assumed to be homoskedastic through time and Eyi1ui2) = 0. Specifically,

E(yi,tui,tyi,t1ui,t1)=0,t=2,3,,T;i=1,,N.(12)

For the exogenous variables, we aggregate the moment conditions across all periods from both the first difference and the level equations. Thus, we are left with one moment condition for each of the exogenous variables

t=2TE(xitlΔvit)+t=2TE(Δxitluit)=0,l=1,,m;i=1,2,,N.(13)

All the above moment conditions can be succinctly written in matrix form

E[GiUi]=0.(14)

Definitions of matrices Gi and Ui are presented in Appendix A.

Based on the moment conditions (14) we propose a limited information criterion that can be used in Bayesian model selection and averaging. The next section provides details on how to construct this criterion.

C. The Limited Information Criterion

As pointed out in Section 2, evaluating the Bayes factors needed for hypothesis testing and Bayesian model selection or model averaging requires calculating the marginal likelihood

p(D|Mj)=p(D|θ,Mj)p(θ|Mj)dθ.

Since GMM is chosen to estimate the parameters of the model, the assumptions made so far do not give a fully specified parametric likelihood p (D|θ, Mi). Therefore, we have to build the model likelihood in a fashion consistent with the Bayesian paradigm using the information provided by the moment conditions.

The construction of non-parametric likelihood functions has received a lot of attention in the literature lately. Several approaches have been used to derive or estimate non-parametric likelihood functions, including Back and Brown (1993), Kim (2002), Schennach (2005), Hong and Preston (2008), and Ragusa (2008). We propose a method of constructing the model likelihoods and posteriors based only on the information elicited from the moment conditions (14). While our approach is related to Schennach (2005) and Ragusa (2008) in spirit, we are able to obtain the likelihood using a much simpler Bayesian procedure by taking advantage of the linear structure of the model.

Suppose we have a strictly stationary and ergodic random process {ξi}i=1, which takes value in the space Ξ, and a parameter space Θ ⊂ RK. Then, there exists a function g: Ξ × Θ → Rl which satisfies the following conditions

  1. It is continuous on Θ;

  2. E [g (ξi, θ)] exists and is finite for every 2; and θ ∈ Θ; and

  3. E [g (ξi, θ)] is continuous on θ.

We further assume that the moment conditions, E [g (ξi, θ)] = 0, hold for a unique unknown θ0 ∈ Θ. Let g^N(θ)=N1i=1Ng(ξi,θ) denote the sample mean of the moment conditions, and assume that E [g (ξi,θ0)g(ξi,θ0)] and S(θ0)limnVar [N1/2g^N(θ0)] exist and are finite positive definite matrices. Then, the following standard result holds (for a proof see Hall (2005), Lemma 3.2).

Under the above assumptions, N1/2g^N(θ0)dN(0,S(θ0)).

That is, the random vector N1/2g^N(θ0) convergences in distribution to a multivariate Normal distribution.

For model (5), the moment conditions for individual i discussed in the previous section can be written in the following form

g(ξi,θ)=Gi(y˜iz˜iθ),

where ξi={y˜i,z˜i},z˜i=(y˜i1x˜iw˜i),θ=(αθxθw) while Gi is the matrix defined in (14). The vectors y˜i and y˜i,1 for the dependent variable and the lagged dependent variable, respectively, are defined as follows

y˜i=(yi1yi2yiTΔyi2Δyi3ΔyiT)y˜i,1=(yi0yi1yi,T1Δyi1Δyi2Δyi,T1).

The matrix x˜i for the exogenous variables is given by

x˜i=(xi11xi12xi13xi1mxiT1xiT2xiT3xiTmΔxi21Δxi22Δxi23Δxi2mΔxiT1ΔxiT2ΔxiT2ΔxiTm),

while the matrix w˜i for the endogenous variables is defined as

w˜i=(wi11wi12wi13wi1qwiT1wiT2wiT3wiTqΔwi21Δwi22Δwi23Δwi2qΔwiT1ΔwiT2ΔwiT2ΔwiTq).

Therefore, g^N(θ0)=N1i=1NGiy˜iN1i=1NGz˜iθ0.

Since {yi0, xi., zi., ui..} is assumed to be a strictly stationary and ergodic process with finite second moment, E[g(ξi,θ)/θ]=E[z˜iGi] is finite and has full rank by the choice of moment conditions. Therefore, by standard argument (see Hansen 1982), E[g(ξi,θ)] is continuous on. In addition ωig(ξi,θ0), is stationary and independent. It satisfies that E[ωi]=0,E[ωiωi] exists and is finite positive definite. This ensures that limNVar[N1/2g^N(θ0)] exists, is finite, and is positive definite (Hansen 1982). Therefore, Lemma 1 can be applied to our dynamic panel data model.

By Lemma 1, the likelihood for θ can be written as

p(N1i=1NGiy˜i|θ,N1i=1NGiz˜i)exp(12Ng^N(θ)S1(θ)g^N(θ))

and the model likelihood can be expressed as

Θp(N1i=1NGiy˜i|θ)p(θ)dθΘexp(12Ng^N(θ)S1(θ)g^N(θ))p(θ)dθ.

Assuming that the prior p(θ) is second order differentiable around θ^0 and using the Laplace approximation, we obtain that the model likelihood is proportional to

Θp(N1i=1NGiy˜i|θ)p(θ)dθexp(12Ng^N(θ^0)S1(θ^0)g^N(θ^0)+logp(θ^0)+K2log2π12logdet2θ2(12Ng^N(θ^0)S1(θ^0)g^N(θ^0))),

Where θ^0argminθNg^N(θ)S(θ)1g^N(θ) is the GMM estimate of θ0 with weighting matrix S(θ)-1. Noting the fact that 2(g^NS1g^N)/θ2|θ=θ^0 is a K × K matrix of order Op (1) due to the ergodicity assumption, the model likelihood can be approximated by

Θp(N1i=1NGiy˜i|θ)p(θ)dθexp(12Ng^N(θ^0)S1(θ^0)g^N(θ^0)K2logN),(15)

where K is the dimension of vector θ. Alternatively, the above approximation has the order of Op(N-1/2) if the unit information prior for θ is used with 2(12g^NS1g^N)/θ2|θ=θ^0 as its variance-covariance matrix. That is, the prior distribution for, p (θ), is given by N(θ^0,2(12g^NS1g^N)/θ2|θ=θ^0).

The unit information prior is a type of weakly informative prior based on observed data. The precision of θ^0, or its inverse variance, from the GMM estimator is 2(12g^NS1g^N)/θ2|θ=θ^0. This can be considered as the amount of information from the entire observation, and can be taken as 2(12g^NS1g^N)/θ2|θ=θ^0, where θ^0 is used as the mean of the prior (as suggested by Kass and Wasserman (1995)). The Gaussian distribution is the least “informative” distribution (maximum entropy distribution) with given mean and variance. So, the use of unit information prior can be thought of as the prior distribution from a person who has unbiased but weak information about the coefficients.7

For a given model Mj for which θ has kj elements different from zero and with the estimate denoted by θ^0,j, the model likelihood (15) becomes

Θp(N1i=1NGiy˜i|θ,Mj)p(θ)dθexp(12Ng^N(θ^0,j)S1(θ^0,j)g^N(θ^0,j)kj2logN).(16)

Then, the moment conditions (14) associated with model Mj can be written as

E[Gi(y˜i(z˜i)Mj(θ0)Mj)]=0,

where the subscript Mj indicates that only entries corresponding to model Mj are taken. Recognizing that the estimate θ^0 differs from model to model, the sample mean of the moment conditions for model Mj can be written as g^N(θ^0,j)=N1i=1NGi(y˜i(z˜)Mjθ^0,j) It is easy to see that Gi,y˜i,and z˜i are the same across all models. In other words, the moment conditions and the observable data are the same across the universe of models, 8 which allows valid comparisons of posterior probabilities, based on the principle of Bayesian factor analysis. Therefore, by using (16), we can compute the posterior odds ratio of two models M1 and M2 by

p(M1/N1i=1NGiy˜i)p(M2/N1i=1NGiy˜i)=p(M1)p(N1i=1NGiy˜i|M1)p(M2)p(N1i=1NGiy˜i|M2)=p(M1)p(M2)exp(12Ng^N(θ^0,1)S1(θ^0,1)g^N(θ^0,1)+12Ng^N(θ^0,2)S1(θ^0,2)g^N(θ^0,2)(k1k22logN)),(17)

which has the same form of BIC as fully specified models. We use iterative GMM estimation with moment conditions E[Gi(y˜i(z˜i)Mjθ0,j)]=0 to approximate the Bayesian factors above. A consistent estimate of the weighting matrix is used to replace S1(θ^0) in (17). As discussed in Section 2, we assume a unit information prior for the parameters and a Uniform distribution over the model space, essentially implying that there is no preference for a specific model, so p(M1)=p(M2)==p(MK)=1K.

IV. Monte Carlo Simulation and Results

This section describes the Monte Carlo simulations intended to assess the performance of LIBMA. We compute posterior model probabilities, inclusion probabilities for each variable in the universe considered, and parameter statistics. These statistics provide a description of how well our procedure helps the inference process both in a Bayesian model selection and a Bayesian Model Averaging framework.

A. The Data Generating Process

We consider the case where the universe of potential explanatory variables contains 9 variables, namely, 6 exogenous variables, 2 endogenous variables and the lagged dependent variable. Throughout our simulations we keep the number of periods constant at T = 4; and vary the number of individuals, N.

For every individual i and period t, the first four exogenous variables are generated as follows

(xit1xit2xit3xit4)=(0.30.40.80.5)+rtwith rt~N(0,I4) for t=0,1,,T;i=1,,N,

where I4 is the four dimensional identity matrix. We allow for some correlation between the first two and the last two exogenous variables. That is, (xi5xi6)are correlated with (xi1xi2) such that for every individual i and period t, the data generating process is given by

(xit5xit6)=((xit1xit2)(0.30.4))0.1(12)(11)+(1.51.8)+rt,with rt~N(0,I2) for t=0,1,,T;i=1,,N,

where I2 is the two dimensional identity matrix.

Similarly, for the endogenous variables, (wi1wi2), we have the following data generating process

(wit1wit2)=0.71(wi,t11wi,t22)+6.7vit(11)+rt for t=1,2,,T(wi01wi02)=6.7vi0(11)+r0with vit~N(0,σv2) and rt~N(0,I2)for t=0,1,,T.

As the data generating process for the endogenous variables indicates, the overall error term vit is assumed to be distributed normally. We relax the normality assumption later. For t = 0, the dependent variable is generated by

yi0=1(1α)(xi0θx+wi0θw+ηi+vi0)with vi0~N(0,σv2)and ηi~N(0,ση2)

where θx = (0.05 0 0 −0.05 0 0.05)′, θw = (0 0.3.)′, wi,0=(wi01wi02), and xi0=(xi01xi02xi03xi04xi05xi06).

For t = 1, 2, …, T the data generating process is given by

yit=αyi,t1+θxxit+θwwit+ηi+vitwith vitN(0,σv2) and ηi~N(0,ση2).9

Further, we test the robustness of our procedure with respect to the underlying distributions of the error term by relaxing the normality assumption and using discrete distributions instead. The distribution of the random variable vit, is obtained as follows. First, we generate its support, Sv, by taking Nv points from a uniform sampling over the interval [−1, 1]. Then, we draw Nv i.i.d. random variables wk ~ Exponential (1). The probability mass assigned to each point skSv is obtained by setting pk=wkiwi. Finally, we adjust each point in Sv so that vit has zero mean and the variance σv2. It is well known that the probability distribution obtained in this fashion is equivalent to a uniform sampling from a simplex in Nv dimensional space. The construction of the simulated model follows exactly the case of the Normal distribution, with the only difference being the use of the discrete distribution described above in every place where the Normal distribution is used for vit.

B. Simulation Results

This section reports Monte Carlo simulations of our LIBMA methodology in order to assess its performance. We generate 1000 instances of the data generating process with the exogenous variables xit, endogenous variables wit, and parameter values (α θx θw)′ as discussed in the previous section, and present results in the form of medians, means, variances and quartiles.10 We consider sample sizes of N = 200, 500, and 2000, and two values for the coefficient of the lagged dependent variable, α = 0.95 and 0.50. In the first set of simulations, we assume that both the random error term vit and the individual effect ηi are drawn from a Normal distribution, vitN(0,σv2) and ηiN(0,ση2), respectively. We consider the cases where σv2=0.05,0.10, and 0.20 while ση2=0.10. As discussed earlier, we examine the robustness of our results by creating a second set of simulations where the assumption of normality for vit is relaxed.

1. Model Selection

The posterior model probability is a key indicator of performance in the Bayesian framework. Table 1 presents means, variances, and three quartiles (Q1, median, and Q3) for the posterior probability of the true model across the 1000 instances. As expected, the mean posterior probabilities of the true model increase with the sample size. For sample size N = 200, the mean of the posterior probability of the true model ranges from 0.031 to 0.218, depending on the values of the other parameters. As the sample size increases to N = 2000, the mean of the posterior probability of the true model increases, while at the same time, showing less variation across different combinations of parameters, with the values ranging from 0.633 to 0.655. In addition, median posterior model probabilities become slightly higher than the means, ranging from 0.690 to 0.705 for N = 2000, and the distribution of the posterior probabilities of the true model becomes skewed toward 1, as shown by the quartiles in Table 1 and the density plots in Figure 1.11

Equation (2) shows that the posterior model probability depends on the prior model probability. Under the assumption that all models have equal prior probability, the more variables are under consideration the smaller the prior probability for each model. Obviously, this would have an effect on the absolute value of the estimated posterior model probability. Taking this into account, we compute an additional (relative) measure, that is independent of the size of the universe. Table 2 presents the ratio of the posterior model probability of the true model to the highest posterior probability of all the other models (excluding the true model). This ratio would be above 1 if the true model has the highest posterior probability and below 1 if there exists another model with a higher posterior probability than the true model. For sample sizes N = 500 and above, this ratio is above unity for all the cases considered, suggesting that the correct model is on average favored over all the other models. For the smaller sample, N = 200, the ratio decreases from 1.591 and 1.039 to 0.422 and 0.249, respectively, as the variance of the random error term increases from 0.05 to 0.20. As expected, the average ratios increase with the sample size, reaching values above 6.5 for N = 2000.

In Table 3 we examine how often our methodology recovers the true model by reporting how many times, out of 100 instances, the true model has the highest posterior probability. For the smallest sample size, N = 200, the recovery rate varies from 7 percent to 59 percent and decreases as the variance of the random error term increases from 0.05 to 0.20. For N = 500 we see an improvement in the selection of the true model with the success rate ranging from 51 percent to 83 percent. The variation becomes much smaller for N = 2000 with the recovery rate ranging from 91 to 94 percent.

2. Model Averaging

In several cases, researchers may not be interested in recovering the exact data generating process, but rather understand which of the variables under consideration are more likely to belong to the true model. One measure that we report for our experiments is the inclusion probability for each variable considered. The inclusion probability for a given variable is a measure of how much the data favors the inclusion of the variable, and is calculated as the sum of all the posterior probabilities for each model that contains that particular variable. Table 4 presents the posterior inclusion probabilities for all the variables considered along with the true model (column 2, Table 4).12 Given the assumptions made relative to the model priors, the prior probability of inclusion for each variable is the same and equal to 0.50. From Table 4 we see that for samples N ≥ 500, the median value of the inclusion probability for all the relevant explanatory variables is greater than 0.95 in all cases considered. As the sample size increases, the posterior inclusion probabilities approach 1 for all the relevant variables. For the variables not contained in the true model, the median posterior probability of inclusion decreases with the sample size with the upper bound being less than 0.07 for all the cases considered when N = 2000. Importantly, it is interesting to see that even in cases where the recovery rate of the true model is poor (for example, 12 percent for the case in which N = 200, α = 0.95, and σv2=0.20), the probability of inclusion is able to differentiate among the relevant and non-relevant variables.

We turn now to the parameter estimates, and examine how the estimated values compare with the true parameter values. Table 5 presents the median values of the estimated parameters, averaged over 1000 replications, compared to the parameters of the true model.13 As in the case of inclusion probabilities, our methodology is performing well in estimating the parameters, with the performance improving as the sample gets larger. In Figure 3, we present the box plots for the parameter estimates of Table 5, for the case of α = 0.95 and σv = 0.1. As the sample increases, the variance of the distribution decreases and the median converges to the true value. In addition to the fact that the estimates are very close to the true parameter values, the variance over the 1000 replications is also very small across the board with values less than 10−4 in many cases.

Overall, while model selection properties are also desired, the strength of our methodology is given by its performance in the context of Bayesian Model Averaging.

3. Robustness Checks Using non-Gaussian Errors

We relax the normality assumption for the error term vit and check the robustness of our results. Overall, as shown in Tables 6-10, the results are very similar to those presented in Tables 1-5. Tables 6 and 7 (which are analogous to Tables 1 and 2), present posterior model probabilities for the true model, and the ratio of the posterior model probability of the true model to the highest posterior probability of all other models, respectively. In Table 6, we see, again, that the mean posterior probabilities of the true model increase with the sample size, while at the same time, showing less variation across different combinations of parameters. Moreover, as the sample size increases, the median posterior model probabilities become slightly higher than the means, ranging from 0.684 to 0.708 for N = 2000. In addition, the distribution of the posterior probabilities of the true model becomes skewed toward 1, as shown by the quartiles in Table 6 and the density plots in Figure 4. Results in Table 7 are similar to Table 2. For sample sizes N = 500 and above, the ratio of the posterior model probability of the true model to the highest posterior probability of all the other models is above unity for all the cases considered, suggesting that the correct model is on average favored over all the other models. For the smaller sample, N = 200, the ratio decreases from to 1.587 and 1.205 to 0.350 and 0.254, respectively, as the variance of the random error term increases from 0.05 to 0.20. As expected, the average ratios increase with the sample size, reaching values above 6.4 for N = 2000.

Model recovery under non-Gaussian errors is still good. As shown in Table 8, results are very similar to those of Table 3. For the smallest sample size, N = 200, the recovery rate varies from 7 percent to 59 percent and it decreases as the variance of the random error term increases from 0.05 to 0.20. For N = 500 we see an improvement in the selection of the true model with the success rate ranging from 51 percent to 85 percent. The variation becomes much smaller for N = 2000 with the recovery rate ranging from 92 to 93 percent.

Tables 9 and 10 present the posterior inclusion probabilities and parameter estimates using LIBMA and compares them the true model. From Table 9, we see that, for samples N ≥ 500, the median value of the inclusion probability for all the relevant explanatory variables is greater than 0.90 in all cases considered. As the sample size increases, the posterior inclusion probabilities approach 1 for all the relevant variables. For the variables not contained in the true model the median posterior probability of inclusion decreases with the sample size with the upper bound being less than 0.073 for all the cases considered when N = 2000. It is interesting to see that even in cases where the recovery rate of the true model is poor (8 percent for the case in which N = 200 = 0.95, σv2=0.20), the probability of inclusion is able to differentiate among the relevant and non-relevant variables. In Table 10, estimated parameter medians and variances are very close to those reported in Table 5. As in the Gaussian case, our methodology is performing well in estimating the parameters, with the performance improving as the sample gets larger. In Figure 6 of Appendix A we present the box plots for the parameter estimates of Table 10 (for the case of α = 0.95 and σv = 0.1). Again, the variance of the distribution decreases as the sample size increases and the median moves toward the true value.

V. LIBMA Application to a Dynamic Trade Gravity Model

A. Background

Following the work of Rose (2000), there has been considerable interest in investigating the determinants of bilateral trade using gravity models. Augmented versions of trade gravity models have been used to examine the effects of exchange rate regimes, exchange rate volatility, and free trade agreements (FTAs) on bilateral trade.14

While relevant studies almost unanimously find that exchange rate regimes with lower uncertainty and transaction costs— namely, conventional pegs and currency unions— are significantly more pro-trade than flexible regimes, there is considerable uncertainty about the size of this effect, given the potential reverse causality between trade and currency unions. Another point of debate is whether FTAs are trade creating (as they could create trade that would not have existed otherwise from a more efficient producer of the product) or trade diverting (as trade may be diverted away from a more efficient supplier outside the FTA towards a less efficient supplier within the FTA), with several studies finding different results depending on the set of regressors used.15 Finally, the majority of gravity models are static, ignoring potential trade persistence effects which may arise from trade inertia due to sunk costs associated with distribution and service networks, or habit formation due to consumers getting accustomed to foreign products.16

These observations suggest that the investigation of bilateral trade determinants is complicated by the fact that several determinants are potentially endogenous, and the considerable uncertainty about the set of explanatory variables that should be included in the gravity model specification. Model uncertainty combined with the use of a dynamic specification with several endogenous regressors suggest that our LIBMA is well suited for the estimation of a dynamic trade gravity model.

B. Model and Estimation

In its simple form, the static gravity trade equation can be expressed as

Xij=YiYj(TijPiPj)1σ(18)

where Xij represents average trade from country i to j; Yi and Yj are total domestic outputs in country i and j, respectively; Pi and Pj are the overall price indices in country i and j, respectively; Tij are iceberg trading costs; and σ is the elasticity of substitution between products (σ > 1). Traditionally, Tij in equation (18) includes transportation costs that are proxied by geographical attributes (such as bilateral distance, access to sea, and contiguity). In recent years, other factors that may affect trade costs, for example, common language, historical ties, FTAs, tariffs, and non-tariff barriers have also been included. To the extent that exchange rate policy choices influence currency conversion costs, exchange rate volatility as well as uncertainty, trading costs would also depend on the exchange rate regime in place, making its inclusion in Tij appropriate.

Qureshi and Tsangarides (2010) examine the trade effects of exchange rate regimes by augmenting the traditional gravity equation and including variables for currency unions, conventional pegs, and exchange rate links created with trading partners as a consequence of pegging with an anchor currency. We extend their specification to include dynamics as well as separate trade creation and diversion effects for FTAs as follows

log(Xijt)=β0+αlog(Xijt1)+k=1KβkZijtm+γ1CUijt+γ2DirPegijt+γ3IndirPegijt++γ4VolijtSR+γ5VolijtLR+m=1MδmcrFTAijtm+m=1MδmdvdFTAitm+λt+uijt,(19)

where Z is a vector consisting of traditional time varying and invariant trade determinants, CUijt is binary variable that is 1 if i and j share the same currency; DirPegijt is a binary variable that is 1 if i’s exchange rate is pegged to j, or vice versa (but i and j are not members of the same currency union); IndirPegijt is a binary variable that takes the value of 1 if i is indirectly related to j through its peg with an anchor country; VolijtSR and VolijtLR refer to real exchange rate volatility defined over either short-run or long run horizons; FTAijt is a vector consisting of FTA dummies taking the value of 1 if i and j are members of the same FTA in a given year; FTAit is a vector consisting of free trade area dummies taking the value of 1 if i is a member of an FTA in a given year; λt are the year-specific effects indicating common shocks across countries; and uij is the error term, uij ~ N(0, ς2).

The dataset used is taken from Qureshi and Tsangarides (2010) extended to include individual entries for the free trade agreement variables from the Regional Trade Agreements database of the World Trade Organization.17 Switching from the yearly time series to panel estimation is made possible by dividing the total period into eight-year time intervals. We potentially have a total of six panels (1960-1967, 1968-1975, 1976-1983, 1984-1991, 1992-1999, and 2000-2007), but given the inclusion of the lagged dependent variable the maximum possible number of periods is 5. Using determinants defined in (19) we identify 42 proxies and consider time effects corresponding to the span on which the data was averaged. The dataset covers 159 countries over the period 1960-2007, yielding 9,628 individual country pairs (rather than 159 x 158/2 = 12, 561 due to missing observations). Our baseline estimation covers 9, 628 country pairs with 22, 875 observations over the period 1960-2007 with and average of 3 (out of maximum 5) observations per country pair.

C. Results

We estimate (19) using LIBMA and compare results obtained when the same specification is estimated with (i) OLS which does not account for either model uncertainty or endogeneity; (ii) System-GMM (SGMM) which accounts for endogeneity but not model uncertainty; and (iii) Bayesian Model Averaging for linear regression models routine which accounts for model uncertainty but not endogeneity.18 In Table 11, we present sequentially the results of estimating the dynamic gravity model using OLS, SGMM, BMA, and LIBMA. Estimated means and standard errors are presented for OLS and SGMM, and statistical significance is indicated as usual, at the 1, 5, and 10 percent levels of significance. For BMA and LIBMA, we present posterior inclusion probabilities, p(Zi|D), posterior means, and standard deviations, and identify robust those variables for which the posterior inclusion probability is above the prior (that is, p(Zi|D) ≥ 0.50).19

Once both model uncertainty and endogeneity are properly accounted for with the use of LIBMA, results from the estimation of a dynamic trade gravity model are very different compared to those obtained using OLS, SGMM, and BMA. LIBMA identifies 25 variables out of 42 as robust, including lagged trade, several exchange rate regime variables (currency union, indirect peg, and long-run exchange rate volatility) as well as some trade creating and trade diverting FTAs. In addition, unlike any of the other three estimation methods, LIBMA also finds that the sub-Saharan dummy variable— a proxy for heterogeneity in the sample— is a robust trade determinant.

We compare each estimation method in sequence with LIBMA and interpret differences in the identified sets of determinants and estimated means as resulting from not properly accounting for model uncertainty, endogeneity, or both when OLS, SGMM, and BMA are used. First, estimation with OLS finds 33 out of the 42 variables considered to be statistically significant at least at the 10 percent level of significance (with 24 of those significant at the 1 percent level). Two potentially endogenous variables— the indirect peg and long-run volatility— are incorrectly estimated by OLS (with the former not found to be statistically significant when LIBMA labels it as robust, and the latter found to be statistically significant when LIBMA does not find it robust). In addition, the large number of the remaining variables identified as statistically significant in OLS— including several FTAs which are estimated to have both a trade creating effect as well as a trade diverting effect— disappears once model uncertainty and endogeneity are accounted for with LIBMA. Finally, estimated means are imprecisely estimated for several variables that are both identified as robust by LIBMA and found to be statistically significant by OLS, such as several of the FTAs, the currency union, and short-run volatility.

Next, we turn to SGMM. With potential endogeneity accounted for, SGMM identifies, overall, less variables as statistically significant compared to OLS. However, incorporating endogeneity alone in SGMM does not properly identify the robust determinants: comparing SGMM with LIBMA, out of the exchange rate regime and trade variables, only one (lagged trade) is found to be statistically significant in SGMM and robust in LIBMA. Similarly, several FTA variables identified as statistically significant trade determinants by SGMM receive very low inclusion probabilities once model uncertainty is properly accounted for in LIBMA. Finally, differences between BMA and LIBMA suggest that accounting for model uncertainty alone does not properly identify the trade determinants, as only about half of the LIBMA identified robust variables are also identified as robust by BMA. The application of BMA— which augments the OLS estimation by incorporating model uncertainty— identifies a subset of the OLS statistically significant variables as robust. In comparison with LIBMA, BMA fails to identify two potentially endogenous variables (indirect peg and short-run volatility) as robust, while several of the FTAs identified as robust by BMA are no longer robust when endogeneity is incorporated in LIBMA.

In summary, we find important differences in the identified bilateral trade determinants using LIBMA, compared with those identified using OLS, SGMM or BMA. We attribute these differences to the fact that LIBMA— unlike the other estimation methods presented— incorporates both model uncertainty and endogeneity. The results of our application underscore the importance of accounting for both endogeneity and model uncertainty in the estimation of the dynamic gravity model for trade.

VI. Conclusion

This paper proposes a limited information methodology in the context of Bayesian Model Averaging— labeled LIBMA— for panel data models where the lagged dependent variable appears as a regressor and endogenous variables appear as regressors. The LIBMA methodology incorporates a GMM estimator for dynamic panel data models in a Bayesian Model Averaging framework to explicitly account for model uncertainty.

Our methodology contributes to the existing literature in three important ways. First, while standard BMA is a full information technique where a complete stochastic specification is assumed, LIBMA is a limited information technique based on moment restrictions rather than a complete stochastic specification. Second, LIBMA explicitly controls for endogeneity. The likelihood and exact expressions of the marginal likelihood used in the fully Bayesian analyses are replaced by the limited information construct modeled on the GMM estimation, and a limited information criterion as an approximation to the actual marginal likelihoods, respectively. Third, we use this methodology in a panel setting thus expanding its usability to a wide range of applications.

Based on simulation results, we conclude that asymptotically LIBMA performs very well and it can be used to address the issue of model uncertainty in dynamic panel data models with endogenous regressors. The application of our LIBMA methodology to the estimation of a dynamic gravity model for trade illustrates its applicability to the case where model uncertainty and endogeneity is present in short panels. Future research could explore the possibility of using the LIBMA methodology for applications where the sample size is constrained by data availability.

References

  • Ahn, S.C., and P. Schmidt, 1995, Efficient estimation of models for dynamic panel data. Journal of Econometrics 68, 528.

  • Anderson, J. E., 1979, A theoretical foundation for the gravity equation. American Economic Review 69, 106116.

  • Andrews, D.W.K., and B. Lu, 2001, Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models. Journal of Econometrics 101, 12364.

    • Search Google Scholar
    • Export Citation
  • Arellano, M., and S.R. Bond, 1991, Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58, 277-297.

    • Search Google Scholar
    • Export Citation
  • Arellano, M., and O. Bover, 1995, Another look at the instrumental-variable estimation of error components models. Journal of Econometrics 68, 2952.

    • Search Google Scholar
    • Export Citation
  • Back, K., and D.P. Brown, 1993, Implied probabilities in GMM estimators. Econometrica 61, 971975.

  • Baxter, M., and M.A. Kouparitsas, 2006, What determines bilateral trade flows? NBER Working Papers 12188, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Bergstand, J., 1985, The gravity equation in international trade: some microeconomic foundations and empirical evidence. Review of Economics and Statistics 67, 474481.

    • Search Google Scholar
    • Export Citation
  • Blundell, R., and S. Bond, 1998, Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 114143.

    • Search Google Scholar
    • Export Citation
  • Brock, W., and S. Durlauf, 2001, Growth empirics and reality. World Bank Economic Review 15, 229272.

  • Campbell, D., 2010, History, culture and trade: a dynamic gravity approach, Working Paper University of California at Davis.

  • Chernozhukov, V., and H. Hong, 2003, An MCMC approach to classical estimation. Journal of Econometrics 115, 293346.

  • Chen, H., A. Mirestean, and C. Tsangarides, 2009, Limited Information Bayesian Model Averaging for Dynamic Panels with Short Time Periods. IMF Working Paper No. 09/74. Washington: International Monetary Fund.

    • Search Google Scholar
    • Export Citation
  • Chipman, H., E.I. George, and R.E. McCulloch, 2001, The practical implementation of Bayesian model selection (with discussion). In Model Selection IMS Lecture Notes, Vol. 38, Lahiri P. (ed), 70134.

    • Search Google Scholar
    • Export Citation
  • Cuñat A., and M. Maffezzoli, 2007, Can comparative advantage explain the growth of US trade? Economic Journal 117, 583602

  • Durlauf, S., and D. Quah, 1999, The new empirics of economic growth. In Handbook of Macroeconomics Vol. IA, Taylor J.B., Woodford M. (eds). North Holland.

    • Search Google Scholar
    • Export Citation
  • Eicher, T., C. Henn, and C. Papageorgiou, 2011, Trade creation and diversion revisited: accounting for model uncertainty and natural trading partner effects. Journal of Applied Econometrics, forthcoming.

    • Search Google Scholar
    • Export Citation
  • Eicher, T., C. Papageorgiou, and A. Raftery, 2010, Default Priors and Predictive Performance in Bayesian Model Averaging, with Application to Growth Determinants. Journal of Applied Econometrics 26, 3055.

    • Search Google Scholar
    • Export Citation
  • Fernàndez C., E. Ley, and M. Steel, 2001a, Model uncertainty in cross-country growth regressions. Journal of Applied Econometrics 16, 56376.

    • Search Google Scholar
    • Export Citation
  • Fernàndez C., E. Ley, and M.F.J. Steel, 2001b, Benchmark priors for Bayesian Model Averaging. Journal of Econometrics 100, 381427.

  • Frankel, J., and A.K. Rose, 2002, An estimate of the effect of common currencies on trade and income. Quarterly Journal of Economics 117, 43766.

    • Search Google Scholar
    • Export Citation
  • Ghosh S, and S. Yamarik, 2004, Are preferential trade agreements trade creating? An application of extreme bounds analysis. Journal of International Economics 63, 369395.

    • Search Google Scholar
    • Export Citation
  • Hall, A.R., 2005, Generalized Method of Moments. New York: Oxford University Press.

  • Hoeting, J.A., D. Madigan, A.E. Raftery, and C.T. Volinsky, 1999, Bayesian Model Averaging: a tutorial. Statistical Science 14, 382417.

    • Search Google Scholar
    • Export Citation
  • Hong, H., and B. Preston, 2008, Bayesian averaging, prediction and nonnested model selection. NBER Working Papers 14284, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Jacobson T., and S. Karlsson, 2004, Finding good predictors for inflation: a Bayesian Model Averaging approach. Journal of Forecasting 23, 476496.

    • Search Google Scholar
    • Export Citation
  • Kass, R., and A. Raftery, 1995, Bayes factors. Journal of the American Statistical Association 90, 77395.

  • Kass, R., and L. Wasserman, 1995, A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90, 92834.

    • Search Google Scholar
    • Export Citation
  • Kim, J. Y., 2002, Limited information likelihood and Bayesian analysis. Journal of Econometrics 107, 17593.

  • Koop, G., and L. Tole, 2004, Measuring the health effects of air pollution: to what extent can we really say that people are dying from bad air? Journal of Environmental Economics and Management 47, 3054.

    • Search Google Scholar
    • Export Citation
  • Klein, M., and J.C. Shambaugh, 2006, Fixed exchange rates and trade, Journal of International Economics 70, 359383.

  • Leamer, E., 1978, Specification searches: ad hoc inference with non-experimental Data. New York: Wiley.

  • Leamer, E., 1983, Let’s take the con out of econometrics. American Economic Review 73, 3143.

  • Ley E., and M. Steel, 2009, On the Effect of prior assumptions in Bayesian Model Averaging with Applications to Growth Regressions. Journal of Applied Econometrics, 24, 65174.

    • Search Google Scholar
    • Export Citation
  • Madigan, D.M. and Raftery, A.E., 1994, Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89, 13351346.

    • Search Google Scholar
    • Export Citation
  • Malik A., and J. Temple, 2009, The Geography of Output Volatility, Journal of Development Economics, 90, 163178.

  • Mitchell, T.J., and J.J. Beauchamp, 1988, Bayesian variable selection in linear regression. Journal of the American Statistical Association 83, 10231032.

    • Search Google Scholar
    • Export Citation
  • Moral-Benito, E., 2011, Determinants of economic growth: a Bayesian panel data approach. The Review of Economics and Statistics, forthcoming.

    • Search Google Scholar
    • Export Citation
  • Morales, K.H., J.G. Ibrahim, C. Chen, and L.M. Ryan, 2006, Bayesian Model Averaging with applications to benchmark dose estimation for arsenic in drinking water. Journal of the American Statistical Association 101, 917.

    • Search Google Scholar
    • Export Citation
  • Moulton, B.R., 1991, A Bayesian approach to regression selection and estimation with application to a price index for radio services. Journal of Econometrics 49, 169193.

    • Search Google Scholar
    • Export Citation
  • Olivero, M., and Y. Yotov, 2011, Dynamic gravity: theory and empirical implications. Canadian Journal of Economics, forthcoming.

  • Qureshi, M., and C. Tsangarides, 2010, The empirics of exchange rate regimes and trade: words vs. deeds, IMF Working Paper No. 10/48, Washington: International Monetary Fund.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., 1995, Bayesian model selection in social research. Sociological Methodology 25, 111163.

  • Raftery, A. E., 1996, Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika 83, 25166.

    • Search Google Scholar
    • Export Citation
  • Raftery, A.E., D. Madigan, and J. A. Hoeting, 1997, Bayesian Model Averaging for linear regression models. Journal of the American Statistical Association 92, 179191.

    • Search Google Scholar
    • Export Citation
  • Ragusa, G., 2008, Bayesian likelihoods for moment condition models. Mimeo.

  • Rose, A.K., 2000, One money one market: estimating the effect of common currencies on trade, Economic Policy 15, 746.

  • Sala-i-Martin, X., Doppelhofer, G., R., and R. Miller, 2004, Determinants of long-term growth: a Bayesian Averaging of Classical Estimates (BACE) approach. American Economic Review 94, 813835.

    • Search Google Scholar
    • Export Citation
  • Santos Silva, J.M.C., and S. Tenreyro, 2006, The Log of Gravity, Review of Economics and Statistics 88, 64158.

  • Schennach, S.C., 2005, Bayesian exponentially tilted empirical likelihood. Biometrika 92, 3146.

  • Temple, J., 2000, Growth Regressions and What the Textbooks Don’t Tell You, Bulletin of Economic Research, 52, 181205.

  • Tsangarides, C., 2004, A Bayesian approach to model uncertainty, IMF Working Paper No. 04/68. Washington: International Monetary Fund.

  • Wintle, B.A., M.A. McCarthy, C.T. Volinsky, and R.P. Kavanagh, 2003, The use of Bayesian Model Averaging to better represent uncertainty in ecological models. Conservation Biology 17, 15791590.

    • Search Google Scholar
    • Export Citation
  • Yeung, K.Y., R.E. Bumgarner, and A.E. Raftery, 2005, Bayesian Model Averaging: development of an improved multi-class, gene selection and classification tool for micro array data. Bioinformatics 21, 23942402.

    • Search Google Scholar
    • Export Citation

Appendices

Appendix I: Representation of Moment Equations for Dynamic Panel Model

We group the moment conditions into matrices the following way. Let Yi be the (T − 1) ×T (T − 1)/2 matrix of lagged dependent variable used as instruments (6) for the FD equation

Yi=(yi00000000yi0yi10000000yi0yi1y1200000000000000yi0yi,Ti2).

Similarly, Wi denotes the (T − 1) ×q (T − 2) (T − 1)/2 matrix of endogenous variables to represent the instruments in (8)

Wi=(000000wi1100wi1q000wi11wi210000000000000wi,T3qwi,T2q).

For the level equation we have the T × (T−1) instruments matrix DYi consisting of first differences of the dependent variable and the T × q (T−2) instruments matrix DWi consisting of first differences of the endogenous variables to represent instruments in (9) and (10) respectively.

DYi=(000Δyi1000Δyi2000Δyi,T1),DWi=(000000Δwi21Δwi2q000000Δwi,T1q).

Further let Xi and DXi denote the following (T − 1) × m and T × m matrices of exogenous and first differenced exogenous variables from moment conditions (7) and (11), respectively

DXi=(0000Δxi21Δxi22Δxi23Δxi2mΔxi31Δxi32Δxi33Δxi3mΔxiT1ΔxiT2ΔxiT3ΔxiTm)Xi=(xi21xi22xi23xi2mxi31xi32xi33xi3mxi41x142xi43xi4mxiT1xiT2xiT3xiTm).

For instruments from homoskedasticity (12) let Yi be the T × (T − 1) instrument matrix used for the moment conditions derived from the homoskedasticity restriction:

Yi=(yi1000000yi2yi2000000yi3yi3000000yi40000000000yi,T).

For instruments from exogenous variables (13) Let ui and Dvi denote the T × 1 and (T − 1) × 1 matrices of the error term and the first differenced idiosyncratic random error, respectively, as defined in model (5).

ui=(ui1ui2uiT)Dvi=(Δvi2Δvi3ΔviT).

Finally we de–ne matrices Ui and Gi to summarize the moment conditions discussed so far. Ui is a (2T − 1) × 1 matrix defined as

Ui=(uiDvi).

Gi is a (2T − 1) × (T + m − 2 + (T + 1) ((T − 2) q + T)/2) matrix defined as

Gi=(DXiDYi0DWi0YiXi0Yi0Wi0).
Table 1:

Posterior probability of the true model LIBMA summary statistics for various N, α, and συ2

article image
Notes:1. ηi~N(0,ση2) with ση2=0.10.2. υit~N(0,συ2).
Table 2:

Posterior probability ratio of true model versus best among the rest LIBMA summary statistics for various N, α, and συ2

article image
Notes:1. See Notes in Table 1.
Table 3:

Probability of retrieving the true model LIBMA summary statistics for various N, α, and συ2

article image
Notes:1. See Notes in Table 1.
Table 4:

Model recovery: medians and variances of posterior inclusion probability for each variable True model vs BMA posterior inclusion probability for various N, α and συ2

article image
Notes: See Notes in Table 1.
Table 5:

Model recovery: medians and variances of estimated parameter values True model vs BMA coefficients’ estimated values for various N, α, and συ2

article image
Notes:See Notes in Table 1.
Table 6:

Posterior probability of the true model LIBMA summary statistics for various N, α, and συ2

article image
Notes:1. The error terms are constructed using discrete distributions.