## I. Introduction

The study of socioeconomic phenomena may be plagued by inconsistent empirical estimates and model uncertainty. The case of inconsistent empirical estimates typically arises with omitted country-specific effects that, if not uncorrelated with other regressors, lead to a misspecification of the underlying dynamic structure, or with endogenous variables that may be incorrectly treated as exogenous. A panel data estimator that simultaneously addresses the issues of endogeneity and omitted variable bias is the systems Generalized Method of Moments Estimator (GMM) proposed by Hansen (1982). GMM estimators hold the potential for both consistency and efficiency gains by exploiting additional moment restrictions. The systems GMM involves the estimation of two equations, one in levels and the other in differences. The estimates from the difference equation, constructed by taking first differences of the levels equation eliminates the country-specific effect. For both equations, potentially endogenous explanatory variables are instrumented with their own lagged values, a factor that deals with the issue of endogeneity. Estimating the equations as a system, the procedure constrains similar coefficients to be constant across equations.^{2}

The case of model uncertainty arises because the lack of clear theoretical guidance on the choice of regressors results in a wide set of possible specifications and, often, contradictory conclusions. Remedially, the analyst has three options: (i) arbitrarily select one model as the true model generating the data; (ii) present the results based on all plausible models without selecting between different specifications; and (iii) explicitly account for model uncertainty. While preferable, option (iii) presents enormous challenges at the level of both concept and statistical theory. Option (ii), although unsystematic, is preferable over option (i), but poses substantial logistical challenges. In practice, researchers tend to focus on one “channel” and choose option (i), ignoring model uncertainty altogether and risking overconfident inferences.^{3} In theory, accounting for model uncertainty requires some version of a “robustness check,” essentially an attempt to account for all possible combinations of predictors. A conceptually attractive solution to the problem of model uncertainty is provided by Bayesian Model Averaging (BMA) although difficulties at the implementation stage sometimes render it impractical.^{4} In particular, with a large number of regressors, *k*^{*}, the procedure may be infeasible due to the large number of models to be estimated,

Taking into consideration the concerns over model uncertainty, this essay develops the theory of a new Limited Information Bayesian Model Averaging estimator (LIBMA). The proposed estimator incorporates a dynamic panel estimator in the context of GMM, and a Bayesian robustness check to explicitly account for model uncertainty in evaluating the results of a universe of models generated by a set of possible regressors. The LIBMA approach provides certain advantages over the existing literature by relaxing the otherwise restrictive underlying assumptions in two ways. First, while standard Bayesian Model Averaging is a full information technique where a complete stochastic specification is assumed, LIBMA is a limited information approach that relies on GMM, a limited information technique based on moment restrictions rather than a complete stochastic specification. Second, while previous literature implicitly assumes exogenous regressors, LIBMA can control for endogeneity through the use of GMM.

The remainder of the paper is organized as follows. Section II introduces some preliminary ideas about the GMM. Section III constructs the GMM estimator in the Bayesian framework and the limited information likelihood. Section IV discusses the concepts of hypothesis testing and model selection in the Bayesian framework, presents the Limited Information Bayesian Information Criterion used in the context of GMM, and completes the derivation of the LIBMA. Section V presents all the calculated quantities and summary statistics on which the robustness analysis is based. The final section concludes.

## II. Preliminaries on GMM

The GMM was developed by Hansen (1982) and White (1982) as an extension to the classical method of moments estimator. The basic idea of the GMM is to choose parameters of the model so as to match the moments of the model to those of the data as closely as possible. A weighing matrix determines the relative importance of matching each moment. Most common estimation procedures are contained in the GMM framework, including ordinary least squares, instrumental variables estimators, and in some cases, maximum likelihood estimators.

A key advantage to GMM over other estimation procedures is that there is no need to specify a likelihood function. The method of moments (and by extension, GMM) does not require the complete specification of distributions. Given that economic models do not specify joint distributions of economic variables, the method of moments (as well as other limited information inference methods) becomes very appealing in empirical studies. Of course, nothing comes for free. The cost is a loss of efficiency over methods such as Maximum Likelihood (MLE). The MLE can be viewed as a limiting case of GMM where under MLE the distribution of errors is specified (so in a sense all of the moments are incorporated). The trouble with MLE is often that the errors may not follow a known distribution (such as the normal which is almost the universal standard in MLE).^{5} Thus, GMM offers a compromise between the efficiency of MLE and robustness to deviations from normality (or other distributional forms).

This section follows the presentation in Kim (2000) and (2002) to introduce the GMM concepts. Let *x*_{t} be an *n* × 1 vector of stochastic processes defined on a probability space *T*–segment of a particular realization of {*x*_{t}}. Let *θ* be a *q* × 1 vector of parameters from ^{6} In this paper Θ is a “grand” parameter space on which all the likelihoods, priors and posteriors under consideration are defined.

Let *h*(*x*_{t}, *θ*) be an *r* × 1 vector valued function, *h*(*x*_{t}, *θ*) characterizes an econometric relation *h*(*x*_{t},θ_{0}) = *w*_{t} for a *θ*_{0} ∈ Θ, where *w*_{t} is an *r*-vector stochastic disturbance process satisfying the standard conditions in GMM of Hansen (1982).

**Assumption (A1)**

{*w*_{t}, −∞ < *t* < ∞} is stationary and ergodic.

**Assumption (A2)**

(a) *E*_{P}[*w*_{t}*w*_{t}′] exists and is finite, and

(b) *E*_{P}[*w*_{t+s}|*w*_{t}, *w*_{t−1}, … converges in mean square to zero.

Assumptions (A1) and (A2) imply a broad class of models as shown in Hansen (1982). Using iterated expectations, Assumption (A2) implies the *r* × 1 moment conditions

**Assumption (A3)**^{7}

(a) *h*(*x*,.) is continuously differentiable in Θ for each

(b) *h*(., *θ*) and *∂h*(., *θ*)/*∂θ* are Borel measurable for each θ ∈ Θ.

Let g_{T}(X_{T}, θ) be the sample average of *h*(*x*_{t}, θ) where

**Definition 1***The GMM estimator**for some ω* ∈ Ω *is the value of θ that minimizes the objective function*

*where**is a sequence of* (*r* × *r*) *positive definite weighting matrices which are functions of the data* x_{T}.

Assuming an interior optimum, the GMM estimate

Let *R*_{w}(*s*) = *E*_{p}[*w*_{s+1}*w*_{1}′]. Using Assumptions (A1) and (A2), it is ensured that *S* above is sometimes interpreted as a long-run variance of *w*_{t} = *h*(*x*_{t},*θ*_{0}) and can be alternatively written as

Conditions (1) and (4) form conditions on the first and second moments of *w*_{t} = *h*(*x*_{t}, *θ*_{0}) implied by the probability measure *P*. The matrix *S* is the asymptotic variance of

In order to see how the weighing matrix in (2) works, consider first the situation where there are as many moment conditions as parameters (referred to as the “just-identified” case). The moments will all be perfectly matched and the objective function in (2) will have a value of zero. In the “over-identified” case where there are more moment conditions than parameters, not all of the moment restrictions will be satisfied, so the weighting matrix

Hansen (1982) points out that setting *S* is also known as the spectral density matrix evaluated at frequency zero.) There are many approaches for estimating (consistent estimators of) *S* which can account for various forms of heteroskedasticity and/or serial correlation, including White (1980), the Bartlett kernel used by Newey and West (1987), the truncated kernel of Hansen (1982), and the automatic bandwidth selection from Andrews and Monahan (1992).

Let *S* based on a sample of size *T*. An optimal GMM estimator is obtained with

where *S* is approximated by

## III. GMM in the Bayesian Framework

In contrast to the classical approach, Bayesian estimation requires the specification of likelihood functions or the data generating mechanism. Because of this reason, one may conclude that the Bayesian method cannot be applied to the moment problem. However, recent developments in the Bayesian and classical econometrics have made it possible to consider a likelihood interpretation of some non-likelihood problems.^{8} Innovative work in this area was done by Zellner (1996 and 1997) who developed a finite sample Bayesian Method of Moments (BMOM) based on the principle of maximum entropy.^{9} One of the distinguishing features of the BMOM approach is that it yields post-data densities for models’ parameters without use of an assumed likelihood function. Inoue (2001) proposes a semi-parametric Bayesian method of moments approach (which differs from the maximum entropy approach) that enables direct Bayesian inference in the method of moments framework. It turns out that the posterior distribution of strongly identified parameters is asymptotically normal even in the presence of weakly identified parameters. Finally, Kim (2000) and (2002) develops a limited information procedure in the Bayesian framework that does not require the knowledge of the likelihood function. His procedure is the Bayesian counterpart of the classical GMM but has certain advantages over the classical GMM for practical applications, and it is the approach we closely follow in this essay.

### A. The Bayes Estimator and GMM

We now begin the construction of the GMM in the Bayesian framework. In the classical framework, GMM is a limited information procedure. The GMM estimate in Definition (1) is based on the moment condition (1), a set of limited information on the data generating process. The goal is to build a Bayesian counterpart of the classical GMM by constructing a Bayesian limited information procedure based on a set of moments.

Following Kim (2000) and (2002), we begin with some of the basic elements. A Bayesian framework is identified by a posterior density defined in the measurable space *π*_{T}(θ|x_{T}(ω)) be the “true” posterior of *θ* that may be unknown.^{10} Assume that the posterior *π*_{T}(·|X_{T}(·)) is jointly measurable

for any

Let *l*(θ,δ) be the loss function that penalizes for the choice of δ when θ is the real parameter value. The Bayes’ estimator is an estimator that minimizes the expected posterior loss

where

We are interested in a loss function that yields an estimator equivalent to the GMM estimator. Since the objective is to study a Bayesian counterpart of the classical GMM, it is natural to adopt a loss function with this property. Consider the following loss function that is quadratic in *g*_{T}:

where *θ*:

where

The loss functions in (10) and (11) are such that under some conditions (discussed in Lemma 1 below) yield an estimator that is the same as the GMM. As discussed in Kim (2000), the choice of the loss functions (10) and (11) does not cause loss of generality. The main results of this essay do not change so far as the chosen loss function can be transformed into a function that is quadratic in *θ*.

From the minimization problem (8) using the loss function in (10) the first order condition is

This implies a moment condition

where the right hand side is a constant conditional on *x*_{T} and

Interpreting the GMM estimator as a Bayes’ estimator, the right hand side of (12) is equal to zero. So we have the moment condition:

**LEMMA 1:**

Assume the second order conditions hold for the minimization in the GMM estimate in (2) and in the Bayes’ estimate in (8) with the loss function described in (10). Then, under Assumption (A3), the GMM estimator

where

and ^{11}

### B. Limited Information Likelihood and GMM

In this section we follow the discussion in Section 3 of Kim (2002) to establish a semi-parametric limited information likelihood based on the moment conditions which form a set of limited information on the data generating mechanism. (This limited information likelihood is then used to derive a limited information posterior in Kim (2002).) The approach to get this limited information likelihood function is based on the principle of maximum entropy where the idea is to get a likelihood that is closest to the unknown true likelihood in an information distance.

From the moment condition (1) we have

and from (4) and (6) we have the second moment condition on *g*_{T}

where *S* is the long-run variance of *w*_{t} = *h*(*x*_{t},*θ*_{0}) described in (4). Under Assumptions (A1) and (A2) it can be shown^{12} that

Given the true probability measure *P* with the properties in the moment conditions (13) and (14), we are interested in the probability measure *Q* that implies the same moment conditions. Let *P* such that for *θ* ω Θ

which (as shown in Kim 2002) reduces to

Usually such a *Q* is not unique. For *P* in the entropy distance or the Kullback-Leibler information distance (White 1982) or the *I*–divergence distance (Csiszar 1975). The optimization problem yields such a solution *Q*^{*}

where *dQ*/*dP* is the Radon-Nikodym derivative (or density) of *Q* with respect to *P*. So, *Q*^{*} is the solution of the constrained minimization where the constraint is given with respect to the moments implied in the measure *P*. As in Csiszar (1975), *Q*^{*} is defined to be the *I*–projection of *P* on *Q*^{*}(*θ*) with respect to *P*. Kim (2002) calls *limited information density* or the *I*–projection density (following Csiszar (1975)).

The solution of (18) *θ* ∈ Θ that satisfies the moment in (16) or (17), and therefore, *θ*. Thus, we call *Limited Information Likelihood* (LIL) or the *I*–projection likelihood.

Under the conditions on *Q*

where *κ* is a constant and *κ* = −1/2 is a desirable choice. Finally, Theorem 1 of Kim (2002) establishes that

When *S* is not known it is replaced by a consistent estimator

## IV. Model Uncertainty and BMA

Standard statistical practice ignores model uncertainty. The classical apporach conditions on a single model and thus leads to underestimation of uncertainty when making inferences about quantities of interest. A complete Bayesian solution to the problem of model uncertainty is the BMA approach which involves averaging over all possible combinations of predictors when making inferences about the quantities of interest.^{13} The Bayesian approach avoids conditioning on a single model. No model is assumed to be the “true” model, instead, all possible models are assigned different probabilities based on the researcher’s prior beliefs using the posterior model probabilities as weights. As noted by Hoeting and others (1994), this is reasonable as it allows for propagation of model uncertainty into the posterior distribution and leads to more sensible uncertainty bands.

The following sections draw from Raftery (1994), Kass and Raftery (1995), and Kim (2000) and (2002). First, we introduce Bayesian hypothesis testing and Bayes factors to test competing models. Then, we derive a limited information model selection criterion, in order to calculate the Bayes factors in the case of a limited information procedure. Finally, we incorporate the derived criterion in the context of BMA to derive the posterior distributions of the parameters of interest.

### A. Bayesian Hypothesis Testing

We begin with the general setup for a model selection problem. Let _{T}. A model ^{k} of dimension *q*_{k} for *k* ∈ I where I = {1,…,*I*} and characterized by a relation of the form *h*(*x*_{t},*θ*) = *w*_{t} (as described in Section 1) with *w*_{t} a stochastic process satisfying Assumptions (A1) an (A2). For every

Suppose that we want to use data x_{T} to test competing hypotheses presented by two models *M*_{1} and *M*_{2} with parameter vectors *θ*_{1} and *θ*_{2}. Let *p*(*M*_{1}|X_{T}) be the posterior probability that *M*_{1} is the correct model,

where (for *k* = l, 2) *q*_{T}(X_{T}|*M*_{k}) is the marginal probability of the data given *M*_{k}, and *p*(*M*_{k}) is the prior probability of model *M*_{k}.^{14}

In general, the term *q*_{T}(X_{T}|*M*_{k}) in (21) is obtained by integrating over the parameter space

where *θ*_{k} under model *M*_{k} (the marginal likelihood), and *Φ*(*θ*_{k}|*M*_{k}) is the prior density associated with model *M*_{k}.

The posterior odds ratio for *M*_{2} against *M*_{1} (i.e. the ratio of their posterior probabilities *M*_{2} over *M*_{1}. Using (21) the posterior odds ratio is

where the first term on the right-hand side of (23) is the Bayes factor for *M*_{2} against *M*_{1}, denoted by *B*_{21}, and the second term is the prior odds ratio. Sometimes the prior odds ratio is set to 1, representing the lack of prior preference for either model, in which case the posterior odds ratio is equal to the Bayes factor. When the posterior odds ratio is greater (less) than 1, the data favor *M*_{2} over *M*_{1} (*M*_{1} over *M*_{2}).

Evaluating the Bayes factor in (23) for hypothesis testing requires calculating the marginal likelihood *q*_{T}(X_{T}|*M*_{k}). This can be a high-dimensional and intractable integral. Various analytic and numerical approximations have been proposed which are reviewed in Kass and Raftery (1995). The Bayesian Information Criterion (BIC) is a simple and accurate method to estimate Bayes factors when the likelihood function is known. This is discussed first in the next section. Then we extend the discussion to the case where only the limited information likelihood is available to derive the Limited Information Bayesian Information Criterion (LIBIC).

### B. The Information Criteria: BIC and LIBIC

Following the approach of Raftery (1994), we focus on approximating the marginal likelihood for a single model, that is, the right hand side of (22). We will avoid indexing for a specific model, so a general form of (22) can be written as

Let *f*(*θ*) about *θ* that maximizes *f*(*θ*) or the posterior mode. The expansion gives

where *f* (*θ*), and *f″*(*θ*) is the Hessian matrix of second partial derivatives of *f*(*θ*) whose (*i, j*) element is *f* (*θ*) is maximized at

From the definition of *f* (*θ*) and (22) it follows that

Recognizing that the integrand in (26) as proportional to a multivariate normal density gives

where *d* is the number of parameters in the model and ^{15} Using

In large samples, *A* ≈ *ni* where **i** is the expected Fisher information matrix for one observation. This is a (*d* × *d*) matrix whose (*i, j*) element is *y*_{1} with *θ* held fixed. Thus, |*A*| ≈ *n*^{d} |**i**|. With these approximations and an added

Removing the terms of order *O*(l) or less, gives^{16}

Equation (30) is the approximation on which the BIC is based and was first derived by Schwarz (1978). As suggested by Raftery (1994), although the *O*(l) term suggests that the error does not vanish with an infinite amount of data, the error will tend towards zero as a proportion of log *q*_{T}(X_{T}|*M*), which ensures that the error will not affect the conclusion reached given enough data. For a particular choice of prior, the error term is of much smaller magnitude. Suppose that the prior is a multivariate normal with mean **i**^{−1}. Under that choice of a prior, we have

Substituting (31) in (29), we get an expression for log *q*_{T}(X_{T}|*M*) where the error term vanishes as *n* → ∞

Using the approximation in (32) we can derive the Bayes factor

As discussed in Kass and Raftery (1995), the expression in (33) is the Schwarz criterion (S) and as *B*_{21}.) Twice the Schwarz criterion is the BIC or

Exact calculation of equation (34) requires the knowledge of the likelihood function for each of the models. If *M*_{1} is nested within *M*_{2} (34) reduces to *M*_{1} against *M*_{2} and *df*_{21} = *d*_{2} − *d*_{1} is the number of degrees of freedom.

The full-information likelihood function is not available in the context of GMM. Therefore, in order to calculate the BIC in (34) we need to rely on the LIL developed by Kim (2002) and discussed in Section 3. Using the LIL in (30) we can replace the likelihood functions on the right hand side of (34) to get a Limited Information Bayesian Information Criterion (LIBIC). First, using (20) (with *S* replaced by a consistent estimator

and substituting in (34) eliminating the

LIBIC:

which reduces to

The LIBIC expression in (37) can be calculated from the estimated output and will be used to estimate the Bayes factors required for the hypothesis testing.

### C. BMA under Limited Information

Suppose we can divide the parameter space into *K* regions (models), so we have the space of all possible models *D*, based on the law of total probability:

where *p*(Δ|*D*), the posterior distribution of the quantity of interest Δ is a mixture of the posterior distributions of that quantity under each of the models with mixing probabilities given by the posterior model probabilities and using the posterior model probabilities as weights. Thus, the *full* posterior distribution of Δ is a weighted average of the posterior distributions under each model (*M*_{1}, …,*M*_{K}), where the weights are the posterior model probabilities *p*(*M*_{k}|*D*). This procedureiswhatistypically referred to as BMA, and it is in fact the standard Bayesian solution under model uncertainty, since it follows from direct application of Bayes’ theorem.

Denoting the data by x_{T}, (38) becomes *K* models, such that:

Although this fully Bayesian approach is a feasible and a very attractive solution to the problem of accounting for model uncertainty, there are certain difficulties in the implementation of the BMA, making it (in some cases) a rather unpopular and less practical proposition. First, when the number of regressors *k*^{*} is very large, the number of models is ^{17} Second, the BMA requires specification of the prior distributions of all relevant parameters so for *k*^{*} possible regressors, ^{18}

Each of the *K* models is compared in turn with a baseline model *M*_{0} (which could be the null model with no independent variables), yielding Bayes factors *B*_{10}, *B*_{20}, …, *B*_{0K}. The value of BIC for model *M*_{k} denoted *BIC*_{k}, is the approximation to 2 log *B*_{0k} given by (34), where *B*_{0k} is the Bayes factor for model *M*_{k} against *M*_{0}.^{19}

It is possible to write (21) and (39) in terms of the BIC. To see this, rewrite the Bayes factor *B*_{12} in (2l), we get

Extending from 2 models to *K* models,

The expression in (40) uses the “full information” BIC derived in (34). In the framework of our GMM analysis, and following the discussion in Section 3.2 we modify (40) to incorporate the “limited information” LIBIC defined in (37)

Equation (41) defines the LIBMA estimator, an extension of the BMA in the case of a limited information likelihood. The LIBMA incorporates a dynamic panel estimator in the context of GMM and a Bayesian robustness check to explicitly account for model uncertainty in evaluating the results of a universe of models generated by a set of possible regressors.

## V. Statistics for the Robustness Analysis

This section summarizes the computational aspects and introduces the statistics on which we will base our robustness analysis.

Suppose we have *n* independent replications of a linear regression model with an intercept *α*, and *k*^{*} possible regressors grouped in a *k*^{*}–dimensional vector *β*. Denote by *Z* the corresponding *n* × *k*^{*} design matrix. We have *p*(*M*_{j}), where

A model *M*_{j} with 0 ≤ *k*_{j} ≤ *k*^{*} regressors is defined by *y* = *α* + *β*_{j}*X*_{j} + *ε*, where *y* is the vector of observations, *X*_{j} denotes the *n* × *k*_{j} matrix of the regressors included, and *β*_{j} is the vector of the relevant coefficients.

From (23) the posterior odds ratio for two models *M*_{j}, *M*_{l} is *do not* assume equal inclusion probability for each model. Instead, following Doppelhofer, Miller, and Sala-i-Martin (2000) we represent a model *M*_{j} as a length *k*^{*} binary vector in which a one indicates that a variable in included in the model and a zero indicates that it is not. In addition, following Doppelhofer, Miller, and Sala-i-Martin (2000), we do not require the choice of (arbitrary) priors for all the parameters – instead, only one hyper-parameter is specified, the expected model size,

Assuming that each variable has an equal inclusion probability, the prior probability for model *M*_{j} is

and the prior odds ratio is

where *k*^{*} is the total number of regressors, *k*_{j} is the number of included variables in model *M*_{j}, and

If the set of possible regressions is small enough to allow exhaustive calculation, we can substitute (42) into (44) to calculate the posterior model probabilities (where the weights for different models are assigned based on posterior probabilities of each model–essentially normalizing the weight of any model by the sum of the weights of all possible

Next, we can use (44) to estimate the *posterior mean* and *posterior variance* as follows:

and

Other statistics relevant to the study are the posterior mean and variance *conditional on inclusion*. First we calculate the *posterior inclusion probability*, which is the sum of all posterior probabilities of all the regressions including the specificvariable (regressor). The posterior inclusion probability is a ranking measure to see how much the data favors the inclusion of a variable in the regression, and is calculated as

*posterior inclusion probability*

If

Finally, we compute the *sign certainty probability*. This measures the probability that the coefficient is on the same side of zero as its mean (conditional on inclusion) and is calculated as

## VI. Conclusion

This paper develops the theoretical background of the Limited Information Bayesian Model Averaging (LIBMA) approach and the computational aspects of the robustness analysis. The proposed methodology consists of a coherent Bayesian framework that addresses the problems of model uncertainty and restrictive assumptions of certain estimation procedures. The LIBMA technique has many potential applications including investigations of competing hypotheses, and parameter estimation that is robust to model specification.

As is typical in many areas of economic research, empirical work on investigating growth (and poverty) determinants is (i) prone to inconsistent estimates due to bias from omitted country-specificeffects and failing to account for endogenous regressors; and (ii) particularly susceptible to model uncertainty arising from the combination of a complex web of relationships and the lack of clear theoretical guidance on the choice of regressors. The first practical application of the LIBMA by Ghura, Leite, and Tsangarides (2002) is a contribution to the ongoing growth and poverty debate that provides empirical evidence on the elasticity of the income of the poor with respect to average income and on the set of macroeconomic policies that directly influence poverty rates. Further, motivated by the existing empirical evidence on poverty reduction (and more broadly on human development), which strongly supports the primacy of the role of economic growth, a second research project attempts to explain the observed differences in standards of living across countries by identifying robust patterns of cross-country growth behavior, and examine convergence using the LIBMA approach.

Andrews, D., and J. C.Monahan, 1992, “An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator,”Econometrica, Vol. 60 (July), pp. 953-66.

Brock, W., and S.Durlauf, 2001, “Growth Empirics and Reality,”World Bank Economic Review, Vol. 15 (no. 2), pp. 229-72.

Csiszar, I., 1975, “I-divergence Geometry of Probability Distributions and Minimization Problems,”Annals of Probability, Vol. 3 (no. 1), pp. 146-58.

Doppelhofer, G., R. I.Miller, and X.Sala-i-Martin, 2000, “Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach,”Working Paper No. 7750 (Cambridge, Massachusetts: National Bureau of Economic Research).

Ghura, D., C.Leite, and C.Tsangarides, 2002, “Is Growth Enough? Macroeconomic Policy and Poverty Reduction,”IMF Working Paper No. 02/118 (Washington: International Monetary Fund).

Golan, A., G.Judge, and D.Miller, 1996, Maximum Entropy Econometrics: Robust Estimation with Limited Data (Chichester: Wiley & Sons).

Hansen, L. P., 1982, “Large Sample Properties of Generalized Methods of Moments Estimators,”Econometrica, Vol. 50 (July), pp. 1029-54.

Hoeting, J. A., and others, 1999, “Bayesian Model Averaging: A Tutorial,”Statistical Science, Vol. 14 (No. 4), pp. 382-417.

Inoue, A., 2001, “A Bayesian Method of Moments in Large Samples,”Working Paper (Raleigh: North Carolina State University).

Kass, R. and A.Raftery, 1995, “Bayes Factors,”Journal of the American Statistical Association, Vol. 90 (no. 430), pp. 773-95.

Kim, J. Y., 2000, “The Generalized Method of Moments in the Bayesian Framework and a Model and Moment Selection Criterion,”Working Paper (Albany: State University of New York).

Kim, J. Y., 2001, “Bayesian Limited Information Analysis in the GMM Framework,”Working Paper (Albany: State University of New York).

Kim, J. Y., 2002, “Limited Information Likelihood and Bayesian Analysis,”Journal of Econometrics, Vol. 107 (March), pp. 175-93.

Leamer, E., 1978, Specification Searches: Ad Hoc Inference with Non-experimental Data, (New York: Wiley & Sons).

Madigan, D., and A.Raftery, 1994, “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window,”Journal of American Statistical Association, Vol. 63, pp. 1535-46.

Newey, W., and K.West, 1987, “Hypothesis Testing with Efficient Method of Moments Estimation,”International Economic Review, Vol. 28 (no. 3), pp. 777-87.

Raftery, A. E., 1994, “Bayesian Model Selection in Social Research,” in Sociological Methodology, ed. by Peter V.Marsden, (Cambridge, Massachusetts: Blackwells), pp. 111-96.

Raftery, A. E., 1996, “Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Models,”Biometrika, Vol. 83, pp. 251-66.

TierneyL., and J. B.Kadan, 1986, “Accurate Approximations for Posterior Moments and Marginal Densities,”Journal of American Statistical Association, Vol. 81, pp. 82-86.

White, H., 1980, “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,”Econometrica, Vol. 48 (May), pp. 817-38.

White, H., 1982, “Maximum Likelihood Estimation of Misspecified Models,”Econometrica, Vol. 50 (January), pp. 1-25.

Zellner, A., 1996, “Bayesian Method of Moments/Instrumental Variable (BMOM/IV) Analysis of Mean and Regression Model,” in Modeling and Prediction Honoring Seymour Geisser, ed. by J.C.Lee, W.C.Johnson, and A.Zellner(New York: Springer), pp. 61-74.

Zellner, A., 1997, “The Bayesian Method of Moments (BMOM): Theory and Application,” in Advances in EconometricsVol. 12, ed. byT.Fomby, R. C.Hill(Greenwich, Connecticut: JAI press), pp. 85-105.

^{}1

This paper is a revised version of the first essay of my Ph.D. dissertation. I thank Mike Bradley, Bob Phillips, and Fred Joutz for their guidance, advice and support. Any remaining errors are my responsibility. Financial support from the Economic Club of Washington is gratefully acknowledged.

^{}2

To the extent that the lagged values of the regressors are valid instruments, this GMM estimator addresses consistently and efficiently both sources of bias.

^{}4

Madigan and Raftery (1994) show that BMA provides optimal predictive ability. Hoeting and others (1999) summarize recent work using BMA. Brock and Durlauf (2000) provide an accessible explanation of criticisms levied at growth empirics and the contribution of Bayesian analysis in dealing with model uncertainty.

^{}5

In such a case, one may use quasi-Maximum Likelihood estimation which does not sacrifice consistency. However, consistency may be an issue for nonlinear models estimated with Maximum Likelihood.

^{}6

The measurable space

^{}7

Assumption (A3) is described here for completeness although it is used later for Lemma 1 as well as the derivation of the asymptotic normality of the posterior.

^{}9

As Golan, Judge, and Miller (1996) show, in the entropy approach, estimators are chosen to maximize entropy or minimize some distance metric between the true probability measure and artificial probability measures for which the moment condition in question is satisfied. Hence, this approach does not require knowledge of the functional form of the likelihood. Using the Bayesian counterpart of this approach, one can obtain finite-sample post-data moments and distribution of the parameters and conduct post-data inference (e.g. see Zellner 1996 and 1997). Many traditional estimators are special cases of entropy.

^{}10

It is the “true” posterior in the sense that it is obtained from the true likelihood of *θ*, or it is a posterior of *θ* that contains a richer set of information than that in the limited information posterior discussed in this paper.

^{}13

Madigan and Raftery (1994) note that the BMA approach provides the optimal predictive ability. Hoeting and others (1999) summarize recent work using BMA.

^{}14

Similar to (21), *p*(*M*_{1})|X_{T} + *p*(*M*_{2})|X_{T} = 1.

^{}15

Tierney and Kadane (1986) show that the error in (27) is *O*(*n*^{−1}) so that *nO*(*n*^{−1}) → constant as *n* → ∞.

^{}16

Note that in (29) the terms *O*(1) or less.

^{}17

For example, the summation and implicit integrations in (39) below may be difficult to compute. Proposed solutions to this problem are novel Markov Chain Monte Carlo techniques such as the MC3 sampler first used by Madigan and York (1995) or averaging over a subset of models that are supported by the data such as the Occam’s window method of Madigan and Raftery (1994).

^{}18

In the last section we will address how our proposed methodology addresses some of the weaknesses of the BMA, and how it compares to the approaches of Brock and Durlauf (2001) and Dopelhoffer, Miller, and Sala-i-Martin (2000).