The Use of Encompassing Tests for Forecast Combinations
  • 1 0000000404811396 Monetary Fund

Contributor Notes

Author’s E-Mail Address:

The paper proposes an algorithm that uses forecast encompassing tests for combining forecasts. The algorithm excludes a forecast from the combination if it is encompassed by another forecast. To assess the usefulness of this approach, an extensive empirical analysis is undertaken using a U.S. macroecoomic data set. The results are encouraging as the algorithm forecasts outperform benchmark model forecasts, in a mean square error (MSE) sense, in a majority of cases.


The paper proposes an algorithm that uses forecast encompassing tests for combining forecasts. The algorithm excludes a forecast from the combination if it is encompassed by another forecast. To assess the usefulness of this approach, an extensive empirical analysis is undertaken using a U.S. macroecoomic data set. The results are encouraging as the algorithm forecasts outperform benchmark model forecasts, in a mean square error (MSE) sense, in a majority of cases.

I. Introduction

The concepts of forecast combination and forecast encompassing are closely related. Often times, there are several competing forecasts of a target variable that come from different sources or models. One way to utilize all the competing forecasts is to merge them into a single, combined forecast. Forecast encompassing tests are used to determine whether one of a pair of forecasts contains all the useful information for prediction. If this is not the case and rather both models contain some incremental information, there is potential to form a combined forecast that blends the useful information of the two (or more) forecasts.

This paper proposes an algorithm that uses encompassing tests to combine forecasts. The algorithm is based on a simple idea: if a forecast is encompassed by another, it is excluded from the combination. Once all the encompassed forecasts are eliminated, the combined forecast is obtained by taking an arithmetic average. To assess the usefulness of this approach, an extensive empirical analysis is undertaken using a U.S. macroeconomic data set. There are 110 target variables to be forecast, and for each of the target variables, about 70 to 150 alternative forecasts to be used in combinations. For each of the 110 cases, the combined forecast obtained with the algorithm is compared with a benchmark model. The results are encouraging as the algorithm forecasts outperform benchmark model forecasts, in a MSE sense, in a majority of cases.

The approach proposed here differs from most of the literature on forecast combinations in that it suggests reducing the number of available forecasts before combining them. Much of the forecast combination literature concentrates on how to optimally combine a given set of forecasts.2 A straightforward approach is to use linear regressions to obtain weights, as proposed by Granger and Ramanathan (1984). This could be done by regressing the target variable on the set of all available forecasts, and then using the regression coefficients as weights. Gupta and Wilton (1987) propose nonparametric methods for calculating weights. The possibility of structural change or state-dependent relationships led researchers to develop time-varying combination methods; see Diebold and Pauly (1987) and Zellner, Hong, and Min (1991). Finally, Diebold and Pauly (1990) propose Bayesian shrinkage methods. Most of these methods take the number of alternative forecasts as given, and then seek to find the best weights for combining them. Although weights of some forecasts can be (close to) zero, generally all forecasts enter the combination.

Methods that choose among alternative forecasts before combining them could enhance predictive power. Examples along this line of research include Schmittlein, Kim, and Morrison (1990) and Swanson and Zang (2001). Schmittlein, Kim, and Morrison propose using Akaike Information Criteria (AIC) for combining forecasts, and provide Monte Carlo evidence on the usefulness of their approach. Swanson and Zhang consider other model selection approaches in addition to AIC, and also provide a real-time empirical analysis. They find that an approach based on the Schwartz Information Criterion (SIC) may provide a useful alternative to other forecast combination methods. Note that both papers examine model selection approaches to reduce the dimension of the forecast combination, rather than using statistical testing. Interestingly, although forecast combinations and forecast encompassing principle are closely related, the latter is rarely used for combining forecasts, but rather for evaluating them.

II. The Encompassing Principle

Let f1 and f2 be two alternative sets of forecasts of a variable. Assume that one of the two sets, say f1, performs better by some criteria, say Root Mean Square Error (RMSE). The idea behind forecast combinations is that the poorly-performing forecast may provide some marginal information that is not contained in the better forecast. In such a case, the combined forecast will perform better then either forecast alone. However, if the poorly-performing f2 contains no useful marginal information, than it is said that f1 encompasses f2.

To formally test for forecast encompassing, I will use the Harvey, Leybourne, and Newbold (HLN, 1998) test. The HLN (1998) test is based on the well-known forecast evaluation test developed by Diebold and Mariano (DM, 1995). The DM test is used to test for equal predictive ability of two competing forecasts. It considers a sample of loss differential series dt, defined as dt = L(elt) − L(e2t); where L () is some arbitrary loss function, such as MSE; eit is the τ -step-ahead forecast error of the model i; i = 1,2; and t = 1, …, T. Equal predictive accuracy amounts to E(dt) = 0, and the test is based on the observed sample mean d¯=(1/T)t=1Tdt. Assuming covariance stationarity of the loss differential series, the DM test has an asymptotic standard normal distribution under the null hypothesis of equal predictive accuracy. The test statistic is as follows:


Where V^(d¯) is a consistent estimate of the asymptotic variance of d¯, and assuming that τ -step-ahead forecasts exhibit dependence up to order τ − 1, it is obtained as:


where γi is the ith autocovariance of d¯, estimated by γ^i=T1t=i+1T(dtd¯)(dtid¯).

HLN (1997) assess the performance of the DM test using Monte Carlo simulations. The authors recommend two simple modifications to the DM test that would improve the power of the test in small samples, resulting in the modified version of DM (MDM): (i) Compare the test statistic with critical values from the Student’s t-distribution with T − 1 degrees of freedom, instead of the standard normal, so as to reduce size distortions; and (ii) Modify the test statistic as MDM = T−1/2 [T +1 − 2τ + T−1τ(τ − 1)]1/2 DM.

The HLN forecast encompassing test is closely related to the DM test. It is simply obtained by modifying dt to dt = (ei,tej,t)ei,t. The null hypothesis is that model i forecast encompasses model j. That is, all the relevant information of model j is contained in model i.

III. Data and Out-of-Sample Forecasts

The data set used is from Marcellino, Stock and Watson (2006), and consists of 172 macroeconomic series from 1959:1 through 2002:4. (Most data are monthly, but converted to quarterly; the Data Appendix provides a detailed description.) Forecast combinations are produced for 110 of these series called target variables; series with a short time span are not forecast. For each variable to be forecast, other series in the data set are used as predictors. Simple models in the form of bivariate regressions are used to produce forecasts with each of these alternative predictors. Forecasts of the alternative models are then used in combinations.

The bivariate linear regressions used to obtain pseudo out-of-sample forecasts are of the form:


where Yt is the target variable to be forecast, Xi is a predictor, and et is an error term. The lags of the variables X and Y are chosen using the SIC, where 1 ≤ p ≤ 4 and 0 ≤ q ≤ 4. SIC is calculated as SIC=nln(1/nt=1nε12)+k ln n where n is the number of observations, and k is the number of regressors including the constant. Regressions of this type are run for all available predictors X in the data set. In order to simulate real-time forecasting, models are re-estimated at each period.

The first estimation sample used to construct a set of forecasts is from the first quarter of 1959 to the last quarter of 1969, providing out-of-sample forecasts for the first quarter of 1970. I then expand the sample by one period, re-estimate the models, and produce forecasts for the second quarter of 1970, and so forth, ending with a final forecast for the fourth quarter of 2002. The maximum number of out-of-sample forecasts across time with this data set is 131.

To forecast price variables, all other series are used as predictors; however, real variables are not forecast using price variables. As a consequence, the number of forecasts produced that is used in the combination at a point in time depends on whether the target is a real or price variable. When the target is a price variable, the number of forecasts available at each period is close to 171, as all available predictors are used for forecasting price series. The exact number will be less than 171 as some series do not go back to 1959, and some were discontinued. Since price series are excluded from forecasting real series, the number of forecasts in that case is about 70.

To obtain the benchmark forecast, the simple arithmetic average of all available forecasts is taken. As demonstrated in many studies, simple averaging works very well in many applications and is extremely difficult to beat consistently even with sophisticated models; hence, it is a natural benchmark. 3 This benchmark forecast combination is called AVE forecast. The other forecast combination, also a simple average, but obtained after the number of forecasts in the combination is reduced by employing the encompassing algorithm, is called the EAL forecast (after the encompassing algorithm).

IV. The Encompassing Algorithm

For each target variable, there are several alternative models, each corresponding to a different series in the data set. Each model produces a set of pseudo out-of-sample forecasts, starting from the first quarter of 1970 to the end of the sample. The algorithm uses pseudo out-of-sample forecasts as inputs. The idea is to compare all models available forecasts with each other using encompassing tests, eliminate those that are encompassed by others, and take the average of the remaining forecasts. The comparisons are done bilaterally, using the HLN (1998) forecast encompassing test. Once the pseudo out-of-sample forecasts are obtained, the following steps are taken to eliminate forecasts encompassed by others, and obtain the EAL combination.

Step 1. Start at time t and calculate the RMSE of the out-of-sample forecasts for each model using out-of-sample forecasts up to time t-1, and realized values. Rank the models according to their past performance based on RMSE.

Step 2. Pick the best model (i.e., model with the lowest RMSE), and test sequentially whether the best model forecast encompasses other models, using the HLN test. If the best model encompasses the alternative model at some significance level α, delete the alternative model from the list of models.

Step 3. Repeat Step 2, with the second best model. Note that the list of models now contains only those that are not encompassed by the best model, and the best model.

Steps 4 and 4+ Continue with the third best model, and so on, until no encompassed model remains in the list.

Last step: Calculate the EAL forecast by taking the average of all remaining models’ forecasts.

There are several issues to consider in applications. First, an initial set of out-of-sample forecasts is required in order to apply the HLN (1998) test. One option is to use all available forecasts prior to the date the forecast is being produced. An alternative would be to choose a rolling window of a fixed number of observations. I will present results for all available forecasts and the last 20 forecasts, denoted by m = all, 20, respectively. To start the applications, out-of-sample forecasts before 1980 are used as inputs, and then the sample is expanded by one-quarter to simulate real-time forecasting. Thus, the comparisons of forecast combinations are based on post-1980 data. Note that some series do not go back to 1959, and there is need to accumulate some data points before including the series in combinations. I impose the condition that there should be at least 30 observations for a variable to be included. So effectively, the window size is 30 in most cases. I then experiment with a window size of 40, which does not affect the conclusions of the paper.

The second concern is the choice of the significance level α of the HLN test. I will experiment with a range of significance levels (α = 0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45) to explore whether a subrange would appear as maximizing the forecasting power. Finally, there is the problem of outliers. To tackle this issue, forecasts that are more than five standard deviations away from the target variable are considered as extreme and removed from combinations.

The proposed algorithm has some desirable properties. A common problem facing modelers attempting to utilize large data sets is that in many cases the number of variables in the data set exceeds the number of observations. In those cases, it is difficult or impossible to use conventional model selection techniques, such as linear regression techniques. The method proposed in this study can easily deal with this problem. A large number of alternative forecasts should not pose any major problems, except for computing time. Another advantage of the algorithm is that it does not require knowledge of the models that produce alternative forecasts; all that is needed is the set of forecasts.

V. Results

A. Does the Algorithm Work?

I compare the models by looking at the relative forecasting performance based on RMSE. The benchmark for each target variable is the AVE model. RMSE are calculated both for the benchmark model and the algorithm forecasts. The ratio RMSEEAL/RMSEAVE gives a scale-free metric, where a ratio less than one indicates that the algorithm forecast outperforms simple averaging.

Table 2 reports the averages of the relative RMSEs for the 110 target variables for each significance level and two different training sample sizes. The results suggest that when m = all and the significance level is greater then 0.15, the EAL forecasts provide some modest gains compared to the AVE forecasts.

Visual illustrations of the distribution of relative RMSE are provided in Figures 1a-d. The figures present results for m = all and α = 0.15, 0.25, 0.35, and 0.45, respectively. In the figures, the vertical axis shows the relative RMSEs, while the horizontal axis contains the 110 target variables to be forecast, sorted from the lowest relative RMSE to the highest. Thus the first variable in the horizontal axis is the target variable for which the EAL forecast provides the highest RMSE gains compared to the AVE method.

Figure 1.
Figure 1.

Relative Performance of Algorithm Forecasts

Citation: IMF Working Papers 2007, 264; 10.5089/9781451868272.001.A001

At α = 0.25 and α = 0.35, percentage RMSE gains arising from employing the EAL are over 10 percent for about half of the cases, with the highest gains of over 20 percent. When α = 0.45, there are less gains, which is not surprising; at high significance levels, less variables are eliminated from the combination and the resulting combined forecasts are closer to the simple average forecast. In contrast, at the low significance levels, very few variables remain in the combination and the forecast benefits less from the advantages of combining. The next section analyzes this tradeoff and presents results on the choice of the significance level.

B. Which Significance Level to Use?

Further results for comparing the forecasting performance of the algorithm at different significance levels are provided in Table 2. I start by looking at simple rankings. First, for each target variable I rank the number of times for which the best results are obtained across the 10 significance levels examined. For example, if the forecast with the smallest RMSE is obtained using significance level 0.25, and the second smallest is 0.30, then using 0.25 and 0.30 are ranked as 1 and 2, respectively. This is done for all 110 models, then the ranks are summed up with the smallest sum suggesting the best performing significance level.

The results vary across the two training sample sizes. When a larger sample is used, the best results are obtained when the significance level is in the 0.25 − 0.40 range. Although the differences are marginal, 0.35 gives the best average result. When the training window size is 20, the best results are obtained at higher significance levels.

The average number of remaining variables is also presented in Table 2. As expected, the number of nonencompassed variables increases with the significance level of the HLN test. At the low significance level of 0.001, usually only one variable remains, which is the one with the best historical RMSE performance. With α = 0.10 there are about seven nonencompassed variables on average and the mean relative RMSE is equal to one, suggesting that as few as seven variables can perform as well as the larger data set. The best performance is achieved when α = 0.35 and the number of variables is 25. It is also interesting to note that with a fixed window of m=20, the mean relative RMSE less than or equal to one is achieved when the significance level is greater than 0.25, corresponding to six or more variables in the forecast combination.

VI. Comparisons With Other Methods

This section compares the proposed method with other well-known forecast combination methods. The following models are used to generate alternative combined forecasts:

RMSE-weighted combinations:

Define the ith forecasting model’s RMSE at time t as RMSEt,i=[(1/n)1netj,i2]1/2 which is computed over a window of n observations. Then the RMSE weighted combined forecast is:


Rank-weighted combinations:

Define Rt,i as the rank of the ith model based on its historical RMSE performance up to time t. As suggested in Aiolfi and Timmerman (2006), the weights of the combination can be calculated as:


Combinations based on ranks have the advantage that ranks are not very sensitive to outliers so the estimated weights can be robust.

The thick-modeling approach:

Granger and Jeon (2004) advocate the thick modeling approach, where the top x percent of the best performers are kept in the forecast combination. There are no theory-based guidelines on how to choose x. The authors state that thick modeling is more of a “pragmatic folk-view than anything based on clear theory.” Researchers usually exclude an arbitrary portion of the worst performing forecasts. Despite that, the method works very well in empirical applications. In this study I have experimented with the following numbers for x: 1, 5, 10, 20, 30, …, 90. The results not only allow for comparisons with the combination strategy proposed in this study, but also provide some empirical evidence on the choice of x.

Combinations before and after the algorithm:

The above mentioned combination techniques are used both before and after the algorithm is applied. Doing that can improve the performance as some of the problems associated with large data sets may be removed after EAL filtering. However, since comparisons with other models would be difficult to present for all significance levels, I will focus on the models that provide the best results in the previous section, that is α = 0.35. This will also ensure that enough observations are included in the thick modeling approach, which cannot be meaningfully applied with only a few observations.

Factor models:

In addition to various forecast combination techniques, factor models are also included in forecast comparisons. These models, advocated by Stock and Watson in a series of papers in recent years,4 are shown to be fairly successful models for forecasting with large data sets. An illustration of factor models is as follows. Assume at time t we have a large data set, denoted Xt, and there is some common information in this data set which can be represented by a few unobserved factors Ft. The dynamic factor representation is as follows:


where Λt is the factor loading matrix, yt is the stationary time series to be forecast, and βt is a vector of parameters that defines the relationship between the factors and the variable to be forecast.

Stock and Watson (2002) show that the factors can be conveniently constructed as the principal components Xt. Once the factors are estimated, the forecasting exercise is similar to standard linear prediction. Coefficients βt are estimated by regressing yt onto Ft, and then the forecast is formed as yt+1=β^tFt. One practical problem is the determination of the number of factors to use in the forecasting exercise. Stock and Watson (2002) propose using information criteria to choose the optimum number of factors. I will experiment with three different approaches: only using the first principal component, and choosing the number of factors using AIC and Bayesian Information Criterion, where the maximum number of factors considered is four.


First, I calculate combined forecasts for each of the 110 target variables using 35 alternative models. Altogether about 90 forecasts are produced with each model for each target variable, and the RMSE (mean absolute errors, MAD) is calculated relative to the benchmark model. As before, a ratio of less than one indicates that the alternative model provides forecast gains. Finally, I calculate the average relative RMSE (MAD) for each forecasting method. That is, I provide 110 different experiments with various models. In Table 3, the models are ranked from best to worst and the average relative RMSE and MAD are reported. As can be seen from the tables, in many cases the differences in RMSE and MAD between models are small, thus the comparisons are only suggestive. In some cases, however, the results are stronger, such as the poor performance of thick models with only a small percentage of the best performers, or the factor models.

For all models except for the thick modeling approach, there is evidence suggesting that post-algorithm forecasts are better then pre-algorithm forecasts. More specifically, this is true for the RMSE and rank weighted, mean, and median combinations. The evidence on the thick modeling approach is mixed. The best overall results are achieved by pre-algorithm thick models, where 20–40 percent of the best models are retained in the combinations. Best results for post-algorithm results are obtained with 80 to 100 percent of forecasts.

Combinations with only a few forecasts, such as the top 1−10 percent, do not perform well on average. Moreover, such strategies deliver a high variance in performance. For example, the highest overall RMSE gains are obtained when the strategy is to choose the historically best model, giving us a reduction in RMSE close to 20 percent. At the same time, the worst performance is also obtained using the same model. These results once again warn against relying too much on best performances in the past, which are not necessarily repeated in the future.

Forecast combinations generally outperform factor forecasts. Compared to simple averaging, the average RMSE loss from using factor models is about 4−6 percent, depending on the model selection mechanism used for factor models. A comparison of the performance of model choice strategies for factor models suggests that using a single factor with lags almost always outperforms models with more than one factor, but no lags.

VII. Conclusions

The algorithm proposed in this study is based on the link between forecast combinations and the encompassing principle. The aim is to reduce the dimension of the combination problem by testing before combining. The empirical analysis presented suggests that the algorithm provides forecast gains in a MSE sense, compared to the benchmark simple averaging method. The average gains arising from employing the algorithm amount to about a 1−2 percent reduction in RMSEs. Although the gains are not substantial, they should still be considered noteworthy, given the well-known difficulty of outperforming the simple averaging method in practice.

Perhaps more important than the marginal reductions in forecast errors is the ability of the algorithm to perform at least as well as the simple averaging method. In this way, the algorithm suggests a method to represent a larger data set with a smaller one. This could prove useful in some applications as it may be easier to monitor and understand the dynamics of a smaller number of variables. For example, the method could be used to identify leading indicators of a variable from a large data set.

Appendix I. Data

The following Appendix Table lists the time series used in the empirical analysis. The series were either taken directly from the DRI-McGraw Hill Basic Economics database, in which case the original mnemonics are used, or they were produced by the author’s calculations based on data from that database, in which case the author’s calculations and original DRI/McGraw series mnemonics are summarized in the data description field. Following the series name is a transformation code, the sample period for the data series, and a short data description. The transformations are (Lev) level of the series; (D) first difference; (Ln) logarithm of the series; (DLn) first difference of the logarithm. The following abbreviations appear in data descriptions: SA: seasonally adjusted; NSA: not seasonally adjusted; SAAR: seasonally adjusted at an annual rate; AC: authors’ calculations.

Table 1.

Description of Data

article image
article image
article image
article image
article image
Table 2.

Relative Performance of Algorithm Forecasts 1/

article image

The relative RMSE indicates the ratio of MSE of algorithm forecasts to that of the simple averaging method. The table reports averages across 110 different experiments. The rankings compare the performance of different significance levels; the significance level that gives the best results is ranked one. The smallest sum of ranks across 110 models performs the best. The last rows in panels indicate the number of models that remain in the combination after the algorithm is applied.

Table 3.

A Comparison of Forecast Combination Methods 1/

article image

The relative root mean absolute errors indicates the ratio of the MAD of a combination method to that of the simple averaging method. Same for the RMSE. Methods are ranked from the best performing to the worst.


  • Aiolfi and A. Timmerman, 2006, “Persistence of Forecasting Performance and Combination Strategies,” Journal of Econometrics, 135, 3153.

    • Search Google Scholar
    • Export Citation
  • Clemen, R.T., 1989, “Combining Economic Forecasts: A Review and Annotated Biography,” International Journal of Forecasting, 5, 55981.

    • Search Google Scholar
    • Export Citation
  • Diebold, F. X., and R. Mariano, 1995, “Comparing Predictive Accuracy,” Journal of Business and Economic Statistics, 13, 253263.

  • Diebold, F.X., and P. Pauly, 1987, “Structural Change and the Combination of Forecasts,” Journal of Forecasting, 6, 2140.

  • Diebold, F.X., and P. Pauly, 1990, “The Use of Prior Information in Forecast Combinations,” International Journal of Forecasting, 6, 503508.

    • Search Google Scholar
    • Export Citation
  • Granger, C.W.J., and Y. Jeon, 2004, “Thick Modeling.” Economic Modeling, 21, 323343.

  • Granger, C.W.J., and R. Ramanathan, 1984, “Improved Methods of Combining Forecasts,” Journal of Forecasting, 3, 197204.

  • Gupta, S., and P.C. Wilton, 1987, “Combination of Forecasts: An Extension,” Management Science, 33, 356372.

  • Harvey I. D., S. J. Leybourne, and P. Newbold, 1997, “Testing the Equality of Prediction Mean Squared Errors,” International Journal of Forecasting, 13, 28191.

    • Search Google Scholar
    • Export Citation
  • Harvey I. D., S. J. Leybourne, and P. Newbold, 1998, “Tests for Forecast Encompassing,” Journal of Business and Economic Statistics, 16, 25459.

    • Search Google Scholar
    • Export Citation
  • Newbold, P., and D.I. Harvey, 2002, “Forecast Combination and Encompassing,” in Clements, M.P. and D.F. Hendry (eds), A Companion to Economic Forecasting. Oxford: Blackwells.

    • Search Google Scholar
    • Export Citation
  • Marcellino, M., J.H. Stock, and M. Watson, 2006, “A Comparison of Direct and Iterated Multistep AR Methods for Forecasting Macroeconomic Time Series,” Journal of Econometrics, 135, 499526.

    • Search Google Scholar
    • Export Citation
  • McCracken M. W., and K.D. West, 2002, “Inference About Predictive Ability,” in M.P. Clements and D.F. Hendry (eds.), A Companion to Economic Forecasting, Oxford: Basil Blackwell, pp. 299321.

    • Search Google Scholar
    • Export Citation
  • Schmittlein, D.C., J. Kim, and D.G. Morrison, 1990, “Combining Forecasts: Operational Adjustments to Theoretically Optimal Rules,” Management Science, 39, 10441056.

    • Search Google Scholar
    • Export Citation
  • Stock, J. H., and M. Watson, 2002, “Forecasting Using Principal Components From a Large Number of Predictors,” Journal of the American Statistical Association, 97, 11671179.

    • Search Google Scholar
    • Export Citation
  • Stock, J. H., and M. Watson, 2006, “Forecasting with Many Predictors,”, in G. Elliott, C.W.J. Granger and A. Timmermann (eds) Handbook of Economic Forecasting, North-Holland: Elsevier. Pages 515554.

    • Search Google Scholar
    • Export Citation
  • Swanson, N.R., and T. Zang, 2001, “Choosing Among Competing Econometric Forecasts: Regression-based Forecast Combination Using Model Selection,” Journal of Forecasting, 20, 425440.

    • Search Google Scholar
    • Export Citation
  • Timmerman A., 2006, “Forecast Combinations,” in G. Elliott., C.W.J. Granger, and A. Timmerman (eds), Handbook of Economic Forecasting, North-Holland: Elsevier. Pages 135196

    • Search Google Scholar
    • Export Citation
  • Zellner, A., C. Hong, and C. Min, 1991, “Forecasting Turning Points in International Output Growth Rates Using Bayesian Exponentially Weighted Autoregression, Time-varying Parameter, and Pooling Techniques,” Journal of Econometrics, 49, 275304.

    • Search Google Scholar
    • Export Citation

I thank Chikako Baba, Oya Celasun, Robert P. Flood, John W. Galbraith, Mark Watson, and seminar participants at the 62nd European Meeting of the Econometric Society for valuable comments and suggestions. Address for correspondence: International Monetary Fund, 700 19th St. NW, Washington, D.C., 20431, U.S.A.; Tel.: 202-623-4158.


On the forecasting performance of simple averaging, see the surveys by Clemens (1989), Newbold and Harvey (2002), and Timmerman (2006).


See Stock and Watson (2006) for a survey.

The Use of Encompassing Tests for Forecast Combinations
Author: Turgut Kisinbay