An Evaluation of the World Economic Outlook Forecasts
Author:
Mr. Allan Timmermann
Search for other papers by Mr. Allan Timmermann in
Current site
Google Scholar
Close

This paper conducts a series of statistical tests to evaluate the quality of the World Economic Outlook (WEO) forecasts for a very large cross section of countries. It assesses whether forecasts were unbiased and informationally efficient, characterizes the process whereby WEO forecasts get revised as the predicted period draws closer, and compares the precision of the WEO forecasts to private sector forecasts known as “consensus forecasts” and published by Consensus Economics on a monthly basis. The results suggest that the performance of the WEO forecasts is similar to that of the consensus forecasts. IMF Staff Papers (2007) 54, 1–33. doi:10.1057/palgrave.imfsp.9450007

Abstract

This paper conducts a series of statistical tests to evaluate the quality of the World Economic Outlook (WEO) forecasts for a very large cross section of countries. It assesses whether forecasts were unbiased and informationally efficient, characterizes the process whereby WEO forecasts get revised as the predicted period draws closer, and compares the precision of the WEO forecasts to private sector forecasts known as “consensus forecasts” and published by Consensus Economics on a monthly basis. The results suggest that the performance of the WEO forecasts is similar to that of the consensus forecasts. IMF Staff Papers (2007) 54, 1–33. doi:10.1057/palgrave.imfsp.9450007

The World Economic Outlook (WEO) is a significant source of forecasts of global economic activity and is a key vehicle in the International Monetary Fund’s multilateral surveillance activities. It is published twice a year, in April and September. Given the central role of the WEO forecasts, it is important that they are periodically evaluated to assess their usefulness and to look for ways to improve the forecasting process. This study is the fourth in a series of such evaluations (following Artis, 1988 and 1997; and Barrionuevo, 1993). There are some notable differences between the current study and the earlier ones:

  • First, we analyze forecasts for 178 countries in seven economic regions (Africa, Central and Eastern Europe, the Commonwealth of Independent States (CIS) countries and Mongolia, Developing Asia, the Middle East, the Western Hemisphere, and the Advanced Economies) since 1990. Earlier evaluations had focused on forecasts for only the Group of Seven (G-7) countries and regional aggregates.

  • Second, we include an extensive comparison between the accuracy of WEO forecasts and consensus forecasts. The latter is a widely used source that compiles the forecasts of economists working in the private sector. Through this comparison, we assess WEO forecasts not just against absolute benchmarks, but also against a relative benchmark of other forecasters.

  • Third, we consider the revisions to the forecasts, both over time and within each forecast round. The latter is important because there is a long gestation lag in the preparation of the forecasts in each round, and it is important to know the gains—in terms of accuracy—of frequent forecast updates.

Our analysis focuses on the current-year and next-year WEO forecasts of real gross domestic product (GDP) growth and inflation. In the case of real GDP growth, we find that the WEO forecasts display a tendency for systematic overprediction—that is, predicted growth, on average, tends to exceed actual growth. From a statistical perspective, these biases are most significant in the next-year forecasts. This tendency for overprediction of growth performance is also persistent over time. Moreover, the evidence suggests that forecasts of U.S. GDP growth are positively and significantly correlated with current-year forecast errors of output growth in a substantial number of advanced economies. (The forecast of German GDP growth also has predictive power over output growth forecast errors in some regions.) Our analysis also finds that, in some cases, accuracy problems appear related to the standing WEO assumption that the output gap is eliminated after 5 years. In particular, the paper notes a predominant negative relationship between the output gap and the forecast error in the GDP growth, notably for Germany, France, and Italy.

Turning to the inflation forecasts, we find a bias toward underprediction of inflation, with these biases significant in the next-year forecasts for many African, Central and Eastern European, and Western Hemisphere countries. The underprediction bias is generally found to be weaker in the current-year forecasts. With regard to their predictability, there is evidence that the next-year inflation forecast errors are often linked to U.S. GDP forecasts.

Prior to the publication of the WEO forecasts in April and September, a first set of predictions is presented to the IMF Executive Board in February and July. Subsequently, the forecasts are revised before they are published. These revisions add considerable informational value. For the February/ April same-year forecasts, the average reduction in the forecast error is about one-fifth for the advanced economies. The reduction is nearly 30 percent for the July/September same-year forecasts, but only 5 percent for the next-year forecasts.

The study compares the WEO projections to consensus forecasts for GDP growth and inflation over the period 1990–2003.1 The data cover all the G-7 economies (Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States), seven Latin American economies (Argentina, Brazil, Chile, Colombia, Mexico, Peru, and Venezuela), and nine Asian economies (China, Hong Kong SAR, India, Indonesia, Korea, Malaysia, Singapore, Taiwan Province of China, and Thailand). Overall, the comparison suggests that the forecast performance of the WEO is similar to that of the consensus forecast. The paper highlights, however, that the timing of the comparison with the consensus forecast matters. WEO current-year forecasts generally perform quite well against current-year consensus forecasts reported in March and perform considerably better against the February consensus forecasts. Given the relatively long gestation lag in their preparation, they tend to perform considerably worse against the consensus forecasts reported in April.

I. Description of the WEO Data Set

Data Coverage

To assess the forecasting performance, we make use of the fact that four sets of short-term forecasts are available for the same variable, because the WEO publishes both April and September current- and next-year forecasts. For example, four forecasts of GDP growth in the year 2000 are reported, namely, the April and September 1999 next-year forecasts and the April and September 2000 current-year forecasts. Access to different forecast vintages allows us to address issues such as whether (and by how much) the error in the forecast gets reduced as the time toward the target dates shrinks. It also allows us to test another efficiency property embedded in an optimal forecast, namely that forecast revisions should themselves be unpredictable. In some cases, we find evidence of significant biases in revisions, suggesting simple ways of improving on the forecasts.

The WEO data set contains information on 178 countries over the period 1990–2003. These countries are collected into seven groups or regions; namely, Africa (50 countries), Central and Eastern Europe (15), CIS and Mongolia (13), Developing Asia (24), Middle East (14), Western Hemisphere (33), and Advanced Economies (29). Data availability and data quality vary significantly across regions and there can be significant differences even within each region. Data quality and the extent to which outliers affect the results also depend on the type of variables being analyzed.

Timing Conventions

Because the target variables are subject to data revisions, a choice has to be made concerning which data vintage to use to measure realized values or outcomes. To this end, we follow the practice from earlier studies of WEO forecasts, such as Artis (1997), and use the first-available data in the April WEO issue of year t + 1 to measure the outcome of the predicted variable in period t (labeled yt). Next-year forecasts for period t + 1 are compared to the realized values for year t + 1 (yt + 1) reported in the September WEO issue of year t + 2. The idea here is that immediacy of an actual value against which the precision of the forecast is measured is particularly important for the short-term forecasts, so the first-available (April) measure is used for these forecasts. This is less of a concern for the longer term (next-year) forecasts, where the more precisely measured September data are consequently used.

In the analysis, we will also make use of the fact that we have both April and September forecasts of same-year and next-year realizations. This means that we have two sets of current-year forecasts generated in April and September, y^t,tApr,y^t,tSep, and two sets of next-year forecasts generated during the same months, y^t+1,tApr,y^t+1,tSep. In this notation, the first subscript indicates the period being predicted and the second subscript indicates the year when the forecast was generated. The superscript indicates the month of the WEO issue in which the WEO forecast was reported. This convention gives rise to four separate forecast errors:

e t , t A p r = y t y ^ t , t A p r , April current-year forecast error e t , t S e p = y t y ^ t , t S e p , September current-year forecast error e t + 1 , t A p r = y t + 1 y ^ t + 1 , t A p r , April next-year forecast error e t + 1 , t S e p = y t + 1 y ^ t + 1 , t S e p , September next-year forecast error .

In addition, we consider current-year and next-year forecast revisions, defined as

r e v t , t = y ^ t , t S e p y ^ t , t A p r , revision to the current-year forecast r e v t + 1 , t = y ^ t + 1 , t S e p y ^ t + 1 , t A p r , revision to the next-year forecast .

The data are trimmed in some regions because of missing observations or extreme observations that would otherwise dominate the regional averages. For example, at least eight September current-year forecasts are available for only 41 out of 50 African countries, and only 11 of the 24 developing Asian economies had more than eight data points for this variable. Fortunately, data on April and September next-year forecasts tend to be more complete, although again there are some countries with incomplete data. Measured by data coverage, the data set is most complete for the advanced economies and least complete for CIS and Mongolia.

II. Properties of Optimal Forecasts

To evaluate the quality of the WEO forecasts, it is necessary to establish a set of testable properties that an optimal forecast should have. In this section, we discuss the nature of such properties. In all cases, the properties are established under the assumption that the objective function is of the mean-squared error (MSE) type so that the forecasts minimize a symmetric, quadratic loss function. Different properties hold for other loss functions. To the extent that the costs associated with over- and underpredicting variables, such as GDP growth and inflation, are not symmetric, then it is, in fact, optimal to bias the forecast. Elliott, Komunjer, and Timmermann (2005) find that this has important consequences when evaluating the optimality properties of a forecast. Patton and Timmermann (2006) show how standard optimality properties that a forecast has under MSE loss get violated under asymmetric loss and a nonlinear data-generating process.

Unbiasedness and Lack of Serial Correlation

Under MSE loss, the optimal forecast, y^t,τ*=argminE[(yty^t,τ)2|Ωτ], where Ωτ is the forecaster’s information set at time o^<ty^t,τ. Under broad conditions, such as the existence of expected loss and covariance stationarity of the forecast error, we have E[et|Ωτ] = 0, which implies unbiasedness of the optimal forecast and absence of serial correlation in the forecast errors. Define the generic forecast errors for period t or t + 1 as

e t = y t y ^ t , t ( τ t ) ,
e t + 1 = y t + 1 y ^ t + 1 , t ( t t + 1 ) .

One can now perform the following simple regressions:

e t = α + ε t , ( 1 )
e t + 1 = α + β e t + ε t + 1 . ( 2 )

For an efficient forecast, we must have α = 0 (unbiasedness) in Equation (1), and and α = 0, β = 0 in Equation (2), implying unbiasedness and absence of serial correlation. The first regression gives rise to a simple Student’s t-test of α = 0, whereas the second leads to an F-test. Adding the forecast ŷt + 1, t to both sides of Equation (2), this regression is easily seen to be equivalent to the conventional Mincer-Zarnowitz (1969) levels regression

y t + 1 = α + β y ^ t + 1 , t + ε t + 1 . ( 3 )

In this regression, unbiasedness of the forecast translates into a requirement that α = 0, β = 1.

Efficiency Properties More Generally

Unbiasedness and absence of serial correlation in the forecast errors can be thought of as weak efficiency requirements. A much more general and stricter orthogonality condition holds for optimal forecasts under MSE loss. Because an optimal forecast should be the conditional expectation of the predicted variable of interest, if the forecaster uses all available information efficiently, then no variable in the current information set should be able to predict future forecast errors. To test this, let zt be any such variable in the forecaster’s information set at time t, Ωt. An implication of informational efficiency is that α = β = 0 in the regression

e t + 1 = α + β z t + ε t + 1 , ( 4 )

where εt+1 is a serially uncorrelated, zero-mean-error term. The relationship between unbiasedness and absence of serial correlation on the one hand (equation and informational efficiency according to Equation (4), on the other, more generally is similar to the relationship between the weak and semistrong versions of the market efficiency hypothesis. According to the weakly efficient hypothesis, past values of the variable itself should not help predict future values. The semistrong version tightens this restriction by requiring that no publicly available information helps forecast future values.

Tests on Forecast Revisions

Forecast revisions are of fundamental interest in a forecast evaluation exercise for one simple reason: If a sequence of forecasts is optimal, then the forecast revisions should themselves be unpredictable (technically a martingale difference sequence). Indeed, if this were not the case and, say, forecast revisions between February and April were themselves predictable, then the original (February) forecast would not be optimal. Suppose, for example, that it is known that on average the April forecast of next-year output growth tends to be ¼ of 1 percent higher than the February forecast. Then the February forecast should be revised upward by this amount to reflect the better information available in April of each year.

Another advantage of studying revisions is that predictable patterns in revisions, if detected, automatically tell the forecaster how to improve the original forecast, namely, by amending it by the fitted value of the forecast revision. Hence, if the February forecast of the revision in the forecast between February and April is

r e v t + 1 , t A p r = α ^ + β ^ z t F e b ,

the original February forecast, y^t+1,tFeb, can be replaced by an improved forecast, y˜t+1,tFeb, as follows:

y ˜ t + 1 , t F e b = y ^ t + 1 , t F e b + r e v t + 1 , t A p r . ( 5 )

More generally, if ΩtApr is the forecaster’s information set in April, ΩtFeb is the information set in February (which is a subset of the April information set, ΩtAprΩtFeb) and if forecasts are formed optimally as conditional expectations—that is, y^t+1,tFeb=E[yt+1|ΩtFeb] and y^t+1,tApr=E[yt+1|ΩtApr]—then by the law of iterated expectations E[y^t+1,tApr|ΩtFeb]=y^t+1,tFeb,, and so the revision, defined as revt+1,t=y^t+1,tApry^t+1,tFeb, must be zero-mean:

E [ r e v t + 1 , t | Ω t F e b ] = 0. ( 6 )

A similar result holds for the current-year revisions, revt,t=y^t,tApry^t,tFeb,,

E [ r e v t , t | Ω t F e b ] = 0. ( 7 )

Notice, however, that in general E[revt+1,t|ΩtApr]0 and E[revt,t|ΩtApr]0, provided that new information arrives between February and April of year t. It is worth emphasizing that we ignore estimation errors, which can induce serial correlation in the forecast errors even if the forecaster knows the true model. This is akin to learning effects—see Timmermann (1993) for a discussion of this point in the context of predictability of financial returns.

An important implication follows from these simple results: forecast optimality can be tested without having data on the target variable y. This is important because, given the availability of different vintages of the target variable, it is not clear whether the forecasts should be compared to the first-issue, second (revised), or “final” data revision. This matters considerably in practice as witnessed by the recent literature on real-time macroeconomic data (see Croushore, 2006). By analyzing data revisions, we can effectively construct a test that is not sensitive to how well the underlying data are being measured.

Nonincreasing Variance of Forecast Errors as the Forecast Horizon is Decreased

A final property of an optimal forecast is declining variance of the forecast error as more information becomes available. This means that the February current-year (next-year) forecast errors should have a greater variance than the April current-year (next-year) forecast errors:

V a r ( e t + 1 , t A p r ) V a r ( e t + 1 , t F e b ) , V a r ( e t , t A p r ) V a r ( e t , t F e b ) . ( 8 )

Intuitively this simply reflects that more information about the outcome in the current or next year is known in April than in February of the same year. This can be formally tested through a variance ratio test or (more appropriately given the small sample size here) by considering patterns in the variance of forecast errors associated with different forecast horizons.

III. Empirical Results

With the data set and benchmark properties of an optimal forecast in place, we proceed to analyze the empirical evidence. Table 1 reports summary statistics for the forecast errors and forecast revisions grouped by variable and region. We show the mean, median, and standard deviation of the forecast error; the average absolute value of the coefficient of first-order serial correlation in the forecast errors; and the percentage of positive values of the forecast error. In all cases, these statistics are computed based on the cross section of countries within a particular region. For example, both the median and standard deviations are computed from the cross section of average values across countries in a given region. We next discuss the main empirical findings.

Table 1.

Descriptive Statistics for Forecast Errors, by Variable and Region

(Averages across countries in region)

article image
article image
article image

GDP Growth

Current-year forecasts

For the real GDP growth rate variable, the mean of the current-year forecast error (that is, the bias averaged across time and across countries) is very close to zero for the advanced economies. Biases in April current-year forecasts are much larger—exceeding more than 1 percent—and negative for Africa, Central and Eastern Europe, CIS and Mongolia, and the Middle East. As expected, this bias is reduced significantly in the September current-year forecasts. Although the April biases appear to be rather large, it should also be noted that they reflect some very large outliers whose values are predominantly negative and thus represent overpredictions. Indeed, the standard deviations of the April current-year forecast errors tend to be largest for those regions where the greatest biases were found—exceeding 8 percent for CIS and Mongolia and 6 percent for the Middle East.

Such outliers in the data lead us to consider more robust statistics as well, for example, the median forecast error and the proportion of positive forecast errors (underpredictions). Provided that the underlying shocks are not drawn from asymmetric distributions, one would expect the median to be close to zero and the proportion of positive forecast errors to be close to 50 percent on average if the underlying forecasting model is not misspecified. Again the data reveal systematic problems for some of the regions: between 34 and 40 percent of the same-year forecasts for the African region are overpredictions of subsequent GDP growth (negative mean forecast errors). Consistent with this, the median forecast error remains large and negative (−0.81 for this region), as it does for Central and Eastern Europe and CIS and Mongolia.

Forecasts in all regions pass the test that the variance of the September forecast errors should be smaller than the variance of the April forecast errors of the same variable. Furthermore, in many regions the reduction in uncertainty between the April and September forecast appears to be quite large. For example, the average standard deviation of the current-year forecast error in the advanced economies is reduced from 1.36 percent in April to 0.81 percent in September, representing a 40 percent reduction.

Next-Year Forecasts

Biases in the next-year forecast errors generally exceed those observed in the current-year forecasts. Interestingly, in every single region the mean April or September biases are negative, and this also holds for the median bias in all regions, with exception of the Middle East. This suggests that the WEO in general overpredicts next-year GDP growth. Furthermore, whereas the average bias in the current-year predictions for the advanced economies is very small, it is quite sizable in the next-year forecast, where it takes values of −0.36 and −0.55 percent, depending on the reporting date of the forecast. Estimates of the standard deviations of the forecast errors associated with the April and September next-year forecasts are much more similar than their current-year counterparts. This suggests that far less is learned between April and September about next-year growth than is learned between these months about growth in the current year.

The proportion of positive next-year forecast errors is again very low for Africa (0.33) and the Western Hemisphere (0.35). The predominance of regions with proportions of positive signs below 0.5 is consistent with the tendency of the WEO forecasts to overpredict next-year GDP growth.

Serial correlation in the forecast errors also appears to be a problem in some regions. The fourth column of Table 1, which reports the average of the absolute value of the first-order autocorrelation in the forecast error, is quite high in CIS and Mongolia in particular.

Turning to the forecast revisions between the April and September WEO publications, which should have a mean of zero, there is systematic evidence of negative biases. This is consistent with the April and September forecasts both overpredicting GDP growth on average, but the April forecast being more optimistic than the September value (so the mean change is negative). Hence, on average, the September forecast is being revised downward when compared with the April value. This finding is corroborated in the median values as well as in the proportion of positive forecast revisions (which consistently lies below one-half) and is information that could easily be used to improve on the WEO growth forecasts.

Another feature worth noting in the forecast revisions is that the standard deviation of the revision is generally quite a bit larger for the current-year values than for the next-year values. Again, this reinforces the earlier observation that information arriving between April and September more strongly affects current-year than next-year forecasts.

Inflation

Very high inflation rates characterized a number of countries during the sample period, so it is not surprising that outliers tend to be very large for this variable and certainly larger than for real GDP growth. As a consequence, we focus our analysis on the relatively robust measures of forecasting performance, such as the proportion of positive forecast errors. For the current-year forecasts, this does not deviate too strongly from 50 percent in any of the regions, except for the Middle East, where only between 34 and 43 percent of the April and September current-year forecast errors are positive, and to a lesser extent for the Advanced Economies, where 43 percent of the signs are positive.

A rather different picture emerges for the next-year forecast errors. Between 60 and 70 percent of the April forecast errors are positive for Africa, Central and Eastern Europe, and CIS and Mongolia. These proportions are closer to 60 percent for the September forecasts, but remain somewhat higher than 50 percent, indicating a tendency toward underprediction of inflation in these countries. Furthermore, all forecast revisions have positive means and more than 50 percent of the forecast revisions are positive. A particularly high percentage is observed among the next-year revisions for CIS and Mongolia and Central and Eastern Europe, which generally see the average forecast revised upward. Hence there is a tendency for both the WEO’s current-year and next-year inflation forecasts to be raised between April and September. Since the September forecasts are generally more accurate than their April counterparts, this suggests that the April WEO inflation forecasts can be improved by increasing their value.

We also consider whether the standard deviation of the April forecast errors is greater than that of the September forecast errors. Although outliers make it difficult to interpret some of the values, this appears generally to be the case.

IV. Analysis of Statistical Significance

Whether the biases documented in the previous table should be of concern depends on how systematic they are. This issue can best be addressed by undertaking a more in-depth statistical analysis. Such an analysis is of course tempered by the short data sample, which potentially invalidates inference relying on asymptotic distributions and also lowers the power of a statistical analysis to detect misspecification in the forecasting models, even when this is present. Again, countries with fewer than eight observations will be excluded from the statistical analysis. Considerable caution should be exercised when interpreting the statistical inference results, because the sample size used here is very small, and finite-sample distortions of standard test statistics that correct for heteroskedasticity and autocorrelation in the regression residuals are well known (see Den Haan and Levin, 1997; Kiefer, Vogelsang, and Bunzel, 2000; and Kiefer and Vogelsang, 2002).

To deal with the problem that the small-sample properties of the simple t- and F-statistics are such that standard critical levels may not provide a reliable guide to inference, we designed a bootstrap experiment. This procedure repeatedly draws values of the forecast errors (e1,…,eT) with replacement from the empirical distribution function to construct a sample whose length (T) is identical to that of the original data sample. Having constructed an artificial sample in this way (e1(b)b,,eT(b)b), where b is an indicator for the bth bootstrap and 1(b), T(b) are randomly drawn integer values between 1 and T, we recalculate the test statistics of interest, for example, t- and F-statistics associated with the efficiency regressions. We repeat this in 5,000 bootstrap experiments to construct a histogram for the distribution of the test statistic. The value of the test statistic found for the actual data is then compared with this bootstrapped distribution to get bootstrapped p-values. We shall report the proportion of countries for which the actual test statistic exceeds the 95th percentile of the bootstrapped distribution (using a two-sided test for the t-statistic).

Using Equation (1), the first two columns of Table 2 report the proportion of included countries in the various regions for which the t-statistic associated with the mean forecast error is less than −2 or greater than 2.2 The third column reports the proportion of bootstrapped p-values for α = 0 that fall below 0.05 using a two-sided test. It is instructive to compare the proportion of t-statistics that exceed 2 in absolute value against the bootstrapped p-values. In almost all cases the latter lead to far fewer rejections, indicating the small-sample size distortions that affect conventional test statistics.

Table 2.

Tests for Biasedness and Serial Correlation of Forecast Errors

(Share of countries in region with significant test statistics)

article image
article image
article image
Source: Author’s calculations.

The fourth column reports the percentage of regressions for which the absolute value of the t-statistic of β in the weak efficiency regression, Equation (2), is greater than 2. The fifth column reports the percentage of cases where the F-test for the joint hypothesis α = 0, β = 0 in Equation (2) exceeds its 5 percent critical level, and the final column reports the percentage of significant values of a sign test for whether the proportion of positive forecast errors differs from one-half, again using a 5 percent critical level. The purpose of reporting so many test statistics is to get a broader picture of possible forecast inefficiencies and to account for the fact that the individual test statistics are surrounded by more than the usual uncertainty, owing to the very small samples entertained here. Caution should therefore be exercised when interpreting the results.

GDP Forecasts

First consider the April current-year forecasts. For close to 40 percent of the countries in the African region, the GDP growth forecasts were systematically too large.3 The bootstrapped test statistics confirm a significant bias for a much larger proportion of African countries—close to 25 percent—than should be expected if the forecasts were genuinely unbiased. This proportion is reduced to 15 percent when bias and serial correlation are jointly tested, most likely because of the weaker power of the joint test, which requires estimation of an additional parameter. In fact, we can identify significant serial correlation for only about 6 percent of the African countries (column 4 of Table 2). Similarly, for about 15 percent of the African countries, the proportion of positive signs in the current-year forecast errors is significantly different from one-half at the 5 percent critical level (column 5 of Table 2).

Between 10 and 20 percent of the countries in CIS and Mongolia and the Western Hemisphere also show evidence of a significant bias in the forecasts. Serial correlation in the forecast errors appears to be most important in the Middle East, where 15 percent of the countries generate significant bootstrapped test statistics. These findings mostly carry over to the September current-year forecasts. Forecast errors continue to be biased and serially correlated for about 15 percent of the countries in Africa and there is strong evidence of serial correlation for the Middle East. In contrast, there is very little evidence that the current-year forecasts are biased or serially correlated in developing Asia or the advanced economies. Overall, the proportion of cases with a significant bias is lower in the September current-year forecasts compared with the April current-year forecasts.

Turning to the next-year forecast errors, there is evidence of a significant upward bias in the forecasts for about 35 percent of the countries in Africa and almost 25 percent of the countries in the Western Hemisphere (column 3 of Table 2). Significant biases also affect more than 20 percent of the countries among the advanced economies. Serial correlation in next-year forecast errors plagues all regions, particularly Africa. All told, the bootstrapped p-values show a pattern of biased or serially correlated next-year forecast errors in all regions.

Current-year forecast revisions are biased for Africa and the Western Hemisphere but there is little evidence of serial correlation. Next-year forecast revisions are biased and serially correlated for more than 10 percent of the countries in the Western Hemisphere, but otherwise the evidence against (weak) efficiency tends to be relatively mild.

Inflation Forecasts

As mentioned previously, the inflation data are affected by numerous outliers, so we will not rely on standard test statistics and instead will move directly to consider the bootstrap results. These reveal mild evidence of inefficiency in the current-year inflation forecasts. There appears to be some positive bias (underprediction of inflation) in the case of Africa and the Western Hemisphere. By far the strongest evidence against efficiency is found in the next-year forecast errors, which reveal forecasts that are systematically downward-biased in most regions except for the Advanced Economies. However, forecast errors are serially correlated in the latter region so the null of no bias or serial correlation is rejected for about 15 percent of all countries (more than double the level expected under the null).

For the next-year forecasts, with the exception of CIS and Mongolia, a greater-than-expected proportion of countries in the various regions generates a significant test statistic associated with the bias. The strongest evidence against efficiency comes from the serial correlation tests in column 5 of Table 2, which show that p-values below 5 percent were generated for between 15 percent and 40 percent of the countries in the various regions. In particular, more than 30 percent of the countries in the Western Hemisphere show evidence of significant serial correlation in the forecast errors. With few exceptions, forecast revisions reveal little systematic evidence of biases or serial correlation.

V. Can the WEO Forecast Errors Be Predicted?

The process whereby the WEO forecasts are generated puts considerable emphasis on integrating predictions across countries, regions, and variables in order to produce a coherent and internally consistent projection of current and future economic activity. One way to analyze whether the procedures that are currently in place have their intended effect is to test for informational efficiency using a range of indicators of global economic activity. Such tests build on the moment condition E[et + 1t] = 0—where Ωt is the forecaster’s information set at the time of the forecast (t)—and are hence versions of the efficiency tests in Equation (4).

In our empirical application we focus on four such predictor variables. First, we consider the WEO prediction of U.S. GDP growth. This is an obvious choice given the size of the U.S. economy and the leading role it plays in shaping global economic activity. The second instrument is the WEO prediction of German output growth—again motivated by the significance of this economy to regional and global growth.4 Finally, we also use the WEO forecast of oil prices and a global current account discrepancy instrument as predictors. Oil prices are an obvious choice because they are an important determinant of economic growth and inflation in a number of economies. The global current account discrepancy is constructed as the sum total of current accounts across all countries scaled by 15 global exports. This figure should be equal to zero but may differ from this value owing to measurement errors.

Table 3 shows the outcome of this exercise. Within each region and for each of the predictor variables the table reports the proportion of t-values below −2 and above 2, respectively. Results indicative of a failure to fully account for the predicted U.S. GDP growth should show up in the form of a proportion of significant t-values somewhat higher than 5 percent. There is also information in the sign of the t-statistic. For GDP growth, a higher proportion of positive and significant values than negative and significant t-statistics would reveal a failure to fully account for the spillover of U.S. GDP growth to other countries.

Table 3.

Predictability of Forecast Errors in Relation to Current Information Variables

(Fraction of all countries in region with t-values for additional variables above or below indicated threshold)

article image
article image
article image
article image
Source: Author’s calculations.

There are only a few cases where the WEO prediction of U.S. GDP growth appears to be correlated with the forecast errors. However, the ones that we find are of considerable interest. Indeed, the evidence suggests that, for the advanced economies, 31 percent of the April current-year forecasts and 24 percent of the September current-year U.S. GDP forecasts generate a t-value above 2 and hence predict the forecast errors. This leads to a significantly positive t-statistic for 29 percent of the current-year forecast revisions in this region. In contrast, there is no evidence that the U.S. GDP forecast has predictive power over the next-year forecast errors. The only other instance registering a greater-than-expected proportion of significant t-values is the current-year forecasts for Central and Eastern Europe, where 33 percent of the t-values exceed a value of 2. For many of the countries in this region, the revision to the current-year forecast that takes place between April and September is predicted by the U.S. GDP forecast.

Turning to the WEO forecast of German output growth, interestingly this is positively correlated and significant in explaining forecast errors in a high proportion of countries in CIS and Mongolia (particularly for the next-year forecast errors) but not to nearly the same extent in other regions.

With a few interesting exceptions—namely, CIS and Mongolia, for which predicted oil prices are positively correlated with forecast errors in GDP growth, and Western Hemisphere and advanced economies, for which a negative correlation emerges—the WEO forecasts of oil prices do not appear to be overly important in explaining forecast errors in output growth.

Interestingly, the global current account discrepancy is significant for close to 40 percent and 25 percent of the advanced economies in explaining the April current-year and next-year forecast errors, respectively.

There is evidence that the next-year inflation forecast errors are linked to U.S. GDP forecasts, particularly for countries in Central and Eastern Europe, CIS and Mongolia, Developing Asia, Western Hemisphere, and the advanced economies. Once again the WEO forecast of German output growth is significant in explaining the inflation forecast error for a very large proportion of the countries in the CIS and Mongolia region.

Output Gap

The output gap—measured as the difference between actual and potential GDP—plays an important role in the WEO forecasts. Implicit in these is an assumption that the output gap is eliminated after 5 years. If this assumption is unrealistic and leads to biased forecasts, then one would expect that the predicted value of the output gap itself would be accountable for forecast errors. For example, if it takes longer to eliminate the output gap than assumed in the WEO, then the WEO will tend to overpredict forecasts for countries with large output gaps.

We have data on output gaps for the 29 advanced economies. For each of these, we regress the forecast error on an intercept and on the predicted output gap whose timing corresponds to the forecast with which it gets matched.

Table 4 presents the results in the form of t-statistics for current- and next-year forecast errors and forecast revisions. A pattern that stands out for the GDP forecasts is that the signs of the estimated t-values predominantly are negative. About 15 percent of the t-statistics exceed 2 in absolute value. The large negative t-statistics for Germany, France, and Italy are particularly interesting because, as we shall see subsequently, these were also economies for which the WEO output growth forecasts were systematically biased upward during the period. This finding suggests that the reduction in the output gap assumed in computing the WEO forecasts could lead to overpredictions: All three economies had large output gaps during the 1990s, as did Japan—the output gap averaged −1.63, −1.99, −2.30, and −4.16 for France, Germany, Italy, and Japan, respectively. These were among the highest output gaps in the 29 countries. An assumption in the WEO forecasts that these output gaps would be reduced too fast might lead to a greater prediction of output growth and hence to an upward bias in the forecast.

Table 4.

Output Gaps and the Predictability of Forecast Errors in Advanced Economies

(Value of t-statistics for the coefficient of the output gap in forecast efficiency regression)

article image
article image
Source: Author’s calculations.

The sign of the regression coefficient of the output gap is predominantly positive in the case of the inflation forecast errors, that is, the opposite of the sign of what was found for the GDP forecasts. Hence, the larger the output gap—that is, the greater an economy’s unused capacity—the more the WEO tends to underpredict inflation. This effect can be quite large and is borderline significant for countries such as France, Germany, and Korea.

Finally, turning to the regression results for the current account, there are many instances with large and significant predictability from the output gap over subsequent forecast errors, although the sign of the regression coefficient varies quite a bit. Countries for which a significant degree of predictability is found include Hong Kong SAR, Japan, the Netherlands, Singapore, and Sweden.

VI. Revisions from Board to Published Forecasts

WEO forecasts are published twice a year, in April and September. Several rounds of forecast revisions precede the published version. A first set of predictions is presented to the IMF board in February and July each year, preceding the April and September WEO publications. To assess the informational value of forecast revisions that occur between the Board version and the published version, we obtained data on Board forecasts of current-year GDP growth in February and next-year board forecasts of GDP growth reported in July. We refer to these forecasts as y^t,tFeb and y^t+1,tJuly, respectively. Further, let the forecast revisions from the board to the published WEO forecasts be given by revt,tpubBoard and revt+1,tpubBoard. If the revisions occurring between the board and published forecasts contain useful information, we should expect that they help predict the errors in the original board forecasts, defined as et,tBoard=yty^t,tFeb and et+1,tBoard=yt+1y^t+,tJuly. We test this proposition through the regressions

e t , t B o a r d = α + β r e v t , t p u b B o a r d + ε t , e t + 1 , t B o a r d = α + β r e v t + 1 , t p u b B o a r d + ε t + 1 . ( 9 )

If the revisions incorporated in the published WEO forecasts do not add any value to the original board forecast, then we should expect to find β-coefficients near zero. Conversely, we would expect to find significant and positive values of β and nonzero R2-values in case the revisions contain valuable information. Estimation results based on Equation (9) are reported in Table 5. The current-year forecast errors for the advanced economies reveal strong evidence that the board-to-publication revision contains valuable information that not only is significantly correlated with the forecast error for about 50 percent of the countries but has the required positive sign for between 80 and 90 percent of the countries. The large R2-value of about 0.25 is further testimony to this effect and suggests that 25 percent of the current-year February or July forecast error can be explained by the revision between the board and published versions.

Table 5.

Real GDP: Significance of Forecast Revisions After Executive Board Meeting

(Average across regions except for fractions)

article image
Source: Author’s calculations.