An Evaluation of the World Economic Outlook Forecasts

Contributor Notes

Author(s) E-Mail Address: agtimmermann@ad.ucsd.edu

The World Economic Outlook (WEO) is a key source of forecasts of global economic conditions. It is therefore important to review the performance of these forecasts against both actual outcomes and alternative forecasts. This paper conducts a series of statistical tests to evaluate the quality of the WEO forecasts for a very large cross section of countries, with particular emphasis on the recent recession and recovery. It assesses whether forecasts were unbiased and informationally efficient, and characterizes the process whereby WEO forecasts get revised as the time to the point of the forecast draws closer. Finally, the paper assess whether forecasts can be improved by combining WEO forecasts with the Consensus forecasts. The results suggest that the performance of the WEO forecasts is similar to that of the Consensus forecasts. While WEO forecasts for many variables in many countries meet basic quality standards in some, if not all, dimensions, the paper raises a number of concerns with current forecasting performance.

Abstract

The World Economic Outlook (WEO) is a key source of forecasts of global economic conditions. It is therefore important to review the performance of these forecasts against both actual outcomes and alternative forecasts. This paper conducts a series of statistical tests to evaluate the quality of the WEO forecasts for a very large cross section of countries, with particular emphasis on the recent recession and recovery. It assesses whether forecasts were unbiased and informationally efficient, and characterizes the process whereby WEO forecasts get revised as the time to the point of the forecast draws closer. Finally, the paper assess whether forecasts can be improved by combining WEO forecasts with the Consensus forecasts. The results suggest that the performance of the WEO forecasts is similar to that of the Consensus forecasts. While WEO forecasts for many variables in many countries meet basic quality standards in some, if not all, dimensions, the paper raises a number of concerns with current forecasting performance.

I. Introduction and Summary

The World Economic Outlook (WEO) is a key source of forecasts of global economic activity and is a key vehicle in the IMF’s multilateral surveillance activities. It is published twice a year in April and September. Given the central role of the WEO forecasts, it is important that they are periodically evaluated to assess their usefulness, and to look for ways to improve the forecasting process. This report is the fourth in a series of such evaluations (following Artis, 1996; Barrionuevo, 1993; and Artis, 1988).

This report analyzes the forecast performance for five key variables—real GDP growth, inflation, the current account balance, and import and export volume growth—from 1990 to 2003, the last year for which actual data were available when the report was initiated. The report incorporates state-of-the-art techniques that shed light on the accuracy of WEO forecasts from new angles, and features three main novel aspects.

  • First, it analyzes forecasts for 178 countries in seven economic regions (Africa, Central and Eastern Europe, the Commonwealth of Independent States (CIS) countries, and Mongolia, developing Asia, the Middle East, the Western Hemisphere, and the advanced economies) since 1990. Earlier evaluations had focused on forecasts for the same variables for only the G-7 countries and regional aggregates.

  • Second, it includes an extensive comparison between the accuracy of WEO forecasts and Consensus forecasts. The latter is a widely used source that compiles the forecasts of economists working in the private sector. Through this comparison, the report assesses WEO forecasts not just against absolute benchmarks, but also against a relative benchmark of other forecasters.

  • Third, it considers the revisions to the forecasts, both over time and within each forecast round. The latter is important because there is a long gestation lag in the preparation of the forecasts in each round, and it is important to know the gains—in terms of accuracy—of frequent forecast updates.

This summary highlights the main findings of the report.

A. How Accurate Are WEO Forecasts?

The first part of the paper examines selected aspects of the WEO forecast performance. In all cases, the analysis considers current-year and next-year forecasts prepared in April and September for a variable. (For example, the April and September 2005 WEOs have projections for 2005 (current year) and 2006 (next year).) Overall, the report finds that WEO forecasts for many variables in many countries meet the basic forecasting quality standards in some, if not all, dimensions. The paper also finds some important issues, which are discussed on a variable-by-variable basis.

  • Real GDP growth. WEO forecasts for real GDP growth display a tendency for systematic overprediction—that is, predicted growth, on average, tends to exceed actual growth (Table S1). From a statistical perspective, these biases are most significant in the next-year forecasts. The results also indicate that systematic overpredictions of real GDP growth are particularly prevalent in forecasts for countries with an IMF program. This tendency for overprediction of growth performance is also persistent over time. The evidence suggests that WEO forecasts for some countries could be improved if more attention were paid to important international linkages. In particular, forecasts of U.S. GDP growth are positively and significantly correlated with current-year forecast errors of output growth in a substantial number of advanced economies. (The forecast of German GDP growth also has predictive power over output growth forecast errors in some regions.) The report also finds that, in some cases, accuracy problems appear related to the standing WEO assumption that the output gap is eliminated after five years. In particular, the paper notes a predominant negative relationship between the output gap and the forecast error in the GDP growth, notably for Germany, France, and Italy.

  • Inflation. The report finds a bias toward underprediction of inflation, with these biases significant in the next-year forecasts for many African, Central and Eastern European, and Western Hemisphere countries. The underprediction bias is generally found to be weaker in the current-year forecasts. With regard to their predictability, there is evidence that the next-year inflation forecast errors are often linked to U.S. GDP forecasts.

  • External current account balances. There appear to be fewer problems in the forecasts for current account balances as percentages of GDP, except for April next-year forecast errors, which, in some cases, are significantly biased or serially correlated. Moreover, general patterns in the direction of biases are not apparent.

Table 1.

Overview of Forecast Accuracy

(Averages across regions for April current-year forecasts and September next-year forecasts; fraction of economies in region where statistics are significant in parenthesis)

article image
Source: Author’s calculations.

Mean forecast error.

Coefficient on lagged forecast error in a regression of the forecast error on a constant and its lagged realization.

In percent of GDP.

Besides the basic accuracy of WEO forecasts, the report also examined a number of other issues of interest.

Directional accuracy of forecasts. The results suggest that the WEO forecasts are quite successful in predicting the directional change for current-year real GDP growth and inflation, but somewhat less so for next-year forecasts.

  • Performance of WEO forecasts during the 2001 downturn. WEO forecasts of GDP generally overpredicted growth in 2001 in all regions, which is consistent with the broad patterns among forecasters in earlier downturns. For 2002, the April and September next-year WEO forecasts prepared in 2001 overpredicted growth in six of the seven regions, although revisions in the April 2002 WEO greatly reduced the forecast errors in four regions.

Revisions from board to published forecasts. WEO forecasts are published twice a year in April and September. Prior to publication, a first set of predictions is presented to the IMF Executive Board in February and July. Subsequently, the forecasts are revised before they are published.

These revisions add considerable informational value. For the February/April same-year forecasts, the average reduction in the forecast error is around one-fifth for the advanced economies. The reduction is nearly 30 percent for the July/September same-year forecasts, but only 5 percent for the next-year forecasts.

B. Long-Run Forecasting Performance for G-7 Countries

Taking advantage of the fact that a longer dataset starting in the early seventies exists for the G-7 economies, the report assesses WEO forecasts for these economies in more detail. Overall, the results suggest that the forecast accuracy has deteriorated somewhat since the last evaluation (Artis, 1996). In particular, WEO forecasts systematically and significantly overpredicted economic growth for all the European G-7 economies and Japan during 1991–2003. In contrast, U.S. growth was underpredicted after 1990, although the bias was not found to be statistically significant. In contrast, inflation was strongly and significantly overpredicted for Canada, France, Japan, and the United States during the 1990s and 2000s, although it was underpredicted by a significant margin for Italy.

These findings have at least two possible, not mutually exclusive, explanations. One is that output growth and inflation have been subject to structural breaks, such as a break toward higher productivity growth in the United States. Another possibility is that the underlying assumptions—such as the assumption that the output gap will be eliminated over a 5-year period—has led to biases.

C. Comparison of WEO and Consensus Forecasts

The report compared the WEO projections to Consensus forecasts for GDP growth, inflation, and the current account balance over the period 1990–2003.2 The data covers all the G-7 economies, seven Latin American economies (Argentina, Brazil, Chile, Colombia, Mexico, Peru, and Venezuela) and nine Asian economies (China, Hong Kong SAR, India, Indonesia, Korea, Malaysia, Singapore, Taiwan Province of China, and Thailand).

Overall, the comparison suggests that the forecast performance of the WEO is similar to that of the Consensus forecast—the current-year WEO forecasts of GDP growth in the G-7 economies are generally less biased than the current-year Consensus forecasts, but the bias in the next-year forecasts is stronger in the WEO than in the Consensus across the board. The paper highlights, however, that the timing of the comparison with the Consensus forecast matters. WEO current-year forecasts generally perform quite well against current-year Consensus forecasts reported in March and perform considerably better against the February Consensus forecasts. However, given the relatively long gestation lag in the preparation, they tend to perform considerably worse against the Consensus forecasts reported in April. With the possible exception of next-year inflation forecasts, there is little systematic evidence that the WEO forecasts could be improved by modifying them to account for information embodied in the Consensus forecasts.

D. Recommendations

The report makes the following recommendations to improve the WEO forecasting process:

  • Timeliness of information is key to forecasting performance. There are systematic gains in forecasting accuracy from using the latest available information. These gains were found in both the comparison of the forecasts of the Executive Board version to those of the published version of the WEO and in the comparison between the WEO and the Consensus forecasts. Clearly, the updating process is already adding significant value, especially for the G-7 economies, but more could be done. It is therefore important that IMF country economists update their projections just before publication.

  • Continuous monitoring of forecasting performance. The empirical analysis indicated structural instability in some of the underlying variables, especially real GDP growth and inflation, which is consistent with the broad evidence of instability in macroeconomic variables provided in academic studies. Given the presence of what appears to be systematic biases in forecasting performance for output growth and inflation, particularly after 1990, the possibility of instituting real-time forecasting performance indicators should be explored.

  • Use bias-adjusted forecasts as a guidance. The simplest and most obvious approach to improving forecasts is to shrink the forecast toward its bias-corrected value. While simple to implement if the bias can be precisely estimated, this approach may also be too mechanical and suffer from its own deficiencies, including, for example, the assumption that the bias remains constant through time. Nevertheless, a comparison of unadjusted forecasts with bias-adjusted forecasts can help in enhancing our understanding of the magnitude and direction of any biases that may exist.

  • Forecasts of risk. In view of the inherent uncertainty associated with forecasts, the report strongly recommends that, in the future, the WEO incorporate recent advances in modeling and forecasting risk/uncertainty, including, for example, by presenting the full probability distribution of key variables over time.

  • Review the output gap assumption. The WEO forecasts are based on scenarios that assume that the output gap is removed within a relatively short period. Since some of the countries with the largest output gaps were also found to be countries for which the WEO forecasts systematically overpredicted output growth, this could be a concern. Hence, an analysis that explores the cost and benefits of using this practice is called for. Also, more frequent reviews of estimates of potential output growth may be needed.

The report is organized as follows. Section II describes the principal dataset containing the WEO forecasts and outcomes. Section III introduces the statistical methods used to test the optimality properties of the WEO forecasts. This is followed by two empirical sections that go through the basic characteristics of the forecasts (Section IV) and statistical significance of tests of forecast inefficiencies (Section V). Section VI presents evidence on predictability of WEO forecast errors by means of a range of instruments, such as the WEO prediction of U.S. and German GDP growth, oil prices, the output gap, and the global current account discrepancy. Section VII conducts directional-accuracy tests, while Section VIII studies the information reflected in the process whereby forecasts are revised and updated from their discussion by the Executive Board to the published version of the WEO. Section IX considers the performance of the WEO forecasts during the most recent recession and recovery years; while, conversely, Section X looks at the long-term performance of WEO forecasts for an extended data set on the G-7 countries. Section XI compares the WEO forecasts to the Consensus values, while Section XII analyzes the potential gains from combining these two sets of forecasts. Section XIII looks at recommendations for modifications to the WEO forecasts and forecasting procedures, and Section XIV concludes.

II. Description of Weo Dataset

A. Data Coverage

To assess the forecasting performance, we make use of the fact that four sets of short-term forecasts are available for the same variable since the WEO publishes both April and September current- and next-year forecasts. For example, four forecasts of GDP growth in the year 2000 are reported, namely the April and September 1999 next-year forecasts and the April and September current-year forecasts. Access to different forecast vintages allows us to address issues such as whether (and by how much) the error in the forecast gets reduced as the time towards the target date is shrunk. It also allows us to test another efficiency property embedded in an optimal forecast, namely that forecast revisions should themselves be unpredictable. In some cases we find evidence of significant biases in revisions, suggesting simple ways of improving upon the forecasts.

The WEO data set contains information on 178 countries over the period 1990–2003. These countries are collected into seven groups or regions, namely Africa (50 countries), Central and Eastern Europe (15), CIS and Mongolia (13), Developing Asia (24), Middle East (14), Western Hemisphere (33), and Advanced Economies (29). Data availability and data quality vary significantly across regions and there can be significant differences even within each region. Data quality and the extent to which outliers affect the results also depends on the type of variable being analyzed.

There are five variables for which forecasts are available, namely GDP growth, export volume, import volume, inflation, and current account balance in US dollars or as a percentage of the underlying economies’ GDP. Our analysis focuses on current and next-year forecasts. Both can be considered short-term forecasts. Longer-term forecasts are not further pursued in the analysis because of the rather short data sample which is unlikely to make a statistical analysis of long-term forecasting performance particularly informative.

B. Timing Conventions

Since the target variables are subject to data revisions, a choice has to be made concerning which data vintage to use to measure realized values or outcomes. To this end we follow common practice and use the first-available data in the April WEO issue of year t + 1 to measure the outcome of the predicted variable in period t (labeled yt) while next-year forecasts for period t + 1 are compared to the realized values for year t + 1 (yt+ 1) reported in the September WEO issue of year t + 2.

We will also make use in the analysis of the fact that we have both April and September forecasts of same-year and next-year realizations. This means that we have two sets of current-year forecasts generated in April and September, y^t,t,Apr, y^t,t,Sep, and two sets of next-year forecasts generated during the same months and labeled y^t+1,t,Apr, y^t+1,t,Sep In this notation the first subscript indicates the period being predicted while the second subscript indicates the year when the forecast was generated. The superscript indicates the month of the WEO issue where the WEO forecast was reported. This convention gives rise to four separate forecast errors:

article image

In addition we will also consider current year and next year forecast revisions defined as3

article image

Table 2 presents basic information on data coverage within each of the regions for the five variables of interest. A maximum of 14 annual data points (1990–2003) is available (13 for next-year forecasts that begin in 1991). The third column reports the number of observations (averaged across countries within each region) that the forecasting analysis makes use of after deleting missing observations and cases where the forecast is exactly identical to the realized value. This leads to a significant trimming of data in some regions. For example, at least eight September current year forecasts are available only for 41 out of 50 African countries and only 11 of the 24 Developing Asian economies had more than eight data points for this variable. Fortunately data on April and September next-year forecasts tend to be more complete although again there are some countries with incomplete data. Measured by data coverage, the data set is most complete for the Advanced Economies and least complete for CIS and Mongolia. Although data coverage does not vary a great deal across variables, the current account data tends to contain somewhat fewer observations.

Table 1.

Sample Size, by Variable and Region

article image
article image
Source: Author’s calculations.
Table 2.

Descriptive Statistics for Forecast Errors, by Variable and Region1

(Averages across countries in region)

article image
article image
article image
article image
article image

Except for the median which is the median of the mean forecast error across countries.

III. Properties of Optimal Forecasts

To evaluate the quality of the WEO forecasts it is necessary to establish a set of testable properties that an optimal forecast should have. In this section we discuss the nature of such properties. In all cases the properties are established under the assumption that the objective function is of the mean squared error (MSE) type so the forecasts minimize a symmetric, quadratic loss function. Different properties hold for other loss functions. For a theoretical discussion and derivation of these properties, see Patton and Timmermann (2004).

A. Unbiasedness and Lack of Serial Correlation

Most fundamentally, an optimal forecast should be unbiased and serially uncorrelated. Define the generic forecast errors for period t or t + 1, computed at time τ, as

et=yty^t,τ(τt),et+1=yt+1y^t+1,τ(τt+1).

To test the basic unbiasedness and uncorrelatedness properties, one can perform simple regressions

et=α+εt(1)
et+1=α+βet+εt+1.(2)

For an efficient forecast we must have α = 0 (unbiasedness) in (1) and α = 0, β = 0 in (2), implying unbiasedness and absence of serial correlation. The first regression gives rise to a simple student-t test of α = 0, while the second leads to an F-test. Adding the forecast, ŷt+ 1, τ, to both sides of equation (2), this regression is easily seen to be equivalent to the conventional Mincer-Zarnowitz (1969) levels regression

yt+1=α+βy^t+1,t+εt+1.(3)

In this regression unbiasedness of the forecast translates into a requirement that α = 0, β = 1.

B. Efficiency Properties More Generally

Unbiasedness and absence of serial correlation in the forecast errors can be thought of as weak efficiency requirements. A much more general and stricter orthogonality condition holds for optimal forecasts under MSE loss. Since an optimal forecast should be the conditional expectation of the predicted variable of interest, if the forecaster uses all available information efficiently, then no variable in the current information set should be able to predict future forecast errors. To test this, let zt be any such variable in the forecaster’s information set at time t, Ωt. An implication of informational efficiency is that α = β = 0 in the regression

et+1=α+βzt+εt+1,(4)

where εt+1 is a serially uncorrelated, zero-mean error term. The relationship between unbiasedness and absence of serial correlation on the one hand (equation (2)) and informational efficiency according to (4) more generally is similar to the relationship between the weak and semi-strong versions of the market efficiency hypothesis. According to the weakly efficient hypothesis, past values of the variable itself should not help predict future values. The semi-strong version tightens this restriction by requiring that no publicly available information should help forecast future values.

C. Forecast Revisions: Efficiency Tests Without Measurement Problems

Forecast revisions are of fundamental interest in a forecast evaluation exercise for one simple reason: If a sequence of forecasts is optimal, then the forecast revisions should themselves be unpredictable (technically a martingale difference sequence).

Indeed, if this were not the case and, say, forecast revisions between April and September were themselves predictable, then the original (April) forecast would not be optimal. Suppose, for example, that it is known that on average the September forecast of next-year output growth tends to be ¼ of 1 percent higher than the April forecast. Then the April forecast should be revised upwards by this amount to reflect the better information available in September of each year.

Another advantage of studying revisions is that predictable patterns in revisions, if detected, automatically tells the forecaster how to improve the original forecast, namely by amending it by the fitted value of the forecast revision. Hence if the April forecast of the revision in the forecast between April and September is

rev^t,tSep=α^+β^zt,

then the original April forecast, y^t,tApr can be replaced by an improved forecast, y˜t,tApr, as follows:

y˜t,tApr=y^t,tApr+rev^t,tSep(5)

More generally, if ΩtSep is the forecaster’s information set in September, ΩtApr is the information set in April—which is a sub-set of the September information set, ΩtAprΩtSep — and if forecasts are formed optimally as conditional expectations, i.e.

y^t,tApr=E[yt+1|ΩtApr]y^t,tSep=E[yt+1|ΩtSep]

then by the law of iterated expectations E[y^t+1,tSep|ΩtApr]=y^t+1,tApr and so the revision, defined as revt+1,t=y^t+1,tSepy^t+1,tApr must be zero-mean:

E[revt+1,t|ΩtApr]=0.(6)

A similar result holds for the current-year revisions, revt,t=y^t,tSepy^t,tApr:

E[revt,t|ΩtApr]=0.(7)

Notice, however, that in general E[revt+1,t|ΩtSep]0 and E[revt,t|ΩtSep]0 provided that any new information of use to the forecaster arrives between April and September of year t.4 It is worth pointing out that we ignore the effect of estimation errors, which can induce serial correlation in the forecast errors even if the forecaster knows the true model. This is akin to learning effects—see Timmermann (1993) for a discussion of this point in the context of predictability of financial returns.

An important implication follows from these simple results: Forecast optimality can be tested without having data on the target variable, y. This is important since, given the availability of different vintages of the target variable, it is not clear whether the forecasts should be compared to first-issue, second (revised) or the “final” data revision. This matters considerably in practice as witnessed by the recent literature on “real-time” macroeconomic data, (see Croushore, 2005). By analyzing data revisions we can effectively construct a test that is not sensitive to how well the underlying data is being measured.

D. Non-Increasing Variance of Forecast Errors as Forecast Horizon Is Decreased

A final property that an optimal forecast should have is that the variance of the forecast error should decline as more information becomes available. This means that the April current year (next year) forecast errors should have a greater variance than the September current-year (next year) forecast errors:

Var(et+1,tSep)Var(et+1,tApr)Var(et,tSep)Var(et,tApr).(8)

Intuitively this simply reflects that more information about the outcome in the current or next year is known in September than in April of the same year. This can be formally tested through a variance ratio test or (more appropriately given the small sample size here) by considering patterns in the variance of forecast errors associated with different forecast horizons.

IV. Empirical Results

With the dataset and benchmark properties of an optimal forecast in place, we proceed to analyze the empirical evidence. Table 3 reports summary statistics for the forecast errors and forecast revisions grouped by variable and region. We show the mean, median, and standard deviation of the forecast error, the average absolute value of the coefficient of first-order serial correlation in the forecast errors and the percentage of positive values of the forecast error. In all cases these statistics are computed based on the cross-section of countries within a particular region. For example, the median value is the median of the mean values across countries in a given region, while the standard deviation is computed across the mean values for countries in the region. In the sequel we discuss the main empirical findings.

Table 3.

Tests for Biasedness and Serial Correlation of Forecast Errors

(Share of countries in region with significant test statistics)

article image
article image
article image
article image
article image
Source: Author’s calculations.

The bias coefficient α is defined in equation (1) in the main text.

Fraction of bootstrapped p-values for the null hypothesis α^=0 that are smaller than 0.05 in a two-sided test.

The serial correlation coefficient β is defined in equation (2).

Fraction of bootstrapped p-values for the F-test of the joint null hypothesis of α = 0 and β = 0 that are smaller than 0.05.

Fraction of significant test values (p-value of less than or equal to 0.05) for a test of the null hypothesis that the fraction of positive forecast errors equals 0.5.