I How Accurate Are the IMF’s Short-Term Forecasts? Another Examination of the World Economic Outlook
  • 1 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund


This paper analyzes the short-term forecasts for industrial and developing countries produced by the IMF and published twice a year in the World Economic Outlook. For the industrial country group, the forecasts for output growth and inflation are satisfactory and pass most conventional tests in forecasting economic developments, although forecast accuracy has not improved over time, and predicting the turning points of the business cycle remains a weakness. For the developing countries, the task of forecasting movements in economic activity is even more difficult and the conventional measures of forecast accuracy are less satisfactory than for the industrial countries. [JEL: E17, E37, F17, F47]

The World Economic Outlook forecasts, published twice a year, are comprehensive in their coverage, both of countries and of economic variables, and only a part of the whole is examined here. The evaluation is directed at the accuracy of short-term forecasts for key economic variables for the seven major industrial (G-7) countries and for regional aggregates of developing countries. This concentration on the value of the forecasts follows the precedent of an earlier examination by the present author (Artis, 1988), which itself built on a previous analysis by Kenen and Schwarz (1986), and was subsequently updated and supplemented by Barrionuevo (1993).

The postmortem analysis of forecasts calls for two cautionary notes. First, for many commentators the principal value of the World Economic Outlook may lie in its analysis of the conjuncture, its diagnosis of the situation reached by the world economy, and its evaluation of the options available to the world’s policymakers—rather than in the fine detail of its short-run forecasts. Second, from the perspective of strengthening global economic policymaking and performance in the longer run, the IMF’s medium-term projections and scenario analyses are arguably more relevant than the short-term forecasts. However, it must remain true that the quality of the IMF’s analysis should be reflected in its forecast of the near-term evolution of the world economy and, as these forecasts are reported with considerable precision and detail, they offer the most accessible and feasible means of bringing quantitative analysis to bear on the quality of the IMF’s conjunctural analysis.

The forecasting methods employed by the IMF partly dictate the choice of evaluation methods. These forecasts are not produced in the framework of an overall econometric model, so that forecast postmortem methods applicable to model-based forecasting (see, for example, Osborn and Teal, 1979; Artis, 1982; and Wallis and others. 1984) are not appropriate. Rather than relying on a global model (with intervention by the forecaster) to produce forecasts, procedures at the IMF rely heavily on forecast information provided by country desk officers, so that optimal use can be made of available country-specific information. Overall economic consistency is provided in two stages: first, by setting common global assumptions on which country desk officers base their work and, second, by aggregating and checking for consistency in the individual country output, trade, and balance of payments projections provided by the country desk officers. Inconsistencies revealed by the aggregation result in iterations on the original country forecasts until an acceptable set of forecasts is arrived at. The global assumptions specified to the country desk officers in a World Economic Outlook forecasting round will typically include the values to be assumed for oil prices and assumptions to be made regarding key monetary and fiscal policy variables and sensitive market variables such as exchange rates. In general, policy variables are taken to be given at current values or at publicly projected values if firm commitments have been made by the governments concerned. Thus, in principle, and like most official forecasts, those in the World Economic Outlook are formally presented as projections based on “unchanged policy” assumptions: however, it is certainly difficult for forecasters to maintain such an assumption strictly, for much of the market information “in the air” at any point of time (including such relevant indicators as interest rates, exchange rates, and business expectations) will reflect, inter alia, anticipation that the values of policy variables may be changed in the future. Such anticipations will also, of course, be reflected in forward-looking market variables. It can be argued, in fact, that much of the greater part of any genuine policy innovations will in general not be felt until some time after the horizon of the forecast. For these reasons, the general practice of treating “unchanged policy” projections as “total” or “unconditional” forecasts is followed in this study.1

The paper first describes the principal definitions of forecast and outturn used in the study and comments on the selection of variables examined. Next, it discusses the evaluation methods used and presents the main results—first for industrial countries and then for developing countries. Conclusions are given in the final section. For details on the sources for forecast and outturn data, a statistical characterization of the corresponding data distributions, and the full data listing used, see Artis (1996).

Basic Definitions and Methods of Evaluation

As in the previous studies mentioned earlier, this study employs two definitions of forecast horizon with corresponding outturn. The “paradigm” World Economic Outlook timetable provides for publication twice a year, in May and October; the forecasts themselves are finalized in April and September. Correspondingly, a “current-year forecast” is defined as the forecast for year x appearing in the May issue of the World Economic Outlook for year x. The outturn data, described as “first available estimates.” are taken from the issue of the World Economic Outlook appearing in May of year x + 1. Thus, the “current-year” forecast corresponds to a near-term forecast, made at a time when some data for the first quarter of the year in question are already on hand for most, though not all, countries; and the realization for the year as a whole is identified with the data available in the first publication of the following year. Next, a “year-ahead” forecast is also defined, which is of longer term. Thus, the “year-ahead” forecast for year x is found in the issue of the World Economic Outlook for October of year x − 1; the realization for this forecast is identified with the data published in the issue of the World Economic Outlook for October of year x + 1, These data are termed “first settled estimates.”

These definitions were first suggested by Kenen and Schwarz (1986) and were employed in the earlier study by Artis (1988). They provide for a test of the sensitivity of the forecast to its horizon. There is no clearly agreed definition of the “correct” vintage of realization data to employ. These data are continuously revised and the forecast postmortems are dependent, in detail, on the choice of data vintage made. A more common practice than that employed here is to use the latest available data—a mixed set of revision vintages. This reflects an understanding of the objective of forecasts, which is that they aim to “forecast the truth,” while the nearest to the revealed truth on hand at any time is the latest available set of data.2 But this may not be the way in which the forecasts are evaluated by their constituency, where a higher premium on immediate predictive accuracy may be found. It is arguable that confronting the forecaster with the latest available set of realizations obliges him to forecast the data revision process as well as to predict the immediate evolution of the data he has available. In practice, there is probably little of general significance in the results that depends on the vintage of realization data employed.3 The definitions of forecast and outturn given above apply to “paradigm” World Economic Outlook publication schedules. In practice, and especially in the period before the forecasts were made public, the intervals between reporting are sometimes erratic and the interpretation of “current-year” and “year-ahead” forecasts with their associated outturn has to be adjusted correspondingly, See Artis (1996) for the precise sources of forecast and outturn data.

The World Economic Outlook forecasts are rich in detail. It would be excessively burdensome to process all the series for which forecasts are made. This paper concentrates on projections for GDP, inflation, the balance of payments, and the growth of imports and exports. These choices coincide with those made in the previous study. The most detailed forecasts are for the industrial countries group—specifically the seven major industrial countries, and the larger part of the study is devoted to them. For developing countries, the analysis is confined to regional aggregates.

The study examines the whole World Economic Outlook forecasting record from its inception in 1971 to 1994. The length of the series now available allows this study to examine whether any significant change has occurred in the IMF’s record over time, particularly in the interval since the previous study.

The literature is replete with a large number of suggested forecasting evaluation techniques (for a survey, see Wallis, 1989). Rationality considerations suggest that a “good” (rational) forecast should produce errors that are unbiased and display an absence of serial correlation: evidence to the contrary would suggest immediately that an improving correction could be made to the forecast process. In addition, it ought not to be possible to show that the forecast errors could be explained (hence potentially reduced) by taking account of any information available at the time the forecast was made (such as, for example, information provided by alternative forecasting procedures). The first two desiderata of a rational forecast can be tested for directly by applying the appropriate econometric procedures to the series of forecast errors. To test for the efficiency of the forecast procedure in the broader sense involves the evaluator in determining what might be critical information and testing to see whether indeed forecast error can be explained by it. An immediate difficulty is that the set of possibly relevant information is huge. Evaluators have generally concentrated on an easily available subset, stressing in particular the possible relevance of the forecast values themselves, and the forecasts that could have been produced by alternative naive—or not so naive—time series forecasts. The first set of information is exploited exclusively by the “realization-forecast” regression introduced by Mincer and Zamowitz (1969): this regression has the attractive property that it is clear what parameter restrictions would correspond to the perfect forecast. Results for this regression featured extensively in Artis (1988) and do so again in the current one.

Forecast evaluation traditionally looks to some alternative forecasting procedure to provide a benchmark against which to appraise the performance of the procedures under examination. One set of alternatives is provided by simple time series models. Traditionally, the potential contribution of alternative, naive models has been filtered through the Theil (1966) statistic, which is computed as the ratio of the root mean square error (RMSE) of the forecast in question to the RMSE of the naive alternative (in Theil’s original exposition the “no change” forecast). In practice, the naive alternative may be represented by a “not-so-naive” model, such as, for example, a BVAR (for such an application, see Artis and Zhang, 1990). This study presents Theil statistic computations both for the original naive interpretation and for a less naive alternative based on a knowledge of the trend. However, these computations simply provide point estimates without any accompanying significance level. A more recent extension of this form of testing against an alternative has been formulated so as to provide significance tests. Tests of this type, in the form of the “MSE regression,” are also introduced in the present study.

The previous study (Artis, 1988) made extensive comparisons between the forecasts produced in the World Economic Outlook with those produced by the Organization for Economic Cooperation and Development (OECD), and by individual national official forecasters. This extensive comparison of official forecasts was notable chiefly for the finding that the major forecasting errors were widely shared across the official forecasting community and for emphasizing the importance of timeliness to good forecasting.4 This time, the comparison is with private sector forecasts. The extent to which it is possible to do so is limited, however, since the Consensus Forecasts that are used are only available from the latter part of 1989. The comparison is thus confined to the part of the study that investigates the forecast record through the last cycle.

In addition lo testing the quantitative forecast, it is well recognized that an added dimension of a forecast is the directional information it contains. Leitch and Tanner (1991, 1995) have shown that accurate directional information is important for business users of forecasts; for policymakers, correctly predicting the turning point in the business cycle is also of separate and significant importance to quantitative accuracy. For this reason this study also includes tests of directional accuracy and discusses some aspects of turning point forecasting in the latest business cycle.

Finally, the study also examines how general the prediction errors are across the economies of the world. Interdependence between economies might be expected to result in a synchronization of the business cycle, leading individual national forecasters to commit forecasting errors of similar sign. This indeed was a finding of the earlier study. The IMF, by reason of its position, should in principle be better placed to “internalize” international interdependence in its forecasting procedures.

Industrial Countries

Basic Facts

The summary table (Table 1) and the four figures (Figures 14) give an immediate impression of the quality of the IMF forecasts for output growth and inflation, both for the current-year and the year-ahead forecasts. Table 1 provides evidence on the questions of bias and persistence.

Table 1.

Test for Biasedness and Serial Correlation of Forecast Error in Industrial Countries

article image
Notes: The test for biasedness is based on the regression expressed as et = β0 + μt, where et, is the forecast error, and the significance level of the t-statistic for β0 = 0 is reported. The Ljung-Box Q-statistic is used to measure serial correlation, and the Q-statistic up to M lags may be expressed as Q(M)=T(T+2)jMρ^j2/(Tj).. Under a null hypothesis of no serial correlation. Q is asymptotically distributed as a χ2.
Figure 1.
Figure 1.

World Economic Outlook Forecast: Real GDP Growth in Industrial Countries—Current-Year Forecast and First Available Outturn

Figure 2.
Figure 2.

World Economic Outlook Forecast: Real GDP Growth in Industrial Countries—Year-Ahead Forecast and First Settled Estimate

Figure 3.
Figure 3.

World Economic Outlook Forecast: Inflation in Industrial Countries—Current-Year Forecast and First Available Outturn

Figure 4.
Figure 4.

World Economic Outlook Forecast: Inflation in Industrial Countries—Year-Ahead Forecast and First Settled Estimate

Bias may be identified with the significance of the mean forecast error, as indicated by a simple regression of the error on a constant term (see Holden and Peel, 1990). In the table, the value of the mean forecast error, β0, is shown both for output growth and for inflation for the seven major industrial countries for each of the types of forecast distinguished. In parentheses are shown the significance levels or probability values at which the null (mean equal to zero) might be rejected. As indicated, these values are generally far in excess of the significance levels that it is customary to employ in this type of situation (0.01, 0.05, or 0.10).

Generally, then, the evidence is that these forecasts are not, on a country-by-country basis, biased. This evidence seems especially strong for the current-year forecasts of output growth—stronger than for the corresponding year-ahead forecasts, for example, and a similar account holds true for the inflation forecasts. It is worth noting, however, that the qualification “country-by-country” may be a little misleading. The fact is that all of the point estimates of bias in the GDP growth rate forecasts are positive—suggesting that there may be a widespread error of output growth optimism. Indeed, when the individual country observations are pooled, the result is a finding that there is significant positive bias in the year-ahead forecasts of just over 0.5 percent a year, but when the period is divided into two (the First subperiod terminating in 1982), it appears that this bias is overwhelmingly due to experience in the first sub-period; bias is not significant in the later period. For the current-year forecasts, the pooling did not reveal any significant bias for the period as a whole (a product of some positive bias in the first subperiod and some negative bias in the second).5

Serial correlation in the time series of the forecast errors itself is tested by the Ljung-Box Q-statistic, significance levels for the null (no serial correlation) being shown in parentheses. Test statistics are reported for up to three orders of autocorrelation. The forecasts for inflation appear to suffer from serial correlation in the errors far more than the output growth forecasts do. In the current-year forecasts for inflation, serial correlation is detected for both Japan and the United Kingdom; in the year-ahead forecasts, serial correlation affects the errors for France, Italy, and Canada. In the corresponding figure (Figure 4), it seems clear that serial correlation affects the errors for the seven major industrial countries as a whole. By contrast, the output growth forecasts are almost entirely free of serial correlation in the errors, even at the 10 percent level, with the single exception of the year-ahead forecasts for the United Kingdom, where serial correlation of the first order is detectable.

The overall conclusion is that on a country-by-country basis, looking at the period as a whole, there is little evidence of bias in the forecasts; when the data are pooled, where evidence of significant bias emerges, this is entirely because of earlier experience. The record in respect of an absence of serial correlation is somewhat less reassuring, especially in relation to the longer-term forecasts of inflation. The rather more favorable impression given by the current-year forecasts than by the year-ahead projection is borne out by the graphical evidence of Figures 14. Figures 1 and 3 for the current-year forecasts give a strong impression that these forecasts are highly accurate and that errors are soon canceled. Figures 2 and 4, for the year-ahead forecasts, indicate much greater variability in the accuracy of the projections.

Further Summary Results

Further summary statistics are given in Tables 26. These report the mean (average) absolute error, the mean absolute actual value (for comparison), the RMSE. and two Theil statistics. Each of these is constructed as the ratio of the RMSE of the World Economic Outlook forecast to the RMSE of a “naive” alternative. “Naive 1” is simply the original Theil “no change” forecast (here meaning “the same rate of growth (inflation and so on) as last year“) and “Naive 2,” which is a value equal to the trend. While Naive 1 corresponds to a random walk with no drift, Naive 2 is the opposite extreme of instant mean reversion. By construction, values of the Theil statistics in excess of unity indicate that the World Economic Outlook forecast is inferior to a forecast built on one of these two alternative extreme assumptions.6

Table 2.

World Economic Outlook Forecast Accuracy: Real GDP Growth in Industrial Countries

(in percent)

article image
Notes: The regression is expressed as Rt = β0 + β1Ft + μt, where Rt, is the realization in year t (first available outturn or first settled estimate) and Ft is the forecast for year t. Figures in parentheses are the significance level of the t-statistic for β0 = 0 or β1 = 1. The significance level of the F-statistic for the test of the joint hypothesis: β0 = 0 and β1 = 1, is reported. Naive 1 means a no-change forecast and Naive 2 means a forecast that is set at the trend (average value) for the period.
Table 3.

World Economic Outlook Forecast Accuracy: Inflation in Industrial Countries

(In percent)

article image
Note: For definitions, etc., see notes to Table 2.
Table 4.

World Economic Outlook Forecast Accuracy: Balance of Payments on Current Account in Industrial Countries

(In billions of U.S. dollars)

article image
Note: For definitions, etc., see notes to Table 2.
Table 5.

World Economic Outlook Forecast Accuracy: Growth of Export Volumes in Industrial Countries

(In percent)

article image
Notes: For definitions, etc., see notes to Table 2. Year-ahead data for the seven major industrial countries cover 1980–94.
Table 6.

World Economic Outlook Forecast Accuracy: Growth of Import Volumes in Industrial Countries

(In percent)

article image
Notes: For definitions, etc., see notes to Table 2. Year-ahead data for the seven major industrial countries cover 1980–94.

For output growth forecasts, and for inflation forecasts, the statistics reported in Tables 2 and 3 support two general propositions: first, these forecasts are superior to the naive alternatives posed; second, the performance of the current-year forecasts is notably better than that of the year-ahead forecasts: RMSEs are some 50 percent bigger in the year-ahead forecasts than in the case of the current-year forecasts; and the size of the mean absolute error is also generally larger by a similar margin.

The balance of payments forecasts (Table 4) are much less satisfactory. While current-year forecasts are again generally superior to the year-ahead projections, even in the former case the Theil statistic exceeds unity in the case of Canada: in the year-ahead forecasts, those for Italy and for all industrial countries exceed unity (Naive 1) or are very close to unity (Naive 2).

Tables 5 and 6 report results for export and import growth. These are comparable with those obtained for output growth.

The summary statistics clearly support the propositions that current-year forecasts are better than the longer-term year-ahead projections and that the balance of payments forecasts are markedly weaker than those for output growth, inflation, and the growth of export and import volumes. These findings are much in line with those arrived at in the earlier study on a smaller data set, as will be amplified further below.


A test of weak efficiency is represented by the realization-forecast equation:


where Rt is the realization, Ft is the forecast, and μt is an error term.

Since RtFt + et where e is the forecast error, the estimate of β1 in the equation would significantly differ from unity if in fact Ft and et are correlated. But if they are, the forecast could be improved. It is in this sense that the realization-forecast equation can be thought of as a weak efficiency test. An efficient forecast would yield an estimate of β1 that is not significantly different from unity, and an estimate of β0 that is not significantly different from zero. Otherwise, again, there would be a simple way of improving the forecast. Since estimates of β0 and β1 are generally likely to be correlated, the appropriate test of whether these desirable restrictions (β0 = 0, β1 = 1) hold is a joint one (Wallis, 1989).7 Tables 26 report estimates of realization-forecast regressions for output growth, inflation, the balance of payments, and export and import growth and show the significance level of the F-test for the joint restriction.

The results are reasonably reassuring regarding the efficiency of these forecasts. Certainly, in Table 2 (output growth), the evidence in favor of efficiency is strong: with the exception of Japan, the significance levels reported exceed the customary value (0.05) by a substantial margin: albeit this margin is bigger for the current-year forecasts than it is for the year-ahead forecasts. The results reported for forecasts of inflation are also generally reassuring: the exception is the year-ahead forecast for the United Kingdom. Turning to the balance of payments (Table 4), there is again evidence of a much weaker performance. The year-ahead balance of payments forecasts for Italy, Canada, the seven major industrial countries as a group, and for individual countries as a whole all fail the weak efficiency test. The forecasts for the growth of exports and imports are, however, all highly satisfactory from this point of view.

In summary, the forecasts generally perform well in relation to the test for weak efficiency. It is, however, entirely possible for a forecast to be efficient in this sense, yet to be poor in some other key respects. A forecast may satisfy the tests for bias and serial correlation in its errors and those for weak efficiency without being the minimum variance forecast and without being good enough for its purpose.

World Variables

Table 7 reports the summary statistics and estimates of the realization-forecast regressions discussed above for two key “world” variables: the growth of world trade and industrial countries’ terms of trade. Estimates of world trade in the World Economic Outlook are widely used by national forecasting agencies in their own forecasts in which world variables arc “exogenous.” The evidence of Table 7 is reassuring in this respect, since the data reported strongly support the efficiency of the corresponding forecasts, and they appear to be superior by a margin to the two naive alternatives. For the terms of trade forecasts, the results are less reassuring. While superior to naive forecasts in RMSE terms, they are strikingly inefficient.

Table 7.

World Economic Outlook Forecast Accuracy: World Trade Volumes and Terms of Trade

(In percent)

article image
Notes: For definitions, etc., see notes to Table 2. Current-year data for industrial countries’ terms of trade cover 1974–94.

MSE Regression Tests

Ashley and others (1980) suggest a procedure for examining the statistical significance of the difference between the mean square errors of pairs of forecasts. Originating in the context of a causality study, the test is directly applicable to an evaluation of alternative forecasts and has been used as such by, among others, Stekler (1991) and Kolb and Stekler (1993). Where, as in these studies and the present one, the alternative forecast is the original Theil (1966) naive random walk model, the test can be regarded as supplying significance levels in a context in which forecast comparison is otherwise carried out by simple inspection of the point value of the Theil statistic. In the present case, this supplementary examination confirms the handful of particularly weak Theil statistic performances already noted above (Tables 26).

The basis for the test is the “MSE regression”:


where δ is the difference (in this case) between the error of the naive forecast and the error of the World Economic Outlook forecast and σ is the sum of these errors (σ¯ its mean). The null—in this case that the World Economic Outlook cannot improve on the naive forecast—can be rejected, in the case that both β1 and β2 are nonnegative, if a joint F-test for β1 = β2 = 0 is satisfied or, either β1 or β2 being negative (but not significantly so) a t-test on the other coefficient shows it to be not significantly different from zero. If either β1 or β2 is negative, the null cannot be rejected.

These tests can be shown to be equivalent to appropriate tests on an expression that defines the difference in mean square error of each of the two forecasts (see, for example, Ashley and others, 1980).

The results of this regression test are shown in Tables 8 (p. 14) and 9 (p. 15). Nearly all the forecasts are shown as superior to the naive (in the sense that the naive does not improve on the World Economic Outlook forecast). Exceptions arise for the balance of payments forecasts (France and Canada for the current-year forecasts; France and Italy for the year-ahead forecasts). It has already been shown that the balance of payments is the most poorly forecast variable, and it is for the balance of payments that the Theil statistics (Naive 1) appeared least satisfactory. According to the MSE regression test, however, the year-ahead forecasts for inflation (Table 9) for Japan and the United Kingdom are also unsatisfactory. While Japan (Table 3) had the highest Theil statistic, that for the United Kingdom was quite low: but it may be recalled that the realization-forecast regression for the United Kingdom was adverse in this case and, more relevant, that this was one of the few cases where bias was shown to be significant.

Table 8.

MSE Regression Test: Current-Year Forecast

article image
Notes: Figures in parentheses are two-sided significance values of the t-statistic for β1 = 0 or β2 = 0, “Reject” denotes that the null hypothesis (β1 = β2 = 0) is rejected at the 5 percent significance level and “no” means no rejection of the null at the 5 percent significance level.
Table 9.

MSE Regression Test: Year-Ahead Forecast

article image
Note: See notes to Table 8.

World Economic Outlook Forecasts Over Time

The availability of data over a comparatively long period of time as in the full sample lends strength to the statistical verdicts it is possible to deduce from the record. However, it is interesting to see whether the forecast record has improved over time. At one level, this question may be answered by simply inspecting the error statistics and looking for reduced values: this does not allow, however, for