The medium-term predictability of exchange rate movements is examined using three models of fundamentals: purchasing power parity, the monetary model, and uncovered interest parity. While the first two approaches yield favorable in-sample results, these largely reflect finite-sample estimation biases. Adjusting for these biases, there is little evidence of predictability, consistent with the lack of systematic improvement in out-of-sample forecasting performance relative to a random walk. Uncovered interest parity fares better at long horizons, but reflects information already embodied in market prices; in this sense, it may not be useful as an indicator of exchange rate misalignment. While more elaborate models of fundamentals might have better medium-term forecasting properties, careful attention must be paid to finite-sample biases in assessing predictability.


The medium-term predictability of exchange rate movements is examined using three models of fundamentals: purchasing power parity, the monetary model, and uncovered interest parity. While the first two approaches yield favorable in-sample results, these largely reflect finite-sample estimation biases. Adjusting for these biases, there is little evidence of predictability, consistent with the lack of systematic improvement in out-of-sample forecasting performance relative to a random walk. Uncovered interest parity fares better at long horizons, but reflects information already embodied in market prices; in this sense, it may not be useful as an indicator of exchange rate misalignment. While more elaborate models of fundamentals might have better medium-term forecasting properties, careful attention must be paid to finite-sample biases in assessing predictability.

I. Introduction

Since the pioneering study of Meese and Rogoff (1983), it has been widely recognized that exchange rate movements are difficult to predict in the short run. There is less agreement, however, on the predictability of longer-horizon changes. If exchange rates are ultimately tied down by economic fundamentals, there is a presumption that they will tend towards levels consistent with such fundamentals over time, even if short-term movements are erratic. Yet attempts to find evidence of the convergence of exchange rates to fundamentals have met with mixed success, and positive results have often proved less robust than initially believed. The question of predictability also has implications for identifying exchange rate misalignment: if exchange rates cannot be forecast, then misalignments (however defined) have no implications for future movements.

The mixed evidence on medium-term exchange rate predictability reflects, in part, econometric problems with estimating forecasting equations in general. In particular, there has been increasing recognition of a tendency toward over-fitting regression models in finite samples in the presence of autocorrelated regressors (as originally discussed by Stambaugh (1986)). Results are then biased toward finding evidence of predictability based on in-sample analysis. In this vein, recent studies of stock market returns have typically overturned earlier evidence in favor of predictability based on fundamentals such as dividend yields and earnings. In the case of exchange rate predictability, the econometric problems are particularly important given the relatively short sample period available under generalized floating — i.e., post 1973.

This paper revisits the issue of medium-term exchange rate predictability, paying close attention to finite-sample biases and differences between in- and out-of-sample performance. The analysis is confined to three relatively simple models of economic fundamentals: purchasing power parity (PPP); the “monetary model” of the exchange rate; and uncovered interest parity (UIP). These models are chosen, both bccause they have been actively studied in the past, and because their implementation is relatively straightforward and unambiguous. Many other, less parsimonious models have been developed to explain exchange rate movements.1 They are not considered here, either because they have not been framed as forecasting models, or because their eclectic structure and data demands significantly complicate the assessment of forecasting ability.

To anticipate the conclusions, we find that both PPP and the monetary model do not robustly predict medium-term exchange rate movements: apparent in-sample success does not translate into out-of-sample forecasting performance, consistent with estimation bias. A tendency toward finding greater predictability at medium-term horizons is due to an increase in the magnitude of the bias as the horizon for the exchange rate change increases. UIP fares somewhat better at long horizons of ten years or so. However, UIP simply reflects the forward rate implied by market interest differentials, and thus does not add to existing market information on future exchange rate movements. In this sense, even its modest success does not imply an ability to assess exchange rate misalignments, defined as predictable deviations in exchange rate outturns from market expectations.

The paper is structured as follows. The next section discusses the evidence based on PPP, including the econometric issues associated with estimating mean-reverting relationships in finite samples. The third section reviews the monetary model and associated estimation issues, while the fourth assesses the performance of UIP. The last section provides concluding remarks.

II. Purchasing Power Parity

There is a huge literature on the hypothesis that exchange rates return to levels consistent with PPP (see Rogoff (1996) for a survey). To the extent that PPP holds, if not continuously then over the medium term, the current deviation in the real exchange rate from PPP should predict future exchange rate movements. While many studies have covered this ground, we revisit the hypothesis in light of recent critical analysis of the predictive power of such models. Another innovation is that we define PPP in terms of real effective exchange rates (REERs), as opposed to real bilateral rates. While the logic underlying PPP should hold for both, its strength is likely to depend on the importance of trade linkages between countries. Using effective exchange rates weighted according to these linkages should raise the ratio of signal to noise compared with using bilateral rates, increasing the efficiency of estimates and test power.

The tests below generally use CPI-based measures of REERs from the IMF’s Information Notice System (INS), as they are often employed to assess competitiveness.2Additional tests are performed with the GDP deflator-based measures used in the staff’s multilateral exchange rate assessment exercise.3 The mean-reverting properties of these data are examined for the G-7 currencies, and also for selected currencies of smaller industrial and emerging market countries.4 Estimation was performed using monthly and annual data to compare the results at different frequencies. Medium-term predictability was tested using two approaches: either iterating forward predictions from the one-year ahead regressions, or directly estimating equations with the five-year change in the REER as the dependent variable. In addition, threshold variables were added to test for nonlinearities in the adjustment process.

To preview the findings, the coefficients on lagged deviations of REERs from means were generally significant, with implied reversion speeds similar to those found elsewhere. Implied half lives of deviations from PPP were typically 3-4 years. Less favorably, these results can be attributed to finite-sample bias in the OLS estimator of the adjustment coefficient. After accounting for this bias, no evidence of mean reversion was found, as reflected in the model’s poor out-of-sample performance. Regressions using five-year changes appeared to perform even better using in-sample data, but the out-of-sample evidence was again unfavorable.

Some evidence was found that threshold effects are important, with the speed and significance of adjustment appearing to increase sharply for deviations in exchange rates from means of greater than about ±10 percent. While this result appears to be larger than can be explained by small-sample bias, as calculated using Monte Carlo studies, there was again no evidence that threshold effects helped improve out-of-sample forecasting performance.

A. Econometric Issues

The basic mean-reversion regression for one-period changes using monthly and annual data was:5


where lnreerm is the sample mean of ln(reer).6 For five-year changes, the regression was:


In the case of the five-year regressions, the standard errors of the parameters were corrected for MA(4) error terms using the Newey-West (1987) approach.

There are several complications associated with estimating regressions of this type. The first is that the OLS estimate of β is biased in finite samples, leading to estimated mean- reversion speeds that exceed their true value. In particular, as discussed in Stambaugh (1986) and (1999), the least-squares estimator of γ in the AR(1) regression:


is biased towards zero in finite samples. The bias can be approximated by:


where T is the sample size. To express equation (3) in the form of the mean-reversion regressions, it can be rewritten as:


where βγ−1 and the mean of xt is subsumed in the constant term α. The expected value of the estimated slope parameter in equation (5),β^, is then given by:


So β^ will show excess mean reversion in finite samples for any plausible value of β. Consider the null hypothesis that x is a random walk and thus γ = 1 and β = 0. E(β^) is then approximately −4/T. In monthly estimation, a sample period of 1980-2001 gives about 250 observations, implying an expected value for the mean-reversion parameter of -0.015 and an implied half-life of deviations from mean of 3.8 years under the null hypothesis of a random walk.7 In annual estimation, with 21 observations, the expected value of β^ is ‐0.190 and the implied half life is 3.3 years under the null of a random walk.

For regressions using five-year changes in the exchange rate, the small-sample bias has not been derived analytically. We approximate it using Monte Carlo simulations, using data generated on the null hypothesis of a random-walk data-generation process (DGP):


Figure 1. shows resulting regression parameters over samples of different lengths for one-year and five-year ahead regressions, based on 1,000 replications of process (7). The parameters using one-year changes are very close to the analytical results given by formula (6). Those using five-year changes imply greater mean reversion: for a sample length of 17 years, equal to that used here for the 5-year regressions, the estimated parameter is -0.789. To take account of these small-sample biases in the estimates of β, the tables below show both raw parameters and bias-adjusted values. For the one-month and one-year ahead regressions, the bias is calculated using the analytical approximation; for the five-year change regressions, it is based on the Monte Carlo results.

Figure 1
Figure 1

Parameter Bias in Mean-Reversion Regressions

(true DGP is a random walk)

Citation: IMF Working Papers 2003, 021; 10.5089/9781451843934.001.A001

Another perspective on finite-sample bias is given by Monte Carlo tests of how the adjustment parameter varies as the horizon for the exchange rate change increases, holding the length of the sample period fixed. Generating a random walk over a 21-year sample period, again with 1,000 replications, yielded the following results for the mean-reversion regression:

article image

The size and significance of the adjustment parameter increases steadily as the horizon of the exchange rate change lengthens, giving the misleading impression that predictability rises at medium-term horizons. Similarly, the R¯2 increases dramatically at longer horizons. These results underscore the dangers of making inferences about predictability based on in-sample tests, especially over medium-term horizons.

Separate from the issue of parameter bias, there is also a problem of size bias in hypothesis tests of mean reversion. As discussed in Engel (2000), if the true DGP does not correspond to the autoregressive specification implicit in the mean-reversion regressions, there will be a tendency to over-reject the hypothesis of a random walk. Engel shows that the size bias in such tests may be huge, even in samples of 100 years, when the exchange rate is driven by a mix of stationary and nonstationary processes. There does not, however, appear to be a straightforward way of assessing this bias without prior knowledge of the true DGP; Berkowitz and Giorgianni (2001) have shown that bootstrap methods, in particular, are not robust in this context.

A third problem in estimating mean-reverting processes is raised by Taylor (2001), specifically that the use of time-averaged data, when the underlying process takes place at a higher frequency, biases downward estimated speeds of adjustment. This effect would work in the opposite direction to the small-sample bias discussed above. Taylor conducts Monte Carlo experiments to calculate the time-averaging bias on the assumption that the true process is daily but estimation is performed using either monthly or annual average data. In either case, he finds that a true daily half-life of 2 years would be biased up by about 50 percent, giving a value using time-averaged data of around 3 years. Furthermore, analytic results show that the ratio of the estimated to the true half life tends to a finite limit exceeding unity as the true half life goes to infinity. Taylor does not provide a limiting value for this ratio, but his tables suggest that it is approximately 1.5. Neither does he analyze the sensitivity of the bias to sample size: although the bias itself is inherently an asymptotic phenomenon, it is not clear how it would interact with the small-sample bias discussed above.

Conceptually, however, there is the issue of how the time-averaging bias relates to the question addressed here — that of predicting exchange rate movements, as opposed to estimating “true” half lives in the underlying data. Taylor’s analysis shows that time- averaging data can bias estimates of underlying half lives. Nevertheless, when time-averaged data are all that are available, the best unconditional predictor of the future exchange rate change still should be based on the biased parameter value, not the conceptual underlying value at a higher frequency. This is because the population value of the adjustment parameter changes when the data are time-averaged. So the estimated value is an unbiased measure of the population value using time-averaged data. For the purposes of exchange rate forecasting, then, the time-averaging bias is not relevant.8

There could still, however, be biases in the size of hypothesis tests that the adjustment parameter is zero when time-averaged data are used. Taylor does not address this issue, but one might expect under-rejection of the hypothesis that the data are a random walk using time-averaged data. To explore this issue, Monte Carlo experiments were run where the DGP was mean reverting with a half-life of three years at a daily frequency. The data were replicated 1,000 times over sample lengths of both 21 years (corresponding to that used below in estimation) and of 250 years (assumed to be roughly asymptotic). The mean- reversion regressions were run on both the daily data and time-averaged monthly data, and the average adjustment coefficients, t statistics, and half lives were calculated. The results were as follows:

article image

The t statistics on the slope parameters in the monthly regressions are indeed lower than those in the daily regressions, confirming a size bias in testing the null hypothesis that the process is a random walk. The results also shed light on the net bias arising from small-sample and time-averaging problems. In the “short” sample of 21 years, the estimated half life of 3.2 years using monthly data is close to the true half life of 3 years, indicating that the small-sample and time-averaging biases are roughly offsetting. Using daily data, the results are affected only by the small-sample bias, giving a half-life estimate roughly one half of the true value — consistent with the approximation in formula (6). In the 250-year sample, the test size distortion diminishes slightly; the half-life estimate using daily data is close to the true value of 3 years, while that using monthly data rises to about 50 percent above the true value, consistent with Taylor’s results.

In practice, it is difficult to adjust for half-life bias arising from time averaging under a general null hypothesis. The approach taken below is to adjust the β^ s by a constant equal to the asymptotic bias in the constrained GMM estimate. This gives a sense of the effect of time-averaging on the underlying half-life estimates, although many of the estimated adjustment parameters are of a perverse sign when corrected in this way. In any event, as indicated earlier, this adjustment is for presentational purposes, as the best estimate of the future exchange rate change will still reflect the unadjusted parameter.9

Yet another issue, again raised by Taylor (2001), arises from assuming a linear adjustment specification when the true process is nonlinear. A simple example involves threshold levels at which the adjustment speed of the exchange rate changes discontinuously. The adjustment parameter will then reflect a mix of the two processes that occur within and outside of these thresholds, and will be correct in neither case. This bias is addressed in the next section, which introduces nonlinearities to the error-correction specification.

This discussion suggests that attempts to assess predictability on the basis of insample evidence are subject to multiple difficulties. Furthermore, adjustments to account for biases in parameter estimates and test size distortions are likely to be sensitive to the specification of the null hypothesis. With these considerations in mind, considerable emphasis is placed on out-of-sample predictive performance in the following analysis.10

B. Linear Tests of Mean Reversion

Raw and adjusted estimation results for equation (1) using monthly data on CPI-based REERs are shown in Table 1, along with implied half lives of mean reversion in cases where the slope parameter is negative. The adjustments take into account both the approximate finite-sample bias from formula (6), as well as the ad hoc correction for using time-averaged data. No t-statistics are provided for adjusted parameters, given uncertainties about their distribution under the null hypothesis. All of the raw estimates of β for individual countries are negative, consistent with mean-reverting behavior, although only 3 of the 7 are significant at the 95 percent level. The adjusted R2s are close to zero, and occasionally negative. Implied half lives range from a low of 2.2 years for the Canadian dollar to 11.2 years for the U.S. dollar. The equations were also estimated as a system using generalized method of moments (GMM), with the constraint imposed that the adjustment parameter was the same across countries (this was not rejected by the data). The common adjustment parameter of -0.015 is highly significant, implying a mean-reversion half life of 3.7 years.

Table 1.

Estimation Results for Monthly Mean-Reversion Regressions

Regression: n(reert/reert1)=α+βn(reert1/reem¯)

article image


  • Sample period: 1981M1-2002M2.

  • reer¯ is the sample mean of reer.

  • Absolute values of t statistics in parentheses.

  • Half lives expressed in years.

  • Small-sample bias calculated from formula (6) in text.

  • Bold italics indicates a parameter is significant at the 95% level.

The second column of Table 1 shows the adjustments to the parameters and half lives needed to correct for small-sample bias. Only two of the adjusted parameters remain negative, and the adjustment half lives are almost tripled to 6 years in those cases. The constrained parameter is about zero, suggesting that the series are approximately random walks. Finally, the third column presents estimates corrected for both the small-sample and time-averaging biases. The latter adjustment is crude, however, for the reasons discussed earlier, and amounts to subtracting 0.008 from each of the parameters shown in the second column. In any case, all of the parameters except that for the U.S. become negative again, but imply long half lives; the constrained estimate is negative with an implied half life of 7 years.

The picture using monthly data, then, is that unadjusted estimates of half lives are in the range of those found in other studies. Mean-reverting behavior, though, appears to be an artifact of small-sample bias. After correcting for this bias, the data appear to approximately follow a random walk. A further correction for time-averaging bias typically restores negative adjustment parameters, but with long implied half lives. This suggests that there could be mean reversion in the underlying process if it occurs at a higher frequency, although this would still not imply “forecastability” using time-averaged data.

Table 2 shows results using annual data. Again, the unadjusted parameters are all negative and imply half lives in the range found previously. As in the case of the monthly data, the correction for finite-sample bias substantially changes the results, with only three of the seven country-specific parameters remaining negative, and the pooled parameter turning positive. The further adjustment for time averaging again generally restores negative parameters, but with long implied half-lives.

Table 2.

Estimation Results for Annual Mean-Reversion Regressions


article image


  • Sample period: 1981-2001.

  • reer¯ is the 1980-2001 mean of reer.

  • Absolute values of t statistics in parentheses.

  • Half lives expressed in years.

  • Small-sample bias calculatcd from formula [(6)] in text.

  • Bold italics indicates a parameter is significant at the 95% level.

Results using 5-year changes are shown in Table 3. The raw parameters are now centered around roughly minus one, suggesting complete mean reversion at this horizon. The R¯2 s are generally above 0.5, indicating that over half of the variance in real exchange rate changes can be predicted by deviations from PPP over 5-year horizons. After correcting for finite-sample bias, however, the signs of several of the coefficients are reversed, and the remainder become much smaller in absolute value. Nevertheless, two remain negative and statistically significant — those for the U.S. and Canada. The pooled parameter also remains negative and significant. On this basis, there is mild evidence of greater predictability of exchange rate movements at medium-term than short-term horizons.

Table 3.

Estimation Results for Five-Year Mean-Reversion Regressions


article image


  • Sample period: 1985-2001.

  • lnreerm is the 1980-2001 mean of ℓn(reer).

  • Absolute values of t statistics in parentheses.

  • Small-sample bias calculated from Monte Carlo results.

  • Bold italics indicates a parameter is significant at the 95% level.

Out-of-sample tests of predictability are inherently unaffected by finite-sample biases.11 We assess out-of-sample performance in terms of goodness of fit, defined as the ratio of the root-mean-squared error (RMSE) of forecasts based on rolling regressions to the root-mean-squared deviation (RMSD) of ex post changes in the data. If this ratio (the Theil statistic) exceeds one, the model is out-predicted by a random walk; if less than one, the regression model out-predicts a random walk.12 If the in-sample estimates are reflective of the out-of-sample properties of the equation, the square of this ratio has an expected value of one minus the R¯2 of the regression.

Table 4 shows the Theil statistics for the monthly and annual equations. The out-of- sample period for the monthly changes is 1991M1-2002M2; for the annual changes 1991-2001; and for the five-year changes 1995-2001. This provided a 10-year estimation window for the first out-of-sample prediction in each case. The means of the REERs for the error-correction terms were based on information known up to the end of the estimation period. Values for the Theil statistics of less than unity are indicated by bold italics. For one- month ahead predictions, the model outperforms a random walk only for Canada. For both the one-year and five-year ahead predictions, just three of the seven statistics are less than unity, indicating that a random walk works better in the majority of the cases. Taken as a whole, the results indicate that the mean-reversion model outperforms a random walk only about one third of the time. These out-of-sample results, then, are consistent with the picture provided by the in-sample estimates after bias correction. Only for the Canadian dollar is there consistent evidence of mean reversion.

Table 4.

Theil Statistics for Out-of-Sample Forecasts from Mean-Reversion Regressions

article image


  • Ratio of the RMSE of the model to the RMSD of Δℓn(reer).

  • Bold italics indicate a value below unity, indicating that the model outperforms a random walk.

The mean-reversion regressions were also run using the GDP deflator-based data over the same sample period. In terms of coverage, these data combine the euro-area countries in a singe aggregate, while including series on several smaller industrial countries (Australia, Denmark, New Zealand, Norway, Sweden, and Switzerland). For the large countries, the estimation results were similar to those obtained with the CPI-based data — the exception is Canada, where the GDP-deflator based series shows less evidence of mean reversion:

article image

Pooling the larger and smaller industrial countries into separate groups and estimating the common adjustment parameter within each group revealed a notably larger mean-reversion parameter for the smaller countries, with a β^ of−0.314 (and a |t| statistic of 5.31) as opposed to −0.174 (3.26) for the larger countries. Out-of-sample prediction tests for the smaller countries performed no better than for the larger countries. In all cases, the Theil statistics for year-ahead changes for the 1991-2001 period exceeded unity:

article image

C. Nonlinear Adjustment and Time Trends

The above specification assumed that the underlying mean-reversion process is log linear. Some recent studies (e.g. Sarno and others (2002), and Sollis and others (2002)) have addressed the issue of whether the speed of adjustment changes as a function of the deviation of the exchange rate from mean. In particular, it is possible that transactions costs, tariffs, and similar frictions create a zone of movement in relative prices within which there is little or no scope for international goods arbitrage. One would expect to see evidence of mean reversion only when price deviations exceed the bounds of this zone.

Such effects can be tested for by adding terms to the standard regression that are either nonlinear functions of the deviations from mean, or that become operative when this gap reaches a certain threshold level (as in threshold autoregression, or TAR, models). One way of implementing the first approach is in the form a Taylor-series expansion of a general nonlinear function:


where ingap is the log of the ratio of the actual to the mean exchange rate. The significance of the coefficients on the power terms can be tested using standard methods. The threshold approach can be implemented in its simplest form as:


making the assumption that the size of the threshold (τ) and the shift in the adjustment parameter (θ) is the same on both sides of the mean exchange rate.

Preliminary tests of the two specifications indicated that the threshold model was more parsimonious and performed at least as well in terms of goodness-of-fit as the power- series approach. The appropriate size of the threshold was determined on the basis of log likelihood values from GMM estimation of the system of equations for the G-7 currencies, with common slope and threshold parameters imposed. The values of the log likelihood function (LLP) for different threshold levels (expressed in log deviation from mean) using monthly and annual data in estimation are:

article image

The highest LLF values arc shown in bold italics. They indicate that a threshold of about ±8 percent fits the monthly data best, while one of ±12 percent fits the annual data. In either case, the threshold variable is statistically significant.13 For simplicity, a common threshold of ±10 percent was used in further analysis at both data frequencies. With a threshold of this size, about 40 percent of the observations at a monthly frequency lie outside the threshold, and 37 percent at an annual frequency.

The in-sample adjustment parameters obtained by estimating equation (9) using all available observations, and imposing common parameters across the G-7 currencies, are:

article image

For both one-month and one-year changcs in the REERs, the estimated speeds of adjustment are significantly higher outside of the threshold (as reflected in the sum of β^ and θ^) than within (β^ only). Indeed, for one-year changes, the within-threshold adjustment parameter becomes insignificant. In contrast, the results for the five-year changes suggest that adjustment is weaker outside the threshold than within.

We focus here on the one-year changes, because they are more favorable to the threshold hypothesis than the five-year changes, and more relevant to medium-term exchange rate forecasting than the one-month changes. The above results were compared with those from Monte Carlo experiments using a random-walk DGP with N(0,1) innovations. The threshold was calibrated to yield the same percentage of observations outside the threshold as in the actual data (i.e. 37 percent). Using 1,000 replications for a 21-year sample, the fitted parameters were β^=0.278(0.50)andθ^=0.318(1.68). The parameters are similar in size, and the parameter applying to observations outside of the threshold is not statistically significant. These properties contrast with the actual estimation results, suggesting that the finding of threshold effects is not simply a finite-sample anomaly.

The out-of-sample performance of the annual threshold model was analyzed over the 1991-2001 period. In addition, the model was generalized by adding a linear time trend to see whether this improved forecasting performance. Theil statistics for the various versions of the model are shown below, with the lowest values for each currency shown in bold italics. The model with thresholds outperforms the basic linear model in only two out of the seven cases, rising to three when a linear time trend is added. The basic model augmented only by a time trend fails to yield the lowest Theil statistic for any of the seven cases. In fact, the unaugmented basic model yields the best results in four of the seven cases, suggesting that elaborations to the PPP specification are questionable from a forecasting perspective. At the same time, it should be noted that a random walk still unambiguously outperforms any variant of the PPP specification, except for Canada.

article image

D. Absolute Versus Relative PPP

The above are tests of relative PPP, as the model is specified in terms of reversion to a sample mean as opposed to an absolute measure of international price differences. The need to estimate both parameters — the sample mean and the adjustment speed — leads to the finite-sample bias discussed above. Using an exogenous measure of price differences, as opposed to the sample mean, implies estimating just the adjustment speed, potentially reducing inference problems.

While absolute PPP is not directly tested here, it is of interest to examine the theoretical impact of using absolute PPP on finite-sample biases to evaluate how promising this route would be. For this purpose, Monte Carlo studies were performed using a random- walk DGP, but with the constant term in the regression constrained to zero — i.e., the known population value. Regressions were estimated for exchange rate changes from 1 to 5 years over a 21-year sample, based on 1,000 replications. The results for regressions that both included and excluded the constant term are as follows:

article image

Rejection rate using a 5 percent nominal test size.

It is apparent that the slope parameter is still biased (i.e., negative) when the constant term is constrained to its true value of zero, although the bias is reduced to roughly one sixth its original size. Interestingly, however, the bias in the width of confidence intervals for hypothesis tests remains large, especially as the horizon lengthens. In particular, the null hypothesis of a random walk should not be rejected more often than the nominal test size. With a 5 percent test size, however, the null hypothesis was rejected about 12 percent of the time using 1-year changes in the dependent variable, and 55 percent of the time using 5-year changes. So the likelihood of incorrectly rejecting the null hypothesis using absolute measures of PPP may still be inappropriately large, even though the parameter value itself is much less biased. These results, then, suggest that tests of absolute PPP may reduce some problems associated with relative PPP, but are not a complete solution.

III. Monetary Model

Another popular framework for analyzing exchange rate “fundamentals” is the monetary model. In addition to continuous PPP, it assumes uncovered interest parity (UIP), a stable money demand function, and exogeneity of money and real income with respect to the exchange rate, the equilibrium exchange rate at a point in time et* can be expressed (in logarithms) as:


where mtmtf is the gap between the logs of the domestic and foreign money supplies, ytytf is the gap between real incomes, and y is the income elasticity of money demand (assumed for simplicity to be the same across countries). Assuming further that deviations in the actual exchange rate from its equilibrium level arc temporary, the monetary model implies the following error-correction specification for changes in the exchange rate:


where k is some horizon over which disturbances to the exchange rate are assumed to decay.

Given the evidence against each of the individual building blocks of the monetary model, it might be surprising if a framework based on their collective performance worked well empirically.14 Indeed, many subsequent tests of the type initially conducted by Meese-Rogoff concluded that the monetary model was not useful in forecasting near-term exchange rate movements. Some researchers in the mid-1990s, however, appeared to arrive at better results by looking at the medium-term performance of the model. Examples are Mark (1995), Chen and Mark (1995), Mark and Choi (1997), and MacDonald and Taylor (1994), which found that the monetary model contained predictive power that increased over the medium term.15 Here we focus on the Mark (1995) results, as they are frequently cited. In his study, equation (11) was estimated over a variety of horizons for four currencies — the Canadian dollar, the deutsche mark, the Swiss franc, and the yen — against the U.S. dollar. The sample was based on quarterly data from 1973Q2 to 1991Q4. Bias-adjusted t statistics were calculated to assess the significance of the adjustment parameter 6, bootstrapping was performed to correct for size distortions in hypothesis tests, and out-of-sample Theil statistics were computed to assess goodness of fit relative to a random walk.

Mark found that, on either in-sample or out-of-sample criterion, the predictive power of the model increased as the horizon lengthened for three of the four currencies (the exception being the Canadian dollar, where the overall model did not work well at any horizon). At the longest horizon of 16 quarters, the root-mean-squared error (RMSE) of the model was about half that of a random walk. Taken at face value, this suggests that a high proportion — some 75 percent — of the variance in longer-horizon exchange rate movements could be predicted by the monetary model.

Several studies subsequently revealed, however, that Mark’s results were not robust, in spite of the apparently sophisticated econometrics. In particular, extending the analysis beyond Mark’s observation period revealed that the predictive performance of the model declined dramatically (Groen (1999), Kilian (1999)). Further, as shown in Faust and others (2001), the model only predicted well in a local region around the time that Mark performed his analysis, and its general historical performance deteriorated as the underlying economic data were revised. From an econometric viewpoint, the use of bootstrap methods to enhance statistical inference when the true parameter value is on the boundary of the admissible parameter space has been criticized by Andrews (2000).

Here we revisit the predictive performance for Mark’s model beyond the end of his sample period in 1991Q4, based on updated and revised data. Reestimating Mark’s equations for the 1973Q1-1991Q4 period yielded the following results at 1-quarter, 4-quarter, and 16- quarter ahead horizons (the t statistics for the 4-quarter and 16-quarter horizons have been adjusted using the Newey-West (1987) approach). As found by Mark, the results for the Canadian dollar are obviously unsatisfactory, with θ^ having the wrong sign at all horizons. For the other three currencies the results are more favorable, and qualitatively similar to those reported by Mark using his original dataset. Most notably, the adjustment coefficients become larger and much more significant as the horizon extends to 16 quarters.

article image

Regressions with moving end points were then used to generate out-of-sample forecasts for 1992Q1-2001Q4. The resulting Theil statistics are shown below. Theil statistics are also shown for an alternative interpretation of the monetary model, where the dependent variable defined as the future change in the monetary aggregate as opposed to the exchange rate. This modified version would be more appropriate than the original model — which assumes that monetary aggregates are exogenous — on the assumption that monetary aggregates are endogenous in the context of modern financial systems and monetary policy frameworks.

article image

When the exchange rate is used as the dependent variable, the best prediction results are for the Canadian dollar — ironically, the currency for which the in-sample estimation results were perverse. Inspection of the results revealed that the adjustment parameter gradually changed sign in the mid-1990s, becoming significantly positive by the end of the sample period. While the model for Canada outperforms a random walk over the more recent period, then, there would have been little basis for using it based on the historical evidence. The model also appears to predict movements in the deutschemark somewhat better than a random walk, especially at the 16-quarter horizon. For the Swiss franc and the yen, a random walk generally outperforms the monetary model. In terms of forecasting future changes in monetary aggregates, in contrast, the monetary model uniformly outperforms a random walk, except for Canada. These results contrast with evidence in favor of medium-term exchange rate predictability found by Mark, underscoring the pitfalls of relying on in-sample data and limited out-of-sample observations. In spite of sophisticated attempts to correct for biases in parameter estimates and test sizes, it turns out that in-sample evidence in favor of the monetary model was misleading.

A more recent paper by Mark and Sul (2001) acknowledges the critiques of the earlier studies. Thus, they focus instead on short-horizon results, reestimating the monetary model for OECD countries using data from 1973Q1 to 1997Q1. Using the 1982Q1-97Q1 period for out-of-sample testing, they find that the model outperforms a random walk for 13 of 18 exchange rates at a one-quarter horizon, and for 17 of 18 exchange rates at a 16-quarter horizon. Revisiting this analysis using currently published data for the same group of countries yielded less favorable results — Theil statistics were obtained of less than unity for only 4 out of the same 18 currencies, suggesting that a random walk outperformed the monetary model at the one-quarter horizon.16 Taking a second out-of-sample period of 1997Q2-2001Q3 yielded Theil statistics of less than unity for 5 of the 8 currencies for which data were available. While slightly better than the earlier findings, these results clearly fail to provide persuasive support for the monetary model. It is not clear whether these differences in findings are the result of subsequent data revisions or differences in econometric methodology.

More recently, Rapach and Wolnar (2002) have looked for evidence in favor of the monetary model using much longer historical time series than employed in earlier studies — typically annual time series starting the late 1800s or early 1900s up to the 1990s. They obtain the following Theil statistics for out-of-sample, one-year ahead forecasts, using subsets of their overall data series (typically the second half of the series):

article image

In only 3 of the 7 cases is the Theil statistic less than unity. In other words, a random walk performs better than the monetary model for a majority of the countries. Rapach and Wolnar view the results — specifically the three cases with Theil statistics of less than unity — as evidence of the success of the monetary model, at least for some countries. Alternatively, the results can be interpreted as showing that the monetary model works no better, and perhaps slightly worse, than a random walk, with Theil statistics distributed around a mean of roughly unity.

In any event, support for the monetary model appears tenuous in spite of several attempts to find a predictive relationship. This is consistent with the empirical shortcomings of the building blocks of the model, perhaps most importantly the assumption that monetary aggregates can be taken as exogenous in predictive regressions. Under modern financial systems and approaches of central banks to monetary policy, it seems much more plausible that money is endogenously determined by other variables in the system.

IV. Uncovered Interest Parity

Under the joint hypothesis of risk neutrality and efficient markets (RNEMH), the forward rate should be an unbiased predictor of the future exchange rate; it should also be a “sufficient statistic,” in that no other information should add power to the forecast. In this sense, the above frameworks implicitly either assume the absence of RNEMH, or take a reduced-form approach that combines UIP with other relationships in an underlying model.

Past tests have overwhelmingly rejected UIP/RNEMH at short horizons, but the evidence appears more favorable at longer-term horizons (Meredith and Chinn (1998), and Alexius (2001)). Here we reexamine the medium-term performance of UIP from a forecasting viewpoint. In performing out-of-sample tests, UIP has one striking advantage over the previous models, specifically that there are no parameters to estimate. Conventional estimation of UIP relationships, of course, typically include a constant term and an unconstrained slope coefficient. But, unlike the models looked at above, there are unambiguous theoretical priors about these parameter values under certain assumptions -— i.e. zero and one respectively. With these constraints imposed, there is no need to perform estimation. Thus, all available data can be treated as out-of-sample observations.

Taking this approach, we compare the performance of UIP with that of a random walk over both short- and long-term horizons. The interest-rate data are updates of the series described in Meredith and Chinn (1998); exchange rates are from the IMF’s International Financial Statistics. All data are end-period. Interest rate differentials are defined as the difference between the domestic and U.S. dollar yield on the corresponding instrument, while the exchange rate is the price of U.S. dollars in units of domestic currency.

Theil statistics are shown in Table 5 for UIP forecasts at 1-year, 5-year, and 10-year horizons, although at the longer horizons the country coverage is limited by the availability of interest rate data. At the 1-year horizon, UIP outperforms a random walk for only one out of the six currencies (the Italian lira) consistent with the extensive evidence on its failure at short-term horizons. At the 5-year horizon, results are available for only three currencies–the Canadian dollar has a Theil statistic of less than unity, while those for the deutschemark and the pound are both unity. At the 10-year horizon, in contrast, the Theil statistics are all less than unity using either constant-maturity bonds or benchmark government bond yields, with values that indicate an appreciable increase in forecast accuracy. For instance, the average Theil statistic of 0.74 at this horizon implies a reduction in the variance of the forecast error of 0.742, or close to one half.

Table 5.

Theil Statistics for UIP Forecasts

article image


  • Bold italics indicate values less than unity.

These results suggest that UIP may be useful in forecasting exchange rate movements at longer horizons. The implications for assessments of exchange rate misalignment are not clear-cut, however. UIP defines a path for the exchange rate that is consistent with the implied forward rate existing in markets. Should the current exchange rate be considered misaligned in situations where its value is forecast to change, but in a way that is consistent with the forward rate in markets? Or should this be viewed as a situation where the exchange rate is in equilibrium, but one that is changing over time? The answer depends on the purpose of identifying misalignments. In particular, if the goal is to anticipate market tensions that will arise in the future from exchange rate misalignments, then one would only want to consider predictable deviations in the future rate from the forward rate. In this case, UIP would not be a useful criterion for identifying misalignment.

V. Concluding Remarks

The above results suggest that both PPP and the monetary model, in their basic forms, are of little use in forecasting medium-term movements in exchange rates. In-sample evidence in favor of these models reflects parameter bias arising from the use of finite samples for estimation. Because the bias tends to increase as the length of the horizon for exchange rate movements increases, there is a greater likelihood of finding spurious evidence of predictability at medium- than short-term horizons.

From a conceptual perspective, this does not necessarily imply that the fundamentals underlying these models, e.g., movements in international price levels, are not relevant for exchange rate determination. Rather, we conclude that they do not contain leading information that would make them useful in predicting exchange rate movements. Fundamentals could still play a role in determining exchange rates, but not have forecasting content, if other unobserved determinants of exchange rates are nonstationary. In this case, deviations in observed exchange rates from the levels determined by a subset of fundamentals will not tend to decay over time, violating the error-correction behavior underlying both PPP and the monetary model.

More fundamentally, the hypothesis of predictability based on these models sits uncomfortably with the notion that exchange rates are determined in efficient markets by profit-maximizing agents. In this case, models capable of reliably predicting exchange rate movements would yield profit-making opportunities. Their exploitation by market participants would then lead any information embodied in these models to be incorporated in the observed spot value of the exchange rate, as opposed to predictable movements in future rates.

Of course, it is well-known that deviations from the risk-neutral efficient-markets hypothesis must exist, as least at shorter horizons, to explain the biasedness of forward rates as predictors of future spot rates. But the models that have been developed to explain this anomaly, for instance those based on time-varying risk premia, have empirical implications that bear little relationship to the fundamentals-based models described above. In any case, the results for UIP presented above suggest that forward rates become better predictors of exchange rate movements at long as opposed to short horizons. This is consistent with the view that markets incorporate information on current and future fundamentals in the current level of the spot exchange rate.

The conclusions drawn here about medium-term predictability of exchange rate movements are, of course, limited to the relatively simple models discussed above. There may well be augmented versions of these models, or completely different frameworks, that yield more favorable results. The above analysis, however, suggests the need for caution in making inferences about predictability on the basis of in-sample results. Furthermore, interpreting such results and associated parameter biases is likely to become more difficult as the complexity of the model increases, given inherent scope for over-fitting relationships as models are refined. This, in turn, suggests the need to corroborate the results of in-sample analysis with out-of-sample evidence.

Medium-Term Exchange Rate Forecasting: What Can We Expect?
Author: Mr. Guy M Meredith