Are Currency Crises Predictable? A Test

Contributor Notes

Authors E-Mail Address: aberg@imf.org; cpattillo@imf.org

This paper evaluates three models for predicting currency crises that were proposed before 1997. The idea is to answer the question: if we had been using these models in late 1996, how well armed would we have been to predict the Asian crisis? The results are mixed but somewhat encouraging. One model, and our modifications to it, provide useful forecasts, at least compared with a naive benchmark. The head-to-head comparison also sheds light on the economics of currency crises, the nature of the Asian crisis, and issues in the empirical modeling of currency crises.

Abstract

This paper evaluates three models for predicting currency crises that were proposed before 1997. The idea is to answer the question: if we had been using these models in late 1996, how well armed would we have been to predict the Asian crisis? The results are mixed but somewhat encouraging. One model, and our modifications to it, provide useful forecasts, at least compared with a naive benchmark. The head-to-head comparison also sheds light on the economics of currency crises, the nature of the Asian crisis, and issues in the empirical modeling of currency crises.

I. Introduction

In recent years, a number of researchers have claimed success in systematically predicting which countries are more likely to suffer currency crises. The Asia crisis has stimulated further work in this area, with several papers already claiming to be able to “predict” the incidence of this crisis using pre-crisis data.2

It may seem unlikely that currency crises should be systematically predictable. In practice, they usually seem to come as a surprise. Since the exchange rate is an asset price, it is reasonable to doubt that sharp and predictable movements are consistent with the actions of forward-looking speculators. Early theoretical models of currency crises suggested that crises may, however, be predictable even with fully rational speculators. They emphasized an inconsistency between the maintenance of a currency peg and other economic policies. The signs of this inconsistency, such as large government deficits or declining reserves, should help predict crises. A central insight of these models is that even if the crisis and its timing are fully predictable (for example, because excessive money creation is leading to a steady loss of reserves), speculators will wait until reserves are below some critical level before they attack.3

Later analyses have extended this picture in several ways, partly inspired by the apparent absence of weakening fundamentals prior to the successful attacks on various EMU currencies in the early 1990s. These so-called “second generation” models have made the point that currency crises could represent not the result of a deteriorating underlying situation but instead a “jump” from one equilibrium, the pegged regime, to another, the devalued or floating regime. As with a bank in the absence of deposit insurance, two equilibria are possible: one with default (here devaluation) and one without. In this view, a country may be in a situation in which an attack, while not inevitable, might succeed if it were to take place. The exact timing of crises would be essentially unpredictable. Even here, though, it may be possible to identify whether a country is in a zone of vulnerability, that is whether fundamentals are sufficiently weak that a shift in expectations could cause a crisis. In this case, the relative vulnerability of different countries might predict relative severity of crises in response to a shock such as a global downturn in confidence in emerging markets.4

It is one thing to say that currency crises may be predictable in general, however, and another that econometric models that are estimated using historical data on a panel or cross-section of countries can foretell crises with any degree of accuracy. Here the question is whether crises are sufficiently similar across countries and over time to allow generalizations from past experience, and whether adequate data on the signs of crisis are available. Each crisis episode presents unique features, and many factors that may indicate a higher probability of crisis, such as inadequate banking supervision or a vulnerable political situation, are not easily quantified.

The possible endogeneity of policy to the risk of crisis may also limit the predictability of crises. For example, authorities within a country, or their creditors, might react to signals so as to avoid crises.5 On the other hand, a focus by market participants on a particular variable could result in its precipitating a crisis where one might not otherwise have occurred.

Ultimately, the question of whether crises are predictable can only be settled in practice. The recent work claiming success in predicting crises has focussed almost exclusively on in-sample prediction, that is on formulating and estimating a model using data on a set of crises, then judging success by the plausibility of the estimated parameters and the size of the prediction errors for this set of crises.6 The key test is not, however, the ability to fit a set of observations after the fact, but the prediction of future crises. Can the model predict the crises that are not in the sample used in its estimation? Given the relatively small number of crises in the historical data, the danger is acute that specification searches through the large number of potential predictive variables may yield spurious success in “explaining” crisis within the sample. The possibility that the determinants of crises may vary importantly through time also suggests the importance of testing the models out-of-sample.

The flurry of work between the 1994 and 1997 crises and the large number of crises observed in 1997 provides an excellent opportunity to test existing state-of-the-art “early warning systems” out-of-sample. This paper evaluates three different models proposed before 1997 for predicting currency crises. The idea is to try to answer the question: if we had been using these models in late 1996, how well armed would we have been to predict the Asia crisis?

We chose the following three approaches based on their promise as early warning systems and their success within sample:

  • Kaminsky, Lizondo, and Reinhart (1997) (hereafter KLR) monitor a large set of monthly indicators that signal a crisis whenever they cross a certain threshold. This approach has the potential attraction that it produces thresholds beyond which a crisis is more likely. This accords with the common practice of establishing certain warning zones, such as current account deficits beyond 5 percent of GDP or reserves less than three months of imports. The authors claim some success in developing a set of indicators that reliably predict the likelihood of crisis. Moreover, Kaminsky (1998) and Goldstein (1998) have asserted that this method can be applied successfully to the 1997 crises.

  • Frankel and Rose (1996) (FR) develop a probit model of currency crashes in a large sample of developing countries. Their use of annual data permits them to look at variables, such as the composition of external debt, that are available only at that frequency.

  • Sachs, Tornell, and Velasco (1996) (STV) restrict their attention to a cross-section of countries in 1995, analyzing the incidence of the “tequila effect” following the Mexico crisis. They concentrate on a more structured hypothesis about the cause of this particular episode, emphasizing interactions among weak banking systems, overvalued real exchange rates, and low reserves. They claim to explain most of the cross-country pattern of currency crisis in emerging markets in 1994–1995. Their approach has also been applied to analyzing the Asia crisis.7

The paper is organized as follows. Sections 2 through 4 implement each model in turn. For each method, we:

  • Briefly describe the methodology.

  • Duplicate the original results as closely as possible, using where possible the original data. We also re-estimate over the same sample, fixing any errors in the original estimates and using currently available and hence revised data.

  • Reestimate the models using data through 1996 in order to forecast for 1997, as would a researcher who at the end of 1996 aimed to predict crises the following year. We use two samples of countries: the same as the original paper, and another common sample for purposes of comparing across the three methods.

  • Make a few plausible extensions or improvements. These changes are in some cases inspired by events in 1997, but again we estimate using data only through 1996.

  • Use the models to forecast the probability or severity of crisis for 1997. We generate a ranking of countries according to predicted probability or severity of crisis in 1997 for each model, then compare the predicted and actual rankings.

Section 5 summarizes and discusses the results. A conclusion follows in Section 6.

II. Kaminsky-Lizondo-Reinhart (1997) Signals Approach

A. Methodology

KLR propose the monitoring of several indicators that tend to exhibit unusual behavior prior to a crisis. A currency crisis is defined to occur when a weighted average of monthly percentage depreciations in the exchange rate and monthly percentage declines in reserves exceeds its mean by more than three standard deviations.8 KLR choose 15 indicators based on theoretical priors and on the availability of monthly data.9 An indicator issues a signal whenever it moves beyond a given threshold level. A “good” signal is one that is followed by a crisis within 24 months. An “optimal” set of thresholds is calculated, defined as a set that minimizes the noise-to-signal ratio: i.e., the ratio of false signals to good signals.

Thresholds are defined relative to the percentiles of the distribution of the indicator by country. For example, the threshold for real exchange rate deviations might be the 85th percentile, so that any value of the real exchange rate deviation above this percentile would constitute a signal. The percentiles are calculated relative to each country’s empirical distribution of the variable in question. To continue the example, the threshold value of the real exchange rate deviation for each country is the 85th percentile of that country’s distribution of real exchange rate deviations. Thus, minimizing the noise-to-signal ratio for the sample of countries yields a percentile for each indicator that is uniform across countries, but the corresponding country-specific thresholds associated with that percentile will differ across countries.

Some notation may help with this last point. Let xit be a variable that may help predict crises, such as the 12-month growth in exports for country I in period t. The percentile is then p(xit), the number between zero and 100 representing where xit fits in the distribution of xi. I(p(xit)) is the indicator, taking a value of 1 when p(xit) is above the threshold percentile for that indicator.

The KLR approach is bivariate, in that each indicator is analyzed, and optimal thresholds calculated, separately. Kaminsky (1998) aggregates the separate indicator series into a single crisis index by computing a weighted average of the indicators I(p(xit)), with the weights based on the noise-to-signal ratios of each indicator. She then calculates a probability of crisis for each value of the aggregate index by observing how often within the sample a given value of the aggregate index is followed by a crisis within 24 months.

B. Implementation

1. Reproduction of KLR Results

We first attempt to reproduce the KLR results using the same 20-country, 1970–95 sample they use.10 Following KLR, we first examine the effectiveness of the approach by determining the extent to which each individual indicator is useful in predicting crises.

Table 1 presents information of the performance of individual indicators, from KLR and from our reproduction. Consider the performance of each indicator in terms of the matrix below:11

Table 1.

Performance of Indicators

article image

Ratio of false signals (measured as a proportion of months in which false signals could have been issued [B/(B+D)]) to good signals (measured as a proportion of months in which good signals could have been issued [A/A+C)]).

P(Crisis/Signal) is the percentage of the signals issued by the indicator that were followed by at least one crisis within the subsequent 24 months ([A/(A+C)] in terms of the matrix in the text). P(crisis) is the unconditional probability of a crises, (A+C)/(A+B+C+D).

Deviation from deterministic trend.

Residual from regression of real M1 on real GDP, inflation, and a deterministic trend.

article image

The cell A represents the number of months in which the indicator issued a good signal, B is the number of months in which the indicator issued a bad signal or “noise,” C is the number of months in which the indicator failed to issue a signal which would have been a good signal, and D is the number of months in which the indicator did not issue a signal that would have been a bad signal.

The first column in Table 1 shows the noise-to-signal ratio estimated for each indicator. This is defined as the number of bad signals as a share of possible bad signals (B/(B+D)) divided by the number of good signals as a share of possible good signals (A/(A+C)). The threshold percentile, chosen to minimize this ratio, is shown in column 3. Column 2 shows how much higher is the probability of a crisis within 24 months when the indicator emits a signal than when it does not. When the noise-to-signal ratio is less than 1, this number is positive, implying that crises are more likely when the indicator signals than when it does not. Indicators with noise-to-signal ratios equal to or above unity are not useful in anticipating crises.

Our results are broadly similar to those of KLR, though we are not able to match exactly the KLR results, as columns 5 through 7 of Table 1 show.12 The patterns are quite similar, though column 1 shows slightly weaker performance than reported by KLR for most of the indicators. Differences are starker for four indicators, for which KLR find a noise-to-signal ratio substantially below unity while we find a ratio above unity. Thus, although KLR find 12 informative indicators, that is those with noise-to-signal ratios below unity, we find only eight of these to be informative.13

There are a number of possible reasons for the differences in results. We have found that our implementation of the KLR definition of crisis results in a set of crisis dates that do not fully match the KLR crisis dates as reported in Kaminsky and Reinhart (1996). Specifically, we fail to match 14 out of 76 KLR crises.14 Some of this discrepancy may come from differences in the raw data.15 We have found that seemingly small differences due to revisions in IFS data can strongly influence the results, and furthermore they and we separately “cleaned” the data of errors.16

As a first step toward considering the extent to which a group of indicators is useful in predicting crises, Table 2 shows the proportion of good indicators signaling a crisis (good indicators here are those with a noise-to-signal ratio less than 1). In more than one-half of the crises, at least 60 percent of the good indicators were signaling, while this was the case in slightly more than one-third of the tranquil periods. Indicators often emit false signals of crisis, however. Indeed, 98 percent of the times that at least 60 percent of the good indicators were signaling, there was no crisis within 24 months.

Table 2.

Proportion of Indicators Signaling a Crisis

article image

Having reproduced as nearly as we could the KLR results, we carry out three sorts of modifications. First we change the sample and try two other indicators. In the following subsection, we modify the basic methodology. Specifically, we depart from the entire “indicators” methodology that looks for discrete thresholds and calculates noise-to-signal ratios. Instead, we apply a probit regression technique to the same data and crisis definition as in KLR. In the process we test some of the basic assumptions of the KLR approach.

We modify the sample in two ways. First, we estimate only through April 1995. This reflects the information available to the analyst just before the Thai crisis of July 1997, since the evaluation of an observation requires knowing whether there will be a crisis within 24 months.17 Second, we change the sample of countries. This will allow the KLR results to be more comparable to those of the other two papers under consideration, as well as serving as an informal test of robustness of the KLR approach. We omit the five European countries from the sample and add other emerging market economies.18 Our 23-country sample is the union of the emerging market economies in the KLR set and the countries in the STV sample.19 The last four columns of Table 1 show that indicator performance over the larger sample is broadly similar to results using the KLR sample. At least for the informative indicators, the thresholds appear fairly similar. The average noise-to-signal ratio falls a little for the informative indicators in the 23-country sample (as well as for the entire set of indicators). The most important changes in the noise-to-signal ratios are that the growth of the M2 multiplier is no longer informative while the change in terms of trade is, though only marginally, with a ratio above 0.9. In what follows, we focus on the 23-country sample estimated through April 1995.

We try two more candidate indicators: the level of M2 to reserves and the ratio of the current account to GDP. KLR used the rate of growth of M2/reserves, but most discussions of crisis vulnerability have focussed on the level of this variable. KLR did not use the current account. We find that the level of M2/reserves is informative, as Table 1 shows. It has about the same noise-to-signal ratio as the rate of change, at 0.42 and 0.39 respectively. The current account/GDP is also highly informative, with a noise-to-signal ratio of 0.45.20

So far we have looked at each indicator separately. Kaminsky (1998) calculates a single composite indicator of crisis as a weighted-sum of the indicators, where each indicator is weighted by the inverse of its noise-to-signal ratio. She then calculates time-series probabilities of crisis for each country, based on the sample distribution of this composite indicator.21 Figure 1 displays these probabilities and shows some increase in the probability of crisis preceding particular crises for Korea, Indonesia, Malaysia, Philippines and Thailand, as well as for Argentina, Brazil and Mexico.22

Figure 1.
Figure 1.

KLR Weighted-Sum Crisis Probabilities for Selected Countries 1/

Citation: IMF Working Papers 1998, 154; 10.5089/9781451857207.001.A001

1/ Vertical line represents crisis dates.

As with other aspects of the KLR methodology, it is somewhat difficult to assess the success of these estimates of the probability of crisis. Figure 1 itself does not tell a clear story. The KLR approach does not lend itself to hypothesis testing; their technique gives no indication of when results are statistically significant.23

There are, nonetheless, several ways to systematically evaluate the KLR models, as shown in the first two columns of Table 3. For zero/one dependent variables, it is natural to ask what fraction of the observations are “correctly called,” where, for example, a crisis period is correctly called when the estimated probability of crisis is above a given cut-off level and a crisis ensues within 24 months. Such “goodness-of-fit” data are shown in Table 3 for two cut-offs: 50 percent and 25 percent. The in-sample probability forecasts can also be evaluated with analogs of a mean squared error measure, the quadratic probability score (QPS) and log probability score (LPS), that evaluate the accuracy of probability forecasts. In addition, the global square bias (GSB) measures forecast calibration. The QPS ranges from zero to 2, and the LPS ranges from zero to infinity, with a score of zero corresponding to perfect accuracy for both. The GSB also ranges from zero to 2, where zero corresponds to perfect global calibration.24

Table 3.

Comparing Predictive Power of Alternative Composite Indicators—In Sample

article image

A pre-crisis period is correctly called when the estimated probability of crisis is above the cut-off probability and a crisis ensues within 24 months.

A tranquil period is correctly called when the estimated probability of crisis is below the cut-off probability and no crisis ensues within 24 months.

A false alarm is an observation with an estimated probability of crisis above the cut-off (an alarm) not followed by a crisis within 24 months.

What can we conclude? The first column of Table 3 displays the scores and goodness-of-fit measures for our reproduction of the KLR weighted-sum-based probabilities, excluding our additional variables. The model correctly calls most observations at the 50 percent cut-off, almost entirely through correct prediction of tranquil periods (that is, those that are not followed by crises within 24 months). Almost all (91 percent) of the crisis months (that is, observations followed by a crisis within 24 months) are missed. Even with so few crisis observations correctly called, 44 percent of alarms (that is, observations where the predicted probability of crisis is above 50 percent) are false, in that no crisis in fact ensues within 24 months. As the second column of Table 3 shows, the addition of the current account and M2/reserves in levels only modestly improves the performance of the KLR-based probabilities.

If we are more interested in predicting crises than predicting tranquil periods and are not so worried about calling too many crises, we may want to consider an alarm to be issued when the estimated probability of crisis is above 25 percent. With this lower cut-off, 41 percent of crisis observations are correctly called by the original KLR model. Alternatively, we may ask how often an alarm is actually followed by a crisis within 24 months. With the 25 percent cut-of, the probability of a crisis within 24 months is 37 percent if there is an alarm, much higher than the unconditional probability of crisis in this sample of 16 percent. Now, however, 63 percent of alarms are false.

Still, these predictions are better than guesses. It is true that since most observations are tranquil, even an uninformative model can, by almost always calling for no crisis, predict correctly most of the time. But the model does significantly better than this uninformative benchmark. A Pesaran-Timmermann test rejects, at the 1 percent significance level, the hypothesis that the original KLR model does no better at calling crises than guesses based on the unconditional probability of crisis, using the 25 percent cut-off.25

2. A Probit-Based Alternative Model

In this section we deviate fairly substantially from the KLR methodology. Specifically, we embed the KLR approach in a multivariate probit framework in which the independent variable takes a value of one if there is a crisis in the subsequent 24 months and zero otherwise. This has three advantages: we can test the usefulness of the threshold concept; we can aggregate predictive variables more satisfactorily into a composite index, taking account of correlations among different variables; and we can easily test for the statistical significance of individual variables and the constancy of coefficients across time and countries.26

KLR assume that the probability of crisis in the subsequent 24 months is a step function of the value of the indicator, equal to zero when the indicator variable is below the threshold and 1 at or above the threshold. They assume, for example, that when the real exchange rate continues to appreciate after it is already above the threshold, this does not increase the probability of crisis. In general, the relationship between a given indicator variable and the probability of crisis could take many more forms than a simple step function. Figure 2 presents various possible relationships between the probability of crisis (on the vertical axis) and the value of a variable P(x), measured as in KLR in percentiles (on the horizontal axis). The KLR assumption, in terms of Figure 2, is that α1 and α3 are zero while α2 is equal to 1. Other possibilities are also plausible. For example if α1 is non-zero and equal to α3 while α2 is equal to zero, then there is a linear relationship between the indicator measured in percentiles and the probability of a crisis. That is, to continue the example, increases in the degree of overvaluation increase the risk no matter how overvalued the exchange rate already is.

Figure 2.
Figure 2.

Relationship Between Predictive Variable and Probability of Crisis

Citation: IMF Working Papers 1998, 154; 10.5089/9781451857207.001.A001

We propose to let the data resolve the question of whether a step-function is in fact a reasonable description of the relationship between indicator variables and the probability of a crisis. To this end, we run bivariate probit regressions on the pooled panel in which the dependent variable is the KLR variable that takes a value of 1 if there is a crisis in the subsequent 24 months and zero otherwise. For each indicator we estimate equations of the form:

prob(c24=1)=f(α0+α1p(x)+α2I+α3I(p(x)T))(1)

where c24 = 1 if there is a crisis in the next 24 months, p(x) = the percentile of the variable x, and I = 1 if the percentile is above some threshold T and zero otherwise.27 Thus, α1, α2, and α3 in equation 1 correspond to the α’s in Figure 2. We use the thresholds T calculated from the KLR algorithm, since we are interested primarily in testing their approach against a more general alternative.28

Table 4 presents estimates of equation 1 for three important predictive variables: deviations of the real exchange rate from trend, the current account deficit as a share of GDP, and the growth rate of the ratio of M2/reserves. Consider first the real exchange rate. Column 1 of Table 4 indicates that α1 α2 and α3 are all significant. The data cannot reject that the relationship between RER deviations and the probability of crisis is of the general form shown in Figure 2, linear with a jump at the threshold and a higher slope thereafter. The first panel of Figure 3 gives a richer view of the relationship between overvaluation and the probability of crisis. The choppy line in this figure presents the fraction of times the observation of a given percentile for RER deviations is followed within 24 months by a crisis in the pooled data. The other line represents the estimated relationship shown in the first column of Table 4 and discussed above. The message of this figure is that while the jump at the threshold is significant, it does not capture an important part of the variation in the probability of crisis as a function of RER deviations.

Table 4.

Testing Indicators Against More General Piecewise-Linear Specifications in Bivariate Probit Models

article image
Figure 3.
Figure 3.

Average No. of Crises in Next 24 Months by Percentile of Variable

Citation: IMF Working Papers 1998, 154; 10.5089/9781451857207.001.A001

Turning to the current account, we again find all three coefficients statistically significant. The second panel of Figure 3 shows that the jump, while statistically significant, appears not to be economically important compared to the strong linear effect below the threshold.29

For the M2/reserves growth variable, we cannot reject that α3=0, as shown in column 3. The data reject the further restriction of α2=0, which would result in a model that is linear in percentiles. The simplification supported by the data is a linear model with a jump at the threshold, as shown in the third panel of Figure 3.

While the outcome of this analysis varies somewhat across indicators, the general lesson is that although the jump in probability of crisis at the threshold is often statistically significant, the underlying percentile variable is usually also important in explaining the variation in crisis probability.

Multivariate probits are the natural extension to the bivariate probits discussed so far. First, they easily accommodate broader functional forms, and we have seen that the bivariate probits cast some doubt on the zero/one indicator approach of KLR. A further advantage is that the estimation of a multivariate version of equation (1) is a natural way to combine the information from the various indicator variables into a single estimate of the probability of crisis. The composite indicators proposed by Kaminsky (1998), based on a weighted-sum of indicators, ignore possible correlations among different indicators, unlike the multivariate probits. Finally, the probits allow the calculation of standard errors and other measures of statistical significance.

Table 5 presents estimates of three probit models that explain whether a crisis occurs in the next 24 months (hereafter designated BP models).30 Model 1 uses the indicator form of the variables, where the indicator equals 1 above the threshold and zero otherwise. In model 2 the variables enter linearly, expressed as percentiles of the country-specific distribution of observations.31 Model 3 is the result of a simplification starting with the most general piecewise-linear specification for all the variables. From a starting point that allowed the estimation, for each variable, of the slope below the threshold, the jump at the threshold, and the slope above the threshold, we used a general-to-specific procedure to simplify to the most parsimonious representation of the data.32

Table 5.

Multivariate Probit Models

article image

Model 1 of Table 5 shows that the probability of crisis is increased when the following variables exceed their thresholds: real exchange rate deviations, the current account, reserve growth, export growth, and both the level and growth rate of M2/Reserves.33 These variables also increase the probability of crisis when entered linearly in model 2, except for the growth rate of M2/reserves, while reserve growth itself is now significant. In the simplified piecewise-linear model 3, two variables (real exchange rate deviations and current account) enter with a significant slope below the threshold, a jump at the threshold, and a steeper slope above the threshold; two variables (reserve and export growth) enter linearly; and for two variables (M2/reserves and M2/reserves growth) only the jump at the threshold is significant.

How well do the different models perform? The results in Tables 3 (on page 16) and 5 allow us to draw two main conclusions. First, the probits tend to slightly outperform the KLR-based probabilities. The most direct comparison involves the indicator probit which uses as predictive variables the zero/one signals from the KLR indicators; here the only difference with KLR is the use of the probits to derive probabilities of crisis from the individual indicators. This model outperforms the KLR-based probabilities in terms of scores and goodness-of-fit. Second, the ranking among the various probit models is ambiguous. The piecewise-linear has the best pseudo-R2 and lowest scores, as is not surprising given that it is a generalization of the other two models (none of these measures give any weight to parsimony). It does not outperform in goodness-of-fit, however. The indicator probit and the linear probit perform similarly: the linear model has better scores but generally worse goodness-of-fit.34

3. Summary In-Sample Assessment

Given the non-statistical nature of most of the KLR analysis, it is somewhat difficult to evaluate the success of this approach. KLR conclude that “the signals approach can be useful as the basis for an early warning system of currency crises” (KLR, page 23). Their grounds are largely that most of the indicators have low noise-to-signal ratios, most indicators signal ahead of most crises, and most crises are preceded by multiple signals. We find similar though somewhat weaker results in our larger sample. Our analysis of the in-sample success of the KLR-type models suggests the approach can indeed be useful and the model does significantly better than guesses based on the unconditional probability of crisis. Nonetheless, most crises are still missed and most alarms are false. In evaluating the KLR indicator approach against our modifications, we find that the probit models generally perform slightly better. The in-sample performance of the linear, indicator and piecewise-linear models is broadly similar.

As to the assessment of which variables are potentially important leading indicators, although we find fewer potentially useful indicators, ours are also classified as useful indicators by KLR (except for those we have added). These are deviations of the real exchange rate from trend, growth of exports, change in international reserves, “excess” M1 balances, growth in domestic credit as a share of GDP, the real interest rate, terms of trade growth, the level and growth of M2/reserves, and the current account.

C. Predicting 1997

1. Original KLR Model

The KLR approach has generated a variety of different ways to forecast 1997 outcomes. First, we can see which indicators were signaling prior to the 1997 crises. We have already calculated the optimal thresholds and resulting noise-to-signal ratios for the different indicators. To forecast for the post-April 1995 period, we apply these thresholds to the values of the predictive variables after this date, determining whether they are issuing signals or not.35 We have examined the performance of each individual indicator in 1996 for each of the eight Asian and Latin American countries discussed above.36 To summarize this large amount of information, no particular indicators flashed in all of the crisis countries. The only indicators to signal in more than one country were the growth rate of exports, which flashed in both Thailand and Korea, the growth of M2/reserves, which signaled in both Thailand and Malaysia, and reserve growth, which flashed in Korea, Malaysia and Thailand.

More interesting for purposes of forecasting crisis than looking at each individual indicator is combining the information from the different variables into a summary measure of crisis probabilities. The first column of Table 6 shows the performance of the Kaminsky (1998) composite measures of the probability of crisis based on the weighted-sum of indicators signaling. A natural question is whether the estimated probability of crisis is above 50 percent prior to actual crises. The goodness-of-fit rows show that only 4 percent of the time was the predicted probability of crisis above 50 percent in cases when there was a crisis within the next 24 months, during the 1995:5 to 1997:12 period. As before, we may be interested in using a lower cut-off probability to define a crisis. Table 6 shows that the Kaminsky (1998) probability estimates are above 25 percent in 25 percent of the pre-crisis observations. As we observed in-sample, most alarms are false at the 25 percent cut-off. The addition of the current account and level of M2/reserves variables improves out-of-sample performance slightly, as shown in the second column. In particular, 32 percent of the pre-crisis observations are called correctly.

Table 6.

Comparing Predictive Power of Alternative Composite Indicators—Out-of-Sample

article image

A pre-crisis period is correctly called when the estimated probability of crisis is above the cut-off probability and a crisis ensues within 24 months.

A tranquil period is correctly called when the estimated probability of crisis is below the cut-off probability and no crisis ensues within 24 months.

A false alarm is an observation with an estimated probability of crisis above the cut-off (an alarm) not followed by a crisis within 24 months.

This may sound like poor performance. It is worth noting, though, that these forecasts are significantly better than random guesses, both economically and statistically. The forecasts from the augmented KLR model in column (2), for example, suggest that the probability of a crisis within 24 months conditional on an alarm (using the 25 percent cut-off) is 40 percent, which is somewhat higher than the unconditional probability of 27 percent. And a Pesaran-Timmermann test rejects the hypothesis that the forecasts are no better than guesses based on the unconditional probability of crisis in the sample at the 1 percent level of significance.

So far we have examined the ability of the models to predict the approximate timing of crises for each country.37 We can also evaluate the cross-sectional success of the models’ predictions in identifying which countries are vulnerable in a period of global financial turmoil such as 1997. The question here is whether the models assign higher predicted probabilities of crisis to those countries that had the biggest crises. Forecasting performance can be evaluated in this manner by comparing rankings of countries based on the predicted and actual crisis indices. Table 7 shows countries’ actual crisis index and predicted probability of crisis in 1997 for the various different forecasting methods.38 The table also shows the Spearman correlation between the actual and predicted rankings and its associated p-value, as well as the R2 from a bivariate regression of the actual rankings on the predictions.39

Table 7.

Correlation of Actual and Predicted Rankings based on KLR Approach

article image

The KLR crisis index (a weighted average of percentage changes in the exchange rate and reserves) is standardized by subtracting the mean and dividing by the standard deviation. Values above three are defined as a crisis and are shown in bold.

Based on average of noise-to-signal weighted probabilities from during 1996:1-12, using out-of-sample estimates.

Augmented with the inclusion of the current account and M2/reserves in levels.

All probit models probabilities are average predicted probabilities for 1996:1-12, where model was estimated up to 1995:4.

Spearman Rank Correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regression of fitted values on actual values.

The KLR-based forecasts are clearly somewhat successful at ranking countries by severity of crisis. The actual rankings of countries in 1997 by their crisis index are significantly correlated with forecasts from the weighted-sum of indicators-based probabilities. With the original KLR variables, 28 percent of the variance is explained. The addition of the current account and the level of M2/reserves brings the R2 up to 36 percent.

To get a richer sense of how useful this general approach would have been, we now examine more closely the predictions of the KLR-based model for four Asian crisis countries (where crisis is identified according to the KLR definition): Korea, Indonesia, Malaysia, and Thailand, and one Asian and three Latin American non-crisis countries: Philippines, Argentina, Brazil and Mexico. Figure 4a presents the KLR composite measure of estimated probability of crisis, with vertical lines at crisis dates.40

Figure 4a.
Figure 4a.

KLR Weighted-Sum Crisis Probabilities for Selected Countries 1/

Citation: IMF Working Papers 1998, 154; 10.5089/9781451857207.001.A001

1/ Vertical line represents crisis dates.

The weighted-sum based probability measure does not paint a clear picture of substantial risks in crisis compared to non-crisis countries. Two non-crisis countries, Brazil and the Philippines, consistently present risks of crisis above 30 percent during 1996. One crisis country, Korea, also presents risks above 30 percent, while Malaysia is generally above 20 percent. Estimated crisis risks remain below 17 percent in 1996 for both the crisis and non-crisis countries Argentina, Mexico, Indonesia and Thailand.

In sum, the KLR approach shows some promise. In particular, the fitted probabilities from the weighted-sum of indicators are significant predictors of crisis probability in 1997. This suggests the model may be useful in identifying which countries are vulnerable in a period following a global financial shock. Still, the overall explanatory power is fairly low, as demonstrated by the low R2 statistic in the regression of the actual on the predicted crisis rankings. Both the overall goodness-of-fit for the out-of-sample predictions and the analysis of the eight cases illustrate the low predictive power of the weighted-sum based probabilities in predicting the timing of crisis. We have already seen that within sample, our probit-based alternatives to the KLR model perform slightly better. We now turn to an examination of the out-of-sample performance of the BP probit model.

2. BP Probit-Based Alternative

To test the various probit models out-of-sample, we use data through 1995:4 to estimate the regression coefficients, as in Table 5, then extend the explanatory variables to generate predictions for the period 1995:5–1997:12.41 The estimated probabilities can be evaluated using the probability scores and goodness-of-fit measures discussed above.

Table 6 shows that on all the scoring measures, the probits perform better than the probabilities based on the weighted-sum of indicators signaling.42 The linear model has the best scores, though the piecewise-linear model is close behind. None of the models correctly calls many crises observations at the 50 percent cutoff, where a correct call is an observation that results in an estimated probability of crisis higher than the cutoff with an actual crisis ensuing within 24 months. Using the looser standard whereby a probability of crisis above 25 percent is considered an alarm, the linear and piecewise-linear probits perform well, much better than the weighted-sum based probabilities. The linear probit generates a probability of crisis above 25 percent in 80 percent of the periods that precede a crisis.43 Reflecting their greater prediction success, the probit models have a lower share of false alarms (crisis calls not followed by a crisis as a share of total crisis calls), as low as 49 percent for the linear model. Putting it slightly differently, for this model the probability of crisis within 24 months conditional on an alarm (using the 25 percent cutoff) is 51 percent, much higher than the unconditional probability of 22 percent.44

The linear model performs much better out-of-sample than the more general piecewise-linear model that include a role for discrete jumps in the risk of crisis at the KLR thresholds. This suggests that the threshold and indicator concept add little to the explanatory power of the simple linear model in predicting crisis timing, at least for 1997. The worse out-of-sample performance of the indicator and piecewise-linear models (and similar or better in-sample performance) is consistent with the greater risk of data-mining in the indicator and piecewise-linear approaches.

As with the KLR models, we can also evaluate the performance of the probit models in predicting the cross-country incidence of crisis in 1997. Table 7 shows that country rankings based on all the probit forecasts are significantly correlated with actual crisis rankings in 1997. Forecasts based on the indicator probit rank countries more accurately than the weighted-sum of indicators-based forecasts, with an R2 close to one half. This superior performance is consistent with previous results that the KLR weighted-sum-of-indicators forecasts are outperformed by the analogous probit model. Somewhat anomalously, the other two probit models perform worse than the indicator probit. In particular, the ranking based on the linear model that had the best goodness-of-fit has the lowest, though still significant, correlation with the actual ranking.45

We can flesh out these results by examining the performance of the linear probit in predicting crisis for our sub-sample of four crisis and four non-crisis countries in 1997 (Table 8a, 8b and Figure 4b).46 The linear probit present a fairly clear picture of the prospects of crisis for most of these countries. Consider first the crisis countries. In Thailand estimated probabilities of crisis were above 40 percent for several months in 1996, and in Malaysia the probabilities were above 30 percent. The probabilities are also reasonably high for Indonesia, ranging from 25 to 28 percent, while the model is somewhat less successful for Korea, where the estimated probability of crisis was between 20 and 33 percent. Turning to the non-crisis countries, in the Philippines probabilities ranged from 20 to 23 percent. None of the Latin American countries yielded crisis probabilities above 30 percent in 1996, and only Brazil was above 20 percent for any length of time.

Figure 4b.
Figure 4b.

Crisis Probabilities based on Linear Probit Model for Selected Countries 1/

Citation: IMF Working Papers 1998, 154; 10.5089/9781451857207.001.A001

1/ Vertical line represents crisis dates.
Table 8a.

Summary Measures for Selected Countries: Asian Countries

article image

Number of good indicators (with noise-to signal ratio less than unity) that are signaling, with the number for which data are available in parenthesis. There are ten good indicators.

Predicted probabilities based on weighted sum of the good indicators, where each indicator is weighted by the inverse of its adjusted noise-to-signal ratio, with original KLR variables.

Predicted probabilities based on weighted sum of the good indicators, where each indicator is weighted by the inverse of its adjusted noise-to-signal ratio, with original KLR variables, augmented with the inclusion of the current account and M2/reserves in levels.

Predicted probabilities of crisis from a probit regression of impending crisis on the indicator variables measured linearly in percentiles.

Table 8b.

Summary Measures for Selected Countries: Latin American Countries

article image

Number of good indicators (with noise-to signal ratio less than unity) that are signaling, with the number for which data are available in parenthesis. There are ten good indicators.

Predicted probabilities based on weighted sum of the good indicators, where each indicator is weighted by the inverse of its adjusted noise-to-signal ratio, with original KLR variables.

Predicted probabilities based on weighted sum of the good indicators, where each indicator is weighted by the inverse of its adjusted noise-to-signal ratio, with original KLR variables, augmented with the inclusion of the current account and M2/reserves in levels.

Predicted probabilities of crisis from a probit regression of impending crisis on the indicator variables measured linearly in percentiles.

We have examined model performance in predicting, out-of-sample, crisis timing and cross-sectional severity of crisis during 1997. Several conclusions emerge. First, all the models examined perform significantly better than chance would imply, both at predicting whether or not a crisis will occur as measured by goodness-of-fit and at predicting the cross-country severity of crisis. Second, we can compare the BP probit-based alternatives to the KLR probabilities based on the weighted-sum of indicators signaling. The KLR forecasts perform better than some of the probits on a few of the measures, so this comparison is not unambiguous. Overall, though, the probits seem to work better. Moreover, in contrasting the BP probit methodology with the KLR probabilities, the most direct comparison involves the indicator probit, as it also uses indicator predictive variables. Here in particular the probit generally outperforms. Third, among the probits, the linear specification performs best in terms of the probability scores, goodness-of-fit and the eight cases examined more closely.

III. Frankel and Rose (1996) Probit Model Using Multi-Country Sample

A. Methodology

FR estimate the probability of a currency crash using annual data for more than 100 developing countries from 1971–1992, a much broader sample of countries than the other two papers. The use of annual data may restrict the applicability of the approach as an early warning system, but it permits the analysis of variables such as the composition of external debt for which higher frequency data are rarely available. FR test the hypothesis that certain characteristics of capital inflows are positively associated with the occurrence of currency crashes: low shares of FDI; low shares of concessional debt or debt from multilateral development banks; and high shares of public sector, variable rate, short-term and commercial bank debt.47

FR define a currency crash as a nominal exchange rate depreciation of at least 25 percent that also exceeds the previous year’s change in the exchange rate by at least 10 percent. Thus, the type of currency crisis considered does not include speculative attacks successfully warded off by the authorities through reserve sales or interest rate increases. FR argue that it is more difficult to identify successful defenses, since reserve movements are noisy measures of exchange market intervention and interest rates were controlled for long periods in most of the countries in the sample.

B. Implementation

Table 9 (column 1) presents our reproduction of the FR benchmark probit regression.48 The coefficients reflect the effect of one-unit changes in regressors on the probability of a currency crash (expressed in percentage points) evaluated at the mean of the data.49 Significant results are starred. FR conclude from this and a variety of similar regressions that the probability of a crisis increases when output growth is low, domestic credit growth is high, foreign interest rates are high, and FDI as a proportion of total debt is low. They also found support for the prediction that crashes tend to occur when reserves are low and the real exchange rate is overvalued.50

Table 9.

Frankel and Rose: Probit Estimates of Probability of a Currency Crash, 1970-92

article image

*, **, and *** denote significance at the 10, 5 and 1 percent levels respectively.

Defined as the deviation from the average real exchange rate over the period.

A crisis is correctly called when the estimated probability of crisis is above 50 percent if a crisis ensues within 24 months. A tranquil period is correctly called when the estimated probability of crisis is below 50 percent and there is no crisis within 24 months.

A crisis is correctly called when the estimated probability of crisis is above 25 percent if a crisis ensues within 24 months. A tranquil period is correctly called when the estimated probability of crisis is below 25 percent and there is no crisis within 24 months.