Alessi, L. and Detken, C. (2011). Quasi real time early warning indicators for costly asset price boom/bust cycles: A role for global liquidity. European Journal of Political Economy, 27 (3), 520–533.
Bandiera, L., Cuaresma, J. C. and Vincelette, G. A. (2010). Unpleasant surprises: Determinants and risks of sovereign default. In C. A. Primo Braga and G. A. Vincelette (eds.), Sovereign Debt and the Financial Crisis: Will This Time Be Different?, Washington, DC: The World Bank.
- Search Google Scholar
- Export Citation
)| false ( Bandiera, L., Cuaresma, J. C.and Vincelette, G. A. 2010). Unpleasant surprises: Determinants and risks of sovereign default. In ( C. A. Primo Bragaand G. A. Vincelette eds.), Sovereign Debt and the Financial Crisis: Will This Time Be Different?, Washington, DC: The World Bank.
Berg, A. and Pattillo, C. (1999). Predicting currency crises: The indicators approach and an alternative. Journal of International Money and Finance, 18 (4), 561–586.
Buffie, E. F., Portillo, R., Zanna, L.-F., Pattillo, C. A. and Berg, A. (2012). Public Investment, Growth, and Debt Sustainability: Putting Together the Pieces. IMF Working Papers 12/144, International Monetary Fund.
Catão, L. and Milesi-Ferretti, G.-M. (2013). External Liabilities and Crises. IMF Working Papers 13/113, International Monetary Fund.
Drehmann, M. and Juselius, M. (2013). Evaluating early warning indicators of banking crises: Satisfying policy requirements. BIS Working Papers 421, Bank for International Settlements.
Ghosh, A. R., Kim, J. I., Mendoza, E. G., Ostry, J. D. and Qureshi, M. S. (2013). Fiscal fatigue, fiscal space and debt sustainability in advanced economies. Economic Journal, 123 (566), F4–F30.
Gilovich, T., Griffin, D. and Kahneman, D. (eds.) (2002). Heuristics and biases: The psychology of intuitive judgment. Cambridge: Cambridge University Press.
IMF (2012). The IMF-FSB Early Warning Exercise: Design and Methodological Toolkit. Occasional Paper 274, International Monetary Fund, Washington DC.
IMF and World Bank (2004). Debt Sustainability in Low-Income Countries - Proposal for an Operational Framework and Policy Implications. Tech. rep., IMF and The World Bank, Washington, DC.
IMF and World Bank (2010). Staff Guidance Note on the Application of the Joint Bank-Fund Debt Sustainability Framework for Low-Income Countries. Tech. rep., IMF and The World Bank.
IMF and World Bank (2012). Revisiting the Debt Sustainability Framework for Low-Income Countries. Tech. rep., IMF and The World Bank, Washington, DC.
IMF and World Bank (2013). Staff Guidance Note on the Application of the Joint Bank-Fund Debt Sustainability Framework for Low-Income Countries. Tech. rep., IMF and The World Bank, Washington, DC.
Kaminsky, G., Lizondo, S. and Reinhart, C. M. (1997). Leading indicators of currency crises. Policy Research Working Paper Series 1852, The World Bank.
Lo Duca, M. and Peltonen, T. A. (2013). Assessing systemic risks and predicting systemic events. Journal of Banking & Finance, 37 (7), 2183–2195.
Manasse, P. and Roubini, N. (2009). “rules of thumb” for sovereign debt crises. Journal of International Economics, 78 (2), 192–205.
Mauro, P., Romeu, R., Binder, A. and Zaman, A. (2013). A Modern History of Fiscal Prudence and Profligacy. IMF Working Papers 13/5, International Monetary Fund.
Pradelli, J. (2012). On external debt sustainability: Default probabilities and debt thresholds to monitor risk of distress, The World Bank.
Schularick, M. and Taylor, A. M. (2012). Credit booms gone bust: Monetary policy, leverage cycles, and financial crises, 1870-2008. American Economic Review, 102 (2), 1029–61.
Stock, J. H. and Watson, M. W. (2004). Combination forecasts of output growth in a seven - country data set. Journal of Forecasting, 23 (6), 405–430.
We would like to thank Saul Lizondo for initiating this project, Chris Geiregat, Andrea Fracasso, Andrew Jewell, Aart Kraay, Calvin McDonald, Sean Nolan, Juan Pradelli, Sam Ouliaris, and many IMF colleagues for useful comments and suggestions. This working paper is part of a research project on macroeconomic policy in low-income countries supported by the U.K.'s Department for International Development. This working paper should not be reported as representing the views of the IMF or of DFID. The views expressed in this Working Paper are those of the authors and do not necessarily represent those of the IMF, IMF policy, or of DFID. Working Papers describe research in progress by the authors and are published to elicit comments and to further debate.
Perhaps this reflects its roots in the HIPC initiative, where it was necessary to arrive at a common level of debt burden to which all countries would be reduced, which suggested a relatively simple and mechanical rule.
As explained in (IMF and World Bank, 2013), “Although the indicative thresholds play a fundamental role in the determination of the risk rating, they should not be interpreted mechanistically. The assessment of risk needs to strike a balance between paying due attention to debt levels rising toward or above thresholds and using judgment. Thus, a marginal or temporary breach of a threshold may not necessarily imply a significant vulnerability. Conversely, a near breach should not be dismissed without careful consideration.”
Of 60 recent DSAs analyzed, the mechanical and actual ratings corresponded in 42 cases; in 11 out of the 18 other cases the deviation was grounded in the fact that breaches were small and temporary, with one debt variable within 3 percent of the threshold. Thus in about 90 percent of cases the thresholds and the WCA seem to have played the determinative role.
Of course, most applications of probit early-warning models use the PTA without belaboring the point; we find the label useful in this paper to clarify the distinction with the DTA.
All these are discussed at length in IMF and World Bank (2012), which itself draws heavily on IMF and World Bank (2004) and the influential paper by Kraay and Nehru (2006). The latest DSF guidance note provides a comprehensive guide to the use of the DSF, explicitly intended for readers without an extensive prior knowledge of the framework (IMF and World Bank, 2013).
For an updated and comprehensive review of the determinants of debt crises, focused on developing countries, see Pradelli (2012).
See Buffie et al. (2012) for an alternative country-specific and scenario-based approach to assessing debt sustain-ability.
The regression tree approach used in Manasse and Roubini (2009) uses a different technique that aims to find which combinations of indicator variables best sort observations into high-risk and low-risk pools. This may have promise as an alternative to the LIC DSF, but it represents a completely different technique and hence does not lend itself to the agenda, of this paper, which is to examine the current approach and suggest modifications.
“The unique nature of crises inherently limits the ability of formal statistical tools to extract information that may be useful for identifying the next crisis. ‘Preparing to fight the last war’ is an obvious pitfall. The [Fund's] EWE thus complements empirical analysis with more heuristic methods, including wide-ranging consultations, as well as judgment informed by economic expertise. Both approaches are complementary: quantitative methods provide a systematic basis for the identification and analysis of vulnerabilities and a useful cross-check on judgment; qualitative analysis helps identify new sources of vulnerabilities and assess consonance among the conclusions stemming from empirical work” (IMF, 2012, p. 15).
A debt distress episode is defined as a period lasting three or more years in which at least one of the following signals of distress is observed: (i) the accumulation of arrears on public guaranteed (PPG) external debt in excess of five percent of the outstanding PPG external debt stock; (ii) a rescheduling of obligations due to Paris Club creditors; or (iii) the disbursement by the IMF of GRA resources exceeding 50 percent of IMF quota.
All explanatory variables are lagged one period to attenuate endogeneity issues. Previous estimates also included a dummy for African countries and the (log of) GDP per capita and adopted slightly different definitions of the distress indicator and debt ratios. For additional details, see IMF and World Bank (2012, Annex 1 and Table A1), and the 2010 and 2013 DSF guidance notes (IMF and World Bank, 2010, 2013). The list of all the LICs in the debt sustainability analysis and several related documents are available at: http://www.imf.org/dsa.
This loss function is the “preferred method” of finding optimal cutoffs in IMF and World Bank (2012), with
This description hides a fair amount of complexity which need not concern us here. In the 2012 revision of the DSF, the IMF and the World Bank (IMF and World Bank, 2012, p.19) derive thresholds using three different concepts of probability of debt distress: (1) the unconditional probability of debt distress; (2) the probability of debt distress corresponding to the median value of the relevant debt burden indicator immediately prior to an outbreak of debt distress; and (3) the probability of debt distress that minimizes the number of missed crises and false alarms. This last option is the preferred one (see IMF and the World Bank (2012, p. 42): this probability “simultaneously minimizes the number of missed crises and false alarms produced by the model, thus ensuring that the thresholds are neither too permissive nor unduly conservative.” Our results do not depend on whether approach (1), (2), or (3) is used. However, we follow the third approach because it makes it much easier to demonstrate the internal inconsistency of the WCA approach, as we show below. A further detail is that the weights in the DSF are not strictly speaking equal. Rather, the DSF calibrates the thresholds using the average probability minimizing Type 1 and 2 errors over the different weights, with the relative weight of Type 2 errors being “gradually increased” from one to almost three times the weight of Type 1 errors. The points we make in this paper are also robust to this detail.
The official values of the cutoffs, as described in IMF and World Bank (2012) actually follow method  in footnote 14. These differ slightly from those in IMF and World Bank (2004) and from those shown in Table 3. However, all the resulting probability cutoffs and debt thresholds are about the same, as IMF and World Bank (2012) emphasizes. The last DSF guidance note revises the threshold of debt service over revenue, and incorporates remittances in the denominators of the debt ratios for countries which receive large remittances inflows (IMF and World Bank, 2013, Tables 2 and 5); however, in the paper we continue to refer to the data and methodology outlined in IMF and World Bank (2012), because this document presents in more detail the methodology behind the DSF.
To improve the flexibility of the DSF, the assessment of the risk of debt distress may involve also the use of customized scenarios, if it captures an important vulnerability of the country which is overlooked by the standardized stress tests (IMF and World Bank, 2013). And as discussed in the introduction, there is a role for judgment in the application of the thresholds and the WCA.
It is infeasible to test the full procedure of 20-year baseline forecasts and stress tests outlined above.
The AUROC is asymptotically normally distributed. For a recent use of the AUROC in a similar context, see Schularick and Taylor (2012), Catão and Milesi-Ferretti (2013), and Drehmann and Juselius (2013). There is one tricky distinction between the text, which describes a standard ROC, and what we do here. Remember that in the DSF and the variants we examine, the threshold is chosen to minimize the loss function. Following this approach, we draw the ROCs in this paper not by calculating the goodness-of-fit for each possible probability cut-off value from 0 to 1, but rather by calculating the optimal cutoff probability as α (the weight on missed crises in the loss function) varies from 1 to 0, and then calculating the associated goodness-of-fit. (Note that α = 1 implies full weight on missed crises and hence an optimal cut-off of 0.) In effect, the resulting “alpha” ROC is a convex version of the standard ROC. The two curves coincide for all those cutoffs probabilities that would be chosen for some value of alpha. For the WCA, only the “alpha” ROC can be calculated: it is not possible to map from a given cutoff to a point on the ROC for the WCA, because each of the five debt indicators that make up the WCA is associated with a different probability cutoff. We therefore always use these ‘alpha’ ROCs here, so we are comparing apples to apples.
In particular, we would choose the
This loss of information takes place even if the DTA cut-off discrepancy noted above is fixed in the calculation of
The application of the WCA—and the DSF more broadly—is not mechanical in practice. However, the use of judgment with respect to the use of the debt burden indicators is generally confined to “limited and temporary” breaches of the thresholds. Typically, a stable and significant breach of the threshold by one debt burden measure will imply a high risk rating. We return to this issue below.
The DTA cut-off discrepancy we saw in section 3.1 looks similar to this difference, but there is an important distinction: that discrepancy did not represent a bias but rather an essentially random optimization error.
IMF and World Bank (2012) discusses the possibility of a probability approach, which is included in the last revision of the DSF (IMF and World Bank, 2013) as an alternative methodology to use for assessing the risk of debt distress in borderline cases (when the largest breach, or near breach, of a threshold falls within a 10-percent band around the threshold).
Alternatively, some argue that the simple thresholds for individual debt burden measures are easier to understand than probabilities and thus a better tool to guide discussions on how to avoid the risk of debt distress. However, the probabilities underlie the debt thresholds, so any “understanding” of the debt thresholds that does not encompass an understanding of the underlying probabilities may be too shallow to form the basis for an ideal policy dialog.
Similar considerations hold when using the other four debt indicators, but figures are not shown for brevity.
It is not a coincidence that the points chosen on the loss function for the PTA and the DTA line up exactly, given that the cutoff point chosen for the DTA (which minimizes the loss associated with using the probabilities, not the thresholds, to call crises) is none other than the one used in the PTA.
For this calculation we draw 1,000 bootstrap samples from our data and calculate the AUROC of the WCA and the CIEW for each sample. In only 3 of the 1,000 samples is the AUROC of the WCA higher than that of the CIEW. Note that, as discussed in section 3.1, the WCA is impaired partly by the fact that the cutoffs chosen for each individual debt measure are not optimal. Even using the optimal cutoffs discussed in section 3.1, the AUROC of the WCA is lower than that of the CIEW with a p-value of 0.075.
The relatively good performance of the CIEW reassuringly echoes the results from macroeconomic forecasts in Stock and Watson (2004), who describe the “forecast combination puzzle” as the finding that simple combination forecasts, in particular, combinations that look a lot like unweighted averages, tend to do well in empirical applications. It may reflect simple debt-variable-specific measurement error—likely especially prevalent in LICs—which could make extreme values of any one debt variable particularly suspect and thus the WCA a particularly inaccurate aggregation method.
As with the other indices, we first standard the debt measures by dividing by their standard deviation, so that the coefficient values can be directly compared.
To calculate statistical significance, we use the same bootstrap technique mentioned in footnote 27, except that here we calculate the BIC for each bootstrap sample, to adjust for the fact that the CIMP estimates many more parameters than the CIEW. In only 18 out of 10,000 of these samples is the BIC of the CIMP lower than that of the CIEW, for a p-value of 0.018.
We also examined the use of univariate measures of predictive accuracy as weights for the five variables, such that each
These statistical calculations follow the method outlined in footnote 31.
There are many alternative ways of arriving at a similar model, and we have tried many of them. For example, one possible way to tackle the collinearity problem is to simplify the five-variable model by eliminating statistically insignificant variables until those that remain are significant (i.e. a stepwise ‘general-to-specific’ approach). This has superficial merit in our application, in that such a procedure that starts by dropping the least significant of the five debt variables identifies a single specification with two surviving variables, DsRev and DExp. However, this conclusion is path-dependent. Dropping a slightly less insignificant variable in the first step will tend to yield a different pair of variables in the final specification. Bootstrapping this process results in many possible two-variable end-points to the procedure, none of which is a clear winner at standard p-values. The most successful, the model with DsRev and DExp, wins in only 37 percent of the bootstrap samples. We pick the two-variable model with DsRev and DExp and name it as we do because it has (nearly) the lowest BIC and it is also the end-point of the step-wise procedure just described.
Unlike the CIMP, the coefficient values are tightly estimated and are statistically significant in the traditional sense (not shown). Of course this is only true conditional on the assumption that this particular two-variable model is the correct one, an assumption we cannot defend statistically.
We again follow the same statistical approach as described in footnote 31.
The weight in DsRev comes from: 0.55 = 0.47 + 0.2 × βCIEW, where
To test the significance of the difference in AUROCs, we bootstrap the AUROC for the CIEW and note that the in-sample value of the AUROC for the CIEWP is well within the resulting distribution.
Our expectation was that these different models might perform similarly on average but yield very different results observation by observation. This turns out not to be the case, as there is a great overlap observation-by-observation in the predictions of different aggregators. For instance, CISW and CIEW predict, respectively, 161 and 167 debt distress episodes, with an overlap of 141 events.
Of course, more major changes to the framework might yield more interesting results. For example, if the regressions distinguished between solvency and liquidity crises, then the different debt measures might play more distinct roles. This is beyond the scope of this paper.
Arguably, the bias created by overly optimistic growth projections is already flagged by the use of a “historical values” stress test in the DSF itself.
“If subjective confidence is not to be trusted, how can we evaluate the probable validity of an intuitive judgment? […] The answer comes from the two basic conditions for acquiring a skill: an environment that is sufficiently regular to be predictable, and an opportunity to learn these regularities through prolonged practice. When both these conditions are satisfied, intuitions are likely to be skilled” (Kahneman, 2011).