Debt Is Not Free
Author:
Ms. Marialuz Moreno Badia
Search for other papers by Ms. Marialuz Moreno Badia in
Current site
Google Scholar
Close
,
Mr. Paulo A Medas
Search for other papers by Mr. Paulo A Medas in
Current site
Google Scholar
Close
,
Pranav Gupta
Search for other papers by Pranav Gupta in
Current site
Google Scholar
Close
, and
Yuan Xiang https://isni.org/isni/0000000404811396 International Monetary Fund

Search for other papers by Yuan Xiang in
Current site
Google Scholar
Close

Contributor Notes

With public debt soaring across the world, a growing concern is whether current debt levels are a harbinger of fiscal crises, thereby restricting the policy space in a downturn. The empirical evidence to date is however inconclusive, and the true cost of debt may be overstated if interest rates remain low. To shed light into this debate, this paper re-examines the importance of public debt as a leading indicator of fiscal crises using machine learning techniques to account for complex interactions previously ignored in the literature. We find that public debt is the most important predictor of crises, showing strong non-linearities. Moreover, beyond certain debt levels, the likelihood of crises increases sharply regardless of the interest-growth differential. Our analysis also reveals that the interactions of public debt with inflation and external imbalances can be as important as debt levels. These results, while not necessarily implying causality, show governments should be wary of high public debt even when borrowing costs seem low.

Abstract

With public debt soaring across the world, a growing concern is whether current debt levels are a harbinger of fiscal crises, thereby restricting the policy space in a downturn. The empirical evidence to date is however inconclusive, and the true cost of debt may be overstated if interest rates remain low. To shed light into this debate, this paper re-examines the importance of public debt as a leading indicator of fiscal crises using machine learning techniques to account for complex interactions previously ignored in the literature. We find that public debt is the most important predictor of crises, showing strong non-linearities. Moreover, beyond certain debt levels, the likelihood of crises increases sharply regardless of the interest-growth differential. Our analysis also reveals that the interactions of public debt with inflation and external imbalances can be as important as debt levels. These results, while not necessarily implying causality, show governments should be wary of high public debt even when borrowing costs seem low.

“Economic ruin admits of varied interpretations, but most of them apply at present to the greater part of Europe, and most of the ruin is to be ascribed to the piling up of debt”

J. S. Nicholson, “Adam Smith on Public Debts,” The Economic Journal, 1920

I. Introduction

Should governments worry about high public debt? The answer from standard debt sustainability frameworks is yes, for not only does real economic growth dip in the immediate aftermath of a fiscal crisis, but the loss of output is often permanent (Medas et al. 2018; Asonuma et al. 2019). However, despite Reinhart and Rogoff’s (2009 and 2011a) seminal work on the perils of excessive debt, the empirical literature on the relationship between public debt and fiscal crises is still surprisingly inconclusive to this day. The case for more public debt is being reinforced by weak economic activity across the globe, large investment needs, and increasing concerns that monetary policy may be reaching its limits particularly in advanced economies. And yet, the risk of fiscal crises still casts a long shadow. Therefore, as many countries remain riddled with mounting debt, one of the most pressing questions facing policymakers is whether current high debt levels are a bellwether of future crises with large economic costs.

The argument that “public debt may have no fiscal cost” (Blanchard 2019) is also gaining traction as many countries face historically low interest rates and the global stock of negative-yielding debt is hovering around $12 trillion by the end of 2019. The underlying rationale is that if interest rates are lower than the economic growth rate—that is, the interest-growth differential is negative—there is no reason to maintain a primary surplus as it would be feasible to issue debt without later increasing taxes.

To help shed more light on this debate, we use machine learning models to identify robust predictors of fiscal crises and ask whether public debt is a reliable leading indicator by itself or when interacting with other variables. The lack of strong evidence on the importance of public debt stems from various methodological challenges. First, not only are crises rare events but also limitations with debt data make robust modelling difficult.2 More importantly, it is very hard to distill complex nonlinearities and interactions from the classic econometric techniques typically employed in the early warning literature. Namely, the debt dynamics depend on many factors (e.g. interest rate, growth, deficit, shocks) and some of those interact with each other: for example, a government may respond to a period of a low interest-growth differential by increasing the deficit—which may in turn lead to higher debt and increase the risk of a crisis. Therefore, with few settled facts, the debate on whether public debt matters for predicting fiscal crises remains unresolved. Our paper takes a step toward filling that gap.

We bring evidence to bear on the issue by studying fiscal crises in a broader sample (188 countries) than previously used in the literature going back to the 1980s. One of the main novelties of our empirical strategy is to take an agnostic approach to the selection of predictors following a two-step procedure. As a starting point, we consider a wide range of predictors, including covariates that the literature commonly associates with the onset of crises. Also, we do not take a view as to what the relevant moments of the variables are but instead consider many permutations yielding a total of 748 indicators. By leveraging machine learning models, we can fit complex and flexible functional forms to our data without overfitting. Our objective, however, goes beyond obtaining a well-performing prediction model as we also want to identify those variables enabling that good prediction. Thus, in a second step we reduce the large set of predictors to the ones that contain more information than noise by using what is referred to in the machine learning literature as “feature selection”. We then use a battery of statistical measures to go beyond the black box, allowing us to uncover the relative importance of variables and their interactions.

Our results show public debt in its various forms is the most important group of predictors. This finding is not an exclusive feature of the period immediately predating the global financial crisis but applies more broadly. The narrative, however, is far from simple as some forms of debt are more important than others—in particular, public external debt—and there is strong evidence of non-linearities across all income groups. Remarkably, the interest-growth differential has low predictive value. What is more, beyond certain debt levels the likelihood of fiscal crises increases significantly irrespective of whether the interest-growth differential is highly positive or negative. Event studies give some insights as to why this may be the case: it is only at the onset of the crisis that the interest-growth differential spikes, making it immaterial for signaling purposes. Our empirical analysis also reveals that it is not solely public debt that matters as its interactions with other variables are as important. Hence, the probability of a crisis rises steeply at high public debt levels but also at relatively moderate ones when accompanied by high inflation or large current account deficits. Notably, these results hold across all income groups.

Our work is related to the extensive early warning literature on sovereign debt crises (see, for example, Detragiache and Spilimbergo 2001; Manasse, Roubini, and Schimmelpfennig 2003; and Chakrabarti and Zeaiter 2014). Relative to that literature, our contribution is twofold. First, we analyze systematically the predictive importance of public debt thanks to a more comprehensive dataset on debt and its characteristics (such as the creditor structure) than previous studies and an unparalleled number of macro indicators. To the best of our knowledge, this is also the first study to examine explicitly the predictive power of the interest-growth differential. The use of a much broader sample enables us to explore the sensitivity of our results along the time and country grouping dimensions. As an additional layer ensuring robustness, we assess the performance of alternative algorithms not only in terms of the out-of-sample predictive accuracy but also the stability of the variable selection with respect to sampling variation. Our second contribution is to leverage machine learning to analyze complex non-linearities and interactions previously ignored in the literature. Our results suggest that this omission may go a long way in explaining the lack of conclusive empirical evidence on the importance of public debt of past studies. Our paper is the first attempt to shed light on the nature of the complex dynamics at play showcasing the potential of machine learning in macroeconomics, a field where the use of these techniques is still in its infancy.

The rest of the paper is organized as follows. Section II provides a survey of the literature on the determinants of fiscal crises covering the last five decades. We not only describe past approaches to crisis prediction but also discuss the changing nature of the predictors identified as important since the 1970s. Section III discusses the definition of fiscal crises and the main characteristics of the data with an emphasis on debt. In Section IV, we describe the main features of our empirical strategy. Section V presents the results on selection of variables and predictor importance. We then conclude by recapping our main findings and their implications for the current policy debate.

II. Does Debt Matter? Lessons from the Literature

The literature on fiscal crises and their determinants has evolved significantly over time. This has been in part a reflection of the changing nature of sovereign debt defaults and other forms of fiscal distress. As new data and econometric techniques became available, economists tried to uncover the empirical regularities surrounding fiscal crises with a special emphasis on the the second financial era (i.e. the period after World War II).

Initially, the reasearch on sovereign crises focused mainly on developing countries. All in all, the 1950s–1960s was a period when greater indebtness was seen as a way to help promote economic growth among less developed nations. By the 1970s, borrowing started to be associated not only with development needs but also with periods of external current account imbalances. Nonetheless, debt was generally seen in a positive light up to that point (Solberg 1988). By the early 1980s, however, the number of fiscal crises surged and so did the work on understanding the drivers of debt distress. We next present an overview of the literature over the last 50 years and some of the challenges that remain especially on identifying the key determinants of crises (Figures 1 and 2). We survey 42 papers chosen out of a pool of 63 references based on their empirical relevance and whether they clearly identify key predictors of crises (for details, see Appendix 1). We organize the discussion into three different periods refering to the years when the papers in question were published.

Figure 1.
Figure 1.

Predictors of Fiscal Crises in the Literature

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The charts are based on a literature review of 42 empirical papers (for more details, see Appendix 1). Variables plotted are those that are statistically significant in at least a third of the papers published during the period of reference.
Figure 2.
Figure 2.

Most Common Predictors in the Literature

(Share of surveyed papers, 1970–2018)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: The chart is based on a literature review of 42 empirical papers (for more details, see Appendix 1).

1970s–1990s: capacity to repay

The literature during this period was mainly focused on assessing the capacity of the sovereign to manage its debt service and avoid defaults on external debt (Feder et al. 1981; Taffler and Abassi 1984; Hajivassiliou 1987). The selection of predictors was often dictated by the financing mix of countries covered (many of them developing countries). Not surprisingly, these papers found external variables related to the capacity of a country to repay its obligations (external debt service; size of external debt3, foreign exchange reserves to imports) to be among most important predictors of crises although economic growth was also identified as a key indicator (Figure 1, panel 1).

The definition of crisis in these studies was in general limited to debt rescheduling or arrears on external debt. The empirical strategy was often based on a logit model with a small number of predictors or, in a few cases, a linear regression model. In some instances, the studies were trying to identify determinants (sometimes, institutional factors) of the capacity to repay and did not test whether the models were useful in predicting future crises (Rivoli and Brewer 1997; Lee 1991; Berg and Sachs 1988). There was also little attention paid to studying the role of public debt in helping predict crises as the focus was on external imbalances.

2000–10: the importance of external debt

At the turn of the century, the definition of crisis starts to expand to include not only debt defaults but also access to IMF programs above a certain quota. The concern was that the previous definition was too restrictive as countries in distress might have been able to avoid defaults by getting official credit from the IMF. The logit model remained one of the most popular tools to predict crises and identify drivers (Ciarlone and Trebeschi 2005; Detragiache and Spilimbergo 2001). However, there were a few papers trying different approaches, including Manasse and Roubini (2009), with a classification and regression tree, and Fioramanti (2008) with neural networks.

Nonetheless, fiscal crises continued to be seen from the perspective of external vulnerabilities, that is whether the sovereign would default on external creditors. As such, there was significant attention to external debt, with a large share of papers identifying it as a predictor of crises (Figure 1, panel 2). Other common predictors included real GDP growth, debt service and the maturity of debt, exchange rate, and default history. In this context, there was still limited focus on public debt, especially domestic debt and arrears.

2011 onwards: a growing debate on the role of public debt

In the aftermath of the global financial crisis, several papers start to use a more comprehensive definition of fiscal crises. The attention is no longer on external defaults alone, but papers also acknowledge that fiscal crises may reflect other types of distress and affect both external and domestic creditors. Hence, the crisis definitions now include debt defaults (mainly external), IMF programs, implicit debt defaults (high inflation, domestic arrears), and loss of market access (Medas et al. 2018; Bruns and Poghosyan 2018; and Sumner and Berti 2017). A broader set of methodologies is also used, although the logit model remains a popular approach. Several papers also pay closer attention to the robustness of results, especially the out-of-sample predictive power, but this remains a weakness in the literature of early warning systems more generally (Berg, Borensztein, and Pattillo 2005; and Cerovic et al. 2018).

Although there is an effort to examine a broader set of potential predictors, the empirical research remains constrained by the use of traditional econometric techniques. In many cases, preference has been given to parsimonious approaches relying on a limited set of indicators partly reflecting the priors of the researcher but also difficulties addressing overfitting and data constraints. Among the most common predictors are the level of GDP or economic growth and external variables (current account, exchange rate, and to a less degree external debt and degree of openness). Public debt and fiscal-related variables are also identified but less frequently so (Figure 1, panel 3).

While there are a growing number of papers examing the role of public debt, the evidence so far is mixed: Savona and Vezzoli (2015) and Bruns and Poghosyan (2018) do not find evidence that public debt matters for predicing crises, while Cerovic et al. (2018) and Sumner and Berti (2017) find some evidence that it does but it is not robust across specifications; changes in public debt are also a signficiant predictor of debt crises in Reinhart and Rogoff (2011a) although the result does not hold for the post-World War II period. In addition, there have been no studies explicitly analyzing the interest-growth differential as a predictor of fiscal crisis. The lack of conclusive evidence on the importance of public debt in predicting fiscal crises has likely fuel the recent policy debate on whether governments should worry about public debt levels especially when interest rates are low.

By in large, the literature has also been unable to explore complex dynamics. Despite some recent research using machine learning techniques for predicting sovereign debt crises (Savona and Vezzoli 2015), these studies only consider a small set of indicators and stop short of analyzing non-linearities or interactions among predictors.

III. Data

A. Measuring Fiscal Crises

There is no common definition of fiscal crises in the literature, but most studies focus on sovereign debt crises triggered by external defaults (see, for example, Detragiache and Spilimbergo 2001; Chakrabarti and Zeaiter 2014). In some instances, however, heightened budgetary distress may be associated with domestic arrears or inflation (Reinhart and Rogoff 2011b), or a default is avoided thanks to official creditor assistance (Manasse, Roubini, and Schimmelpfennig 2003). To capture these different facets, we follow Medas et al (2018) and identify fiscal crises in any given year if any of the following four criteria is met (for details, see Appendix 2):

  1. Credit events. A crisis is triggered when the debt service is not paid on the due date or the creditor incurs any other type of losses including through debt restructuring.

  2. Exceptionally large official financing. Episodes where the country receives large financial support from the IMF or the European Union.

  3. Implicit domestic public debt default. Two criteria are considered: (1) periods of high inflation (usually associated with monetary financing of the budget); or (2) accumulation of domestic arrears.

  4. Loss of market confidence. Episodes associated with extreme market pressures as proxied by: (1) loss of market access, capturing sovereign defaults or bond issuance coming to a halt; or (2) very large borrowing costs or sovereign yield spikes.

Based on this definition, we identify 418 crisis episodes for a sample of 188 countries over the period 1980–2016, making ours one of the most comprehensive study of fiscal crises to date (for details on the sample, see Appendix 3). On average, countries have experienced two fiscal crises since 1980 with more than three quarters of countries having at least one crisis (Table 1). Low-income countries (LICs) is the group with the highest frequency of crises— about two-thirds are in fiscal distress at any point in time—followed by emerging market economies (EMs)—on average, 40 percent. On the other hand, fiscal crises are rare events among advanced economies (AEs): less than 15 percent of them are in fiscal distress in any given year.

Table 1.

Fiscal Crises Episodes (1980–2016)

article image
Sources: Bloomberg; Datastream; Eurostat; Gelos, Sahay, and Sandleris (2004); Guscina, Sheheryar, and Papaioannou (2017); IMF, International Financial Statistics; OECD; Reuters; and authors’ calculations.

Crisis starts can be associated with more than one criterion. Therefore, the breakdown does not need to add up to 100. A year is considered to be a fiscal crisis year when at least one of the four criteria is met. To separate between crisis events, we require at least two years of no fiscal crisis between the distinct events.

A cursory look at the data suggests that the 1990s was the decade with the highest concentration of crises: at the peak, about half of the countries (EMs for the most part) were in fiscal distress (Figure 3). To a lesser extent, there was also some bunching in the early 1980s—reflecting the collapse of commodity prices and a surge in global interest rates—and in 2010—following the onset of the global financial crisis—pointing to the potential importance of global factors as precursors of fiscal crises. Overall, credit events are the most frequent type of crises accounting for close to two-thirds of episodes. Nonetheless, AEs are outliers relative to other income groups with most episodes associated with loss of market confidence and/or exceptional large official financing.

Figure 3.
Figure 3.

Countries with Fiscal Crises, 1980–2016

(Number)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Sources: Bloomberg; Datastream; Eurostat; Gelos, Sahay, and Sandleris (2004); Guscina, Sheheryar, and Papaioannou (2017); IMF, International Financial Statistics; Laeven and Valencia (2018); OECD; Reuters; and authors’ calculations.1/ Two crises are identified as overlapping if they start within two years of each other. Financial crises are banking crises episodes as defined in Laeven and Valencia (2018).

Although fiscal crises are usually not accompanied by other types of distress, in about a third of cases there is overlap with currency crises (Figure 4). Consistent with the empirical evidence in Reinhart (2002), we find that most of these cases relate to EMs and LICs, underscoring the importance of external financing among these countries, an issue we explore later in this paper. The synchronicity with financial crises is, however, relatively low suggesting that, although banking crises may precede or coincide with sovereign debt crises through a contingent liability channel (Reinhart and Rogoff 2011a), the root cause of a fiscal crisis may often lie elsewhere. Triple crises are even rarer, accounting for only 3 percent of events.

Figure 4.
Figure 4.

Overlap with Other Crises, 1980–2016

(Number of crises episodes)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Sources: Bloomberg; Datastream; Eurostat; Gelos, Sahay, and Sandleris (2004); Guscina, Sheheryar, and Papaioannou (2017); IMF, International Financial Statistics; Laeven and Valencia (2018); OECD; Reuters; and authors’ calculations.1/ Two crises are identified as overlapping if they start within two years of each other. Financial crises are banking crises episodes as defined in Laeven and Valencia (2018).

B. Predictors

As discussed in Section II, there is no consensus over the relative importance of alternative predictors of fiscal crises partly reflecting the diversity of methodologies and samples used in the literature. In addition, priors prevailing at the time of previous studies (as well as data availability) may have biased the selection of indicators to be tested, narrowing the set to a few variables of interest to the researcher in question and leaving out important interactions (Chakrabarti and Zeaiter 2014). To address these shortcomings, we canvass the empirical and theoretical literature to identify potential predictors of crises. Our aim is to cast as wide net as possible, the only constraint being data availability.4 As a result, our dataset covers an unusually rich array of economic indicators and institutional country characteristics. Furthermore, the analysis uses several permutations of each variable—such as levels, lags, and differences at various horizons—and cross-sectional averages—which allow us to capture dependencies arising from global factors or spillover effects. Overall, this yields 748 indicators encompassing among others: different measures of debt, economic activity, level of development, prices, fiscal aggregates, external indicators, global factors, demographics, and institutions. Appendix 4 gives a detailed description of the variables and sources.

An important contribution of this paper is to assemble a more comprehensive range of debt metrics than previous studies, allowing us to make a more robust assessment of their relevance as predictors of crises. We do not restrict ourselves to public debt but also include various indicators of private indebtedness as well as the total stock of debt in the economy. Accounting for these different forms of debt is important as the line between them may become blurry at times of crises (Mbaye, Moreno Badia, and Chae 2018a) and what may matter is not any single measure but their interactions. To construct consistent time series of debt, we leverage the Global Debt Database—an unmatched account of private, public, and total debt for 190 countries going as far back as 1950 (for details, see Mbaye, Moreno Badia, and Chae 2018b). We scale debt not just by GDP but also use other indicators that can proxy for available liquidity such as reserves or revenues, complementing the information provided by debt service ratios. We also capture some of the characteristics of public debt that have been identified as important in the sovereign debt crises literature. Significant effort was placed on building metrics of external public debt comparing alternative sources of data to ensure the consistency of the series. Figure 5 shows the debt data coverage in our sample. A salient feature is that there are large differences in public debt profiles across country groups not just in terms of levels but also the composition—external debt being the main component among LICs but not in other income groups (Figure 6). In the remainder of the paper we will exploit this heterogeneity to identify what debt characteristics may have higher discriminating value.

Figure 5.
Figure 5.

Debt Statistics: Country Coverage, 1980–2016

(Number of countries)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Sources: IMF, Global Debt Database; IMF, World Economic Outlook; World Bank, World Development Indicators; U.S. Bureau of Economic Analysis; Haver; Arslanalp and Tsuda (2014); and authors’ calculations.
Figure 6.
Figure 6.

Public and Public External Debt, 1980–2016

(Weighted average, percent of GDP)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Sources: IMF, Global Debt Database; IMF, World Economic Outlook; World Bank, World Development Indicators; U.S. Bureau of Economic Analysis; Haver; Arslanalp and Tsuda (2014); and authors’ calculations. Note: Public external debt refers to public and publicly guaranteed debt.

Given the current policy debate, we also pay special attention to the construction of the interest-growth differential variable (henceforth, “r-g”). Our starting point is the recursive equation behind changes in the debt-to-GDP ratio—see Escolano (2010) for a detailed discussion—and thus, r-g is defined as:

( r g 1 + g )

where r is the effective interest rate and g the GDP growth rate. We calculate the effective rates using consistent time series for the stock of public debt, which has been a challenge in the literature.5 The trade-off is that, to the extent that the interest bill refers to a broader perimeter of government than the stock of debt, the interest-growth differential may be overestimated. This shortcoming also applies to other studies that look at long histories. However, LICs—the group for which debt stocks usually refers to the narrower perimeter of government—typically report the interest bill for the same level of government so this is less of a problem. Consistent with other studies (see, Mauro et al. 2015; Barrett 2018; Escolano, Shabunina, and Woo 2017), our data suggests that, on average, the interest-growth differential has been close to zero or negative since the 1980s across all income groups (Figure 7). However, there is a wide dispersion within each group and positive interest-growth differentials are not an anomaly.

Figure 7.
Figure 7.

Interest-Growth Differential, 1980–2016

(Percent)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Sources: IMF, Global Debt Database; IMF, World Economic Outlook; and authors’ calculations.

Finally, we also compile data for various fiscal indicators that have often been overlooked in the literature. We include not only the fiscal deficit but also revenues and expenditures (and the disaggregation into its primary component). To capture potential valuation effects and contingent liabilities that may lead to fiscal distress, we calculate stock-flow adjustments (SFAt) as follows:

S F A t = d t ( 1 + r 1 + g ) d t 1 + p t

where dt and dt-1 are the stock of public debt in periods t and t-1 respectively, and pt is the primary balance in period t.

IV. Methodology

Our main objective is to identify a stable and robust set of predictors of fiscal crises from a large number of variables. As some indicators may be useful in predicting crises but only when interacting with other variables or in a non-linear way, it is important that our estimation strategy captures complex dynamics. Our model of choice is a random forest (Breiman 2001) for two reasons. First, it can deal with complexity and deliver significant improvements in predicting fiscal crises relative to standard econometric approaches typically used in the early warning literature and other machine learning algorithms (see Appendix 5 for an empirical comparison of the out-of-sample performance of the random forest (RF) against other econometric approaches). Second, in a large scale empirical evaluation of 179 classification algorithms tested across 121 real world datasets, Fernandez-Delgado et al. (2014) find that, on average, RF is the best performer.

RF aggregates many decision trees, each run in a random sample of variables and country-years. Decision trees are very flexible models but the bigger a tree grows, the less likely is that it will generalize well to out-of-sample data. This is usually referred to as overfitting. By averaging the predictions of many trees, RF cancels out the noisy components of each tree, increasing the ability to predict on new data. The advantage is that RF can potentially incorporate a very large number of predictors without running into overfitting problems. The downside to this approach is that it makes it more difficult to distinguish relevant from irrelevant variables and to understand how each indicator affects the probability of a crisis (Degenhardt, Seifert and Szymczak 2019). Our empirical approach is to reduce our initially large set of variables to the ones that contain more information than noise. In the remainder of this section, we describe the various selection procedures used in this paper to identify a stable set of predictors and the statistical techniques to analyze the importance of variables and their interactions. Appendix 5 gives more technical details on methodology.

A. Variable selection

Variable selection is a crucial issue in many applied classification and regression problems (e.g. Hastie, Tibshirani, and Friedman 2001). In our benchmark RF, we start with a large set of variables (748) to assign a probability of a crisis in the next two years. More specifically, let f be a predictive model:

y ^ = f ( x ) ,

where X is a matrix with n (annual) observations and m variables and y^[0,1] is the predictive probability of a fiscal crisis over the next two years. y takes value 1 if there is a crisis and 0 if there is no crisis in the next two years.

To reduce the number of variables to those that are most relevant, we use what is referred to in the machine learning literature as “feature selection” algorithms. These techniques have been used as the workhorse in genomics research (for a review, see Saeys, Inza and Larranaga 2007; Ma and Huang 2008; Hilario and Kalousis 2008; Duval and Hao 2010; and Degenhardt, Seifert and Szymczak 2017). The basic principle of these algorithms is to identify the relative importance of features (each of them using different criteria to determine the relative ranking) and eliminate those features that are unimportant according to some predefined metric. We focus on the four algorithms built around RFs that have been more widely used in the literature:

  • P-values computed with permuation importance (PIMP). Altmann et al. (2010) developed a method for selecting relevant predictors based on repeated permutations of the outcome vector (i.e. the likelihood of a crisis), leaving correlation patterns between predictors unchanged. For each permutation of the outcome, the relevance for all predictor variables is assessed. This leads to a vector of importance measures for every variable, called the “null importances”. The PIMP algorithm fits a probability distribution to the population of null importances (such as normal, lognormal, or gamma). Parameters of these distributions are estimated using maximum likelihood methods and P-values are calculated as the probability of observing an importance score that is larger than the original importance score under the estimated distribution. Only significant predictors (with respect to the PIMP scores) are kept.

  • Recursive Feature Elimination (RFE). RFE aims to find a minimal set of variables which leads to a good prediction model (Diaz Uriarte and Andres 2006). It starts with a RF built on all variables. A specific proportion of the least important variables is then removed, and a new RF is generated using the remaining variables. These steps are recursively applied until the out-of-bag (obb) predictive error is larger than the initial/previous oob error. At each step the prediction performance is estimated based on the out-of-bag samples that were not used for model building. The set of variables that leads to the RF with the smallest oob error or to an error within a small range of the minimum is selected.

  • Boruta. This algorithm was developed to identify all relevant variables within a classification framework (Kursa and Rudnicki 2010). It compares the importance of the real predictor variables with those of random so-called shadow variables. For each real variable a statistical test is performed comparing its importance with the maximum value of all the shadow variables. Variables with significantly larger (smaller) importance values are declared as important (unimportant). All unimportant variables and shadow variables are removed and the previous steps are repeated until all variables are classified or a pre-specified number of runs has been performed.

  • VSURF. Developed by Genuer et al. (2015), this algorithm returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the RF permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction.

In choosing among these algorithms, we consider two criteria:

  • High predictive power. Ideally, we want the empirical power of the smaller variable set to be at least as good as the the full set. To make that evaluation, we compare the out-of sample predictive performance of four RF estimated using the features selected by each algorithm against the RF estimated with the full set of variables. We check the statistical significance of the difference between the performance of each algorithm and the full RF model by calculating t-tests based on standard errors adjusted for two-way clustering (see Cameron, Gelbach, and Miller 2011). The main performance measure for these comparisons is the area under the receiver-operator curve (AUROC), although other measures such as of the log likelihood and mean squared errors (MSE) are also reported. Using the AUROC, one of the most common metrics in the early warning literature, allows to benchmark our results against other studies. Intuitively, the AUROC assesses the accuracy of binary models against the alternative of a coin toss. A perfectly accurate model would display an AUROC of 1, while one with no predictive power over a coin toss would show a value of 0.5.

  • Stability of feature selection. As noted by Degenhardt, Seifert and Szymczak (2019), it is important to verify the stability of variables selected by each algorithm as these can vary due to small changes in the data and therefore results may not be reliable. To assess stability, we construct two separate samples by randomly dropping 5 percent of observations, comparing the overlap of the features selected by each algorithm in each sample. We use the Pearson Correlation Coefficient to measure the overlap as it allows us to make comparisons between two sets of arbitrary cardinality (see, Nogueira and Brown 2016). The Pearson coefficient takes values between -1 and 1, with 1 meaning perfect overlap between the two sets.

B. Assessing variable importance

There are several methods in the literature to rank explanatory variables by their relative importance. In this paper, we use two approaches:

  • Out of bag permuted predictor importance. The relative variable importance is estimated by measuring the increase in the prediction error after permuting a feature. A feature is “important” if shuffling its values increases the model error, because in this case the model relied on that feature for the prediction. At the opposite end, a feature is “unimportant” if shuffling its values leaves the model error unchanged, because in that case the model ignored that feature. The model errors (with and without shuffling) are calculated on the oob sample.6

  • Shapley values. Variables are ranked by their contribution to the probability of a crisis using Shapley values (Strumbelj and Kononeno, 2010; Lundberg and Lee, 2017). Similarly to cooperative game theory, Shapley values in the machine learning context measure each variable’s contribution (payoff) to an individual predictions’ deviation from the historical mean. They are constructed as the mean of each variable’s marginal contribution to the forecast for every possible combination of other variables. To assess the discriminating value of a particular variable, we also calculate the differences in Shapley values between crisis and non-crisis events. Note that Shapley values do not give the difference of the predicted value after removing the feature from the trained model, but rather give the contribution of a feature value to the difference between the actual prediction and the mean prediction of the sample.

C. Studying interactions and nonlinearities

Partial dependence plots

As outlined above, machine learning techniques allow us to capture non-linearities and heterogenous interactions between various predictors. To analyze these complex relationships, we rely on partial dependence plots (Greenwell 2017; and Friedman and Popescu 2008; Friedman 2001). A partial dependence plot (PDP) shows the marginal effect of one or several features on the predicted outcome and can identify if the relationship between the predictor and the outcome is linear, monotonic or more complex. PDPs can either be a line plot (univariate) or a surface plot (bivariate). Univariate PDP shows relationship between a feature and the predicted outcome (in our case the probability of a crisis), whereas, a bivariate PDP helps to visualize predicted outcome for a pair of features by marginalizing over other variables (see Appendix 5 for details).Intuitively, the partial dependence function at a particular feature value (for example, 40 percent of public external debt) represents the average prediction (in our case, the probability of a fiscal crisis) if we force all data points to assume that feature value. H-statistics

We also use a measure of relative strengths of two-way interaction, that is to what extent two features interact with each other in a given model, following Friedman and Popescu (2008). The measure of relative strength of interaction (termed H-statistic) uses the concept of PDPs and is defined as the variance of the difference between observed bivariate PDP and the two individual PDPs. This captures the fraction of variance of PDxj not captured by variance of PDx + PDj over the data distribution. This helps to rank all (n-1) pairs of nth variables based on their relative strength of interactions between each other.

Mathematically, the H-statistic for the interaction between feature j and k is:

H j k 2 = i = 1 n [ P D j k ( x j ( i ) , x k ( i ) ) P D j ( x j ( i ) ) P D k ( x k ( i ) ) ] 2 / i = 1 n P D j k ( x j ( i ) , x k ( i ) ) 2

The amount of the variance explained by the interaction is used as interaction strength of H-statistic. The statistic is 0 if there is no interaction at all and 1 if the variance of the PDjk is explained by the sum of the partial dependence functions. An interaction statistic of 1 between two features means that each single PD function is constant and the effect on the prediction only comes through the interaction.

V. Results

A. Variable selection

There are wide-ranging differences in the variable sets selected by each algorithm—from less than 10 variables in the VSURF to more than 300 in Boruta (Figure 8)—underscoring the different objectives of each of them and making it difficult to determine which one is the best a priori. Therefore, we start by comparing their out-of-sample predictive performance against the RF estimated using the full set of predictors (Table 2). Although we pool all countries for estimation purposes, results are disaggregated by income groups for comparison with previous studies. We find that the full model performs better for AEs and EMs than for LICs but, overall, the predictive power is higher than previous studies. By way of comparison, the AUC in the full model is 0.81 for AEs and EMEs and 0.71 for LICs while Cerovic et. al (2018) report a maximum AUC of 0.69 and 0.68 respectively (see also Appendix 5 for the performance of alternative estimation methods).7 Among the feature selection algorithms, Boruta is always at least as good as the full model for both income groups irrespective of the performance metric considered. Moreover, it is statistically significantly better than the full model across the board when assessed by the loglikelihood and for AEs and EMEs when assessed by the MSE. At the other extreme, the VSURF has the worst performance throughout, while the PIMP is usually worse than either the Boruta or the RFE. The RFE, one of the most popular methods for variable selection, is somewhere in between, underperforming relative to Boruta across some metrics.

Table 2.

Out-of-Sample Performance of Alternative Feature Selection Algorithms

article image
Note: Bootstrapped standard deviations (based on 100 random resamples of the test sample with replacement) in parentheses. Stars indicate degree of confidence that a model outperforms the Random Forest estimated with the full set of 748 variables:

90%,

=95%, and

=99%. Models predict probability of crisis start occurring in year t+1 or t+2. Out-of-sample performance obtained from 15 rolling regressions (see Appendix 5).

Figure 8.
Figure 8.

Feature Selection Algorithms

(Number of variables selected)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The chart shows the number of variables selected by each feature selection algorithm and the full RF model estimated over the full sample.

In choosing among alternative feature selection algorithms, we also want to ensure the stability in the choice of predictors as results can be highly sensitive to small perturbations in the sample (Nogueira and Brown 2016). Given computational time, we restrict our comparison of stability to the best performing algorithms: Boruta, RFE, and PIMP. The Pearson index is 0.92 for Boruta indicating that despite changing the sample, there is high overlap of the selected variables between replicates (Figure 9). In contrast, the Pearson index is 0.63 for RFE and 0.81 for PIMP suggesting lower stability in the selection of variables, consistent with the findings of other studies for these algorithms (see, Degenhardt, Seifert, and Szymczak 2019).

Figure 9.
Figure 9.

Robustness in Variable Selection

(Pearson correlation coefficient)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The Pearson index provides a measure the stability of the chosen feature set to variations in the training data. Alternative samples are drawn randomly dropping 5 percent of observations from the original dataset. The Pearson Correlation Coefficient gives an indication of the overlap of features across samples (takes values between -1, no overlap, and 1, full overlap).

B. Variable importance

Given its predictive power and stability, we choose Boruta as the benchmark algorithm to study the relative importance of predictors. Despite reducing the initial set by half, the number of variables selected by Boruta is still very large (336 indicators, including permutations). Hence, we expect the predictive power of any individual indicator to be small. Moreover, it may be difficult to fully distinguish the impact of closely related (and correlated) variables. Therefore, we group them into 23 categories to make it easier the interpretation of results (see Appendix 4 for the groupings). Based on the out-of-bag permuted predictor importance, we find that:

  • Public debt is the most important group of predictors followed closely by public debt service (Figure 10). This should not be surprising as fiscal crises by and large involve some degree of debt distress (e.g. inability to repay or borrowing difficulties). But as discussed in Section II, the previous literature has only found weak evidence that public debt matters. We conjecture that, by including much broader set of debt measures and characteristics (for example, whether creditors are foreign) and accounting explicitly for nonlinearities and interactions among variables, we are able to capture the complex dynamics at play in the run up to a crisis. We will explore some of these below.

    Figure 10.
    Figure 10.

    Variable Importance by Group of Predictors

    Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

    Notes: Variable importance is calculated using an in-built out of bag permuted predictor importance function in R based on the RF estimated with the variables selected by Boruta.

  • Institutional slow-moving variables also rank among the most important predictors. These include the level of development (GDP per capita), demographics, and to a lesser degree the quality of institutions. Given the very different frequency of crises between AEs, EMs and LICs, it is likely that these variables are helping discriminate countries more prone to crises rather than the exact timing. In essence, this may reflect the lower vulnerability of more developed countries as stronger institutional frameworks may better prepare them to manage the exposure to shocks and avoid crises.

  • In line with the literature, external variables—in particular, external capital flows and, to a less extent, external debt, current account, and the exchange rate—are also important. As discussed in Section III, fiscal crises overlap with currency crises in a third of cases and, thus, we can expect some association between fiscal crises and periods when external borrowing conditions change, or external investors become concerned with the ability of the sovereign to fulfill its debt obligations.

  • Fiscal flow variables (deficits, revenues, spending) also appear as relevant, but considerably less than debt or some of the external variables. This explains why the past literature had difficulty in finding fiscal variables as robust predictors.

  • At the other end, the interest-growth differential and global conditions are among the least relevant variables.

As an alternative to measure the discriminating power of predictors between crisis and noncrisis observations, we look at Shapley differences. Overall, we find that the ranking of variables across income groups are broadly similar but with some differences (Figure 11). Public debt and public debt service are the most important group of predictors for EMs and LICs but somewhat less for AEs (although public debt remains among the top 3 categories). On the other hand, private debt appears as a more important predictor than public debt for AEs possibly suggesting that what may have started as a debt crisis in the private sector may end up on the balance sheet of the government (e.g. directly via bail out of banks or indirectly through the ensuing recession). We also look at Shapley differences for the pre- and post-2000 period. Overall, public and public debt service are the top categories in both time periods although in reverse order (i.e. public debt has the highest ranking in the post-2000), suggesting that public debt may have been an important red flag not only in the run up to the global financial crisis.

Figure 11.
Figure 11.

Contribution to Probability of a Crisis

(Shapley Values)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: These charts display the mean Shapley value difference (crisis versus non-crisis observations) by income groups.

C. An analysis of selected predictors

In what follows, we undertake a more in-depth analysis of key leading indicators. Given the importance of public debt, we zero in on this group of predictors and, specifically, on public external debt which is the individual debt measure with the highest predictive value. We are also interested in the interactions with other indicators to help inform the ongoing policy debate on the risks associated with debt in the face of low interest rates and inflation as well as the potential role of external and financial imbalances. Throughout this analysis, we will make extensive use of univariate and bivariate PDPs. We present results disaggregated by income groups (AEs, EMs, and LICs) as the nature of crises may differ widely depending on the level of development.

Public debt

Figure 12 (panels 1 and 2) shows the univariate PDP for public external debt, depicting how the average predicted probability of entering a crisis varies when public external debt changes. Overall, there is a positive relationship between the two that is non-linear in nature and holds across all income groups although with some differences in profiles.8 For AEs, the probability of a crisis increases substantially once debt is around 70 percent of GDP. For EMs, the estimated probability is relatively flat for debt levels below 30 percent of GDP but rises steeply above those levels. For LICs, predicted probabilities are much higher from the start and the steepening of the curve takes place at lower debt levels than in other income groups.

Figure 12.
Figure 12.

Partial Dependence Plots1/ and Event Studies

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

1/ Charts (1)-(3) display PDPs based on the Boruta random forest. Solid lines show the PDP curve which represents the average prediction across all levels of public external debt (charts 1 and 2) and the interest-growth differential (chart 3). Dotted lines show probability thresholds based on minimizing the sum of type I and type II errors (missed crises and false alarms).2/ Chart (4) display an event study based on the framework developed by Gourinchas and Obstfeld (2012) where t=0 is the start of the fiscal crisis. We estimate the equation yi,t=αi+j=55βt+jDi,t+j+εi,t where y is the interest-growth differential, and Di,t a dummy equal to 1 when the country is j periods away from the start of a crisis in period t and zero otherwise. Each data point should be interpreted as the interest-growth differential at time t+k , relative to “non-crisis” times benchmark.

To get further insights of when is a probability high enough to be concerned, we calculate the probability thresholds at which the model identifies a crisis. This is done by calculating the threshold that minimize the sum of type I and type II errors (missed crises and false alarms) following the literature (see, for example, Berg, Borensztein, and Pattillo 2005). The threshold for AEs is 8.5 percent, which results in capturing 80 percent of the crises while false alarms are kept at 20 percent. For EMs, the probability threshold is 22 percent and for LICs 28 percent. Overall, public external debt is one of the few variables for which at high enough values (70 percent of GDP for AEs and EMs, and 80 percent for LICs), the estimated probability breaches the crisis threshold regardless of other factors. The somewhat higher debt level for LICs, which may seem counterintuitive given the findings of Reinhart, Rogoff, and Savastano (2003) on debt intolerance, is likely to reflect several factors. First, the minimization of the sum of errors results is picking a higher debt level for LICs likely to avoid too many false alarms. Second, it may partly reflect the higher share of concessional borrowing among LICs making the debt burden (for a given level of debt) lower than those cases with commercial debt.

Taken together, these results tell a consistent story on the high relevance of public debt as a leading indicator. But its importance is also related to interaction effects. By calculating H-statistics scores for pairwise interactions between all variables, we can disentangle to what extent a feature interacts cumulatively with all other features in the model. Estimates H-statistics scores show that the public external debt has the strongest cumulative interactions with all other variables (Figure 13). Also, the H-statistic ranks as top variables other debt and debt service indicators consistent with the analysis on variable importance. Looking into the 2-way discrete interactions, we find that the features that most interact with public external debt are external assets, the inflation rate, GDP relative to US, and public external amortization (Figure 14), suggesting that the estimated probability of a crises is higher than the individual contributions of these variables. Therefore, it is entirely possible that the probability of a crises may be high even for moderate levels of public external debt if other factors are present. We will look at some of these interactions below.

Figure 13.
Figure 13.

Overall Interaction Strength

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: The chart shows the interaction strength (H-statistic) for each feature with all other features for the Boruta RF. Public external debt has the highest relative interaction effect with all other features.
Figure 14
Figure 14

Top-10 Interactions with Public External Debt

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: The chart shows the 2-way interaction strengths (H-statistic) between public external debt and each other feature.

The interest-growth differential

Academics, investors, and policymakers alike are still getting to grips with negative interest rates and what they mean for the affordability of public debt (Blanchard 2019, Garin et al. 2019, and Mehrotra 2017). More specifically, the question is not about interest rates but how high they are relative to economic growth (the interest-growth differential).9 Given our findings so far, a question is whether a low-interest growth differential should assuage concerns about high debt levels. At first glance, the variable importance analysis suggests that the interest-growth differential has very limited information. This is confirmed by PDPs: even for large variations of the interest growth differential, the estimated probability of a crisis barely changes and remains relatively low and never breaches the crisis threshold (Figure 12, panel 3). To see what insights can be gleaned from the data that could explain this apparent puzzle, we follow Gourinchas and Obstfeld (2012) to analyze their dynamics in the run-up to a crisis. Overall, our event study shows that interest-growth differentials can remain low for long stretches of time and only shoot up at the onset of the crisis, thereby making it irrelevant as a leading indicator (Figure 12, panel 4).10

Bivariate PDPs also show that a low interest-growth differential does not dampen the risks of high debt (Figure 15). Cells highlighted in red depicts combination of public external debt and the interest-growth differential for which the estimated probability of a crisis is above the threshold calculated for the individual income group. In both AEs and EMs, we find that if public external debt is sufficiently high, the estimated probability breaches the crisis threshold irrespective of the interest-growth differential. The probability of a crisis for a given level of debt does not change with the interest growth differential for these countries. On the other hand, we observe some interactions for LICs. In particular, both highly negative and positive interest-growth differentials can imply a higher probability of crisis for the same level of debt. A possible reason is that they may both signal imbalances although of a different nature (e.g. a low interest-growth differential could be due to overheating). Another possible explanation is that governments may respond to periods of low interest-growth differential by increasing deficits and accumulating debt, negating the potential benefits of low borrowing costs for reducing risks. Therefore, our findings should not be interpreted as suggesting that that the interest-growth differential does not matter for debt sustainability, but it is just one factor determining debt dynamics.

Figure 15.
Figure 15.

Bivariate Partial Dependent Plots: Public External Debt and r-g

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: Charts display bivariate partial dependent plots for different country groupings. Cells highlighted in red depicts combinations of public external debt and the interest-growth differential for which the estimated probability of a crisis is above the probability thresholds calculated for that income group based on minimizing the sum of type I and type II errors (missed crises and false alarms). The darker the blue color, the lower the probability of a crisis.

Inflation

For a long time, high inflation was associated with crises as countries resorted to the printing press to monetize fiscal deficits. But on the heels of the global financial crisis, the problem is just the opposite: inflation is too low. So what does it mean for the risk of fiscal distress? The analysis based on univarite PDPs suggest that there is a strong relationship between inflation and the estimated probability of crises, consistent with the findings in Reinhart and Rogoff (2011a). As with debt, we also find evidence of strong non-linearities: for AEs and EMs, the probability of crises increases significantly when inflation is above 20 percent (Figure 16).11 Although the literature has long established that countries with hyperinflation tend to suffer from debt distress, ours is the first paper presenting evidence of the non-linear relationship.

Figure 16.
Figure 16.

Inflation: Univariate Partial Depedence Plots

(Percent)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The charts display PDPs based on the Boruta random forest. Estimated probabilities are plotted in the vertical axis and inflation in the horizonal axis. Solid lines show the PDP curve which represents the average prediction across all levels of public external debt.

Our results also suggest that both an increase and decline in inflation can be associated with a higher probability of crises, particularly for EMs, underscoring the risk of deflation and the snowball effect it can have on debt dynamics (Crafts 2016). Notably for LICs, the levels of public external debt for which the estimated probabilities breach the crisis thresholds decrease with inflation (Figure 17). This means that even for relatively low levels of debt, the probability of a crisis in LICs surges when inflation is high. This could reflect the fact that in some countries the ability to manage even relatively moderate levels of debt is limited and, in such cases, governments may resort to monetizing deficits.

Figure 17.
Figure 17.

Inflation and Public External Debt: Bivariate Partial Dependence Plots

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: Charts display bivariate PDPs for different country groupings. Cells highlighted in red depicts combinations of public external debt and inflation for which the estimated probability of a crisis is above the probability thresholds calculated for that income group based on minimizing the sum of type I and type II errors (missed crises and false alarms). The darker the blue color, the lower the probability of a crisis.

External and financial imbalances

As discussed in section III, there is some overlap between fiscal and currency crises. The analysis on variable importance also confirms the relevance of external factors. To further explore the importance of external imbalances as driver of crises, we look at the PDP of the current account balance. As with other indicators, we find what is by now a recurrent nonlinear pattern: once external deficits are between 3–7 percent of GDP, the probability of a crisis increases substantially for AEs and EMs (Figure 18). Surprisingly, the current account deficits seem to be less relevant for predicting fiscal crises in LICs. There is also some evidence of interactions between public external debt and the current account particularly for AEs (Figure 19). Even for relatively moderate levels of debt, the probability of crises rises steeply when current account deficits are high. The opposite is however not true. That is, current account surpluses do not appear to shield countries from crises if debt levels are high.

Figure 18.
Figure 18.

Current Account: Univariate Partial Dependence Plots

(Percent of GDP)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The charts display PDPs based on the Boruta random forest. Estimated probabilities are plotted in the vertical axis and the current account in the horizonal axis. Solid lines show the PDP curve which represents the average prediction across all levels of public external debt.
Figure 19.
Figure 19.

Current Account and Public External Debt: Bivariate Partial Dependence Plots

(Percent of GDP)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: Charts display bivariate PDPs for different country groupings. Cells highlighted in red depicts combinations of public external debt and the current account balance for which the estimated probability of a crisis is above the probability thresholds calculated for that income group based on minimizing the sum of type I and type II errors (missed crises and false alarms). The darker the blue color, the lower the probability of a crisis.

We also find some evidence that fiscal crises are associated with high leverage in the private sector although results are mixed depending on the country group. To capture financial imbalances, we focus on one of the most popular metrics in the literature, the credit gap (i.e. private debt as a share of GDP relative to the 10-year average). Our results suggest that the probability of a crisis increases significantly in AEs and EMs when the gap is above 40 percent (Figure 20). We also see interactions with public external debt for EMs, with the estimated probability breaching the crisis thresholds for lower levels of debt if the credit gap is large (Figure 21). At the other end, the private debt dynamics are much less relevant for LICs likely reflecting low financial deepening and, therefore, relatively low risks compared to AEs or EMs.

Figure 20.
Figure 20.

Credit Gap: Univariate Partial Dependence Plots

(Percent of GDP)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Note: The charts display PDPs based on the Boruta random forest. Estimated probabilities are plotted in the vertical axis and the credit gap in the horizonal axis. Solid lines show the PDP curve which represents the average prediction across all levels of public external debt.
Figure 21.
Figure 21.

Credit Gap and Public External Debt: Bivariate Partial Dependence Plots

(Percent of GDP)

Citation: IMF Working Papers 2020, 001; 10.5089/9781513523767.001.A001

Notes: Charts display bivariate PDPs for different country groupings. Cells highlighted in red depicts combinations of public external debt and the credit gap for which the estimated probability of a crisis is above the probability thresholds calculated for that income group based on minimizing the sum of type I and type II errors (missed crises and false alarms). The darker the blue color, the lower the probability of a crisis.

VI. Conclusion

This paper contributes to the debate on the costs of public debt by revisiting its importance in predicting fiscal crises. In a world of ultra-low interest rates, it is tempting to believe that there may be no costs. For those that subscribe to that theory, the natural conclusion is that now may be the time to rely more heavily on debt to attend to worthy causes such as fixing a crumbling infrastructure all while propping up a frail economy. The skeptics point to history, noting that those that ignore high debt do it at their peril as excessive debt may force disruptive fiscal adjustments or eventually lead to costly crises.

We use machine learning models to confront these dueling views with evidence. Our results show that public debt in its various forms is the most important predictor of fiscal crises and it does matter always and everywhere. But public debt is not the only game in town as its interactions with other predictors also make a difference. Surprisingly, however, the interest-growth differential does not have much signaling value: it does not really matter whether it is highly positive or negative; moreover, beyond certain debt levels, the likelihood of a crisis surges regardless.

It is important to acknowledge that the machine learning techniques used in this paper do not allow us to establish causality. This is an area where computational science is still trying to make inroads (see, Athey 2018). What we can confidently say is that there is a high correlation between public debt and crises and that this association is very robust. Therefore, at the current juncture, complacency about high debt levels would be ill-advised even if interest-growth differentials were to remain low. The underlying reason is that the dynamics of crises are highly non-linear and by the time the interest-growth differential may start flashing red, a crisis may well be underway catching policymakers off guard.

These findings do not mean that bringing debt down is always the right policy prescription. There are clearly cases where the use of debt for countercyclical purposes, to increase public investment, or to address other structural needs is desirable. However, the evidence presented in this paper points to the risks, suggesting that public debt might not be free after all.

References

  • Abbas, S. A., N. Belhocine, A. ElGanainy, and M. Horton. 2011. “A Historical Public Debt Database,” IMF Economic Review, vol. 59, issue 4, pp. 71742.

    • Search Google Scholar
    • Export Citation
  • Altmann A., L. Tolosi, O. Sander and T. Lengauer. 2010. “Permutation Importance: A Corrected Feature Importance Measure,” Bioinformatics Volume 26 (10), pp. 13401347.

    • Search Google Scholar
    • Export Citation
  • Apley, D. W. 2016. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” arXiv preprint arXiv:1612.08468 (2016).

    • Search Google Scholar
    • Export Citation
  • Arslanalp, S., and T. Tsuda. 2014. “Tracking Global Demand for advanced Economy Sovereign Debt,” IMF Economic Review, Vol. 62, Iss. 3, pp. 43064.

    • Search Google Scholar
    • Export Citation
  • Asonuma, T., M. Chamon, A. Erce, and A. Sasahara. 2019. “Costs of Sovereign Defaults: Restructuring Strategies, Bank Distress and the Capital Inflow-Credit Channel,” IMF Working Paper No. 19/69 (International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Athey, S. 2018. “The Impact of Machine Learning on Economics,” in The Economics of Artificial Intelligence: An Agenda, edited by A. Agrawal, J. Gans, and A. Goldfarb, University Chicago Press.

    • Search Google Scholar
    • Export Citation
  • Baldacci, E., I. Petrova, N. Belhocine, G. Dobrescu, and S. Mazraani. 2011. “Assessing Fiscal Stress,” IMF Working Paper No. 11/100 (International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Barrett, P. 2018. “Interest-Growth Differentials and Debt Limits in Advanced Economies,“ IMF Working Paper No. 19/82 (International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Beirne, J., and M. Fratzscher. 2013. “The Pricing of Sovereign Risk and Contagion during the European Sovereign Debt Crisis,” Journal of International Money and Finance, Vol. 34, pp. 6082.

    • Search Google Scholar
    • Export Citation
  • Berg, A. and J. Sachs. 1988. “The Debt Crisis Structural Explanations of Country Performance,” Journal of Development Economics, 29(3), pp. 271306.

    • Search Google Scholar
    • Export Citation
  • Berg, A., E. Borensztein and C. Pattillo. 2005. “Assessing Early Warning Systems: How Have They Worked in Practice?IMF Staff Papers, 52(3), pp. 462502.

    • Search Google Scholar
    • Export Citation
  • Berg A. and C. Pattillo. 1999. "Predicting Currency Crises: The Indicators Approach and An Alternative," Journal of International Money and Finance, 18, pp. 56186.

    • Search Google Scholar
    • Export Citation
  • Berti, K., M. Salto, M. Lequien. 2013. “An Early-detection Index of Fiscal Stress for EU Countries,” European Economy. Economic Papers. 475. Brussels.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001. Random Forest. Machine Learning. 45(1), 532. http://oz.berkeley.edu/users/breiman/randomforest2001.pdf.

  • Blanchard, O. 2019. “Public Debt and Low Interest Rates,” American Economic Review 109(4), pp. 11971229.

  • Bluwstein, K., M. Buckman, A. Joseph, M. Kang, S. Kapadia, and O. Şimşek. 2019. “Credit Growth, the Yield Curve, and Financial Crisis Prediction: Evidence from a Machine Learning Approach,” unpublished manuscript.

    • Search Google Scholar
    • Export Citation
  • Bocola, L., G. Bornstein, and A. Dovis. 2019. “Quantitative Sovereign Default Models and the European Debt Crisis,” Journal of International Economics, Vol. 118, pp. 2030.

    • Search Google Scholar
    • Export Citation
  • Bruns, M. and T. Poghosyan. 2018. “Leading Indicators of Fiscal Distress: Evidence from the Extreme Bound Analysis,” Applied Economics, 50(13), pp. 145478.

    • Search Google Scholar
    • Export Citation
  • Bussiere, M. and M. Fratzscher. 2006. “Towards a New Early Warning System of Financial Crises,” Journal of International Money and Finance, 25(6), pp.953973.

    • Search Google Scholar
    • Export Citation
  • Callier, P. 1985. “Further Results on Countries’ Debt-servicing Performance: The Relevance of Structural Factors,” Review of World Economics, 121(1), pp.105115.

    • Search Google Scholar
    • Export Citation
  • Cameron, A.C., J. B. Gelbach, and D. L. Miller. 2011. “Robust Inference with Multiway Clustering,” Journal of Business & Economic Statistics Vol. 29, No. 2, pp. 23849.

    • Search Google Scholar
    • Export Citation
  • Catão, L. and B. Sutton. 2002. “Sovereign Defaults: The Role of Volatility,” IMF Working Paper No. 02/149 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Cerovic, S., K. Gerling, A. Hodge, and P. Medas. 2018. “Predicting Fiscal Crises,” IMF Working Paper No. 18/181 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Chakrabarti, A., and H. Zeaiter. 2014. “The Determinants of Sovereign Default: A Sensitivity Analysis,” International Review of Economics and Finance, 33, pp. 30018.

    • Search Google Scholar
    • Export Citation
  • Checherita-Westphal, C., A. Klemm, A., and P. Viefers. 2015. “Governments’ Payment Discipline: The Macroeconomic Impact of Public Payment Delays and Arrears,” IMF Working Paper No. 15/13 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Ciarlone, A., and G. Trebeschi. 2005. “Designing an Early Warning System for Debt Crises.” Emerging Markets Review 6, pp. 37695.

  • Citron, J.T. and Nickelsburg, G. 1987. “Country Risk and Political Instability,” Journal of Development Economics, 25(2), pp. 38592.

    • Search Google Scholar
    • Export Citation
  • Cline, W. 1984. Interest and Debt: Systematic Risk and Policy Response. Washington: Institute of International Finance.

  • Cortes, C. and V. Vapnik. 1995. “Support-vector Networks,” Machine Learning, Vol. 20, No. 3, pp. 27397.

  • Cruces J.J. and C. Trebesch. 2013. “Sovereign Defaults: The Price of Haircuts,” American Economic Journal: Macroeconomics, 5(3), pp. 85117.

    • Search Google Scholar
    • Export Citation
  • Crafts, N. 2016. “Reducing High Public Debt Ratios: Lessons from UK Experience,” Fiscal Studies, Vol. 37, iss. 2, pp. 20123.

  • Cruz, C., P. Keefer and C. Scartascini. 2018. "Database of Political Institutions 2017 (DPI2017)." Inter-American Development Bank. Numbers for Development.

    • Search Google Scholar
    • Export Citation
  • Dawood, M., N. Horsewood, and F. Strobel. 2017. “Predicting Sovereign Debt Crises: An Early Warning System Approach,” Journal of Financial Stability, 28, pp. 1628.

    • Search Google Scholar
    • Export Citation
  • De Cos, Pablo Hernández, G. Koester, E. Moral-Benito, and C. Nickel. 2014. "Signaling Fiscal Stress in the Euro Area: A Country-Specific Early Warning System," ECB Working Paper No. 1712 Frankfurt: European Central Bank.

    • Search Google Scholar
    • Export Citation
  • Degenhardt, F., S. Seifert, and S. Szymczak. 2019. “Evaluation of Variable Selection Methods for Random Forests and Omics Data Sets,” Briefings in Bioinformatics 20 (2), pp. 492503.

    • Search Google Scholar
    • Export Citation
  • Detragiache, E. and A. Spilimbergo. 2001. “Crises and Liquidity: Evidence and Interpretation,” IMF Working Paper No. 01/02 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Díaz Uriarte, R. and S. Álvarez de Andres. 2006. “Gene Selection and Classification of Microarray Data using Random Forest,” BMC Bioinformatics, 7:3.

    • Search Google Scholar
    • Export Citation
  • Duval, B., and J. Hao. 2010. “Advances in Metaheuristics for Gene Selection and Classification of Microarray Data,” Briefings in Bioinformatics 11 (1), pp. 12741.

    • Search Google Scholar
    • Export Citation
  • Escolano, J. 2010. A Practical Guide to Public Debt Dynamics, Fiscal Sustainability, and Cyclical Adjustment of Budgetary Aggregates. Technical Notes and Manuals 10/02. Washington, DC: International Monetary Fund.

    • Search Google Scholar
    • Export Citation
  • Escolano, J., Shabunina, A. and Woo, J. 2017. “The Puzzle of Persistently Negative Interest- Rate–Growth Differentials: Financial Repression or Income Catch-Up?Fiscal Studies, 38(2), pp.179217.

    • Search Google Scholar
    • Export Citation
  • Feder, G., R.E. and Just. 1977. “A Study of Debt Servicing Capacity Applying Logit Analysis,” Journal of Development Economics, 4(1), pp. 2538.

    • Search Google Scholar
    • Export Citation
  • Feder, G., R. Just, and K. Ross. 1981. “Projecting Debt Servicing Capacity of Developing Countries,” Journal of Financial and Quantitative Analysis, 16(5), pp. 65169.

    • Search Google Scholar
    • Export Citation
  • Fernandez-Delgado, M., E. Cernadas, S. Barro, and D. Amorim. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?The Journal of Machine Learning Research, Vol. 15, No. 1, pp. 3133181.

    • Search Google Scholar
    • Export Citation
  • Fioramanti, M. 2008. "Predicting Sovereign Debt Crises using Artificial Neural Networks: A Comparative Approach," Journal of Financial Stability, Vol. 4, pp. 14964.

    • Search Google Scholar
    • Export Citation
  • Fischer, S., R. Sahay, and C. Vegh. 2002. “Modern Hyper- and High Inflations,” Journal of Economic Literature, 40, pp. 83780.

  • Frank, C. R., and Cline, W. R. 1971. “Measurement of Debt Servicing Capacity: An Application of Discriminant Analysis,” Journal of International Economics, 1, pp. 32744.

    • Search Google Scholar
    • Export Citation
  • Frankel, J., and G. Saravelos. 2012. “Can Leading Indicators Assess Country Vulnerability? Evidence from the 2008–09 Global Financial Crisis,” Journal of International Economics 87, pp. 21631.

    • Search Google Scholar
    • Export Citation
  • Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics Vol. 29, No. 5, pp. 11891232.

    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., and B. E. Popescu. 2008. “Predictive Learning Via Rule Ensembles,” The Annals of Applied Statistics, Vol. 2, No. 3, 91654.

    • Search Google Scholar
    • Export Citation
  • Fuertes, A., and E. Kalotychou. 2006, “Early Warning Systems for Sovereign Debt Crises: The Role of Heterogeneity,” Computational Statistics & Data Analysis 51, pp. 142041.

    • Search Google Scholar
    • Export Citation
  • Fuertes, A., and E. Kalotychou. 2007. “Optimal Design of Early Warning Systems for Sovereign Debt Crises,” International Journal of Forecasting, 23(1), pp. 85100.

    • Search Google Scholar
    • Export Citation
  • Garín, J., R. Lester, E. Sims, and J. Wolff. 2019. “Without Looking Closer, It May Seem Cheap: Low Interest Rates and Government Borrowing,” Economic Letters 180, pp. 2832.

    • Search Google Scholar
    • Export Citation
  • Gelos, R., R. Sahay, and G. Sandleris, 2004, “Sovereign Borrowing by Developing Countries: What Determines Market Access?IMF Working Paper No. 04/211 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Genuer, R., J. Poggi, and C. Tuleau-Malot. 2015. “VSURF: An R Package for Variable Selection Using Random Forests,” The R Journal, R Foundation for Statistical Computing, 7 (2), pp.1933. hal-01251924v1

    • Search Google Scholar
    • Export Citation
  • Georgievska, A., L. Georgievska, A. Stojanovic, and N. Todorovic. 2008. “Sovereign Rescheduling Probabilities in Emerging Markets: A Comparison with Credit Rating Agencies’ Ratings,” Journal of Applied Statistics, 35(9), pp. 103151.

    • Search Google Scholar
    • Export Citation
  • Ghulam, Y., and J. Derber. 2018. “Determinants of Sovereign Defaults,” The Quarterly Review of Economics and Finance 69, pp. 4355.

    • Search Google Scholar
    • Export Citation
  • Gourinchas, P. and M. Obstfeld. 2012. “Stories of the Twentieth Century for the Twenty-First,” American Economic Journal: Macroeconomics, Vol. 4(1), pp. 22665.

    • Search Google Scholar
    • Export Citation
  • Greenwell, B. M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1), pp. 42136. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

    • Search Google Scholar
    • Export Citation
  • Guscina, A., M. Sheheryar, and M. Papaioannou. 2017. “Assessing Loss of Market Access: Conceptual and Operational Issues,” IMF Working Paper No. 17/246 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Hajivassiliou, V.A. 1987. “The External Debt Repayments Problems of LDC's: An Econometric Model Based on Panel Data,” Journal of Econometrics, 36(1-2), pp. 20530.

    • Search Google Scholar
    • Export Citation
  • Hajivassiliou, V.A. 1994. “A Simulation Estimation Analysis of the External Debt Crises of Developing Countries,” Journal of Applied Econometrics, 9, pp. 109131.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer-Verlag, New York.

    • Search Google Scholar
    • Export Citation
  • Hilario, M., and A. Kalousis. 2008. “Approaches to Dimensionality Reduction in Proteomic Biomarker Studies,” Briefings in Bioinformatics 9 (2), pp. 102118.

    • Search Google Scholar
    • Export Citation
  • Hilscher, J. and Y. Nosbusch. 2010. “Determinants of Sovereign Risk: Macroeconomic Fundamentals and the Pricing of Sovereign Debt,” Review of Finance 14, iss. 2: pp. 23562.

    • Search Google Scholar
    • Export Citation
  • IMF. 2015. The Fund’s Lending Framework and Sovereign Debt—Further Considerations. Board Paper (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • IMF. 2017. Review of the Debt Sustainability Framework in Low-income Countries: Proposed Reforms.

  • Jordà Ò., M. Schularick, and A. M. Taylor. 2016. “Sovereigns versus Banks: Credit, Crises, and Consequences,” Journal of the European Economic Association 14 (1), pp. 4579.

    • Search Google Scholar
    • Export Citation
  • Kolscheen, E. 2010. “Sovereign Risk: Constitutions Rule.” Oxford Economic Papers, New Series, Vol. 62, No. 1, pp. 6285.

  • Kose, A., S. Kurlat, F. Ohnsorge, and N. Sugawara. 2017. “A Cross-Country Database of Fiscal Space,” World Bank Development Prospects Group, Policy Research Working Paper No. 8157.

    • Search Google Scholar
    • Export Citation
  • Kraay, A., and V. Nehru. 2006. “When Is External Debt Sustainable?The World Bank Economic Review 20 (3): pp. 34165.

  • Kursa, M., and W. Rudnicki. 2010. “Feature Selection with the Boruta Package,” Journal of Statistical Software, 36(11), pp. 113.

  • Laeven, L. and F. Valencia. 2018. “Systemic Banking Crises Revisited,” IMF Working Paper No. 18/206 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Lane, P., R., and G.M. Milesi-Ferretti. 2007. “The External Wealth of Nations Mark II.” Journal of International Economics, Vol. 73, pp. 22350.

    • Search Google Scholar
    • Export Citation
  • Lane, P., R., and G.M. Milesi-Ferretti. 2017. “International Financial Integration in the Aftermath of the Global Financial Crisis.” IMF Working Paper 17/115, International Monetary Fund, Washington, DC.

    • Search Google Scholar
    • Export Citation
  • Lee, S.H. 1991. “Ability and Willingness to Service Debt as Explanation for Commercial and Official Rescheduling Cases,” Journal of Banking & Finance, 15(1), pp. 527.

    • Search Google Scholar
    • Export Citation
  • Lloyd-Ellis, H., G.W. McKenzie, and S.H. Thomas. 1989. “Using Country Balance Sheet Data to Predict Debt Rescheduling,” Economics Letters, 31(2), pp. 173177.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S. M. and S. Lee. 2017. “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems, pp. 476574.

    • Search Google Scholar
    • Export Citation
  • Ma, S., and J. Huang. 2008. “Penalized Feature Selection and Classification in Bioinformatics,” Briefings in Bioinformatics 9 (5), pp. 392403.

    • Search Google Scholar
    • Export Citation
  • Maltritz, D., and A. Molchanov. 2014. “Country Credit Risk Determinants with Model Uncertainty,” International Review of Economics & Finance, 29, pp. 22434.

    • Search Google Scholar
    • Export Citation
  • Manasse, P., N. Roubini, and A. Schimmelpfennig. 2003. “Predicting Sovereign Debt Crises,” IMF Working Paper No. 03/221 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Manasse, P., and N. Roubini. 2009. “Rules of Thumb for Sovereign Debt Crises,” Journal of International Economics, 78(2), pp. 192205.

    • Search Google Scholar
    • Export Citation
  • Marashaden, O. 1997. “A Logit Model to Predict Debt Rescheduling by Less Developed Countries,” Asian Economies 26, pp. 2534.

  • Mauro, P., R. Romeu, A. Binder, and A. Zaman. 2015. “A Modern History of Fiscal Prudence and Profligacy,” Journal of Monetary Economics, Vol. 76, pp. 5570.

    • Search Google Scholar
    • Export Citation
  • Mauro, P., and J. Zhou. 2019. “r-g<0: Can We Sleep More Soundly?” mimeo.

  • Mbaye, S., M. Moreno-Badia, M., and K. Chae. 2018a. “Bailing Out the People: When Private Debt Becomes Public.” IMF Working Paper No. 18/141 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • Mbaye, S., M. Moreno-Badia, M., and K. Chae. 2018b. “The Global Debt Database: Methodology and Sources.” IMF Working Paper No. 18/111 (Washington: International Monetary Fund).

    • Search Google Scholar
    • Export Citation
  • McFadden, D., R. Eckaus, G. Feder, V. Hajivassiliou, V., and S. O’Connell. 1985. Is There Life After Debt? An Econometric Analysis of the Creditworthiness of Developing Countries. International debt and the developing countries, pp. 179209.

    • Search Google Scholar
    • Export Citation
  • Medas, P., T. Poghosyan, Y. Xu, J. Farah-Yacoub, and K. Gerling. 2018. “Fiscal Crises,” Journal of International Money and Finance, Vol. 88, pp. 191207.

    • Search Google Scholar
    • Export Citation
  • Messmacher, M. and M. Kruger. 2004. Sovereign Debt Defaults and Financing Needs (No. 4–53). International Monetary Fund.

  • Mehrotra, N. 2017. "Debt Sustainability in a Low Interest Rate World," Hutchins Center Working Paper No. 32

  • Nogueira, S., and G. Brown. 2016. “Measuring the Stability of Feature Selection.” In: Frasconi, P., N. Landwehr, G. Manco, and J. Vreeken (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, vol 9852. Springer.

    • Search Google Scholar
    • Export Citation
  • Pamies Sumner, S. and K. Berti. 2017. “A Complementary Tool to Monitor Fiscal Stress in European Economies,” EC Discussion Paper, 49 (June).

    • Search Google Scholar
    • Export Citation
  • Peter, M. 2002. "Estimating Default Probabilities of Emerging Market Sovereigns: A New Look at A Not-So-New Literature," HEI Working Paper No: 06/2002, Geneva: Graduate Institute for International Studies.

    • Search Google Scholar
    • Export Citation
  • Reinhart, C. M. 2002. "Default, Currency Crises, and Sovereign Credit Ratings," World Bank Economic Review, Oxford University Press, vol. 16(2), pages 15170.

    • Search Google Scholar
    • Export Citation
  • Reinhart, C. M., V. R. Reinhart, and K. Rogoff. 2012. "Public debt Overhangs: Advanced Economy Episodes since 1800s,” Journal of Economic Perspectives, Vol. 26, No. 3 (Summer), pp. 6986.

    • Search Google Scholar
    • Export Citation
  • Reinhart, C. M, and K. Rogoff. 2009, This Time is Different: Eight Centuries of Financial Folly (Princeton, NJ: Princeton University Press).

    • Search Google Scholar
    • Export Citation
  • Reinhart, C. M, and K. Rogoff. 2011a. “From Financial Crash to Debt Crisis,” American Economic Review 101(5), pp. 16761706.

  • Reinhart, C. M, and K. Rogoff. 2011b. “The Forgotten History of Domestic Debt,” Economic Journal 121 (552), pp. 319350.

  • Reinhart, C. M., K. Rogoff and M. Savastano. 2003. “Debt Intolerance,” Brookings Papers on Economic Activity, 34, 2003-1, pp. 174.

    • Search Google Scholar
    • Export Citation
  • Rivoli, P. and Brewer, T.L. 1997. “Political Instability and Country Risk,” Global Finance Journal, 8(2), pp. 30921.

  • Rodriguez A., and P. N. Rodriguez. 2006. "Understanding and Predicting Sovereign Debt Rescheduling: A Comparison of the Areas Under Receiver Operating Characteristic Curves," Journal of Forecasting, 25, pp. 45979.

    • Search Google Scholar
    • Export Citation
  • Saeys, Y., Inza, I., and P. Larranaga. 2007. “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics 23 (19), pp. 250717.

    • Search Google Scholar
    • Export Citation
  • Sargen, N. 1977. "Economic Indicators and Country Risk Appraisal," Economic Review, Federal Reserve Bank of San Francisco, issue Fall, pp. 1935.

    • Search Google Scholar
    • Export Citation
  • Savona, R., and M. Vezzoli. 2015. “Fitting and Forecasting Sovereign Defaults using Multiple Risk Signals,” Oxford Bulletin of Economics and Statistics, 77 (1), pp. 6691

    • Search Google Scholar
    • Export Citation
  • Savona, R., Vezzoli, M., and E. Ciavolino. 2015. “A Data-Driven Explanation of Country Risk: Emerging Markets vs. Eurozone Debt Crises,” SYRTO Working Paper n.17.

    • Search Google Scholar
    • Export Citation
  • Snider, L.W. 1990. “The Political Performance of Third World governments and the Debt Crisis,” American Political Science Review, 84(4), pp. 126380.

    • Search Google Scholar
    • Export Citation
  • Solberg, R.L. 1988. Sovereign Rescheduling: Risk and Portfolio Management. Routledge.

  • Strumbelj, E. and I. Kononenko. 2010. “An Efficient Explanation of Individual Classifications Using Game Theory,” Journal of Machine Learning Research, Vol. 11, pp. 118.

    • Search Google Scholar
    • Export Citation
  • Sturzenegger, F. and J. Zettelmeyer. 2006. Debt Defaults and Lessons from a Decade of Crises, Table 1, Chapter 1 (Cambridge: MIT Press).

    • Search Google Scholar
    • Export Citation
  • Sumner, S.P. and K. Berti. 2017. A Complementary Tool to Monitor Fiscal Stress in European Economies (No. 049). Directorate General Economic and Financial Affairs (DG ECFIN), European Commission.

    • Search Google Scholar
    • Export Citation
  • Sy, A.N.R. 2004. “Rating the Rating Agencies: Anticipating Currency Crises or Debt Crises?Journal of Banking and Finance 28(11), pp. 284567.

    • Search Google Scholar
    • Export Citation
  • Taffler, R.J. and B. Abassi. 1984. “Country Risk: A Model for Predicting Debt Servicing Problems in Developing Countries,” Journal of the Royal Statistical Society: Series A (General), 147(4), pp. 54161.

    • Search Google Scholar
    • Export Citation
  • Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, pages 26788.

    • Search Google Scholar
    • Export Citation
  • Wolpert, D. H., and W. G. Macready. 1997. “No Free Lunch Theorems for Optimization,” IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, pp. 6782.

    • Search Google Scholar
    • Export Citation

Appendix 1. Literature Review

article image
article image
article image
article image
article image
article image
article image
Notes: AEs: Advanced Economies; EMEs: Emerging Market Economies; LICs: Low-income Countries.

Appendix 2. Fiscal Crisis: Definitions and Data Sources

article image

Appendix 3. Sample of Countries

article image

Appendix 4. Data: Definition, Sources, and Predictor Groupings

article image
article image
article image
article image
article image
article image
article image
article image
article image
article image
article image
Note: L = lag; L2 = second lag; fd_L = lag of first difference; fd_L2 = second lag of first difference; 5yr_L = lag of the trailing 5 year difference; 10yr_L = lag of the trailing 10 year difference; d3_L = lag of the trailing 3 year difference; pc3_L = lag of percentage change over trailing 3 year; mean_L = lag of trailing 10 year moving average; wavg = cross sectional weighted average for all the permutations

Public debt includes total debt liabilities of the government with domestic and foreign creditors. In compiling public debt series for each country, we look at the different perimeters of government (non-financial public sector, general government, and central government) for which the Global Debt Database reports data, choosing the debt category for which the time series is the longest. In many cases, particularly, among LICs, this results in a narrow definition of debt (central government) but ensures the consistency of the series across time. In contrast, previous studies have often used a hybrid approach to compile debt statistics, switching debt concepts depending on availability which may have yielded longer but inconsistent time series.

Public external debt is defined in terms of the residency of holder. It includes general government debt and debt guaranteed by the government and, as such, it may have a wider sectoral coverage than our measure of total public debt. An attempt was made to construct alternative measures based on currency-denomination, but time series availability was limited for most of the sample and that metric was excluded from the analysis. Equal constraints applied to other debt characteristics such as the type of holder (e.g. banks, official sector) or the maturity of debt.

External debt includes total debt liabilities of a country (both for the government and private sector) with foreign creditors.

Appendix 5. Methodological Details

Empirical model

Following the literature on crises, we choose a prediction window of two years (see, for example, Berg and Pattillo 1999). Since we are interested in the transition from non-crisis to crisis state, we follow Bussi and Fratzscher (2006) and only consider observations in which a country is not in a crisis in year t and drop all crisis years after the start of a crisis episode.

To estimate the probability of a crisis, we rely on a RF—an ensemble learning method based on decision trees. Each decision tree is an interpretable model that successively splits the data into subsets by testing a single predictor at each node. Starting at the root node of the tree, all observations are divided into subsets, called leaves, based on variable cutoffs. Trees are constructed through two random perturbation mechanisms: (1) each tree is trained on a bootstrap sample; (2) optimal variables at each split are identified from a random subset mtry of explanatory variables from the m predictors (i.e. mtry< m). The prediction for each leaf is the mean outcome for the observations on that leaf, and trees are fit to minimize mean squared errors. The overall prediction of the RF is the average prediction of all trees.

We follow the standard practice in the literature and pool all countries to make use of the largest possible training samples and capture a wide variety of crises (see, for example, Fuertes and Kalotychou 2006). The tuning parameter mtry is chosen from a grid of candidate values through cross-validation.12 No other restrictions are placed on the tree growing process, so that each tree is grown exhaustively. The number of trees is set to 2000.

Sample splitting

Models were evaluated based on out-of-sample predictive performance. For that purpose, we split the sample into two non-overlapping sub-samples: training (for model estimation) and test (for evaluation). To avoid possible information spillovers from the test to the training sample, we use a rolling cutoff year (beginning in 2000) to separate between the two. That is, we start by estimating a model with data for 1980–2000 and then roll forward both the estimation and the testing periods, adding one year at a time in each iteration. Therefore, we end up estimating 15 models, each one based on a larger training sample than the previous one with the hyperparameter retuned in each round.

Hyperparameter tuning

For each of the 15 training samples, we use k-fold cross validation to choose the optimal mtry, where k is the number of years of that training sample. k-fold cross validation simulates out-of-sample prediction. We choose the mtry to minimize out-of-sample log-likelihood loss. The tuning length parameter is set to 10 (i.e. we do a blind search 10 times to search for the optimal mtry). Our procedure is as follows:

  1. First, the training sample is partitioned into k equal sized subsamples. A model is then fit to k-1 subsamples and used to predict results for the kth. This is repeated for each of the k subsamples, so that there is an “out-of-sample" prediction of each observation. The best performing mtry, in terms of log-likelihood loss function, is chosen.

  2. Using the selected tuning parameter values, we fit the model to the entire training set.

  3. With this fitted model, we produce predicted probability of a crisis for the corresponding testing set.

Evaluation measures

In machine-learning classification models, a standard measure of model accuracy is the AUC. One of its advantages is that it does not require that we specify a probability threshold above which we predict a fiscal crisis will occur. However, the AUC has some limitations, especially in the presence of class-imbalanced data (as it is our case, where crisis episodes are relatively low-frequency events). Thus, we also report two alternative measures commonly used in the literature, the mean squared error and the log-likelihood (see, for example, Fuertes and Kalotychou 2006; Dawood et al. 2017). Specifically:

  • The mean squared error (MSE) calculated as:

    1 N [ i I c r i s i s ( 1 p i ) 2 + i I c r i s i s p i ] ,  and
  • The log-likelihood calculated as

    1 N [ i I c r i s i s log ( p i ) + i I n o n c r i s i s ( 1 p i ) ]

where pi denotes the predicted probability of crisis for observation i, Icrisis and Inon–crisis denote the sets of crisis starts and non-crisis observations in the test sample, and N is the number of observations in the test sample. These two metrics differ on how they penalize false classifications (MSE will have a value of 1 if our model misses entirely all crises in the test sample, while the log-likelihood will be -∞). All evaluation measures are computed as the average of each metric for the 15 test samples.

Comparison of Out-of-sample Performance across Prediction Models

The theoretical literature suggests that different models work well for different prediction problems—what is known in the machine learning literature as “the no free lunch theorem” (Wolpert and Macready 1997). Nonetheless, a large scale empirical study found that RF is the best performer across a broad set of prediction tasks (Fernandez-Delgado et al. 2014) and the early warning literature has recently found similar results (Bluwstein et al. 2019). To assess relative performance in our particular setting (i.e. forecasting fiscal crises), we evaluate the RF (using the original 748 variables) against two other popular machine learning algorithms: LASSO (Tibshirani 1996) and Support Vector Machine (SVM, Cortes and Vapnik 1995).

  • LASSO is a shrinkage and selection method for linear regression that minimize the usual sum of squared errors with a bound of the sum of the absolute values of the coefficients. For all practical purposes, it is just a logistic regression model (like the standard one used in the early warning literature), the difference being that it penalizes large coefficients and forces all but the most important ones to zero. Thus, it is less susceptible to overfitting although is not well suited to identify complex interactions.

  • SVM is a popular machine learning classification algorithm for small and medium-sized datasets. After performing non-linear transformation of the features, the algorithm estimates and separating hyperplane (in our particular case discriminating between crisis and non-crisis events).

While there is a large variety of additional algorithms we might have tested, we focus on these because they are well established in the machine learning literature, and they are commonly used for forecasting in economics. On the basis of the three metrics we use to evaluate out-of-sample performance, we find that the RF does always as well or better than the other two algorithms (Table A5.1). In terms of the AUROC, one of the most commonly used metrics, RF is clearly superior than both LASSO and the SVM for advanced and emerging market economies. For other metrics, RF is broadly better in all income groups although differences are not statistically significant.

Appendix Table 5.1.

Out-of-Sample Performance

article image
Note: Bootstrapped standard deviations (based on 100 random resamples of the test sample with replacement) in parentheses. Stars indicate degree of confidence that a model outperforms the Random Forest estimated with the full set of 748 variables:

90%,

=95%, and

=99%. Models predict probability of crisis start occurring in year t+1 or t+2. Out-of-sample performance obtained from 15 rolling regressions.

Partial Dependence Plots

To understand how PDPs are calculated, consider a predictor set X = {x1, x2, x3, xn}. We construct a subset XS which would either contain {x1} or {x1, x2} depending on if we want to generate univariate PDPs or bivariate PDPs. Bivariate PDPs are generally used to study interactions between two variables. Let XC be the complementary set of XS in X. A PDP of a predicted response variable in X is defined by the expectation of predicted responses with respect to XC.

f S ( X S ) = E C [ f ( X S , X C ) ] = Z f ( X S , X C ) p C ( X C ) d X C

Where pC(XC) is the marginal probability of XC. PDP work by marginalizing the model output over the distribution of the feature in set C to show the relationship between the variable of interest and the outcome. By marginalizing over other features and assuming that each observation is equally likely, we get the following function to estimate the partial dependence using the observed predictor data.

f S ( X S ) 1 N i = 1 N f ( X S , X i C )

Where N is the number of observations and Xi=(XiS,XiC) in the ith observation. fS(XS) is the partial dependence plot for XS. If two variables say Xj and Xk do not interact with each other, then the partial dependence function can be decomposed into the sum of individual PDPs, but it would not be the case if Xj and Xk interacts. In that case, the bivariate PDPs cannot be expressed as the sum of univariate PDPs.

P D j k ( X j , X k ) = P D j ( X j ) + P D k ( X k )
1

We are indebted to James Feigenbaum, Catherine Pattillo, Shuyi Liu, Nobuo Yoshida, Kazusa Yoshimura, participants in the 8th annual workshop on Fiscal Policies and Institutions organized by the IMF in Brussels, and colleagues at the IMF for useful comments. Juliana Gamboa Arbelaez, Ade Adeyemi, and Kadir Tanyeri provided excellent research assistance and support with coding.

2

In the aftermath of the global financial crisis, several scholars compiled new panel datasets on debt covering many decades (if not centuries) of data—e.g. Reinhart and Rogoff (2009) and Abbas et al. (2011) on public debt; and Jordà, Schularick, and Taylor (2016) on private debt. Unfortunately, these datasets tend to include only a few countries or use a narrow and changing definition of debt limiting the scope of research.

3

It is important to note that the papers over this period are often not explicit about the definition of external debt and whether it includes the external liabilities of the private and public sectors.

4

A variable is included if 70 percent of the data exists. To take advantage of much information as possible we impute missing values for a given variable with the training sample median of the non-missing values.

5

The effective interest rate is calculated as the ratio of the interest bill in period t and the stock of public debt (average of debt stocks in t and t-1). This interest rate is different from the measure used to define fiscal crises which is based on marginal yields and spreads (see Appendix 2). For countries that issue foreign-currency denominated debt, it may be important to account for the depreciation-adjusted interest-growth differentials (Escolano, Shabunina and Woo 2017). However, as in other studies, limited data availability on the currency composition of debt prevents us from making this adjustment. Nonetheless, among the set of predictors we also include various measures of exchange rate depreciation.

6

In the RF, each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree. This is the so-called oob sample.

7

Cerovic et al. (2018) is a natural benchmark as it is one of the few studies that has a large sample of countries and examines low-income countries in detail covering the period 1970–2015.

8

One caveat of the PDP is that it relies on the assumption of independence among features. Therefore, the results may be biased if features are highly correlated. The Accumulated Local Effects (ALE) solve this problem by calculating differences in predictions instead of averages (Apley 2016). As a robustness check, we use the ALE approach and confirm our findings on the non-linearities of public external debt still hold.

9

A country with high economic growth (and tax buoyancy) will be in a better condition to manage debt than a country with lower growth rate for the same interest rate. Nonetheless, a negative interest-growth differential does not ensure that debt will not increase as this will also be a function of the primary deficit.

10

As showcased during the European sovereign debt crisis, these dynamics may partly reflect the spike in interest rates at the start of the distress episode (Beirne and Fratzscher 2013; Bocola, Bornstein, Dovis 2019). Mauro and Zhou (2019) also use an event study to argue that sovereign defaults may not necessarily be preceded by high (positive) interest-growth differentials.

11

Note that the threshold for inflation in our crisis definition is higher at 35 percent for AEs and 100 percent for EMs and LICs.

12

The same procedure for hyperparameter tuning is followed for the variable selection algorithms.

  • Collapse
  • Expand
Debt Is Not Free
Author:
Ms. Marialuz Moreno Badia
,
Mr. Paulo A Medas
,
Pranav Gupta
, and
Yuan Xiang
  • Figure 1.

    Predictors of Fiscal Crises in the Literature

  • Figure 2.

    Most Common Predictors in the Literature

    (Share of surveyed papers, 1970–2018)

  • Figure 3.

    Countries with Fiscal Crises, 1980–2016

    (Number)

  • Figure 4.

    Overlap with Other Crises, 1980–2016

    (Number of crises episodes)

  • Figure 5.

    Debt Statistics: Country Coverage, 1980–2016

    (Number of countries)

  • Figure 6.

    Public and Public External Debt, 1980–2016

    (Weighted average, percent of GDP)

  • Figure 7.

    Interest-Growth Differential, 1980–2016

    (Percent)

  • Figure 8.

    Feature Selection Algorithms

    (Number of variables selected)

  • Figure 9.

    Robustness in Variable Selection

    (Pearson correlation coefficient)

  • Figure 10.

    Variable Importance by Group of Predictors

  • Figure 11.

    Contribution to Probability of a Crisis

    (Shapley Values)

  • Figure 12.

    Partial Dependence Plots1/ and Event Studies

  • Figure 13.

    Overall Interaction Strength

  • Figure 14

    Top-10 Interactions with Public External Debt

  • Figure 15.

    Bivariate Partial Dependent Plots: Public External Debt and r-g

  • Figure 16.

    Inflation: Univariate Partial Depedence Plots

    (Percent)

  • Figure 17.

    Inflation and Public External Debt: Bivariate Partial Dependence Plots

  • Figure 18.

    Current Account: Univariate Partial Dependence Plots

    (Percent of GDP)

  • Figure 19.

    Current Account and Public External Debt: Bivariate Partial Dependence Plots

    (Percent of GDP)

  • Figure 20.

    Credit Gap: Univariate Partial Dependence Plots

    (Percent of GDP)

  • Figure 21.

    Credit Gap and Public External Debt: Bivariate Partial Dependence Plots

    (Percent of GDP)