Surrogate Data Models: Interpreting Large-scale Machine Learning Crisis Prediction Models
Author:
Mr. Jorge A Chan-Lau
Search for other papers by Mr. Jorge A Chan-Lau in
Current site
Google Scholar
PubMed
Close
,
Ruofei Hu
Search for other papers by Ruofei Hu in
Current site
Google Scholar
PubMed
Close
,
Maksym Ivanyna 0000000404811396 https://isni.org/isni/0000000404811396 International Monetary Fund

Search for other papers by Maksym Ivanyna in
Current site
Google Scholar
PubMed
Close
,
Ritong Qu
Search for other papers by Ritong Qu in
Current site
Google Scholar
PubMed
Close
, and
Cheng Zhong
Search for other papers by Cheng Zhong in
Current site
Google Scholar
PubMed
Close

Machine learning models are becoming increasingly important in the prediction of economic crises. The models, however, use datasets comprising a large number of predictors (features) which impairs model interpretability and their ability to provide adequate guidance in the design of crisis prevention and mitigation policies. This paper introduces surrogate data models as dimensionality reduction tools in large-scale crisis prediction models. The appropriateness of this approach is assessed by their application to large-scale crisis prediction models developed at the IMF. The results are consistent with economic intuition and validate the use of surrogates as interpretability tools.

Abstract

Machine learning models are becoming increasingly important in the prediction of economic crises. The models, however, use datasets comprising a large number of predictors (features) which impairs model interpretability and their ability to provide adequate guidance in the design of crisis prevention and mitigation policies. This paper introduces surrogate data models as dimensionality reduction tools in large-scale crisis prediction models. The appropriateness of this approach is assessed by their application to large-scale crisis prediction models developed at the IMF. The results are consistent with economic intuition and validate the use of surrogates as interpretability tools.

1. Introduction

Economic crises are costly. Following the sharp contraction in economic activity, lower investment impairs countries’ long-term growth and productivity outlook, and leads to permanent output losses. For instance, Barnichon et al. (2018) estimated that ten years after the 2017 global recession potential output in the United States converged to a level about 12 percentage points below that implied by its crisis trend. Romer and Romer (2017) found that, five years following a financial crisis, the gross domestic product was about 9 percentage points lower in OECD countries. Empirical estimates presented by Cerra et al. (2021) for several countries suggest that following a financial crisis output remains permanently depressed one decade later.

To reduce and avoid the costs associated with economic crises as much as possible, central banks and policy-making institutions have devoted major resources to developing early warning systems and crisis prediction models. Firstly, these systems aim to identify economic and financial imbalances that make economies vulnerable to economic and financial distress, and, ultimately, to an economic crisis. Secondly, by identifying the economic and financial drivers ahead of a crisis occurrence, the systems could help policymakers prevent the crisis realization by using well-targeted policies.

Crisis prediction models have benefitted from the rapid adoption of methods first developed in the field of machine learning. Several of these models have been implemented in central banks and multilateral financial institutions and have helped sharpen policy advice. Most models include many features (explanatory variables) and use methods and techniques suitable for capturing non-linear effects prevalent in the run-up to a crisis episode. Gains in predictive power, however, have come at the expense of reduced model interpretability, which lessens the model usefulness for guiding policy decisions. Without an understanding of what the main crisis drivers are and their interactions, it is difficult to trust the model predictions, evaluate the effectiveness of policy measures for reducing the likelihood of an economic crisis, and obtain insights conducive to model improvement.

This paper proposes using surrogate data models (SDMs) to reduce the dimensionality of machine learning (ML) crisis prediction models and enhance their interpretability. By restricting the set of features to those economic analysts typically monitor and forecast, surrogate data models can translate model results into the economic domain familiar to senior policy makers. In addition, these models facilitate -- to a large extent – “what-if” scenario analysis. This claim must be judged based on real-world applications. Hence, we provide a concrete illustration of the surrogate data model approach as applied to a suite of ML models recently developed at the IMF to predict sectoral economic crises (IMF, 2021).

The remainder of this paper provides a short, selective survey of recent ML crisis prediction models followed by a discussion of the use of surrogate models to enhance the interpretability of ML models. The discussion describes the conceptual foundation of the surrogate data model. As a concrete example, we proceed to apply the methodology to some ML crisis prediction models developed by the IMF and show their usefulness for performing scenario analysis of the economic projections the IMF released publicly in April 2022. Finally, we list the lessons inferred from the surrogate data model case study.

2. A short survey of ML crisis prediction models

Early crisis prediction studies made heavy use of probit and/or logit models (Eichengreen et al., 1995; Frankel and Rose, 1996), and non-parametric signal extraction models (Kaminsky et al. 1998). Recent applied work on crisis prediction and early warning systems has moved beyond these traditional approaches by incorporating machine learning methods. These methods, which tend to emphasize predictive ability rather than casual inference, can handle a large number of features (explanatory variables) and can capture nonlinear effects better than generalized linear models such as logistic regression and multinomial regression. A non-exhaustive list of recent work is reviewed below.

Holopainen and Sarlin (2017) undertook a comparison of conventional statistical methods and machine learning methods in early-warning systems of banking crises in 15 European countries. They found machine learning methods, such as /(-nearest neighbors, neural networks, and ensemble learning models outperform logistic regression in out-of-sample forecasting exercise.

Bluwstein et al. (2021) compared the performance of different early warning models for financial crisis prediction for a sample of 17 advanced economies over the period 1870–2016. The models included 16 features (explanatory variables) aimed at capturing the domestic and global economic and credit cycles. In addition to logistic regression, they implemented a variety of machine learning models, including decision trees, random forests, extremely randomized trees, support vector machines, and artificial neural networks. Except for decision trees, all machine learning models outperformed the logistic regression. The limited number of features allows the application of Shapley regressions (Joseph, 2020), which identify credit growth and the slope of the yield curve as the main predictors of financial crises.

Fouliard et al. (2021) showed it was possible to predict systemic financial stress episodes in European Union countries and the United States three years ahead by using a set of different machine learning models. Their approach incorporates information from economic data sequentially as soon as the data become available, a sequential process known in the ML literature as online learning. The models used 244 features observed on a quarterly frequency, of which about half are available for online estimation.

Hellwig (2021) showed that traditional econometric models were unable to outperform simple heuristic “rules of thumb” in the prediction of fiscal crises in advanced economies, emerging markets countries, and low income/developing countries. On the other hand, machine learning techniques such as elastic net, random forests, and gradient-boosted trees, delivered superior performance when the number of predictors is large. The models are based on extensive set of predictors comprising economic, financial, demographic, and institutional variables, as well as various feature engineering of the raw variables, including lags, temporal changes, and averages.

IMF (2021) described a set of different ML prediction models, each tailored to predict a crisis affecting different sectors of the economy. Examples included: financial crisis, fiscal crisis, external sector crisis (balance of payment crisis), and real sector crisis. Compared to other studies reviewed here, the dataset covered more countries (all 190 IMF member countries), and each sectoral crisis model includes a substantial number of features, including several data transformations. The crisis events definitions used reflected the needs policy the IMF faces.1 Due to the wide country coverage, data imputation techniques were used to address missing data problems. Synthetic oversampling methods were used to deal with potential issues arising from the imbalance between few crisis observations. Horse-race comparisons showed that ML models outperform conventional approaches except for external sector crises, for which the signal extraction approach remains the most appropriate modeling technique. Feature contributions (importance) to crisis prediction were assessed using the SHAP method (see next section on feature importance).

Hacibedel and Qu (2022) study systemic non-financial corporate sector distress as identified by the prevalent elevated probability of default across firms in an economy. An ensemble of ML models is constructed by optimally pooling gradient-boosted trees, linear discriminant, and logit Lasso to predict the onset of distress in one year’s time. The model leverages a total of 40 predictors covering domestic and international macroeconomic variables, firm balance sheet variables, and model-based PD from the Credit Research Initiative of NUS (2019). The paper shows the ensemble model has a better and more robust performance against its individual model constituents across different time blocks and country groups.

Notwithstanding the results above showing that ML models outperform traditional crisis prediction models, some caution is warranted as results may depend on the crisis definition and data sample used. Beutel et al. (2019) found that machine learning models might not necessarily outperform standard econometric techniques in all instances. The researchers, using a sample of advanced economies covering a 45-year data sample from 1971–2016 and a set of crisis events collected across different banking crisis event studies, found that machine learning models, while often delivering a good in-sample fit, failed to meet the performance of a logit model in recursive out-of-sample predictions of systemic banking crises, including the 2007–8 global financial crisis. These results suggest that the performance of ML crisis prediction models is highly dependent on the crisis definition used, the degree of imbalance between crisis and non-crisis observations, and the availability of the dataset features. The results also indicate that a trial-and-error approach based on the task at hand is appropriate.

3. Enhancing ML crisis prediction model interpretability using surrogate data models

Surrogate models and feature importance

The large number of features most crisis prediction models use hampers their interpretability. Without a clear understanding of the feature importance to evaluate the crisis likelihood or the input-output relationship, it is difficult to reconcile the lessons derived from theoretical crisis models, empirical work, and past policy decisions with the output of high-dimensional complex models. Surrogate models address the lack of intuition problems present in the latter models. Surrogate models, widely used in engineering design optimization and emulation, are simple approximation models that mimic the behavior of complex systems and models at a lower computational cost and provide a clearer understanding of the systems’ dynamics (Forrester et al. 2008).

In machine learning, the use of surrogate models aims to enhance model interpretability, including measuring the contribution of different features to the model output (Burkart and Huber, 2021). Assuming the original model is akin to a black box due to its complexity and highly non-linear nature, it would be easier to understand the black box output if a simpler interpretable m odel, such as a linear regression or a tree-based m odel, approximates the black box output well enough. In this case, the simpler model is used to explain the black box results. Surrogate models are either agnostic -- if they can be applied regardless of what the black box model is – or model specific, as is the case for several surrogate approaches developed specifically for decision trees.

Surrogate models can be either global or local, depending on the scope of their prediction. A global surrogate model attempts to describe the average behavior of a ML model. In contrast, a local surrogate model only attempts to explain the output of an individual observati on corresponding to the values of a single set of features. Given a dataset X, the construction of a global surrogate model follows these steps (Kamath and Liu, 2021; and Molnar, 2022): (a) obtain the output Ŷ = F(X) generated by the ML model F; (b) select a different simpler and interpretable model, G: (c) train G on (X,Ŷ); and (d) use G to explain the model output. The closer G approximates the output of F, the better the explanation derived from G is. Examples of global surrogate models include agnostic models (Ribeiro et al., 2016a; Hall et al., 2017; Frosst and Hinton, 2017; Yang et al., 2018), or decision-tree-specific surrogate approaches (Andrzejak et al., 2013; Bastani et al. 2017; Hara and Hayashi, 2018).

In the case of a local surrogate model, the steps are (Ribeiro et al. 2016b): (a) select the instance (observation) of interest; (b) generate synthetic sample instances in the neighborhood of the selected instance using random perturbation; (c) weight the new samples by their proximity to the selected instance; (d) train an interpretable local model on the dataset comprising the synthetic instances; and (e) explain the output of the model using the local model. Notable examples of these approaches are model explanation systems (Turner, 2016), local interpretable model-agnostic explanations (LIME, Ribeiro et al., 2016b), the Shapley additive explanation (SHAP, Lundberg and Lee, 2017), and their variations.

The SHAP explanation, a linear local surrogate model, is commonly used to calculate the feature importance in crisis prediction models (IMF, 2021 , among others). The calculation of the SHAP values builds on the calculation of optimal Shapley values, which assumes that the model features are coalition players in a cooperative game (Shapley, 1953).2 Conceptually, the Shapley value of a feature is its average marginal contribution to the prediction, where the average is taken over all possible coalitions (combinations) of features. Specifically, given the prediction function f, n features, and a specific feature), we can form several coalitions S excluding feature) using the rem aining n-1 features, each coalition having a total of |S| features ranging from 1 to n-1. The Shapley value of featureyis:

ϕj(f)=1nΣsf(S{xj})(|S|n1).

Once Shapley values are computed, the SHAP explanation model, g, or approximation to the prediction function, f, is specified as:

g(z)=ϕ0+Σj=1nϕjzj',

where z=(z1',...,zn'){0,1}n is the coalition vector, or simplified features of the model, where 0 or 1 denotes whether the feature value is absent or present respectively. The SHAP explanation could be further refined to give greater weight to small and large coalitions (KernelSHAP, Lundberg and Lee, 2017) and to decision trees (TreeSHAP, Lundberg and Lee, 2018).

Surrogate Data Models

The interpretability in surrogate models rests on the premise that the simpler model might be enough to understand the evolution of the economy ahead of a crisis event and to evaluate the contribution of each variable to the event prediction. Gains in interpretability could be small, however, if there are too many features and especially if there are features derived from data transformations lacking a straightforward economic interpretation. One simple approach to deal with this issue is to group the features into different aggregate categories, i.e., economic activity, real growth, external imbalances, etc. Nonetheless, this approach might not be sufficient for providing policy guidance. For instance, the feature contributing the most in an aggregate category could be a complex data transformation for which it might be difficult to assess how it changes under a proposed policy measure.

Rather than following standard practice in surrogate model implementation, we propose here using surrogate data models. SDMs approximate the black box model using a low-dimensional set of features, some of which may not be in the black box model feature set. Since the dimensions of the SDM feature set are low, it is easier to interpret the results even if the SDMs do not use an easy-to-interpret model. In this case, interpretability in the SDM is provided by a surrogate model, i.e., SHAP. Figure 1 shows the differences between the standard surrogate and the SDM approaches.

Figure 1.
Figure 1.

Surrogate model and surrogate data model approaches

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: the authors.

Although ML crisis prediction models tend to use a wide range of features, the economic and finance literature justifies using fewer features. Conceptually, there should be a strong dependence on the joint distribution of economic and financial variables. The dependence reflects the reaction of economic agents and market participants to shocks and/or new information under the restrictions budget constraints and equilibrium conditions demand (see Piazzesi, 2007, and references therein).

The emphasis econometric methods place on dimension reduction and their success in economic and financial applications validate this view. Dynamic factor models are widely used to capture the common dynamics of a large amount of time series variables due to the movement of a relatively small number of latent factors (Stock and Watson, 2016). High-dimensional vector autoregression (VAR) models can be simplified by reducing them to a low-dimensional VAR and an idiosyncratic component that can embed a dynamic pattern (Cubadda and Hecq, 2022). Principal component analysis (PCA) can be used to extract latent factors from the features and then estimate low dimensional models using the factors as regressors (Fan et al. 2018) or in supervised principal components regression (Bair et al., 2006).

Manifold learning techniques also show that high-dimensional economic models are reducible to a low-dimensional space and exploited to develop early warning models in financial markets (Huang, Kou, and Peng, 2017). For the features the IMF ML models described above use, manifold decompositions suggest the informational content useful for crisis prediction lies in a simpler information space suitable for clustering analysis (Chan-Lau and Wang, 2020).

4. An application to the IMF ML crisis models

As the literature review section explains, empirical approaches are highly dependent of the dataset and crisis definitions used. We test whether the SDM approach could facilitate the crisis prediction task in the context of the economic and financial surveillance work of the IMF. Our analysis assumes that the IMF ML sectoral crisis prediction models, estimated using annual data, are the true models underlying the crisis dynamics in the real, fiscal, and external sectors (IMF, 2021).3

The output of the IMF ML models is a crisis risk index that can be interpreted to a large extent -- but not strictly -- as the crisis probability. Because IMF models include many features, it is difficult users to assess what variables are the main drivers of the crisis index. Even though it is possible to apply SHAP, it is challenging to assign contributions to each feature belonging to the same aggregate category (Kumar et al., 2017). Model users also find it difficult to perform scenario analysis since this requires modeling the joint distribution of the model features, many of which are transformations of raw variables.

This section addresses these issues by constructing sectoral SDMs for predicting the crisis risk indices in the external, fiscal, and real sectors. It first describes the feature selection process and lists the selected features to include in the SDMs. Next, the section explains how the models were estimated and presents the result.

Feature selection

The first step towards building the SDMs involves feature selection. Ideally, the set should include variables for which economic theory and intuition suggest there a non-negligible association between their dynamics and a crisis realization. In addition, it would be useful for the end user of the models to be able to project the values of the variables under different macro-financial scenarios. Finally, the number of selected variables should be quite small compared to those the IMF ML models use to enhance model interpretability.

Finding a potential set of features is straightforward as the IMF regularly publishes and updates both historical data and 5-year projections for several macroeconomic and financial variables in its semiannual World Economic Outlook. These variables are natural candidates for the features SDMs should use as they are regularly updated and monitored by country economists. Moreover, several of the variables are typically among the drivers of economic and financial crises in academic literature. Table 1 lists variables selected as features/predictors in the sectoral SDMs, with their number ranging from 12 to 20, which is significantly fewer than those in VE risk indices.

Table 1.

Variables used in surrogate models

article image
Source: the authors.

Surrogate data models: estimation

We experimented with several model options before settling on a combination of global (all IMF member countries, a total of 196 countries) and income-based country-group random forest models (RFM) as proxies for each risk index. Each of the RFMs was estimated using the selected features in Table 1. The data sample covered the period of 1980 to 2021 for the real and fiscal risk index models and 1989 to 2021 for the external risk index model. The rationale for combining the country-group models and a global model is to account for parameter heterogeneity across advanced economies (AE), emerging markets (EMs), and low-income countries (LICs).4

The global model is estimated with data from all economies. Each income-based model is estimated with data from each income-based country group. Figure 2 illustrates the model combination approach. The SDM risk indices are optimal linear combinations of the global model and country-group models, where combination weights are estimated to minimize the mean squared error (MSE) of out-of-sample forecasts. This approach, first advanced by Geweke and Amisano (2017), is proven to outperform the forecasts of the individual components. This approach strikes the optimal balance between the bias from ignorance of parameter heterogeneity in the global model and the high variance from smaller sample sizes of regional models.

Figure 2.
Figure 2.

Linear combination of country-groups and global models.

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: the authors.

The hyperparameters of each of the individual RFMs used by the combination model are chosen to minimize MSE loss on validation sets constructed using the gap-block cross-validation approach of Hacibedel and Qu (2022). The gap-block cross-validation approach, adapted from Burman et al. (1994), breaks the dependencies between the training and validation sets by first constraining the validation set to include only data sample blocks observed at consecutive time periods and leaving time gaps (or blocks) between training and validation sets. The training set does not require an unbroken sequence of observations, as shown in Figure 3.

Figure 3.
Figure 3.

Gap-block cross validation

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: the authors

The red and light-green areas in the figure correspond to the validation sets and the training sets respectively. Note that the training data set does not need to include only contiguous data blocks. We choose a gap size of 1 year and 15-fold cross-validation, corresponding to a validation set length of around 2 years. As a robustness check, we vary the gap size and validation set between 1 and 3 years to find the cross-validation R-square stable.The cross-validation exercise yields tree depths below 15 with the number of predictors in each tree varying from 10 percent to 70 percent of the total number of predictors (from 2 to 14 at most). The number of trees in the forest is set equal to 100 and adding more trees does not improve the model performance.

After determining the hyperparameters, we estimate the optimal combination weights of the global model and income-based country-group models, based on model predictions on the validation sets. We impose positive constraints on model weights to reduce estimation error. Table 2 shows the estimated weights of income-based country-group models. Except for AE and EM models in the real sector, all estimated weights are positive and below 1, indicating diversification effects from model combinations.

Table 2.

Optimal weights of income-based country-group models

article image
Source: authors’ calculations.

Surrogate data models: results

The performance of the combination model differs across sectors when evaluated using the R-squared and rank correlation between the SDM indices and VE indices. Table 3 shows the out-of-sample performance in the year 2022. The SDM performs especially well in the fiscal sector, with an out-of-sample rank correlation of around 0.9. The performance worsens in the real and external sectors, with respective rank correlations of 0.41 and 0.56. The inferior performance is likely driven by absence of financial market variables in SDM. Hence, surrogate indices in the two sectors should be looked at more critically.

Table 3.

Out-of-sample performance

article image
Source: authors’ calculations.

To further examine the goodness of fit, Figure 4 plots the SDM-generated average risk index for different income-based country groups against the average VE risk indices for the in-sample period. The figure also shows the projected SDM crisis risk dynamics corresponding to two different scenarios: (1) the October 2022 IMF 5-year projections for the model features as reported in IMF (2022) (blue line); and (2) an adverse scenario of higher inflation, oil and gas supply shock, further slowdown of the Chinese economy, and further tightening of financial condition super-imposed on the IMF 5-year projections for years 2022 – 2026 (red line). Table 4 presents the scenario characteristics averaged across country income groups.

Figure 4.
Figure 4.

VE indices, SDM indices, projections, and adverse scenario

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: authors’ calculations.
Table 4.

Five-year scenarios: IMF (2022) baseline and adverse scenario

article image
article image
Note: growth rates in percent; changes in ratios in percent of variable in denominatorl all other variables, in percent. Source: IMF (2022) and the authors (adverse scenario).

In contrast to econometric or statistical models, it is not feasible to estimate standard confidence intervals in the scenario analysis. However, one-standard deviation upper and lower bounds can be constructed to measure the uncertainty of the SDM projections. The upper bounds for a country-specific SDM projection estimate are obtained by multiplying the projection values by the exponential of the one-standard deviation of the log residuals in the validation set. Similarly, the lower bounds are obtained by dividing the projection values by the same factor. Once country-specific bounds are obtained, the country-group averages are calculated. Figure 4 depicts the area between both bounds in light blue.

Overall, the SDM tracks general trends of crisis risk indices well for all country groups and sectors. However, a few specific periods exist during which the SDMs predictions fall short of risk indices values. One example covers the years preceding the 2008 Global Recession (2006 – 2008), where the SDM underpredicts the risk indices across all sectors in the advanced economies group as well as the fiscal sector in the emerging markets country group. The other period is 2020 when the Covid-19 global pandemic started. While SDMs perform relatively well in advanced and emerging economies, they cannot capture the large surge in the risk indices in the real and external sectors in the low-income countries group.

The results discussed above suggest that a dataset including only a few features might suffice to capture the crisis dynamics. However, there might be crisis episodes more severe than what the SDM predicts. The latter shortcoming indicates that the selected features cannot capture all information conveyed by the full set of variables in the IMF ML models. Notwithstanding, the ability of the SDMs to track well the crisis dynamics with few variables enhances model interpretability.

With fewer predictors, it is easier to disentangle their contribution to changes in the crisis risk indices when using any of the standard explainability methods in the machine learning literature. Enhanced explainability provides economists and policy markets with more information to guide decisions and reduce crisis risk. The SHAP method below, described in Section 3, assess the feature contributions to the risk indices in different sectors.

Figure 5 plots the distribution of Shapley values of the SDMs.5 The vertical Y-axis of each dot represents a Shapley value from one observation. The color represents the level of the corresponding predictors. The dots are jittered to reflect the distribution of Shapley values. Hence, a distribution of Shapley values with red dots on the right and blue dots on the left suggests higher predictor values have a positive impact on the outcome. Predictors are ranked by the sum of absolute SHAP values in descending order. Consistent with our priors, Figure 5 shows lower GDP growth and tightening financial condition -- as proxied by the increase in the USD short-term deposit rates -- are the most important drivers of real sector crises (Figure 5a),6 higher debt levels, and weaker government revenue predict fiscal crises. Currency depreciation and a worsening of the current account balance also signal future external sector crises. Overall, the relationships between feature values and their SHAP values makes sense for all features and sectors. For some features, a mixture of blue and red dots across the axis suggests a presence of significant non-linearities, again aligned with economic rationale. For example, increases in commodity prices benefit commodity-exporters but hurt commodity importers, and vice versa for the decreases.

Figure 5.
Figure 5.

SDM indices: distribution of Shapley values

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: authors’ calculations.

Further evaluation of whether SDMs are consistent with economic theories is conducted by calculating the cross-country average contribution of each variable for each country income group. Figure 6 shows the variables that contribute most to explaining the crisis risk indices in each sector in each year, including the 5-year scenario. The time series of each variable’s average SHAP are further demeaned, and the stacked bars are equal to the SDM indices plus a constant. Figure 6 highlights the drivers, on average, of each of the crisis risk peaks. For example, the 2022 peaks of LICs’ external sector risk are mainly driven by recent increases of inflation and high government deficits starting from 2020. From 2022 onwards, elevated projected high risks in the fiscal sector of low-income countries are mainly driven by a persistent tightening of global financial conditions arising from higher US long-term yields.

Figure 6.
Figure 6.

SDM indices: SHAP decomposition

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: Authors’ calculations

As another example, we compare the reginal averages of fiscal sector SDM indices of 2022 fall and spring world economic outlook. At the time, the global economy is faced with several headwinds including tightening financial condition, accelerating inflation, and a food and energy crisis due to the Russian-Ukraine war. To examine the sources of vulnerabilities across regions, we pick out the top three contributors to increases in regional fiscal SDM indices in terms of changes in Shapley values. Figure 7 shows each region’s top three contributors from left to right, and the color of cells indicates the magnitude of increases in Shapley values. Vulnerability of fiscal risk comes from different sources across regions. Western hemisphere (WHD) countries’ increases in fiscal sector risk is mainly driven by inflation, while risk of African (AFR) countries’ fiscal sector is mainly driven by tightening global monetary conditions as proxied by US interest rates.

Figure 7.
Figure 7.

Major driver of SDM external sector indices across regions

Citation: IMF Working Papers 2023, 041; 10.5089/9798400234828.001.A001

Source: Authors’ calculations

5. Conclusions

Machine learning tools are especially useful for supporting economic and financial surveillance, especially crisis prediction. Enhancing the ability of the ML models to guide crisis prevention and mitigation policies requires improving model interpretability and establishing causal relationships. The latter is challenging as most models used in a policy-making setting rely on datasets comprising many explanatory variables. Simpler models (a.k.a. surrogate models) that rely on the original high-dimensional dataset do not provide substantial interpretability gains.

Motivated by the empirical fact that a few latent variables are sufficient to explain the joint dynamics of a high dimensional dataset of economic and financial variables, we proposed using surrogate data models (SDMs) as dimensionality reduction and interpretability enhancing tools suitable for crisis prediction, prevention, and mitigation. To test this approach, we used it to explain the output of a suite of models developed at the IMF for forecasting crises in the external, fiscal, and real sectors, including as predictors only a subset of the variables country economists typically monitor on a recurrent basis.

The SDMs captured well the crisis dynamics generated by the ML models. In a few specific episodes, especially during 2007–8, they failed to match the crisis severity (as measured by the ML crisis index), likely the result of a lack of information not captured by the surrogate dataset. Nevertheless, an analysis of the Shapley-based contributions of different predictors to the crisis severity singled out main drivers consistent with economic intuition and an anecdotal crisis narrative.

The results suggest that SDMs could be suitable interpretability tools but as in many other instances in applied machine learning models, their use and implementation should be explored on a case-by-case basis.

References

  • Andrzejak, A., Langner, F., Zabala, S. 2013, April. Interpretable models from distributed data via merging of decision trees. In 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (pp. 19). IEEE.

    • Search Google Scholar
    • Export Citation
  • Bair, E., Hastie, T., Paul, D., Tibshirani, R. 2006. Prediction by supervised principal components. J. Amer. Statist. Assoc. 101: 119131.

    • Search Google Scholar
    • Export Citation
  • Barnichon, S., Matthes, C., Ziegenbein, A. 2020. Are the effects of financial market disruptions big or small? Working Paper, Federal Reserve Bank of San Francisco.

    • Search Google Scholar
    • Export Citation
  • Barnichon, S., Matthes, C., Ziegenbein, A. 2018. The financial crisis at 10: will we ever recover? FRBSF Economic Letter 2018–19.

  • Bastani, O., Kim, C., Bastani, H. 2017. Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504.

  • Beutel, J., List, S., von Schweinitz, G. 2019. Does machine learning help us predict banking crises? J. of Financial Stability 45, 100693.

    • Search Google Scholar
    • Export Citation
  • Bluwstein, K. Buckmann, M., Joseph, A., Kapadia, S., Simsek, O. 2021. Credit growth, the yield curve, and financial crisis prediction: evidence from a machine learning approach. Working paper No. 2614 (European Central Bank).

    • Search Google Scholar
    • Export Citation
  • Buckmann, M., Joseph, A. 2022. An interpretable machine learning workflow with an application to economic forecasting. Staff Working Paper No. 984. Bank of England.

    • Search Google Scholar
    • Export Citation
  • Burkart, N., Huber, M., 2021. A survey on the explainability of supervised machine learning. J. of Artificial Intelligence Research 70: 245317.

    • Search Google Scholar
    • Export Citation
  • Burman, P., Chow, E., and Nolan, D. (1994). A cross-validatory method for dependent data. Biometrika, 81(2):351358.

  • Cerra, V., Hakamada, M., Lama, R. 2021. Financial crises, investment slumps, and slow recoveries. IMF WP No. 2021/170.

  • Chan-Lau, J. A., Wang, R. 2020. UnFEAR: unsupervised feature extraction clustering with an application to crisis regimes classification. IMF WP 2020/262. International Monetary Fund.

    • Search Google Scholar
    • Export Citation
  • Credit Research Initiative of the National University of Singapore (2019). Probability of default (PD) white paper.

  • Cubadda, G., Hecq, A. 2022. Dimension reduction for high-dimensional vector autoregressive models. Oxford Bull. of Economics and Statistics, Available at https://doi.org/10.1111/obes.12506

    • Search Google Scholar
    • Export Citation
  • Eichengreen, B., Rose, A., Wyplosz, C. 1995. Exchange market mayhem: the antecedents and aftermath of speculative attacks. Economic Policy 10 (21): 249312.

    • Search Google Scholar
    • Export Citation
  • Fan, J., Lv, J. and Qi, L. 2011. Sparse high dimensional models in economics. Annual Review of economics 3: 291317.

  • Fan, J., Sun, Q., Zhou, W.-X., Zhu, Z. 2018. Principal component analysis for big data.

  • Forrester, A., Sobester, A., Keane, A. 2008. Engineering design via surrogate modelling: a practical guide. Wiley.

  • Fouliard, J., Howell, M., Rey, H. 2021. Answering the Queen: machine learning and financial crises. BIS Working paper No. 296. Bank for International Settlements.

    • Search Google Scholar
    • Export Citation
  • Frankel, J., Rose, A. 1996. Currency crashes in emerging markets: an empirical treatment. J. of International Economics 87 (2): 216231.

    • Search Google Scholar
    • Export Citation
  • Frosst, N. and Hinton, G., 2017. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784.

  • Geweke, J. and Amisano, G. (2011). Optimal prediction pools. Journal of Econometrics, 164(1):130141.

  • Gu, S., Kelly, B., Xiu, D. 2021. Autoencoder asset pricing models. J. of Econometrics.

  • Hacibedel, B., Qu, R. 2022. Understanding and predicting systemic corporate distress: a machine-learning approach. IMF working paper.

  • Hall, P., Gill, N., Kurka, M. and Phan, W., 2017. Machine learning interpretability with h2o driverless ai. H2O. ai.

  • Hara, S., Hayashi, K. 2018. Making tree ensembles interpretable: A Bayesian model selection approach. In International conference on artificial intelligence and statistics (pp. 7785). PMLR.

    • Search Google Scholar
    • Export Citation
  • Hellwig, K.-P. 2021. Predicting fiscal crises: a machine learning approach. IMF Working Paper WP/21/150. International Monetary Fund.

  • Holopainen, M., Sarlin, P. 2017. Toward robust early-warning models: a horse race, ensembles and model uncertainty. Quantitative Finance 17 (12): 19331963.

    • Search Google Scholar
    • Export Citation
  • Huang, Y., Kou, G., Peng, Y. 2017. Nonlinear manifold learning for early warnings in financial markets. European Journal of Operational Research 258: 692702.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund. 2021. How to assess country risk: the vulnerability exercise approach using machine learning. Technical Notes and Manuals.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund. 2022. World Economic Outlook: Countering the Cost-of-Living crisis. Washington. DC. August.

  • Joseph, A. 2020. Parametric inference with universal function approximators. arXiv preprint arXiv:1903.04209.

  • Kamath, U., Liu, J. 2021. Explainable artificial intelligence: an introduction to interpretable machine learning. Springer.

  • Kaminsky, G., Lizondo, S., Reinhart, C. 1998. Leading indicators of currency crises. IMF Staff Papers, 45 (1).

  • Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., Friedler, S. 2020. Problems with Shapley-value-based explanations as feature importance measures. International Conference on Machine Learning: 54915500.

    • Search Google Scholar
    • Export Citation
  • Lakkaraju, H., Kamar, E., Caruana, R., Leskovec, J. 2019. Faithful and customizable explanations of black box models. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society: 131138.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S., Lee, S.-J. 2017. A unified approach to interpreting model predictions. NIPS’ 17: Proceedings of the 31st International Conference on Neural Information Processing Systems: 47684777.

    • Search Google Scholar
    • Export Citation
  • Molnar, C. 2022. Interpretable machine learning: a guide for making black box models explainable. Available at https://christophm.github.io/interpretable-ml-book/

    • Search Google Scholar
    • Export Citation
  • Moulin, H. 1995. Cooperative microeconomics: a game-theoretic introduction. Princeton University Press.

  • Myerson, R. 1991. Game theory. Harvard University Press.

  • Piazzesi, M. 2007. Estimating Rational Expectations Models. prepared for the New Palgrave.

  • Ribeiro, M. T., Singh, S., Guestrin, C., 2016a. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386.

  • Ribeiro, M. T., Singh, S., Guestrin, C., 2016b. Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining: 11351144.

    • Search Google Scholar
    • Export Citation
  • Ribeiro, M. T., Singh, S., Guestrin, C. 2018, April. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, 32 (1).

    • Search Google Scholar
    • Export Citation
  • Romer, C., Romer, D., 2017. New evidence on the aftermath of financial crises in advanced countries. American Economic Review 107 (10): 3072118.

    • Search Google Scholar
    • Export Citation
  • Shapley, L.S. 1953. A value for n-person games. Contributions to the Theory of Games II: 307317. Princeton University Press.

  • Stock, J. H., Watson, M.W. 2016. Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In Taylor, J. B., Uhlig, H., editors. Handbook of Macroeconomics, Vol. 2: 415525.

    • Search Google Scholar
    • Export Citation
  • Strumbell, E., Kononenko, I. 2014. Explaining prediction models and individual model predictions with feature contributions. Knowledge and Information Systems 41: 64765.

    • Search Google Scholar
    • Export Citation
  • Turner, R. 2016. A model explanation system. IEEE 26th international workshop on machine learning for signal processing (MLSP): 16.

  • Yang, C., Rangarajan, A., Ranka, S. 2018. Global model interpretation via recursive partitioning. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (pp. 15631570). IEEE.

    • Search Google Scholar
    • Export Citation

Annex I. IMF ML models, crisis event definitions

External sector model

Sudden stops in capital flows

A sudden stop occurs when net private capital inflows as a percentage of GDP are at least 2 percentage points lower than in the previous year and two years before, as well as when the country gets approved to tap large IMF financial support.

Exchange market pressure events

An EMP index is constructed by combining degrees of exchange rate depreciation and international reserves loss. The index is defined as a weighted average of the annual percentage depreciation in the nominal exchange rate and the annual decline in reserves as a percentage of the previous year’s GDP. EMP events are defined as occurring when the index lies in the lower 15th-percentile of the whole panel, as well as when the country gets approved for large IMF support.

Fiscal sector model

Credit event

The definition includes default, restructuring, or rescheduling of a substantial size of public debt relative to GDP (50 percent or more) or if the defaulted nominal amount grows by a substantial amount (10 percent per annum).

Exceptionally large official financing

Defined as high-access IMF financial arrangements with a fiscal adjustment objective in place exceeding 100 percent of a country’s IMF quota, or country inclusion in a European Union support program.

Implicit domestic public default

Defined as a high inflation rate exceeding certain thresholds (35 percent in advanced economies, 100 percent in other economies) or a steep increase in domestic arrears measured as the first difference of the ratio of other account payables to GDP exceeding 1 percentage point.

Loss of market confidence

Defined as a high price of market access, either when sovereign or credit default swap spreads exceed 1000 bps, the annual changes in these spreads exceed certain thresholds (300 bps in advanced economies and 650 bps in emerging market countries), or the country loses market access.

Real sector model

Crises are defined based on four different GDP series and four different thresholds. The four series are i) a country’s annual growth rate, ii) its cumulative growth rate over the past three years, iii) its growth performance relative to the most recent five-year average, and iv) its average GDP level relative to the previous three-year average. For each of these, the focus is on GDP per working-age person.

Values of these series are flagged as being in a crisis if they fall below the 10th percentile of observations in one of the following groups: i) all countries in the sample, ii) all countries in the same income group – advanced economies (AEs), emerging markets (EMs), and low income and developing economies (LICs) according to the WEO classification in 2020, iii) by income group according to the WEO classification in 1980 with an additional category of countries with a population below one million, and iv) countries in the same tercile of the total sample for year-on-year growth volatility.

These four series and four thresholds lead to sixteen crisis criteria, and the ultimate crisis definition appears consistent with historical real-sector crisis episodes. Each indicator assesses whether the point-in-time value of one of the series is below one of the thresholds. A country in one particular year is recorded as experiencing a real sector crisis whenever nine or more indicators signal a crisis.

Annex II. IMF ML models, model features

1

See annexes.

2

Moulin (1995) and Myerson (1992) offer comprehensive treatments of Shapley values in cooperative games accessible to economists.

3

The annexes describe the crisis definitions and list the explanatory variables in detail

4

AE: 40 countries; EM: 97 countries; and LICs: 59 countries.

5

Calculations were performed using the Python API SHAP, available at https://github.com/slundberg/shap. See the documentation section there for a detailed explanation of how to interpret figures 5 and 6.

6

Income per capita relative to the U.S. also shows up as one of the key contributors. This is likely due to the fact that rich countries are likely to grow slower, and hence even a moderate growth shock is likely to drive them into a recession (e.g. defined as a period with negative GDP growth).

  • Collapse
  • Expand
Surrogate Data Models: Interpreting Large-scale Machine Learning Crisis Prediction Models
Author:
Mr. Jorge A Chan-Lau
,
Ruofei Hu
,
Maksym Ivanyna
,
Ritong Qu
, and
Cheng Zhong