Completing the Market: Generating Shadow CDS Spreads by Machine Learning
  • 1 https://isni.org/isni/0000000404811396, International Monetary Fund
  • | 2 https://isni.org/isni/0000000404811396, International Monetary Fund

Contributor Notes

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.

Abstract

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.

I. Introduction

The Credit Default Swap (CDS) market has attracted considerable attention since its inception in the early 1990s. It has undergone a period of rapid growth and usage in the run-up to the 2008 Global Financial Crisis (GFC). Since then, the CDS market has experienced a cooling period as well as structural changes, but it still represents the third largest over-the-counter (OTC) derivatives market, with a gross market value of about $8 trillion US dollars (BIS, 20191). 2

By providing insurance against default, CDS enables loan lenders to hedge the default risk of borrowers, where CDS spread is dependent on the direct information about the creditworthiness of the entity named on the derivative security. After the 2008 financial crisis, CDS spreads have become the most closely monitored early warning signals for credit risk changes. The risk-neutral implied default probability estimated from CDS spreads are used to price credit securities, assess credit quality by rating firms, monitor systemic risk, and stress test financial systems by regulators (Chan-Lau 2006; Huang et al. 2009).

Compared with other credit risk measures such as bankruptcies, rating and bond yields, or general risk measures as stock volatility, CDS spreads have several advantages. First, CDS spreads are a continuous alternative to discrete credit assessments of rating agencies, which also incorporates market perceptions of default risk (Das et al., 2009). Unlike the rare credit events, the CDS market offers timely cross-sectional and time-series credit information, gauged by the market instead of a credit rating agency. Second, CDS spreads outperform ratings in capturing firm-specific default probability and also contains information on systematic risk. (Hilscher and Wilson, 2017). Third, CDS spreads contain credit information not included in stock prices or bond yields when important credit events occur, leading the price discovery on stock and bond market (Lee et al, 2019). Finally, CDS spreads are less affected by liquidity and tax effects compared to bond spreads (Elton et al., 2001), and are less sensitive to momentum than stock prices.

However, not all firms issue CDSs. Generating “shadow” CDS spreads for the firms without CDS can thus provide a useful credit risk measure, adding valuable insights for market participants. Das, Hanouna and Sarin (2009) find that both accounting-based and market-based information have explanatory power on CDS spreads. If the underlying structure between (economic/firm) fundamentals and CDS spreads is homogeneous across similar firms, one can artificially recover such a structure to the firms without CDS and generate “shadow” spreads. In this paper, we use the fundamentals to cross-sectionally nowcasting CDS spreads, test the validity, and generate the “shadow” spreads.

There has also been little research on forecasting CDS spreads to date. Two exceptions are Guenduez and Uhrig-Homburg (2011), and Son et al (2016), who both predict firms’ CDS spreads using historical spreads. No study has used economic and firm fundamentals to forecast CDS, while many researchers have used them to predict other credit risk measures, such as bankruptcies (e.g. Altman,1968; Ohlson, 1980; Altman, 2000; Hillegeist et al., 2004; Duffie et al., 2005; Agarwal and Taffler, 2008, and Duan et al., 2012), rating changes (Nickell et al., 2000; Duffie and Singleton, 2003; Jorion et al., 2009; Jones et al., 2015), bond yields (Huang et al., 2005; Collin-Dufresne et al.,2001; Longstaff and Rajan, 2006) and stock volatility (Christiansen et al., 2012; Mittnik et al., 2015). As discussed above, CDS spreads have multiple advantages over other risk measures. Hence, in this paper, we forecast future CDS spreads longitudinally using economic and firm fundamentals.

Our crosssectional nowcast and longitudinal forecast also incorporate the recent developments and applications of data-driven machine learning methods (MLs). In terms of credit risk, most studies using machine learning methods focus on bankruptcy and credit rating. Empirical evidence from these discrete measures suggests that recent classifiers such as gradient boost and random forest clearly excel compared to traditional LDA or probit/logit (Jones et al., 2015, Flavio et al., 2017). But there has not been equal scrutiny on the continuous measure of CDS spreads. What enables machine learning methods to outperform traditional approaches have not been investigated sufficiently. In this study, we “horserace” the predictive performances of traditional methods and a series of recent ML techniques in regards to their nowcasting and forecasting capabilities, and investigate the source of performance differences.

This paper aims at answering three specific questions:

  • (1) Can we generalize the relationship between the fundamentals and CDS spreads cross-sectionally to other companies to construct “shadow” CDS spreads for those without actual CDS?

  • (2) Can we generalize the relationship over time to forecast CDS spreads in the future?

  • (3) What is the relative explanatory power of fundamental variables in predicting CDS, under traditional and Machine Learning approaches?

To answer these questions, we conduct nowcasting cross-sectionally, and the one-month ahead longitudinally forecast to predict CDS spreads. Our sample comprises monthly CDS spread data of 69 firms, with accounting-based, market-based, and macroeconomics series as input variables. We test a wide range of machine learning estimation techniques and use traditional credit risk model regressions as benchmark tools.

Our results indicate that machine learning methods can considerably enhance the prediction accuracy of CDS spreads both cross-sectionally and overtime when compared to traditional econometric models quantifying credit risk relationships. Ensemble methods including Bagging, Random Forest, and Gradient Boosting consistently outperform basic interpretable methods, such as Ridge, LASSO, and linear regression, in prediction accuracy and stability. The precision of linear regression fluctuates widely across randomly chosen estimation and test sets and leads to the weakest average out-of-sample prediction power.

We further assess the importance of regressors by using the LIME (Local Interpretable Model-Agnostic Explanations) method, to provide more thorough insights into the underlying reasoning for why ensemble MLs are more accurate in predicting CDS spreads, from the view of input variables. We find that linear regressions assign exceptionally high weights to interest rates and spreads, including treasury yields, term spreads, and long term bond yields. In contrast, ensemble ML methods rely mostly on the firm and economic fundamentals. The results pinpoint the most critical variables that predict CDS spreads and suggest that ensemble ML methods can identify authentic credit information for predicting CDS spreads.

The high cross-sectional and longitudinal precision of ensemble ML techniques suggests that the nonlinear relationship between the firm and economic variables and CDS spreads can be applied to other firms and also to the future. The corresponding generalizable relationship allows us to construct valid “shadow” CDS spreads for those companies without actual CDS, but with the firm and economic variables. We show that the constructed “shadow” CDS spreads can capture the main changing direction of the spreads, but are much less volatile. We are also able to predict future CDS spreads for those firms with CDS.

The remaining sections of the paper are organized as follows: Section 2 discusses the relevant literature, section 3 introduces the sample and imput variables, section 4 provides the discussion on methodology and empirical contexts, section 5 presents the results and provides the “shadow” CDS spreads that we have constructed for those firms who do not have “real” CDS, and section 6 provides LIME analysis on understanding why the nonlinear ensemble methods outperform the linear benchmark. In section 7, we design a specific case study using non-crisis periods as training sets and crisis periods as test sets, then we conclude.

II. Literature Review

The importance of using both accounting and market based variables in the modeling of credit risk has been intensively discussed in the credit risk literature. The pioneering works of Altman (1968) and Ohlson (1980) have used firm-specific financial ratios and other accounting variables to develop scores for predicting firm’s default probability (Altman’s Z-score and Ohlson’s O-score).

The most widely recognized credit risk models in the field are based on market-based variables. Specifically, Merton (1974) has developed a distance to default (DTD) measure based on market information, assuming that the fundamental value of a firm follow a certain stochastic process and computes the default probability from the level and volatility of asset’s market value.

As shown in Jarrow & Turnbull (1995) and Duffie & Singleton (1999), reduced-form models or intensity-based models assume that the default follows a process with stochastic intensity, and one can extract the default intensity from market securities. In such models, the conditional probability of failure of a firm depends purely on the distance to default, a variable calculated by market equity data and accounting data for liabilities. Empirically, the performance of these models are regarded to be superior to Altman’s Z-score and Ohlson’s O-score (Hillegeist et al., 2004).

Although structural models and reduced-form models have received great recognition both in the industry and academia – for example, structural models have been adopted by firms such as Moody’s KMV and CreditMetrics – the overemphasis of these models on distance to default raises concern. Duffie and Lando (2001) show that if markets are not fully efficient, DTD might cause filtering problems and other variables could provide additional information.

Hillegeist et al. (2004) find that DTD outperforms accounting information in predicting default. Hillegeist et al. (2004) and Duffie et al. (2007) conclude that accounting-based and macroeconomic variables are relevant as well in predicting corporate failure. Specifically, Das et al. (2009) find that models using accounting-based data and models using market-based information have performed similarly well in explaining CDS spreads. Bai and Wu (2016) combine DTD with multiple firm fundamentals and find that the fundamentals explain CDS spreads by an average 77% of R-square.

In the literature of corporate default prediction, the firm’s failure intensity depends on the covariates, including firm-specific financial variables and macroeconomic variables. The prediction of forward intensity next period is conditional on the covariates observed on the present period. Duffie & Wang (2004) and Duan et al. (2012) incorporate all the accounting-based, market-based and macroeconomic variables to predict corporate default. This paper is in accordence with Duffie & Wang (2004) and Duan et al. (2012).

The application of machine learning methods in credit risk analysis and financial time series prediction has been pursued as separate strands of research in prior studies. For the credit market, most of the relevant work focuses on credit rating analysis. Huang et al. (2004) suggests that the rating analysis using artificial intelligence techniques choose input variables following the conclusion of traditional credit risk analysis. Jones et al. (2015) compares a range of classifiers from traditional techniques to fully non-linear classifiers including neural networks, support vector machines and more recent statistical learning techniques such as generalized boosting, Adaboost and random forest, to predict rating changes, using financial, market and macroeconomic variables as inputs. They find that new classifiers perform better than all other classifiers on both cross-sectional and longitudinal test samples.

Relatedly, relevant research predicting financial time series have concentrated in stock market and achieved relatively accurate prediction results. Relevant studies have used technical input variables and fundamental variables to predict stock return (Chan te al.,1993, Cavalcante, 2016) or volatility (Charlotte et al., 2012, Mittnik et al.,2015). Specifically, studies predicting CDS spreads have only used historical spreads. Gündüz and Uhrig-Homburg (2011) analyze the ability of CDS spreads in predicting future CDS spreads using both traditional credit risk models and support vector machine regression. Son et al. (2016) expand Gündüz and Uhrig-Homburg’s work by introducing more modeling methods with additional maturities.

In this paper, we conduct the prediction of CDS spreads using fundamental variables and compare the results with traditional benchmark models, and we fill the gap between two strands of the literature, the credit risk literature and the machine learning literature.

III. Pricing CDS Spreads

In this section, we motivate our nowcast and forecast with a forward default intensity model. We model the pricing of the CDS spreads following Das et al. (2009) and Duan et al. (2012). We model the default of a firm as an intensity process, λt, thus the probability to survive from staring time t=0 to default time t=τ is st=exp(0τλtdt). In the model, the forward intensity λt depends on the firm and economic variables observed at time t (Xt) or beforehand (Xt-i, i > 0), and is of exponential affine form,

λt=exp[BtiXti],i0,

where Bt-i = [β0(t-i),...,βk(t-i)]’ is a vector of coefficients, and Xt-i = [l, X1(t-i),...,Xk(t-i)] is a vector of economic variables including both accounting-based, market-based firm-level, and macroeconomic variables. Assuming that conditional on the given economic variables vector Xt-i, the forward default intensity is a constant, as E(λt|Xt-i) = λ.

CDS enables market participants to shift the default risk on the firm from an insurance buyer to an insurance seller. The buyer pays a premium to guarantee future potential protection. Hence the premium and the protection legs both determine CDS spread together. The premium leg represents the expected present value of premium payment from the insurance buyer to the seller, while the protection leg indicates the expected present value of the default loss payment from the seller to the buyer. Fairly priced CDS equals the premium leg and the protection leg.

The premium leg is,

E[0TDtstCSdt](1).

and the protection leg is,

E[0TDtstλt(1ϕ)dt](2).

where T is the maturity of CDS and CS is the CDS spread. Dt=exp(0trsds) is the discount rate at default time t, where rt is the interest rate at time t. st=exp(0tλsds) is the probability that firm survive until default time t. λt is the default intensity that the firm default at t, and ϕ is the constant recovery rate.

Assume that the maturity T can be equally divided into n intervals, where Δt is the time interval between time t and t-1. The intervals are denoted by j = 1, 2, ..., n. Note that conditional on the given economic variables vector Xt-i the forward default intensity is a constant. Hence

λ=λj=exp[BtiXti],i0,j=1,2,...,n,,

Equating the premium leg and protection leg under conditional constant intensity leads to fairly priced CDS spreads3:

CS=(1φ)(1eλΔt)Δt(3).

Taking logarithm and employing the fact that λ=exp(BtiXti) leads to a linear relationship between logCS and firm and economic variables,

logCS=log(1ϕΔt)+log(1eλΔt)log(1ϕΔt)+log(λΔt)=log(1ϕΔt)+Bti'XtiΔt

Namely,

logCSconstant+BXti,i0.

In comparison, for Machine Learning methods, we assume a flexible relationship between economic variables and logCSt,

logCSf(Xti),i0(4).

where the function form is determined by Machine Learning methods. We conduct nowcasting (i= 0) when generating shadow CDS spreads cross-secitonally and forecasting (i> 0) in longitudinal analysis.

IV. Data, Empirical Context, and Methods

4.1 Sampling

We utilize the CDS contracts data obtained from MARKIT. Our sample is based on the CDS constituents in the CDX North American Investment Grade Index, which includes the most liquid 125 North American entities’ CDSs with investment-grade credit ratings. The reason to focus on most liquid CDSs is that they have the most fairly-priced and informative spreads in the North American CDS market. We collect the 5-year CDS spreads of the constituents at the end of each month over the period 2006 to 2016 4. After merging the sample with the WRDS Monthly Finance Ratio database, Compustat, CRSP daily stock file database, and IBES analyst database, 69 entities remain in our sample with 6811 corresponding monthly CDS spreads.

4.2 Input variables

We collect firm-level accounting-based and market-based variables, analyst forecasts, financial markets, and macro-economic variables, details of which are presented in Table 1.

Accounting-based variables

We use the monthly financial indicators from the WRDS Industry Financial Ratio database (WIFR). WIFR is developed by WRDS based on the Compustat, CRSP, and IBES databases, covering a wide range of most commonly used financial ratios. The ratios measure various aspects of firms’ fundamental performance, including capitalization, efficiency, financial solvency, liquidity, profitability, and valuation. The WIFR carries forward the most recent quarterly or annual data and lags all variables by two months to guarantee that the data is available at the specific month. After removing variables with more than 10% empty values, 57 financial ratios remain in our sample and are described in Table 1. Following the previous credit risk literature (Hensher et al., 2007; Jorion et al., 2009; Ashbaugh et al., 2006; Jones, 2015), we expect that these variables measure the overall performance of a firm and have predictive power over CDS spreads.

Market-based variables

A. Equity market variables

We include several equity market variables, including the stock return, realized volatility, the change of realized volatility, as well as the trade volume, to measure a firm’s performance on the stock market. We also include the variables to measure the general stock market performance, including S&P 500 return, VIX (CBOE Volatility Index), Fama-French four factors, and Pastor-Stambaugh liquidity factors. The equity market reflects the market perception of general firm performance besides credit risk. Griffen and Lemmon (2002) find that firms’ credit risk is cross-sectionally priced on the stock market. Tang and Yan (2010) and Lee et al. (2019) find evidence that the change of stock return and volatility is correlated with CDS spreads. Consistent with this literature, we expect that the equity market variables have some predictive power over CDS spreads.

B. Analysts’ recommendations and estimates

We also follow Jones et al. (2015) to include equity analysts’ recommendations and estimates as input variables. The recommendation and estimates are based on analysts’ thorough investigation on a firm, hence should have covered firms’ financial performance and credit quality should be covered.

C. Interest rates, spreads and risk factors

This category captures the interest rate dimension following Welch and Goyal (2008), namely, the T-Bill rate, relative T-Bill rate, long term bond return, term spread, and default spread. The TED spread that measures the illiquidity of the bond market is also included. Duffee (1998), Collin-Dufresne et al. (2001) and Bharath and Shumway (2008) find that changes in interest rates negatively affect the changes in default risk. Moreover, since the underlying reference entities and obligations of CDSs are senior unsecured bonds issued by corporate, the spreads on the bond market could influence the pricing of CDS spreads. Hence, we expect the interest rate and spreads to have predictive power over CDS spreads.

D. Distance to default (DTD)

We further use the market-based credit measure distance to default (DTD) to measure the probability of default 5. DTD is the most frequently used market-based credit risk measure developed by Merton (1974). Bharath and Shumway (2008) find that DTD has predictive power on financial distress and default. Das et al. (2009) find evidence that DTD and financial ratios perform comparably in explaining CDS spreads. Thus, we also expect DTD to have predictive power over CDS spreads.

Macroeconomic variables

We use a range of monthly updated macroeconomic indicators, including the inflation rate, industrial production, housing starts, M1 growth, orders, return CRB spot, consumer confidence, and others to measure the overall economic condition. The macroeconomic variables are commonly used in default and rating change prediction (Dun et al. 2012; Jones et al., 2015). Bonfim (2009) find evidence that macroeconomic variables explain default probabilities. Thus, we expect macroeconomic variables to play a role in CDS spread forecasting.

Other variables

We further include the firm size proxy, industry dummies, credit rating, and CDS recovery as input variables. The firm size and industry dummies are commonly used as controls in credit risk research (Moody’s, 2004; Bonfim, 2009). The rating is the long-term credit rating assigned to the entity by S&P, Moody’s, or Fitch. Recovery rates are pre-populated based on the recovery rate set. We use the credit rating and CDS recovery rates reported by MARKIT.

4.3 Machine Learning Methods

In addition to the widely used linear regression methods, there are a series of parametric and nonparametric machine learning approaches, which are well established in the literature. In this paper, we compare the theory motivated linear regression with two parametric machine learning methods (Ridge and LASSO) and six nonparametric learning methods (Support Vector Regression, Neural Network, Regression Tree, Bagging, Random Forest and Gradient Boosting). In the nonparametric learning methods, Support Vector Regression, Neural Network, and Regression Tree are single methods, while Gradient Boosting, Bagging, and Random forest are ensemble methods. We briefly introduce the methods in Appendix A.

4.4 Empirical Context

We focus on the out-of-sample predictive power of the accounting-based and market-based variables on CDS spreads using linear regression and machine learning methods, motivated by reduced-form forward intensity model. To fairly compare these methods, all of the models are estimated using the same set of input variables within the same dataset. To test the out-of-sample predictive performance, we divide the original dataset into an in-sample training set and out-of-sample test set. The methods are estimated on in-sample set to determine respective parameters and evaluated in the out-of-sample set. We follow Espinoza et al. (2012) to evaluate the predictive performance using root-mean-square error (RMSE), a frequently used measure that captures the difference between the predicted and observed values. Smaller RMSE indicates better predictive performance. We have split our CDS sample both cross-sectionally and longitudinally.

In the cross-sectional case, we conduct nowcast and evaluate whether our approach can provide precise CDS spreads prediction cross-sectionally and hence potentially generate effective shadow CDS spreads for the firms without CDS. We follow the 80/20 sample division arrangement to randomly allocate 80% of the firms into the in-sample set and the remaining 20% into the out-of-sample set. The division is replicated ten times to avoid biased allocation. For each replicate, we generate 1000 bootstrapped RMSEs and then calculate the average out-of-sample RMSEs across the ten replicates to measure the predictive power of the methods.

In the longitudinal case, we generate one-month forward forecasting and test the intertemporal predictive ability of model-motivated linear regression, and machine learning methods on CDS spreads6. We separate the in-sample training set from the out-of-sample test set with boundary year rolling from 2011 to 2016. To mimic the actual data available at the end of each boundary year, we include the observations before the year in the training set, within the year in the test set, and abandon the rest observations. Such a longitudinal arrangement can provide the intertemporal validation that is missing in the cross-sectional case, in which the test set is drawn from the same sample period of the training set (see Jones and Hensher, 2004). The rolling windows generated by rolling boundaries can also avoid biased allocation and hence provide an adequate test of a model’s intertemporal predictive ability. Similar to the cross-sectional case, we produce 1000 bootstrapped RMSEs for each window and calculate the average out-of-sample RMSEs across all rolling windows to measure the methods’ predictive performance.

Finally, the hyperparameters of a model might strongly influence the performance of prediction outcomes, as well as the degree to which the model overfits the data. Overfitting indicates that the model fits well in the in-sample training set but performs poorly in the out-of-sample test set. We use a standard 10-fold cross-validation method with the loss function RMSE to adjust the hyperparameters of models and intend to avoid the overfitting problem.7

V. Results

This section describes the empirical performance of model-motivated linear regressions and alternative machine learning methods. We first provide the descriptive details of the sample. Among all the 6811 log CDS spreads in our sample, the average log spread is -5.026, namely 65 bps in original spreads. Figure 1 demonstrates that our sample has covered a wide range of spreads. The log spreads range from -7.67 to -2.43, which is 4.6 bps to 880.3 bps for spreads. Our sample is representative since 94.8% of the spreads for all CDX indices constituents fall into our sample range, including 99.7% of investment-grade CDX.NA.IG constituent, 93.9% of the CDX.NA.XO8 constituents, and 87.9 % of the high yield CDX.NA.HY constituents.

Figure 1:
Figure 1:

The Distribution of The log of Five year CDS spreads

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

Figure 2 displays the box plots for the bootstrapped RMSE of each model across cross-sectional and longitudinal test samples. The box plots provide insights on the predictive stability of each method over different data subsamples. For the cross-sectional case, the box plots show bootstrapped RMSEs calculated from the ten randomly selected test samples; for longitudinal, the bootstrapped RMSEs on all rolling boundary years are displayed. The extreme RMSEs are showed as outliers in box plots. We consider the methods with more outlier RMSEs and more substantial variance as less stable.

Figure 2:
Figure 2:

The Bootstrapped RMSEs in Cross-sectional and Longitudinal Test Samples10

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

Table 2 summarizes 1) the average overall RMSE and ranking of each method across all test samples as well as the average RMSE and ranking of each model over cross-sectional and longitudinal test samples. Table 3 alternatively demonstrates the variances and ranking of RMSE across all test samples and separately for cross-sectional and longitudinal test samples.

Table 2:

The Average RMSEs in Cross-sectional and Longitudinal Test Samples

article image
Table 3:

The RMSE Variances in Cross-sectional and Longitudinal Test Samples

article image

The overall results displayed in Table 2 indicate that ensemble machine learning models, including Random Forest, Bagging, and Gradient Boosting, have outperformed all other methods, both in cross-sectional and longitudinal samples. Gradient Boosting and Bagging have overall average RMSE at 0.397 and 0.413, with the former performs slightly better than the latter. The overall RMSE of Random Forest is around 0.454, and the Regression tree follows by 0.554. The accuracy of support vector regression and Lasso regression decrease relatively large and have RMSE above 0.7. Ridge regression has further worse accuracy with an RMSE 1.818. Among all the methods, theory-motivated linear regression is the weakest over the whole subsample, with a very large RMSE of 3.433. The neural network is slightly better with RMSE of 3.167.

An interesting result presented in Figure 2 is that inflexible methods including Linear and Ridge regression can forecast comparably well along the lines of ensemble machine learning methods in some cases, but can also perform quite poorly in other cases. The three methods have more outlier RMSEs and a wider range of RMSEs.

Table 3 confirms that Linear regression is the most unstable method with an overall variance of 22.51, followed by Ridge regression (16.787) and neural network (3.450). In comparison, ensemble methods, including Gradient Boosting, Bagging, and Random Forest, have provided forecasts with very low variance (0.006, 0.007 and 0.010), indicating that their predictive performances are remarkably stable across different subsamples. Though support vector regression does not provide very accurate prediction (RMSE=0.701, rank 5), it has the lowest RMSE variance among all the methods. 9

5.1 Cross-sectional sample

In the cross-sectional sample, among the 69 firms, we randomly select 56 firms as part of the training set and 13 firms as the test set with ten replications and calculate the average RMSEs. Table 2 summarizes the overall average performance of used methods across cross-sectional test samples, and the performance ranking is consistent with the ranking for average RMSE across both cross-sectional and longitudinal samples. Results from Table 2 indicate that the ensemble machine learning methods, including Random Forest, Bagging, and Gradient Boosting, provide the most accurate nowcasting predictions, with Gradient Boosting outperforming all other methods with the average RMSE at 0.401.

Linear methods, namely OLS, Ridge, and LASSO, perform relatively worse compared to ensemble machine learning methods. Both Ridge and LASSO outperform OLS. The predictive accuracy of OLS is substantially worse than all other of the methods used.

Ensemble methods combine a range of weak estimators to produce a strong one. The weak estimators are assessed on multiple subsamples extracted from the original dataset, and the final prediction is a weighted average of the predictions generated by all the weak estimators (see the differences of Random Forest, Bagging, and Gradient Boosting in Appendix A). Hence ensemble methods are much more stable for various training and test set pairs.

Figure 3 demonstrates the shadow spreads generated by out-of-sample nowcasting using the Omnicom Group as an illustrating example. We have compared the nowcasting result with the original data series (green ellipsis line): the Gradient Boosting method (orange dash line) has generated the best accuracy in terms of the lowest RMSE (0.401) and the Linear Regression Model is our comparison benchmark. We can see that the Gradient Boosting method is much better than the benchmark model in terms of fitting the actual data points.

Figure 3:
Figure 3:

Nowcasting Shadow Spreads , an Illustrating Example on Omnicom Group

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

While our gradient boost “shadow” CDS spreads can achieve relatively high accuracy, it cannot fully describe the movement of actual spreads. Tang and Yan (2017) find that besides the fundamental factors, the supply-demand imbalance and liquidity in the CDS market also moves CDS spreads. Since our “shadow” CDS spread is generated based on a wide range of fundamentals and factors on equity and bond markets, the spreads only capture the fundamental and “external” part of actual CDS spreads.

The solid lines indicates the average CDS spreads of all firms, and shadow area around the solid line describes the interval of individual spreads.

Our shadow spreads generated by Gradient Boosting can also play an essential role in the existing CDS on missing times. As a derivative, CDS do not necessarily have continuous spreads for every month. Figure 4 summarizes the actual spreads on existing months for all firms, and the corresponding shadow spreads on the missing months using the actual spreads and input variables as the training set. Our shadow CDS spreads generated by Gradient Boosting managed to capture the main moving direction of actual spreads while being less volatile, consistent with its fundamental property.

Figure 4:
Figure 4:

Aggregate Shadow Spreads for firms

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

5.2 Longitudinal sample

Joy and Tollefson (1975) notes that the test set created from the same period as the training/estimation set will not provide intertemporal validation, and thus cannot provide an adequate test on a model’s predictive ability (see also Jones and Hensher, 2004).

To ease the above concern, in this subsection we use longitudinal samples on rolling windows, which are rolling from 2011 to 2016. While the cross-sectional sample separates different firms into training and test sets, the longitudinal sample have all firms in both training and test set, and the division is on the data time. Taking the year 2011 as an example, we allocate the observations for all firms before 2011 into the training set, observations on 2011 into the test set, and drop the data points after 2011. Such a procedure is applied in the year 2011 until the year 2016.

Table 4 demonstrates the performance score of methods used in the longitudinal test sample. Compared with Machine Learning models, especially Random Forest, Bagging, and Gradient Boosting, linear models still perform relatively worse but deliver much better results than the cross-sectional samples. On average, the Bagging method and the Gradient Boosting method have generated the best accuracy in terms of the lowest RMSE (0.391 for Bagging, and 0.393 for Gradient Boosting on average, see Table 2).

Table 4:

RMSEs in Longitudinal Test Samples

article image

To summarize, Table 4 shows that unlike the cross-sectional case, Ridge and Lasso regression perform comparably well to other Machine Learning methods on average for longitudinal samples. However, linear regression still suffers from extreme RMSE outliers.

VI. Channels

After testing the cross-sectional and longitudinal samples, we apply the LIME module to different methods. LIME is short for “Local Interpretable Model-Agnostic Explanations,” the LIME module has two major advantages: (1) it is able to detect and improve untrustworthy models; and (2) it allows insights into different models. In this section, we use LIME to provide a locally faithful explanation for linear and non-linear algorithms..11

We first select the top 50 most important observations. To obtain a representative explanation of the overall dataset, we select 50 “true” observations using the submodular pick method (Ribeiro, Singh, Guestrin, 2016). (Ribeiro, Singh, Guestrin, 2016).12 The advantage of the submodular pick is in explaining the model globally by combining local explanations, namely to select observations that give the most different input variable importance to capture the heterogeneity from the raw data set.

Among these 50 most important observations, for each selected observation y_i, we first generate perturbed input variables values X_i_perturbed(n*k) around y_i. For variables with numerical values, we perturb them by sampling from a standard normal distribution and implementing the inverse operation of mean-centering and scaling, according to the means and standard deviations in the training data. For variables with categorical values, we perturb them by sampling according to the training distribution and construct a binary variable that is 1 when the value is the same as the instance being explained. Then we calculate the prediction of the trained algorithm using the perturbed variables values, y_i_predict(n*1).

With the newly created local dataset (X_i_perturbed(n*k), y_i_predict(n*1)), we use weighted Ridge regression to find the top 10 most important variables. The weight for each perturbed observation is the kernel distance of the observation to the true observation around which we build the local dataset. The top 10 variables are selected using the highest weights, namely, selecting the top 10 input variables that have the highest product of absolute coefficient and the variable value of the original data point.

We conduct the above local interpretations for each observation and aggregate all the interpretations. Each estimated local weighted Ridge provides the top 10 most important input variables and their coefficients. After aggregating our explanations, we build up two measures:

  • 1. Importance probability: it measures the frequency of a variable to appear in the top 10 most important variables among the selected 50 observations; e.g., for linear regression, “Long-term Debt Spreads” is among the top 10 most important variables for all the 50 observations, thus the probability is 50/50 = 100%. “Distance to default” is not so important in the context of linear regression, since among the 50 observations, only 5 observations pick the variable as the important top 10, hence the importance probability is 5/50 = 10%.

  • 2. Coefficient: The coefficient for a variable is the average coefficient of the variable in the local weighted Ridge of 50 observations.

In the following, we mainly discuss the LIME results of the Linear Regression model (our benchmark model) and the Gradient Boosting model (our best-performing model) in the context of cross-sectional nowcasting.

For the linear regression model, we find that three variables have the most significant impacts on prediction results13: (1) the three-month T-Bill rate; (2) the long term bond return; (3) the term spreads. All the above three variables belong to the financial market information subset, which is in accordance with the theoretical model that we have used in section 3. The variable importance probability matrix is reported in Table 5. We calculate the average importance probability across 10 randomly selected training/testing set of firms in our cross-sectional sample.

Table 5:

Variables Importance Probability – Linear Regression Model

(Cross-sectional Nowcasting)

article image

In Table 6 we report the variables importance probability matrix of the Gradient Boosting Model. Under the non-linear model of Gradient Boosting, we find that the most important variables are the macro economic variables and firm specific balance sheet variables, as (1) Unemployment Rate, (2) Credit Rating Category, (3) Size proxy, (4) Distance to Default, and (5) Inflation Rate. The significantly different pattern compared with the LIME results from the linear model, is that the balance sheet information (firm dependent information) becomes more important in the nonlinear model, which is believed to be the key driving factor why the non-linear model’s prediction power could outperform the linear models in our setup.

Table 6:

Variables Importance Probability – Gradient Boosting Model

(Cross-sectional Nowcasting)

article image

It is interesting to see that the “Unemployment Rate” and “Inflation Rate” appear in the top 5 ranking variables, indicating the important role of the current state of the business cycle or, more broadly, the macroeconomic environment. The “Credit Rating Category” and “Distance to Default” are direct measures for evaluating the health of a specific firm, and it is not surprising that these two variables play important role in our nowcasting & forecasting exercises.

It is equally surprising that the Linear Regression Model do not seem to properly capture the important explanatory power of “Credit Rating Category” and “Distance to Default” separately or simultaneously; they have not even entered the top 10 ranking list. These results seem to suggest that the nonlinear relationship between “Credit Rating Category” & “Distance to Default” and the CDS spreads could be the major reason why non-linear models (Gradient Boosting or Bagging) can generate high prediction accuracy compared to the linear benchmark (including Ridge and Lasso model).14

VII. Crisis vs. Non-Crisis Period

Scrutinizing Table 4, we see that if one uses the most recent data as test set (2015 or 2016) one will not see much of a difference between OLS and ML algorithms, the relatively large RMSE difference appears with using year 2011 or 2012 as the rolling window boundaries. Since particularly 2011 is still very close to the global financial crisis period and a time of considerable adjustments of both economic processes and financial systems, it seems useful to dig deeper and really assess whether there is a considerable difference in the forecasting/nowcasting abilities of traditional linear models versus Machine Learning models when using crisis versus non-crisis times as training and test sets separately.

Fine tuning based on our previous longitudinal sample with rolling windows, we further conduct a specific case study by separating the observations of all firms into two groups: crisis and non-crisis. We include the observations within non-crisis periods as training sets (2006, 2013, 2014, 2015, 2016) and crisis periods (2008, 2009, 2010, 2011) as test sets. Since 2007 and 2012 are the transitory years between crisis and non-crisis periods, to fully separate the two periods, we don’t include the two years in training or test sets. Although we have perturbed the time periods due to the division of non-crisis vs crisis periods, our main goal of designing such a test is to evaluate the linear vs ML algorithms when a structural break is present. Given the very nature of the non-linearity brought by crisis as a structural break, we expect that the ML algorithms especially the ensemble methods are able to behave relatively well in terms of forecasting performance.

Figure 5 displays the box plots on the bootstrapped RMSEs of our test sample, which provides a clear presentation on the RMSEs across different methods. Table 7 also summarizes the overall RMSEs and the corresponding ranking. Not surprisingly, the ensemble machine learning models including Random Forest, Bagging and Gradient Boosting have outperformed all other methods.

Figure 5:
Figure 5:

The RMSEs in Non-Crisis Training vs Crisis Test Sample

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

Table 7:

The Average RMSEs in Non-Crisis Training vs Crisis Test Sample

article image

Table 8 summarizes the RMSEs of the top two ranked methods (Bagging and Gradient Boosting) across all the test samples. As what we have expected, the RMSE of the non-crisis training vs crisis test sample is much higher than the longitudinal test samples and the longitudinal/cross-sectional test samples all together. It is not surprising that the non-linearity brought by crisis as a structural break is the driver behind the spike in RMSEs.

Table 8:

The Average RMSEs across All Test Samples (Bagging/Gradient Boosting)

article image

We also apply LIME module to the bagging method (first ranked method for the non-crisis vs crisis test sample). According to the variables importance probability generated by LIME, the top ten most important variables are: (1) unemployment rate, (2) credit rating, (3) M1 growth rate (YOY), (4) size proxy measured by total asset over average total asset of the sampling period, (5) after-tax interest coverage, (6) distance to default measure, (7) term spread, (8) multiple of enterprise value to EBITDA, (9) interest coverage ratio, and (10) dividend yield. The results of applying LIME module to the gradient boosting model are similar in terms of the selection of the top ten most important variables.

Interestingly, and consistent with our previous findings in section 5, the LIME results of linear model are quite different. Leaving the large RMSE and variance aside, the top ten important variables are: (1) capitalization ratio, (2) three-month T-Bill rate, (3) long term bond return, (4) term spread, (5) common equity/invested capital, (6) monthly capacity utilization, (7) long-term debt/invested capital, (8) monthly industrial production growth, (9) unemployment rate, and (10) total debt to capital ratio. The inability of the linear model to properly capture the importance of “Credit Rating Category” and “Distance to Default” is again evident in this specific case. Hence, it is evident that linear models are less reliable in properly capturing

In conclusion, Figure 6 describes the CDS Spreads forecasting during crisis period. The forecasting of Linear regression shows an extraordinary excess increase during crisis time compared with the true spreads. In comparison, Gradient Boost provides smooth prediction which is less volatile but captures the direction of true spreads.

Figure 6:
Figure 6:

CDS Spreads Forecasting during Crisis Period

Citation: IMF Working Papers 2019, 292; 10.5089/9781513524085.001.A001

VIII. Conclusions

In this paper, we analyze the predictability of CDS spreads cross-sectionally and longitudinally using accounting based, market based, and macroeconomics variables. We first compare the nowcasting and one-step ahed predictive power of traditional credit risk model and various machine learning models, and find that machine learning models can strengthen the prediction accuracy of CDS spreads both cross-sectionally and over time horizons. Among all the machine learning models, ensemble methods including Bagging, Random Forest and Gradient Boosting consistently outperform other interpretable methods. The high cross-sectional and longitudinal precision of ensemble MLs suggests that the nonlinear relationship between economic variables and CDS spreads can be used for constructing “shadow” CDS spreads for those companies without actual CDS.

Using LIME, the “Local Interpretable Model-Agnostic Explanations”, we calculate the importance of right hand side variables, which allows insights into the underlying reasoning for why ensemble methods are more accurate in predicting the variable of interest. The application of LIME is particularly interesting in order to shed potential light into the reasoning why non-linear Machine Learning techniques outperform traditional estimation procedures in nowcasting and forecasting CDS spreads during crisis periods. In times of higher volatility and potential structural breaks, prediction accuracy seems particularly driven by non-linear firm specific credit risk and broader economic conditions, which are not properly captured by traditional estimation procedures such as OLS.

To summarize, our results present three valuable contributions to the literature: (1) Machine learning techniques are able to add considerable value in the prediction of CDS spreads. (2) We are able to map the relationship between available market and firm-specific information and CDS spreads to other companies, thus constructing “shadow” CDS spreads for those companies without actual CDS. (3) By using LIME, we are able to unpack some of the “black box” around Machine Learning techniques, and obtain insights into the explanatory power of different variables in predicting the CDS spreads.

Completing the Market: Generating Shadow CDS Spreads by Machine Learning
Author: Nan Hu, Jian Li, and Alexis Meyer-Cirkel