Into the Great Unknown: Stress Testing with Weak Data

Contributor Notes

Author’s E-Mail Addresses: long@imf.org; rmaino@imf.org; nduma@imf.org

Stress testing has become the risk management tool du jour in the wake of the global financial crisis. In countries where the information reported by financial institutions is considered to be of sufficiently good quality, and supervisory and regulatory standards are high, stress tests can be of significant value. In contrast, the proliferation of stress testing in underdeveloped financial systems with weak oversight regimes is fraught with uncertainties, as it is unclear what the results actually represent and how they could be usefully applied. In this paper, problems associated with stress tests using weak data are examined. We offer a potentially more useful alternative, the "breaking point" method, which also requires close coordination with on-site supervision and complemented by other supervisory tools and qualitative information. Excel spreadsheet templates of the stress tests presented in this paper are provided.

Abstract

Stress testing has become the risk management tool du jour in the wake of the global financial crisis. In countries where the information reported by financial institutions is considered to be of sufficiently good quality, and supervisory and regulatory standards are high, stress tests can be of significant value. In contrast, the proliferation of stress testing in underdeveloped financial systems with weak oversight regimes is fraught with uncertainties, as it is unclear what the results actually represent and how they could be usefully applied. In this paper, problems associated with stress tests using weak data are examined. We offer a potentially more useful alternative, the "breaking point" method, which also requires close coordination with on-site supervision and complemented by other supervisory tools and qualitative information. Excel spreadsheet templates of the stress tests presented in this paper are provided.

I. Introduction

The global financial crisis has placed the topic of stress testing firmly in the spotlight.2 Issues such as the methodologies and assumptions applied in stress tests, and the availability (and quality) of data used in those tests have come under close scrutiny, amid heated debate about transparency and the desirability of making stress test results public. The ongoing discussion is clearly very relevant for countries with more advanced financial systems—and oversight practices—where well-designed stress tests could add significant value to risk management and contingency planning by both the authorities and individual financial institutions. Robust stress tests are also useful for the financial surveillance work done by international financial institutions, including the International Monetary Fund (IMF).

The use of stress tests as an off-site supervision tool has also gained momentum in lower-income countries with typically underdeveloped financial systems.3 However, in many of these countries, supervisory capacity is low, human resources are limited, the regulatory framework remains largely inadequate and the track record on implementation and enforcement tends to be weak. As a result, the necessary data required for stress testing are usually of poor quality, i.e., insufficient, incomplete or inaccurate. In such instances, the desire to keep up with international developments by running stress tests on the respective financial systems is potentially fraught with problems. Indeed, it could do more harm than good if the flawed findings cause undue consternation or lead to inappropriate decisions and actions.

This paper will briefly discuss the problems associated with stress tests that are usually applied to underdeveloped banking systems and then propose a more useful alternative. We focus our analysis on the stress testing of credit risk in the banking system and its impact on solvency, which tend to be the key concern in many of these countries. We demonstrate how a modified version of Čihák’s (2007) credit risk stress test could be used to complement other supervisory actions, including on-site examinations by supervisors. Consequently, we suggest that any stress testing performed in underdeveloped banking systems would have to be closely coordinated with on-site supervision and complemented by other supervisory tools and information to be useful in any way, since stand-alone stress test results would likely be meaningless.

This paper is structured as follows. The data are briefly described in Section II. Section III discusses the problems associated with the simple stress test model that is commonly used for countries with more basic financial systems, where data quality may be questionable. Section IV proposes an alternative method for stress testing, which is less dependent on data quality and the highly subjective assumptions of stress testers, to complement on-site supervision. Our concluding thoughts on the topic are presented in Section V.

II. The Data

We use hypothetical numerical examples to illustrate the issues raised in this paper. The data are presented as follows:

  • An assumed set of capitalization and credit data for the banking system of “Country X,” which comprises 5 individual banks, is used as the baseline (Table 1).

  • The local definitions of loan classifications and their corresponding provisioning requirements are also assumed (Table 2).

  • The required capital adequacy ratio (CAR) for banks in Country X is assumed to be 12 percent, below which banks would be required to recapitalize.

Table 1.

Baseline: Selected Bank Balance Sheet Items for Country X, as at End-2009

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that NPLs are fully provisioned for initially.

Table 2.

Loan Classifications and Provisioning Requirements for Country X

article image

Several key assumptions are also made with regard to the calculation of the CAR:

  • Risk-weighted assets (RWA) are assumed to remain the same post-shock, which would translate to a more conservative result.4

  • Profits are assumed to be zero for the period of the shock, so the full impact is reflected in capital.

  • Where loans by classification are available, they are assumed to have been fully provisioned for prior to the shock; where less granular information is available, loans may be under-provisioned for.

  • In all cases, provisions are topped up post-shock to ensure that loans are again fully provisioned for.

III. Weaknesses in the “Ad Hoc Shock” Method

Ideally, reliable macroeconomic and financial data would be available for modeling the impact of external shocks on banks’ balance sheets in stress tests. Specifically, econometric models would be used to quantify the historical relationship between shocks to selected macroeconomic variables and non-performing loans (NPLs).5 A variety of macro-scenarios would be applied and their effect on NPLs, and consequently, loan-loss provisions and capitalization would be estimated.

In the absence of such data, stress testers have to subjectively make assumptions about the size of shocks to banks’ loan portfolios and possibly take other “short cuts” in designing the top-down stress tests.6 The possible scenarios are:

  1. Shocks to aggregate NPLs vs. to loans by classification. The shocks are applied directly to the aggregate NPLs of the banking system as a whole when more granular data on loan classifications are not available. Provisions may be estimated using one of two methods, depending on data availability:

    1. Assume provisions of 100 percent of NPLs when calculating the impact of the shocks on capital adequacy, in the absence of more granular information on provisioning requirements; or

    2. Assume provisions using the average of performing and non-performing provisioning rates, where information on provisioning requirements is available.

  2. Shocks to the banking system vs. to individual banks. When individual bank data are either not provided or incomplete, “back of the envelope” stress tests are performed on the system as a whole with some form of aggregated data.

  3. Shocks of ever larger magnitudes to NPLs. Credit shocks of increasingly larger magnitudes are applied to estimate the impact on capitalization. Such shocks are usually multiples of existing NPLs or involve increasingly larger proportions of performing loans becoming NPLs.

For the purposes of this paper, ad hoc shocks, as defined in Table 3, are applied to each scenario and the results are analyzed.

Table 3.

Stress Test Assumptions: Ad Hoc Shocks to Asset Quality

article image

A. Analysis

The first question that should be asked prior to performing any stress test is whether the reported data are reliable, so as to determine the possible size of the shocks. If the information is of sufficiently good quality, then shocks over the short-term to banks’ balance sheets should realistically be constrained by the amount of loans in each classification. By definition, past due loans typically migrate from one classification down to another over time, which means that the impact of any shock to credit quality—no matter how severe—should be limited to the outstanding amount in any one loan classification. The constraint of maximum possible loans migrating down classifications is sometimes overlooked in stress tests.

The maximum amount by which a loan category can increase in the short-term following any shock should be equivalent to the balance in the category above it. As we show in Table 4, the worst possible shock to both, performing and non-performing loans (rows 6, 7, 9 and 10) should result in the total amount in each category move down by one step (rows 25, 27, 28 and 29). With the exception of doubtful loans, other categories of loans do not become loss loans—requiring 100 percent provisioning—straightaway. The resulting impact on capitalization from the required increase in provisions appears relatively modest (row 41) at between 1.3–2.8 percentage points. Indeed, none of the banks in the example would be required to recapitalize following the shock. Therefore, to the extent that shocks applied to asset quality in each loan classification exceed the maximum possible amount as described above, the stress test must be assuming actual shocks plus some under-reporting of NPLs, or that the stress test horizon is over the medium- to long-term.

Table 4.

Ad Hoc Shock Stress Test: Maximum Possible Migration Down Classifications

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that NPLs are fully provisioned for initially.

It is assumed that NPLs increase proportionately across all categories.

It is assumed that RWA remains the same.

Another caveat is that loan books may be very different across banks and are thus affected differently when a shock occurs. For instance, a bank with a loan book that consists predominantly of speculative commercial property is likely to be harder hit than one which has focused its lending on residential mortgages. Thus, the application of shocks of uniform magnitudes across banks may be highly unrealistic and likely uninformative. However, granular data are typically unavailable in countries where data reporting and collection are weak and incomplete. In this case, the design of the stress tests may have to take into account more qualitative information, such as anecdotal evidence about the composition of individual banks’ loan books.

Scenario 1a: Shock to aggregate NPLs using a 100 percent provisioning rate vs. to loans by classification (Table 5)

Where the stress tester may not have more granular and accurate data on classified loans to work with, the tendency is to shock aggregate NPLs. The situation could occur if, say, banks do not adhere to reporting requirements for loan classifications, or if supervisors do not make the data available to third party stress testers. We demonstrate the inaccuracies in the results that could arise from shocking aggregate NPLs:

  • In the absence of more detailed information on classified loans, simplistic assumptions are sometimes made. For instance, the stress tester may assume provisions at 100 percent of the additional NPLs (i.e., that all are loss loans) in order to calculate the additional provisions required (column 1, row 14) and ultimately, the impact on capital. In our example, such an assumption, with a shock amounting to a 400 percent increase in NPLs would result in system-wide CAR falling by 7 percentage points (column 1, row 17).

  • There is significant downward bias when the aggregate NPL amount is used, compared to the alternate scenario where more detailed data on classified loans data are available (column 2, rows 16, 18–21). In the latter situation, calculations of graduated provisions would be possible (column 2, rows 32, 34–37), resulting in a more moderate decline in CAR of 4.7 percentage points (column 2, row 41) following a 400 percent increase in NPLs. In other words, the estimated impact would be around two-thirds that from using aggregate NPLs.

Table 5.

Scenario 1a: Ad Hoc Shock to Aggregate NPLs Using a 100 Percent Provisioning Rate vs. to Loans by Classification

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that NPLs are under-provisioned for initially in the case of aggregate NPLs, since 100 percent provisioning is typically assumed; NPLs are fully provisioned for initially in the case where NPLs data by classification are available.

Where data on classified loans are available, it is assumed that NPLs increase by 400 percent across classifications; balance of performing loans are distributed proportionately across classifications.

It is assumed that RWA remains the same.

Scenario 1b: Shock to aggregate NPLs using an average provisioning rate vs. to loans by classification (Table 6)

A more accurate method for refining the above calculations may be to use the average rates for performing and non-performing loans to determine provisions. In this example, we would use the average rate of 0.02 (arithmetic average of 0.01 and 0.03) to determine the required provisions for performing loans (column 1, row 10) and the average rate of 0.567 (arithmetic average of 0.2, 0.5 and 1.0) for calculating the required provisions for NPLs (column 1, rows 11 and 17):

  • The result is almost identical to that from using more detailed classifications, with CAR falling by 4.8 percentage points (column 1, row 21) compared to 4.7 percentage points (column 2, row 41).

  • More generally, however, the similarity of the impact between the two methods would depend on the distribution across NPL classifications.

Table 6.

Scenario 1b: Ad Hoc Shock to Aggregate NPLs Using an Average Provisioning Rate vs. to Loans by Classification

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that NPLs are under-provisioned for initially in the case of aggregate NPLs since an average of the provisioning rates is assumed; NPLs are assumed to be fully provisioned for initially in the case where NPLs data by classification are available.

2/ Where data on classified loans are available, it is assumed that NPLs increase by 400 percent across classifications; balance of performing loans are distributed proportionately across classifications.

It is assumed that RWA remains the same.

Scenario 2: Shock to the banking system vs. to individual banks (Table 7)

Another stumbling block in stress testing may be the lack of availability of information on individual banks. Sometimes, supervisors may not be inclined to share the bank-by-bank information with third party stress testers, usually for confidentiality reasons. In such instances, the stress tester would use data for the aggregate banking system, in which case the stress test findings would need to be interpreted with caution:

  • The information derived from shocks to the aggregate system could mask problems among individual banks. In our example, a shock representing a 400 percent increase in NPLs across the board would result in the system’s CAR declining by 4.7 percentage points to 10.4 percent (column 1, rows 41 and 40, respectively), i.e., 1.6 percentage points below the required minimum of 12 percent.

  • Closer examination of the impact on individual banks show significantly varied outcomes. The CAR of Bank 5 has declined by a massive 9.3 percentage points to 4 percent (column 6, rows 41 and 40, respectively), while the capitalization of Bank 2 has fallen by 3.4 percentage points to 11.2 percent (column 3, rows 41 and 40, respectively), not far below the required minimum 12 percent. Thus, focusing on the aggregate outcome alone could obscure the possibility that a particular institution may be very vulnerable with potentially systemic consequences.

Table 7.

Scenario 2: Ad Hoc Shock to the Banking System vs. to Individual Banks

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that NPLs are fully provisioned for initially.

It is assumed that NPLs increase proportionately across all categories.

It is assumed that RWA remains the same.

Scenario 3: Shocks of increasingly larger magnitudes to NPLs (Table 8)

In the absence of good quality and sufficient historical data to model the relationship between macroeconomic developments and credit risk, the size of shocks applied in stress tests often lack foundation or justification. While historical experience could be used as a guide, many nascent banking sectors may not have experienced a complete business cycle; shocks also tend to be different from one crisis to the next. Thus, stress testers would typically apply increasingly larger shocks to estimate their impact on capital, and then conclude that the banks or banking system may be vulnerable. Such stress tests seem to overlook the obvious algebraic relationship between NPLs and CARs, i.e., the larger the increase in NPLs, the greater the decrease in CARs.

Table 8.

Scenario 3: Ad Hoc Shocks to NPLs of Increasingly Larger Magnitudes

(In millions of domestic currency units unless stated otherwise)

article image
Source: Authors’ calculations.

It is assumed that loans are fully provisioned for initially.

It is assumed that RWA remains the same.

Using a set of increasingly larger shocks to NPLs (row 11), we demonstrate their impact on banking system and individual bank CARs.7 As expected, the impact on individual banks’ CARs increases as NPLs rise from 100 to 400 percent and as the amount of performing loans becoming NPLs increase from 10 to 40 percent (columns 1–6, row 14). Put another way, the CARs deteriorate as a matter of course when the shocks increase in magnitude, all to the point of falling below the required capitalization levels and in some cases, significantly so.

The key weakness to the increasingly larger ad hoc shocks approach lies in the relevance of the results. It would be impossible to infer that the banks and banking system are significantly vulnerable in cases where the shocks translate to significant under-capitalization, since there would be little empirical evidence to support the plausible occurrence of tail shocks of such large magnitudes. As a result, it could be very difficult for the stress tester to make constructive recommendations on actions to be taken in response to the findings.

B. Summary of Findings

Clearly, simple stress tests using ad hoc and extreme shocks are flawed, which begs the question of how useful they may be for risk management and contingency planning purposes. Specific caveats are as follows:

  • Any assumption of 100 percent provisioning following a shock would significantly overstate the amount of additional provisions required and thus underestimate the resulting capitalization. NPLs are classified according to the lateness in debt service, and different provisioning rates apply across the different loan classifications. When a shock occurs, the quality of loans typically move from one classification down to the next over the short-term, with a graduating rise in the provisioning rate, rather than become loss loans—with a 100 percent provisioning requirement—straightaway.

  • The soundness of individual banks’ balance sheets varies considerably across a particular financial system, and uniform shocks to aggregate banking system data would likely yield less than useful results. In many financial systems, the quality of banks range from those that are well-capitalized and well-managed with conservative business models and sound risk management systems, to those that are weak, risk-seeking and profligate. Thus, the application of a particular shock to the aggregate system runs the risk that supervisors may base their contingency planning decisions based on potentially meaningless information from the stress tests.

  • Applied shocks, if sufficiently large, would break any bank or banking system in the world, which suggests that such results may not be instructive. The laws of algebra should show that the larger the ad hoc shock to NPLs, the greater its flow-through impact on capitalization, to the point where the banks appear severely undercapitalized. However, it is unclear as to what should be inferred from such stress tests, and how the findings could usefully be applied, given that the quality of the raw data usually precludes any ability to quantify or justify such shocks.

IV. A Proposed Compromise: The “Breaking Point” Method

We subsequently propose a possibly more useful method for determining banks’ CAR, which could be more informative for supervisors in situations where the reporting of NPLs by banks is unreliable. The “breaking point” method is essentially a “stressing until it breaks” exercise, which is also known as reverse stress testing. This particular method of analysis is intuitively appealing in that it: (i) does not depend heavily on the quality of reported data;8 and (ii) does not require any assumption with regard to the size of the overall NPL shock(s). It estimates the amounts of classified loans that would reduce a bank’s CAR to the “breaking point”—in our example, 12 percent—below which recapitalization would be necessary. We contemplate two situations: (i) only aggregate NPL data are available; and (ii) granular data on loan classifications and provisioning are available.

It should be emphasized that even the breaking point method cannot be used as a stand-alone stress test. Rather, the findings from applying this particular method of stress testing would need to be complemented by information gathered from bank examinations performed by on-site supervisors. Thus, in situations where the reported data are of poor quality, the off- and on-site teams would have to collaborate even more closely, and any stress test result can only provide guidance to the latter on which bank(s) may be undercapitalized, given their findings during on-site inspections.

Scenario A: Shock to aggregate NPLs (Table 9)

Application of the breaking point method is very straightforward when only aggregate data for the banking system is available. We estimate algebraically the aggregate NPL ratio for the banking system which would bring the CAR down to 12 percent. Naturally, the calculation of provisions that should be held in the banking system following the shock would play an important role in determining post-shock capital. As in Scenario 1b earlier, we use the average rates for performing and non-performing loans to calculate the new provision amount required. The results show that:

  • The breaking point NPL ratio for the banking system as a whole is around 17.5 percent (row 13), compared to the current 5 percent (row 7).

  • NPLs would have to increase by almost 250 percent from current levels (row 17).

Table 9.

Scenario A: Breaking Point Shock to Aggregate NPLs

(In millions of domestic currency units unless stated otherwise)