Credibility and Crisis Stress Testing
  • 1 0000000404811396 Monetary Fund

Contributor Notes

Author’s E-Mail Addresses:;

Credibility is the bedrock of any crisis stress test. The use of stress tests to manage systemic risk was introduced by the U.S. authorities in 2009 in the form of the Supervisory Capital Assessment Program. Since then, supervisory authorities in other jurisdictions have also conducted similar exercises. In some of those cases, the design and implementation of certainelements of the framework have been criticized for their lack of credibility. This paper proposes a set of guidelines for constructing an effective crisis stress test. It combines financial markets impact studies of previous exercises with relevant case study information gleaned from those experiences to identify the key elements and to formulate their appropriate design. Pertinent concepts, issues and nuances particular to crisis stress testing are also discussed. The findings may be useful for country authorities seeking to include stress tests in their crisis management arsenal, as well as for the design of crisis programs.


Credibility is the bedrock of any crisis stress test. The use of stress tests to manage systemic risk was introduced by the U.S. authorities in 2009 in the form of the Supervisory Capital Assessment Program. Since then, supervisory authorities in other jurisdictions have also conducted similar exercises. In some of those cases, the design and implementation of certainelements of the framework have been criticized for their lack of credibility. This paper proposes a set of guidelines for constructing an effective crisis stress test. It combines financial markets impact studies of previous exercises with relevant case study information gleaned from those experiences to identify the key elements and to formulate their appropriate design. Pertinent concepts, issues and nuances particular to crisis stress testing are also discussed. The findings may be useful for country authorities seeking to include stress tests in their crisis management arsenal, as well as for the design of crisis programs.

“Investors don’t like uncertainty. When there’s uncertainty, they always think there’s another shoe to fall.”

Kenneth Lay, then-CEO of Enron Corp.

August 20, 2001

I. Introduction

Stress tests have become the “new normal” in financial crisis management. They are increasingly being used by country authorities as an instrument for regaining the public’s trust in the banking system during the current global financial crisis. This new tool, known as a “crisis stress test,” is essentially a supervisory exercise accompanied by detailed public disclosure to remove widespread uncertainty about banks’ balance sheets and the authorities’ plans for those banks. Put another way, the crisis stress test is a microprudential exercise with macroprudential objectives (Figure 1). In this regard, transparency, and hence the quality of disclosure, is critical.

Figure 1.
Figure 1.

Solvency Stress Testing Applications

Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

Source: Jobst and others (2013).Notes:1. IMF staff typically defines top-down stress tests as those that are either conducted using the data of individual banks and then aggregated, or on an aggregated portfolio; bottom-up stress tests are defined as those conducted by individual institutions using their own internal risk models and data.2. Fund staff had previously conducted a rudimentary stress test of the Hungarian banking system during the crisis program discussions in late-2008 as an input into determining the size of the program, which were subsequently published (IMF, 2008).

The concept of crisis stress testing was introduced by the U.S. authorities in early-2009 in the form of the Supervisory Capital Assessment Program (SCAP). The solvency stress testing exercise took place during the darkest days of the sub-prime loans meltdown, following a sharp loss of confidence in U.S. banks and an unprecedented decimation of their market value. The announcement of the SCAP was initially met with trepidation and skepticism by markets, but official clarifications surrounding the event about the aim of the exercise, the availability of a financial backstop and the subsequent publication of the methodology and results appeared to reassure markets (see Peristian and others, 2010). The findings revealed that the capital needs of the largest U.S. banks at the time would be manageable even if a more adverse scenario were to materialize (see Tarullo, 2010). Investor sentiment rebounded and stabilized, and the assessed banks were able to add more than $200 billion in common equity in the following 12 months. The U.S. supervisors have since followed up on the SCAP with publicized supervisory stress tests in the form of the Comprehensive Capital Assessment Program (CCAR) and also under the Dodd-Frank Wall Street Reform and Consumer Protection Act (DFA) framework.

A crisis stress test conducted by supervisors should not be confused with a supervisory stress test undertaken during a crisis. Both types of stress tests may be used for similar purposes, i.e., to: (i) determine a needed capital buffer over current solvency levels; (ii) differentiate the soundness of banks in the system as part of triage analysis; and/or (iii) quantify potential fiscal costs, depending on the magnitude of the projected shortfalls and the urgency of any required recapitalization. However, supervisory stress tests in crisis situations would not have the same degree of transparency and indeed, may have to be kept very confidential to avoid potentially unleashing an unmanageable backlash if the design of key elements—which we will discuss in much of this paper—are inadequate.

Clearly, crisis stress tests must be credible to be successful. The governance of the tests (i.e., the stress tester and the overseer) must be perceived to be independent, with the requisite technical expertise. The stress tests themselves must be sufficiently stringent yet plausible. They should be simultaneous, consistent and comparable cross-firm assessments to enable a broader analysis of risks and an evaluation of estimates for individual institutions (Tarullo, 2010). From a macroprudential perspective, they should allow for a better understanding of inter-relationships across institutions. Moreover, the manner in which the results will be addressed or backstopped and used must be clarified early on. The results themselves must be sufficiently granular such that there is clear differentiation among institutions in the first instance, to guide subsequent actions. Supervisory authorities in Europe have also used crisis stress tests for systemic risk management but with varying degrees of effectiveness to date, compared to the U.S. stress test. This suggests that the design of such exercises matter significantly.

Crisis stress tests should be seen as one element of an overall strategy to rebuild public confidence in a banking system. Ideally, such a strategy should include (i) containment; (ii) diagnostics (asset quality review (AQR), data integrity and verification (DIV), stress test); and (iii) restructuring or exit. Within the diagnostics component, the stress test is a forward-looking tool for determining a capital buffer against further deterioration in the real economy, while any preceding AQR and DIV work should ensure that the data used in the stress test are “clean,” which is critical for credibility.

The assessment by the Turkish authorities of its banking sector following the 2001 crisis is an example of a public diagnostic exercise. Although it did not include a forward-looking stress test component, its overall objective and design included the necessary attributes for a credible outcome. The financial status of all domestic banks was assessed using improved accounting standards and a three-stage audit procedure. The capital adequacy of each bank was determined and banks that were undercapitalized were required to take capital action. A financial backstop through the State Recapitalization Scheme was made available to banks that were deemed solvent, but which were unable to raise the necessary capital. The objective, method and implementation details of the exercise were published (see Banking Regulation and Supervision Agency (BRSA), 2002a), as were the findings and progress on actions taken (BRSA, 2002b).

This paper focuses on the design of crisis stress tests, leaving the comprehensive study of other aspects of a diagnostic to future research. Work on developing a comprehensive framework for an effective crisis stress test has been limited to date. Hirtle and others (2009) draw lessons from the SCAP in analyzing the complementarities between macroprudential and microprudential supervision. Schuermann (2012) explores in some detail the design of stress scenarios and their application in terms of modeling losses, revenues and balance sheets—key elements in macro stress testing—in the U.S., EU and Republic of Ireland (“Ireland”) exercises. He also examines the disclosure strategies across the various exercises. Other empirical and policy-related literature in this area has largely focused on the effectiveness of the SCAP (Bernanke, 2010; Matsakh and others, 2010; Peristian and others, 2010; Tarullo, 2010), with some coverage of the European exercises (Onado and Resti, 2011).

The specific objective of this paper is to formulate guidelines for designing a crisis solvency stress test, based on lessons learned from previous experiences. Although a crisis stress testing exercise may cover either solvency or liquidity risk or both, we focus on the former in this paper. In this regard, our study complements the work done by Hirtle and others (2009) and Schuermann (2012). We employ various methodologies in our analysis:

Our conclusions point to an immutable fact, which is that crisis stress tests are not for the half-hearted. Ideally, the stress test should take place sufficiently early to address any crisis of confidence in the banking system and have a clearly-specified objective. Moreover, lessons learned from past experiences show that country authorities must be fully committed if they are to undertake such an exercise, lest it backfires. The authorities must be prepared to conduct a thorough, honest and transparent examination of their banking system and resolve to take appropriate follow-up action(s) on the results with the necessary resources to back them, if the exercise is to serve its purpose. Supporting activities such as AQRs and possibly follow-up stress tests are necessary to ensure the credibility of crisis stress tests. However, political economy considerations could also play an important role in the design of crisis stress tests, given the potential implications of the results for public confidence and the fiscal purse. We suggest that our findings may be useful for authorities seeking to undertake stress tests for systemic risk management and for the design of financial crisis programs.

Our paper is organized as follows. Section II details the relevant case studies of stress testing exercises conducted in the United States and Europe, as well as the market data used in the initial impact study. Section III discusses the metrics used for defining the effectiveness of those crisis stress tests and presents the empirical analysis. Section IV draws on those findings and the qualitative information gleaned from the case studies to identify the key stress test elements and to formulate their design. A sidebar comparing the differences between bank restructuring costs and loss estimates from crisis stress tests is presented in Section V. Section VI concludes.

II. The Data

Our analysis draws on four case studies covering seven crisis stress tests. The tests were conducted in the United States, the European Union, Ireland and Spain between 2009 and 2012 (Table 1). The details of the individual exercises are sourced from the respective authorities, namely, the Board of Governors of the Federal Reserve (Fed), the European Banking Association (EBA) and its predecessor, the Committee of European Banking Supervisors (CEBS), the Central Bank of Ireland (CBI) and Banco de España (BdE).

Table 1.

Case Studies: Crisis Stress Tests

article image
Sources: Fed; CBI; EBA; and BdE.

The CEBS had noted that its stress tests contrasted with the crisis stress test nature of the SCAP. The EU authority had stated that the objective of its exercise was to “provide policy information for the assessment by individual Member States of the resilience of the EU banking sector as a whole and of the banks participating in the exercise,” compared to the SCAP, which was linked to “determining the individual capital needs of banks” (CEBS, 2010a); however, the CEBS’ overt efforts at transparency to reassure markets—including through the announcement of aggregate results in the 2009 exercise—have been consistent with the macroprudential application of crisis stress tests.

We first identify successful crisis exercises by analyzing the performance of market indicators, consistent with existing studies. Previous research had examined the behavior of stock prices of individual U.S. banks post-SCAP (Matsakh and others, 2010; Peristian and others, 2010) as well as the sovereign CDS spreads (Peristian and others, 2010; Schuermann, 2012) to determine the effectiveness of the respective crisis stress tests. Here:

  • We study the financials stock price indices for each jurisdiction as proxies for the market’s assessment of the soundness of the respective banking systems. Stock prices represent a bellwether indicator for market confidence in that shareholders are the “first loss” investors and the evidence shows that they respond very quickly to incorporate all relevant publicly available information in their pricing (Fama, 1970).

  • We also consider the behavior of sovereign CDS spreads around the stress testing exercises and related events. Sovereign CDS spreads provide an indication of the perceived creditworthiness of a country, which is considered closely linked to the health of its banking sector given the potential implications for the public purse if government support is required. In several banking systems, the high holdings of sovereign debt have focused market concerns on the bank-sovereign feedback loop.

All market data used in this study are sourced from Bloomberg (Table 2). It should also be noted that caveats apply to the use of financial markets indicators to define the effectiveness of the stress tests in that they may also be influenced by other concurrent events which we do not isolate in this study.

Table 2.

Market Data: Financials Stock Price Index and CDS Spreads

article image
Source: Bloomberg.

III. Identifying the Successful Crisis Stress Tests

We apply a simple event study-type methodology for determining the effectiveness of a crisis stress test. Given that our analytical framework is not strictly that of a formal event study, we shall refer to our assessment as an “impact study.” We classify a stress test as successful if it has been able to stabilize or improve investor sentiment towards the banking system for at least six months after the results are announced, providing sufficient time for follow-up action(s) to be taken. In other words, the stress test is considered to have achieved its objective if it has been able to establish a “floor” for the market during this period such that:

  • The financials stock index does not fall below the threshold set at the time the test results are announced.

  • The volatility of daily returns (in this case, calculated as the standard deviation over 60 days to satisfy the Central Limit Theorem) continues to decline as uncertainty recedes.

  • The sovereign CDS spread narrows as the market perceives the potential fiscal burden from the banking system to have lessened.

In this context, the empirical evidence from the U.S. SCAP shows that the exercise had been successful in achieving its aim (Figure 2):

  • The release of the SCAP results effectively halted and then reversed the 2-year slide in investor confidence towards the country’s banks. At the same time, market volatility—which had peaked just prior to the exercise—declined sharply. Since then, the S&P 500 Financial Sector Index has largely remained above the level established by the SCAP results, although it flirted with that floor during the more volatile period in 2012 Q3.

  • U.S. firms have substantially increased their capital since the SCAP. The weighted Tier 1 (T1) Common Equity ratio of the 18 bank holding companies that were in the SCAP sample has more than doubled from an average 5.6 percent at the end of 2008 to 11.3 percent in 2012 Q4, reflecting an increase in T1 Common Equity from $393 billion to $792 billion during the same period.

  • U.S. CDS spreads narrowed in tandem with the improvement in the financials index during the SCAP period. However, they subsequently dissociated from developments in the banking sector in September 2011 as markets turned their attention to the fiscal deficit after the Congressional Budget Office (CBO) announced that the U.S. budget deficit had reached its widest as a percentage of GDP since 1945.

Figure 2.
Figure 2.

United States: The Sentiment after the SCAP

(Indexed to 100 on February 20, 2007)

Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

Sources: Bloomberg; Fed; various financial media; and authors’ calculations.

The turnaround in confidence in U.S. banks from the SCAP buoyed sentiment towards banks elsewhere, at least temporarily. EU banks’ stock prices benefitted from the rebound and volatility fell; however, they were unable to sustain the gains over the medium term, with some countries having to conduct separate tests subsequently:

  • The stress tests of EU banking systems were less convincing. Stock prices weakened and sovereign CDS spreads widened following each exercise (Figure 3). Indeed, the stock index to this day has never risen above the level recorded at the time of the CEBS 2009 stress test.

    Figure 3.
    Figure 3.

    European Union: The Ebb from the EBA

    (Indexed to 100 on April 20, 2007)

    Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

    Sources: Bloomberg; EBA; various financial media; and authors’ calculations.

    • The CEBS 2010 exercise suffered the ignominy of having Ireland request a bailout from the European Commission (EC), the European Central Bank (ECB) and the IMF (“the Troika”) after the stress tests indicated that EU banks would remain sufficiently capitalized and resilient under adverse scenarios (CEBS, 2009a; CEBS, 2010b).

    • Similarly, systemic banks Dexia (Belgium) and Bankia (Spain) passed the EBA 2011 stress test (EBA, 2011a) only to require significant restructuring within a few months. These events were accompanied by sharp jumps in stock market volatility.

    • The EU Capital Exercise was subsequently announced in October 2011 in response to a rapidly evolving crisis. The disclosure of its results in December 2011, followed by the introduction of the ECB’s Long-term Refinancing Operation (LTRO) facility for financing eurozone banks later that month, halted the deterioration in confidence towards EU sovereigns as evidenced by their narrowing CDS spreads. In the former, the EBA reviewed banks’ actual capital positions as at end-June 2011 and their sovereign exposures in light of the worsening of the sovereign debt crisis in Europe, and requested that they set aside additional capital buffers by June 2012 based on September 2011 sovereign exposure figures and capital positions (EBA, 2011b). The announcement of the Outright Monetary Transactions (OMT) by the ECB in August 2012 further improved market confidence in the region as a whole.

  • In Ireland, the Prudential Capital Assessment and Review (PCAR) 2011 exercise contributed to stabilizing market sentiment. The publication of the IMF’s Third Review in September 2011—six months after the release of the PCAR results—indicating that the program’s structural benchmarks had largely been met and that the outcomes of the PCAR were being incorporated into banks’ recapitalization and restructuring plans (IMF, 2011), gave credence to the exercise. The financials stock price index troughed, its volatility continued to decline and the sovereign CDS spreads tightened (Figure 4).

    Figure 4.
    Figure 4.

    Ireland: The Pain before the PCAR

    (Indexed to 100 on February 21, 2007)

    Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

    Sources: Bloomberg; EBA; CBI; various financial media; and authors’ calculations.

  • In Spain, the third-party BU stress test and corresponding revelation of a comprehensive strategy to identify and deal with problem banks put a floor on market sentiment and turned it around. The announcement of the IMF FSAP and third-party TD stress test results coincided with increased volatility in stock price returns, but also signaled that the authorities were closer to taking concerted action to restructure the banking sector (IMF, 2012b; Roland Berger, 2012; Oliver Wyman, 2012a). The subsequent publication of the Memorandum of Understanding (MoU) with the Eurogroup in July 2012, which incorporated comprehensive diagnostics of banks’ balance sheets and the details of a financial backstop, reassured investors. Stock price volatility declined sharply and sovereign CDS spreads narrowed (Figure 5).

Figure 5.
Figure 5.

Spain: The “Floor” under the FSAP

(Indexed to 100 on February 14, 2007)

Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

Sources: Bloomberg; EBA; BdE; various financial media; and authors’ calculations.

A summary of the effectiveness of the respective crisis stress tests is presented in Table 3.

Table 3.

Crisis (and Follow-up) Stress Tests: Effectiveness Scorecard

article image
Source: Authors.

Included for completeness only—not intended as a crisis stress test; surveillance stress testing exercise was conducted in a crisis environment.

Driven by fiscal deficit and sequester.

IV. Designing an Effective Crisis Stress Test

Crisis stress tests require additional considerations which may not be required of supervisory stress tests during normal times. In particular, the design of certain elements may necessarily be different from what is typically done in the latter (e.g., the timing of the test, its governance, the transparency requirements, the objective, action plan and financial backstop). Other aspects have to be constructed to withstand intense public scrutiny (e.g., the scope and scenario design). While no one particular element can alone ensure the success of a crisis stress test, each one plays a crucial part in the credibility of the exercise as a whole.

There are also additional activities that provide integral support for or complement crisis (solvency) stress tests. These include AQRs, separate liquidity stress tests and/or follow-up solvency stress tests, some or all of which may be crucial for the credibility of the original exercise itself. Our overall findings are summarized in Table 4 in a design “scorecard” comparing the features of various elements across crisis stress tests, with the associated details presented in Appendix I.

Table 4.

Crisis Stress Tests: Design Scorecard

article image
Sources: Table 3; Appendix I; and authors.

Included for completeness only—not intended as a crisis stress test.

Medium-term sustainability of market confidence remains to be seen.

Delay may impose significant additional costs in order to be effective.

Forecast losses provided by third party.

Not necessary if top-down is conducted on individual banks.

Delay may result in wider coverage of banks to allay increased doubts.

Large cross-border banks; domestic systemically important banks making up at least 60 percent of national banking assets.

Stress test did not include haircuts to sovereign debt holdings in the banking book.

Takes into account the ECB’s LTRO support facility.

Crisis program with the Troika.

Not critical only if independent cross-checks/validation conducted.

Not critical only if independent cross-checks/validation conducted.

Lower-intensity, quantitative substitute for AQR.

Assumptions must be sufficiently stringent and must be disclosed.

Timing will take into account the AQR in the context of the Single Supervisory Mechanism (SSM) and for Spain and Ireland, the EBA 2014 exercise.

The EBA conducted a confidential thematic review of liquidity funding risks.

A. Key Elements


The timing of a crisis stress test is crucial. Steps to reduce uncertainty through information provision should be taken as soon as possible during a crisis. Borio and others (2010) posit that early recognition and intervention would avoid hidden deterioration in conditions that could magnify the costs of the eventual resolution. Pritsker (2010) argues that while central bank actions such as broadening the range of acceptable collateral, loan guarantees and government-sponsored capital injections may increase bank lending during a crisis, it also increases the central bank’s exposure to credit and market risk. Such efforts would be less costly and more effective under conditions of less uncertainty, i.e., it would be easier to convince potential lenders of a bank’s solvency if they have better information about the scope of the problem early on.

Experience confirms that delay by country authorities in taking resolute action in a timely manner has eventually required the incurrence of significant additional costs. First, there is the destruction of the banks’ asset values which could take a long time to recover, if at all. Second, the reputational risk to supervisory authorities also grows when a crisis is allowed to fester and deepen. Third, any loss in market, depositor and creditor confidence could potentially place significant burden on the fiscal purse and consequently, the creditworthiness of the sovereign if government support becomes necessary. Combined, these factors could give rise to greater demands when the authorities finally decide to take action, notably:

  • The damage to the credibility of the authorities may be too deep-seated to overcome following a lengthy crisis. A consequence could be that they may have to contract third party stress testers and seek independent overseers to enhance the credibility of the exercise.

  • Heightened uncertainty about banks’ asset quality and concerns over increasing lender forbearance could mean a more complex, resource-intensive and protracted exercise. The stress test may have to cover a much broader sample of banks than would otherwise be necessary and possibly require additional steps, such as an AQR comprising audits and third-party expert valuations of banks’ portfolios and a DIV exercise.

  • Markets are likely to impose higher standards on institutions that are already under extreme pressure if they have lost all trust in the quality of assets (e.g., through expectations of higher loan loss projections and larger capital buffers).

That said, the decision as to what constitutes an “optimal” moment for introducing a crisis stress test is not clear-cut and remains largely an issue of judgment and, possibly, serendipity. As an extension of the principle espoused in IMF (2012a) that market views should be taken into account in designing stress tests, indicators such as stock prices, their corresponding price-to-book (PB) ratios, as well as sovereign CDS spreads could potentially be used as triggers in deciding on the timing of a crisis stress test (Table 5). However, the evidence to date is inconclusive:

  • The United States was first off the rank with the SCAP following two years of decline from the February 2007 historical peak of the S&P 500 Financial Sector Index. The EU CEBS 2009 stress test was also introduced almost 2 years after the STOXX Europe 600 Banks Price Index peaked but has been less effective by comparison. The Ireland and Spain crisis stress tests took place 4 and 5½ years after the apex of their respective financials stock prices. Assuming that the decisions to stress test were made around the end of the year prior to each crisis stress test, the U.S. and European indices would have dropped by anywhere between 65–75 percent by that stage. The long-term (5-, 10- and 15-year) average index levels also do not provide any clear guide to the decision-making process by the authorities as they do not appear to have been used as trigger points. Ireland and Spain conducted their stress tests following their engagement with the Troika for financial support. By that stage, Ireland’s banks had lost almost all their market value, while the equity values of Spanish banks were down by more than 60 percent.

  • The PB ratio, which is typically used to assess bank valuations, may yield some hints on the timing of the crisis stress tests. These ratios were richest in late-1990s to early-2000s period for the sample jurisdictions, reaching 3.5 times for the U.S. financials and exceeding 4 times in Ireland and Spain. Long-term averages ranged from 1.8–2.2. The decision to conduct the SCAP would have been made when the PB ratio fell to unity, which could perhaps be considered a “line in the sand” for future reference. The other jurisdictions waited until their respective PB ratios had declined to well below unity, while Ireland’s PCAR would have been contemplated around the time when banks’ average PB ratio had dropped to below 0.3 times.

  • Sovereign CDS spreads are an indicator of the market’s current perception of sovereign risk. Given the systemic importance of the banking sector for economic activity, market concerns that the government may have to bail out institutions that are too big to fail, and the resulting burden on the fiscal balance, are likely to be reflected in the CDS spreads. In Europe, the sovereign-bank feedback loop from banks’ large holdings of sovereign debt increased the likelihood of losses. Here, any rule-of-thumb that may have been used is less clear—spreads had ballooned to unprecedented levels across the board by the time any decision would have been taken on running the tests.

Table 5.

Crisis Stress Test Jurisdictions: Financial Markets Statistics

article image
Sources: Bloomberg; and authors’ calculations.

Ideally, a crisis stress test should be conducted before the crisis of confidence in the banking system becomes entrenched. However, the successful exercises to date reveal little in terms of whether they had been appropriately timed given that counterfactuals are difficult to prove:

  • By all measures, the “intervention” by the U.S. authorities did halt and turn around the sharp slide in market confidence. That said, the rebound from the 80 percent loss in banks’ market value has been sluggish compared to the overall market, which has recovered all its losses from the crisis (Figure 6). The question then is whether the rise in the financials index would have been quicker and stronger had the supervisors stepped in earlier. Although bank stocks may have arguably been overvalued prior to the crisis, their PB ratio is currently well below the 15-year average.

    Figure 6.
    Figure 6.

    United States: S&P 500 Stock Market and Financial Sector Indices

    (Indexed to 100 on February 20, 2007)

    Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

    Sources: Bloomberg; and authors’ calculations.

  • The eventual outcomes from the Ireland and Spain stress tests have also been positive but these achievements were almost pyrrhic. The supervisors were perceived to have lost significant credibility with markets by that stage (e.g., The Irish Times, 2010; Garicano, 2012). External consultants had to be employed to conduct comprehensive AQRs and in the case of Spain, to run the stress tests in order to reassure investors (third-party consultants provided forecast losses for the Ireland stress test). Moreover, the fiscal cost of supporting their respective banking systems had become so onerous that both countries had to eventually request external financial aid.

Irrespective of the timing of a crisis stress test, recognition of the problem alone is insufficient. It should be linked to restructuring if a bank’s profitability is to eventually be restored. In other words, the decision to conduct a crisis stress test should also take into account the potential implications for the public purse, i.e., it must be tied to the capacity of the authorities to adequately backstop and address the findings. The evidence suggests that while the timing of crisis stress tests may be important, it is insufficient in the absence of other key elements, as elaborated throughout the rest of this section.


There is no hard and fast rule as to who should oversee and/or conduct the crisis stress test. The overriding requirement is that the protagonists are considered credible. In some cases, issues such as expertise, sufficiency of resources and/or political economy considerations play equally important roles in determining who they should be:

  • In the United States, the oversight and execution of the SCAP relied on collaboration across supervisory agencies—the Fed, the FDIC and the OCC; supervisors of individual banks were consulted but not involved in the actual stress test analyses.

  • The EU-wide stress tests were conducted by national supervisory authorities, overseen and coordinated by the EBA (which did not have direct interaction with the banks prior to or during the exercise) in cooperation with the EC and the ECB/European Stability Risk Board (ESRB). However, the EBA had argued that it needed more legal powers over the exercise to ensure the reliability of the input data, and hence the results (see Brunsden, 2012).

  • In contrast, the authorities in Ireland and Spain appointed third-party contractors in their efforts to strengthen perceptions of independence and objectivity in the process. The reputation of the supervisors had been dented after their banks passed the CEBS/EBA stress tests only to require significant restructuring not long afterwards. In the case of Spain, the authorities, the Troika, the EBA and counterparts from two other European central banks were involved in the oversight of the stress testing exercises.


There is some flexibility to the stress testing approach taken in a crisis exercise. Ideally, a bottom-up (BU) test, cross-validated by a top-down (TD) exercise, would be the superior approach (IMF, 2012a; Jobst and others, 2013), but this may not be possible in a crisis situation where the timeframe is compressed.2 Both BU and/or TD approaches have been used effectively in crisis stress tests. However, if only a TD stress test can be undertaken, it should be conducted on a bank-by-bank rather than aggregated basis, which is consistent with the need for transparency at the disclosure stage, as we discuss below. Additionally, the stress tests should be supported by inputs from AQRs (and preferably DIVs) which we cover later in this paper:

  • The U.S. SCAP consisted of a BU and TD mix, with what we would deem a lower-intensity, quantitative substitute for an AQR. The supervisors applied independent quantitative methods using firm-specific data to support their assessments of banks’ submissions (Fed, 2009a).

  • The EU CEBS 2010 and the EBA 2011 exercises comprised BU tests by cross-border banking groups and simplified stress tests, based on national supervisors and reference parameters provided by the ECB, for less complex institutions. The CEBS 2010 stress test included a peer review of the results and a challenging process, as well as extensive cross-checks by the CEBS (CEBS, 2010a); the process evolved for the EBA 2011 exercise to incorporate consistency checks by the EBA, a multilateral review and TD analysis by the EBA and the ESRB with ECB assistance (EBA, 2011c).

  • Ireland’s PCAR 2011 was a BU exercise supported by an AQR. Banks were required to model the impact of certain assumptions on their balance sheets and profit and loss accounts (revenues and losses) based on a third party’s assessment of forecast losses (CBI, 2011). The stress test was perceived to be particularly credible in that it explicitly compared the loan loss estimates of the CBI with those of the third party as a cross-check of the results, which were subsequently published by the CBI.

  • In Spain, two sets of crisis stress tests were conducted in 2012 and the results were published by the third party-consultants who conducted the exercises:

    • The first exercise was a TD stress test. Two consultants separately considered the historical performance and asset mix for each institution at aggregate levels to generate forward-looking projections. The consultants applied their own models, expert experiences and benchmarks (Roland Berger, 2012; Oliver Wyman, 2012a).

    • The second stress test was conducted by one consultant, using detailed data from banks and inputs from a comprehensive AQR exercise. Specifically, the test drew on information derived from external reviews by independent auditors and real estate appraisers and from BdE central databases, to estimate individual banks’ capital needs under a baseline and an adverse scenario (Oliver Wyman, 2012b). Structural analysis of individual banks’ financial statements and business plans were undertaken. Given that the banks only ran their own models on the baseline scenario to generate net revenues, it was essentially another TD exercise—albeit at a much more granular level—but is widely referred to as a bottom-up (BU) exercise. (For differentiation purposes, we refer to the first as the TD test and the second as the BU test).

The coverage of banks should capture at least the systemically important institutions, given the macroprudential nature of the stress test (see IMF, 2012a; Jobst and others, 2013). In this respect, guidance has been provided by the FSB on what constitutes global and domestic systemically important banks (BCBS, 2011 and 2012). However, the sample may have to be expanded depending on the environment at the time of implementation. While some banks are of obvious systemic importance and their selection is indisputable, the difficulty has been in identifying those that are systemic at the margins, e.g., some of the smaller institutions which may have the potential to become or have become systemic under certain conditions (IMF/BIS/FSB, 2009). In cases where there has been a total loss of confidence in the entire banking system and uncertainty about the soundness of individual banks is very high, the coverage may have to include even the smaller, non-systemic banks to forestall a “witch hunt” for failed and failing banks. Coverage has differed across the various crisis stress tests to date (including whether the tests were run on consolidated or domestic business data), but each exercise has captured at least 60 percent of domestic banking system assets:

  • The U.S. SCAP included the 19 largest bank holding companies (BHCs), each with total assets greater than $100 billion. They represented two-thirds of banking system assets.

  • The EU CEBS 2009 stress test captured 22 large cross-border banks with 60 percent of EU banking assets; the number of banks increased to 91 in subsequent exercises, covering 21 countries and at least 50 percent of each banking sector, for an additional 5 percentage points coverage of EU banking assets. However, the flexibility for country authorities to choose which banks to include in the stress tests was perceived to have reduced the legitimacy of the exercises (Ahmed and others, 2011).

  • Ireland’s PCAR 2011 stress tested four financial institutions which accounted for 80 percent of banking system assets.

  • In Spain, the TD and BU stress tests covered banks accounting for around 90 percent of total system assets. Initial concerns had been with some medium-sized and smaller banks rather than the largest, most systemic banks. However the slow deterioration in sentiment over a prolonged period and constant revelations of new problems eventually affected perceptions of the entire banking system. In the end, the inclusion of both the largest banks and the smaller problem ones in both stress tests became necessary in order to differentiate the strong institutions from the weak ones.

Scenario design

The selection of adverse macroeconomic scenarios in crisis stress tests represents a delicate balance between the need to be credible yet constructive. As a principle, stress scenarios should capture extreme but plausible shocks, i.e., the tail risks for the financial system (BCBS, 2009; IMF, 2012a). However, while this principle should always be applied in stress tests for surveillance purposes to support discussions on supervisory actions and crisis preparedness (Jobst and others, 2013) and in regular supervisory stress tests, it needs to be more nuanced in a crisis situation.

In a crisis stress test, the adverse scenario should reflect the uncertainty around the baseline. Crises are typically already tail risk events in themselves. In some cases, they may even be labeled “black swan” events at the outset, as some have argued is the case of the current global financial crisis (e.g., Helmore, 2008; Skidmore, 2008), although the prolonged accumulation of economic and financial imbalances may be obvious in hindsight. In such an environment, banks may already be under severe stress. In other words, the point of the cycle at which the shock is applied matters. Consequently, any implementation of further “tail of the tail” shock scenarios that would hypothetically obliterate an entire banking system would obviate any constructive planning of needed follow-up action(s) by the authorities and the banks themselves. Borio and others (2012) argue that it is easier to identify relevant scenarios for stress testing purposes after a crisis has erupted as the system “does not need to be shaken so hard to reveal weaknesses.” Rather, a key consideration in the scenario design at that stage is that the crisis stress test should be able to differentiate across institutions, as a first step towards determining whether capital injection, some other form of balance sheet restructuring or resolution is required.

The evidence from the crisis case studies suggests that the magnitudes of the macroeconomic shock scenarios per se are not an overriding element for success. The CEBS/EBA stress tests have been derided for the apparent mildness of their adverse growth stress scenarios, among other things, contributing in part for their lack of acceptance (e.g., Ahmed and others, 2011; Campbell, 2011; Jenkins, 2011; Steinhauser, 2011; IMF, 2013a). However, a closer examination of the other crisis stress tests suggests that this argument may be flawed:

  • The CEBS 2009 and 2010 and the EBA 2011 exercises applied cumulative growth shocks averaging 1.9, 1.3 and 1.5 standard deviations from their respective baseline growth scenarios (Box 1), with attendant shocks to other macroeconomic variables. However, the test results did not gain wide acceptance.

  • What is not commonly known is that the adverse growth scenario used in the SCAP was even less stressful than any of the CEBS/EBA shocks. It was equivalent to a cumulative one standard deviation from the baseline over the two-year risk horizon, determined well before the contraction had bottomed out (Figure 7). Indeed, the SCAP stress scenario was criticized by some at the time the results were announced for likely being closer to the actual baseline itself (e.g., Fox, 2009). Yet, the SCAP was effective in regaining market confidence.

    Figure 7.
    Figure 7.

    United States: Baseline and Adverse Growth Scenarios for Crisis and Supervisory Stress Tests

    (In percent, quarter-on-quarter annualized)

    Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

    Sources: Fed; and authors’ estimates of annualized quarterly growth profiles for both SCAP 2009 scenarios and the CCAR 2011 baseline scenario.

  • Similarly, the growth shocks applied to both the Spain TD and BU stress tests were equivalent to one standard deviation from the projected baseline, while Ireland’s PCAR 2011 used the EBA scenarios.

The selection of macroeconomic parameters in the scenario design does not appear to significantly influence the credibility of a crisis stress test either. The SCAP was parsimonious, with three (real GDP growth, the unemployment rate and house prices), while the Ireland and Spain stress tests employed more than a dozen different ones (Table 6).

Table 6.

Crisis Stress Tests: Macro-financial Parameters Scorecard

article image
Sources: CBI; EBA; Fed; IMF; Oliver Wyman; and Roland Berger.Note: Even though some variables (e.g., commodities, CDS, securitized assets) were not provided as part of the general macro scenarios, they were used in the determination of key market risk drivers.

Included for completeness only—not intended as a crisis stress test; surveillance stress testing exercise was conducted in a crisis environment.

Information not disclosed.

Unlike the growth scenarios, and outside of some coverage of the real estate and employment variables, the projections for most of the other variables were generally less scrutinized.

Consistent with best practice, comprehensive coverage of material risk factors in crisis stress tests appears to be much more relevant for the reliability of the results (BCBS, 2009; Fed/FDIC/OCC, 2012; IMF, 2012a). The global financial crisis has brought to the fore risks which had previously been in the periphery or which had not been considered, such as exposures to sovereign and other previously low-default assets, their accounting in the banking or trading book, funding costs and cross-border exposures, among others (Jobst and others, 2013). The U.S. and European stress tests appropriately focused on banks’ domestic loan books (Table 7), which were the main concern of investors, but the exclusion of some important risk factors affected the credibility of some:

  • The EU stress tests have been vociferously criticized for their inadequate capture of important risk factors, owing in part to political economy constraints (see Wilson, 2011; Wishart, 2011). The failure to properly stress banks’ sovereign exposures was considered particularly egregious in light of the debt crisis and concerns about the bank-sovereign feedback loop (e.g., Ahmed and others, 2011; Das, 2011; Steinhauser, 2011). Specifically, the haircuts imposed on banks’ sovereign portfolios in the trading book during the EBA 2011 exercise were seen to have been too lenient as they only applied a market value adjustment rather than possible defaults, while the omission of any stress test of the banking book—where the majority of banks’ sovereign exposures resided—meant that the main risk factor at the time had not been adequately captured.

  • In Spain, concerns about lender forbearance and possible misclassifications in banks’ loan books were addressed in the BU exercise. Auditors and real estate appraisers were appointed to verify the quality of the input data. The issue of sovereign risk was omitted but was considered less of an issue owing to the availability of the LTRO facility from the ECB by the time of the stress tests. The liquidity support allayed market concerns about banks’ funding costs and possible deep haircuts to their sovereign debt portfolio as the pressure for banks to liquidate their holdings in the hold-to-maturity (HtM) banking book and realize the losses abated.

Table 7.

Crisis Stress Tests: Risk Factors Scorecard

article image
Sources: CBI; EBA; Fed; IMF; Oliver Wyman; and Roland Berger.

Included for completeness only—not intended as a crisis stress test; surveillance stress testing exercise was conducted in a crisis environment.

Information not disclosed.

The EBA conducted a confidential thematic review of liquidity funding risks.

Designing Crisis Stress Test Growth Scenarios

The CEBS stress tests popularized the notion of calibrating growth shocks in terms of the number of standard deviations from the baseline. This metric allows for a more standardized comparison across stress tests at a point in time and over time. For instance, the application of a large growth shock scenario to an economy that typically experiences large and volatile growth rates may be a less significant event than to one which consistently posts more moderate and stable growth. The CEBS method for determining adverse growth scenarios consists of the following steps:

The rule of thumb in the IMF’s Financial Sector Assessment Program (FSAP) scenario stress tests has generally been to apply two standard deviation shocks to growth (IMF, 2012a; Jobst and others, 2013), but calibrations have been necessary in crisis situations. For example, the Spain FSAP stress test imposed a “severe adverse” scenario of one standard deviation from the baseline GDP growth trend over a two-year risk horizon (IMF, 2012b). The shock came on top of a downward adjustment to the World Economic Outlook baseline forecast to incorporate downside risks to growth from the crisis plus a projected fiscal adjustment. In this scenario, most of the shock to the baseline growth (about two-thirds) was assumed to occur in the first year, attributable to a sharp decline in output, further declines in house prices and rising unemployment.

Viewed from another angle, the 2-year cumulative GDP shock for Spain under the severe adverse scenario was considered extreme by historical standards, as the actual outcome proved. The GDP drop in the first year of the risk horizon approximated the largest decline in economic activity since 1945 but represented a plausible “tail of the tail” risk under the circumstances (Box Figure 1). The third-party crisis stress tests subsequently increased the second-year stress and extended both scenarios to a third year. As it turned out, the growth in 2012—the first year of the risk horizon—approximated the projected baseline.

Box Figure 1.
Box Figure 1.

Spain: 30-year Average Annual Growth Rate and Stress Test Scenarios

(In percent)

Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

Sources: Oliver Wyman (2012a); World Economic Outlook; and authors’ estimates.

A corroborating method to gauge the extremity of a proposed shock scenario is to determine its deviation from the long-term historical average, in standard deviation terms. In the Spain example, the adverse shock scenario extended beyond 3 standard deviations of the mean annual growth rate of the past 30 years; it exceeded even the sharp contraction experienced in 2009 and was designed to be more protracted.

Another aspect of crisis stress testing is the standardization of assumptions and not just the assumptions themselves. Crisis stress tests tend to be more constrained in the assumptions that are employed, in order to facilitate comparisons. That said, absolute standardization is not necessary for credibility. To date, all crisis stress tests have imposed consistent macro scenario(s) on all banks within a particular jurisdiction, but behavioral assumptions (i.e., assumptions with regard to factors that management control) have been allowed to vary, typically with cross-checks by another party to ensure their reasonableness. Ultimately, what has been more important is the publication of information relating to those assumptions so that market participants are able to replicate the results to their own satisfaction (see discussion below on communication):

  • In the SCAP, the U.S. supervisors provided assumptions for the macroeconomic scenarios. Banks were asked to adapt the assumptions to reflect their specific business activities when projecting their potential losses and resources for absorbing those losses; supervisors then reviewed and assessed the firms’ submissions and the quantitative methods that were used to project those losses and resources, as well as the key assumptions (Fed, 2009a and 2009b). To facilitate horizontal comparisons across firms, supervisors applied their own independent quantitative methods to firm-specific data.

  • The EU CEBS/EBA stress tests applied macroeconomic and sovereign shock scenarios and parameters developed by the ECB. Very detailed and prescriptive guidance on assumptions and methodologies were provided for the EBA 2011 exercise (EBA, 2011c). Banks’ calculations were reviewed and challenged by the respective national supervisors, then analyzed by the EBA, which conducted in-depth consistency checks and challenges with national supervisors.

  • Ireland’s PCAR 2011 incorporated many of the parameters used for the EBA 2011 stress test. A private consultancy firm was contracted by the CBI to provide oversight and to challenge the work of the third-party stress tester and to ensure consistency across institutions and portfolios (CBI, 2011).

  • The Spanish stress tests used the growth scenarios and guidelines provided by a Steering Committee comprising the authorities, the Troika and counterparts from two European central banks. The process and methodology for the BU exercise were closely monitored and agreed upon with an Expert Coordination Committee from the Troika, the EBA and the authorities (Oliver Wyman, 2012b).

Crisis stress testing has also placed the spotlight on the modeling of revenues and losses. While stress testing for losses has typically been to map macro-factors onto the various risk factors that drive the impairment parameters, the crisis has underscored the importance of adequately modeling losses for different categories of credit risk (e.g., various types of real estate, corporate sector, credit cards), geographic heterogeneity and a rapidly evolving macro-financial environment for which there has been no precedent. Separately, stress testing revenues—especially for stressed conditions—is largely seen to be a “black box” (Schuermann, 2012). Given the importance of projected pre-provision profits in determining banks’ loss absorption capacity in stress scenarios, the credibility of these estimates are key in the overall perception of any stress testing exercise.

Capital standards

The capital standards applied to crisis stress tests play a crucial role in their legitimacy, but evidence from the case studies suggests that some variability is acceptable. Countries would typically apply their existing capital frameworks. In this context, the differences in regulatory frameworks and thus difficulty in comparing stress test results across jurisdictions do not appear to be an overriding concern for markets, as long as the definition of capital is made clear in each case. Bernanke (2010) notes the importance of focusing not just on the levels of capital but also on the composition of capital (which is also consistent with Basel III) in a crisis stress test:

  • The U.S. authorities applied their existing capital framework. Banks were required to meet the T1 capital hurdle of 6 percent post-stress and the higher quality T1 Common Equity ratio of 4 percent post-stress. Basel I risk weights were used to calculate risk-weighted assets (RWA), providing transparency in this aspect of the stress test. Nonetheless, the authorities acknowledged in designing the SCAP that “no single measure of capital adequacy is universally accepted or would guarantee a return of market confidence” (Bernanke, 2009).

  • The EU, Ireland and Spain stress tests applied the existing Capital Requirement Directive (CRD) at the time (i.e., CRD II) to the calculation of capital. The Basel II risk weights–which are more opaque—were used to calculate RWA. That said, the capital definition deviated from that of the regulatory directive.

    • The CEBS 2009 and 2010 stress tests applied a T1 hurdle rate of 6 percent. The EBA 2011 stress test evolved in line with Basel III developments—it implemented a commonly-agreed upon definition of common equity capital (“EBA Core Tier 1 (CT1)”) and applied a post-stress hurdle rate of 5 percent, noting that a higher threshold than the legal minimum was “necessary in assessing the resilience of banks in adverse circumstances if credibility and confidence in the banking sector is to be restored” (EBA, 2011d).

    • Ireland imposed a hurdle rate of 10.5 percent for the baseline scenario and 6 percent EBA CT1 under stress (up from the 4 percent required minimum), plus an additional protective buffer.

    • The Spain TD and BU stress tests applied an EBA CT1 hurdle rate of 9 percent under the baseline scenario and 6 percent for the adverse scenario.

In a crisis situation, tensions may arise between microprudential and macroprudential objectives in determining the adequacy of capital buffers (IMF, 2013b). While concerns such as lender forbearance and loan misclassification should be taken into account, especially in instances where AQRs are not undertaken, requiring banks to hold very high post-stress test capital ratios (microprudential) to meet—sometimes unreasonable—market expectations (see Box 2) could lead to excessive deleveraging, forestall the issuance of new credit to the economy and exacerbate the economic downturn (macroprudential). The result could be a vicious circle of further deterioration in the asset quality of banks and consequently, further destruction of capital.

Instead, banks should build strong prudential buffers during good times so that they are in a position to reduce them during bad times in a manner that respects microprudential objectives. The design of the capital standards for the Ireland and Spain crisis stress tests applied this philosophy—banks were expected to maintain a high level of CT1 capital adequacy (which incorporated a buffer) under a central (baseline) case scenario, but were assumed to be able to draw on the buffer in the event that an extreme stress scenario were to materialize. During bad times, encouraging increases in capital levels rather than ratios could align both microprudential and macroprudential objectives.


Objective, Action Plan and Financial Backstop

Crisis stress tests provide the financial foundation for authorities to take necessary action(s) to restore stability to the banking sector. The ultimate overarching objective should be to ensure that the financial system returns to health and that the recovery is durable. Thus, any crisis stress test should be designed to meet a well-specified policy goal, accompanied by a comprehensive strategy to address the findings:

  • The former should not risk prejudging the final result—the underlying conditions of banks need to be determined first.

  • The latter must be in place at the time of the commencement of the exercise to avoid any uncertainty. It should comprise a clear action plan, and credible financial backstops against possible adverse findings must be at hand (see Schuermann, 2012). For instance, the revelation of a potentially large gap in bank capitalization with no market access would require other ready sources of funding to fill that capital need.

  • In some cases, the restoration of solvency may require a detailed roadmap for significant balance sheet and cost restructuring. Merely raising capital would be ineffective if cleaning up balance sheets is necessary for their repair (see Borio and others, 2012). Importantly, any restructuring should be carried out swiftly and, as much as possible, in ways that do not worsen sovereign debt burdens (Claessens and others, 2011).

  • In other cases, the resolution of non-viable banks may be necessary to ensure the future health of the system. Thus, having an adequate resolution framework in place to take the requisite action is also key to any successful outcome arising from crisis stress tests.

The Potential Impact of Capital Hurdle Rates for Crisis Stress Tests

In a crisis stress test, where the results may require follow-up capital action, the setting of capital hurdle rates is of significant import. Combined with the magnitude(s) of the applied shock(s), hurdle rates play a potentially crucial role in estimating any required recapitalization and consequently, in any decision to restructure or exit banks from the system. These could have far-reaching implications for capital raising and possibly the fiscal budget.

The recapitalization of banks based on stress test outcomes could affect their lending capacity if the hurdle rates are set too high. As a simple example (Box Figure 2), let us assume that a bank has (i) a pre-shock CT1 ratio of 9 percent; (ii) constant RWA; and (iii) to meet a required CT1 capital adequacy hurdle rate of 11 percent post-stress, which includes a buffer. Next, consider two stress test scenarios—a baseline and an adverse 1/:

  • Baseline (central case)

    • (a) Assume that under the baseline scenario, the bank’s CT1 ratio is reduced by 2 percentage points to 7 percent.

    • (b) The bank is expected to take capital action that would return the CT1 ratio up to 11 percent, i.e., an increase of 4 percentage points.

    • (c) In other words, the bank would have to “top up” its existing 9 percent CT1 ratio with another 4 percentage points up to 13 percent, in anticipation of the baseline scenario materializing.

    • (d) This means that the bank would have to hold a total capital adequacy ratio of more than 16 percent, once additional requirements to make up T1 and total capital are included, and even before taking into account possible items such as D-SIB or G-SIB surcharges.

    • (e) If the central case growth forecast is accurate and the bank’s CT1 ratio is indeed reduced by 2 percentage points, the bank would have a CT1 ratio of the targeted 11 percent.

  • Adverse

    • (i) Assume that under a severe adverse scenario, the tail shock sharply increases the bank’s projected losses and reduces its CT1 ratio by 6 percentage points to 3 percent.

    • (ii) The bank is expected to take capital action that would return the CT1 ratio back up to 11 percent, i.e., an increase of 8 percentage points.

    • (iii) In other words, the bank would essentially have to have a CT1 ratio of 17 percent (i.e., the existing 9 percent plus another 8 percentage points).

    • (iv) This means that the bank would have to hold a total capital adequacy ratio of more than 20 percent, once additional requirements to make up T1 and total capital are included, and even before taking into account possible items such as D-SIB or G-SIB surcharges.

    • (v) If the baseline scenario were to materialize, the bank would be carrying a CT1 ratio of 15 percent (i.e., 17 percent less the 2 percentage points impact).

Box Figure 2.
Box Figure 2.

Hypothetical Recapitalization Estimations

(In percentage points)

Citation: IMF Working Papers 2013, 178; 10.5089/9781484395615.001.A001

Source: Authors.

Private sector stress tests of the Spanish banking sector in 2011–12 applied similarly stringent assumptions. Their huge estimates of the recapitalization needs of the banks were presaged on projected losses of up to half, CT1 thresholds of up to 11 percent and a capital hole of up to €120 billion (Box Table 1). As it turned out, the baseline growth scenario for 2012 eventually became the actual outcome (Box 1).

Box Table 1.

Spain: Market Estimates of Bank Recapitalization Needs with Associated Hurdle Rates

article image
Source: IMF (2012b).
1/ In both scenarios, the absolute amount needed to “replenish” the capital may be lower if RWA decreases in line with the loan losses.

In these areas, the design and execution of crisis stress tests have varied across jurisdictions:

  • The U.S. authorities are generally perceived to have stood “wholeheartedly” behind their stress test results (Onado and Resti, 2011). The SCAP was designed and implemented to meet a clearly-defined policy objective with the necessary financial support.

    • The authorities explicitly noted that the aim of the SCAP was to try and change macroeconomic outcomes by ensuring that the largest banks had sufficient capital buffers so that they would remain well-capitalized and be able to continue providing credit and intermediation services even in an economic environment that was more challenging than anticipated at the time (Fed, 2009c; Tarullo, 2010).

    • At the start of the exercise, the authorities announced that banks needing to augment their capital post-stress test would be given one month to design a detailed plan and six months to raise the requisite extra capital, and that they would be bridged by the Treasury’s firm commitment to provide contingent common equity under the Capital Assistance Program (CAP) of the Troubled Asset Relief Program (TARP) in the meantime.

    • Clarifications (or “forward guidance”) by the authorities that the SCAP would not be used as a pretext for government takeovers of the largest banks, if nationalization was not necessary, provided support for their stock prices; indeed, the stock prices of SCAP banks outperformed the non-SCAP ones during the stress test period, possibly because it was unclear how the latter would traverse the financial crisis.

  • The contrast between the U.S. and EU crisis stress tests has been stark.

    • The clarity of the EU objectives improved only over time. The stated aim of the CEBS 2009 stress test was vague, with the authority initially noting that the exercise was being held in the context of supervisors’ regular risk assessment of the financial sector (CEBS, 2009b). In contrast, the objectives of the CEBS 2010 and the EBA 2011 exercises were explicit—to provide policy information about the overall resilience of the EU banking system for the assessment of banks’ resilience to adverse economic developments and to inform policymakers about the ability of banks to absorb those shocks (CEBS, 2010; EBA, 2011d).

    • Moreover, little guidance was provided on possible action plans and the availability of resources to back them. Follow-up measures to the CEBS 2010 stress test were left to individual national authorities to pursue. The tests failed to reassure the markets, especially when some banking systems subsequently came under severe pressure. The EBA 2011 exercise subsequently required banks showing capital shortfalls to present their plans to restore their capital positions and to implement remedial measures within 6 months. However, the European governments could not reach any agreement prior to any of the CEBS or the EBA stress tests and could not provide any collective financial backstop for the results.

  • In Ireland, the PCAR 2011 was undertaken following the government’s request for financial support from the Troika (see Department of Finance—Government of Ireland and CBI, 2010). The government had requested an IMF arrangement under the Extended Fund Facility for a period of 36 months in the amount of €22.5 billion, in addition to €45 billion from the European Financial Stabilization Mechanism (EFSM)/European Financial Stability Facility (EFSF) including bilateral loans and own resources, in November 2010. The stress test formed part of the agreed reforms of the domestic banking sector under the Financial Measures Program, the banking element of this package.

  • Nowhere was the difference between having a clear objective and action plan in place and not having them more obvious than in the case of Spain. Markets remained unconvinced following the release of the results from the TD stress test in June 2012. The exercise had been undertaken to obtain an “overall figure” for the recapitalization needs of the Spanish banking system as a precursor to a more granular evaluation of individual bank portfolios as part of its request for EU assistance (Ministry of Economy and Competitiveness and BdE, 2012). However, it was not accompanied by any specific details on a financial backstop or follow-up action to address the problems in the banking sector. Sentiment only firmed upon the actual signing of the MoU with the Eurogroup in July, under which financial assistance to the banking sector would be provided through the European Stability Mechanism (ESM). The aim was to use the subsequent BU stress test to identify institutions that needed to be restructured and to require concerted reforms of the banking sector as key conditions for financial support.

Disclosure of technical details

Transparency is an indispensable requirement of any crisis stress test. Peristian and others (2010) posit that investor uncertainty about the condition of banks during a crisis stems from several sources. These include concerns as to how banks account for losses and their true capital adequacy going forward; the capital standard that regulators would apply to a bank; and how the government would deal with insolvent banks, i.e., whether it would nationalize the banks and wipe out the value to investors or whether it would inject capital and mitigate investors’ losses. Goldstein and Sapra (2012) argue that a key part of the supervisory disclosure on stress tests is to hold supervisors accountable for their actions ahead of time about (i) what is needed for firms to meet the test requirements; (ii) what firms that do not meet the requirements would be expected to do; and (iii) what steps supervisors would take with those firms.

The public nature of crisis stress tests is premised on the desire to improve transparency into the health of individual banks and that of the banking system as a whole. The severity of the global financial crisis has been attributable in part to bank opacity—excessive risks taken by banks were not adequately disclosed and markets could not distinguish the healthy banks from the weak ones during the crisis (Peristian and others, 2010; Goldstein and Sapra, 2012). Hence, detailed, quality disclosure of bank-specific information from any crisis stress test is crucial as it will allow investors and counterparties to understand the risk drivers for each institution, improve market discipline and reduce the risk premia charged (Pritsker, 2010). The actual substance of the information should enable the market to do its own assessment of the scenarios, assumptions and the resulting outcomes at the bank level (Bernanke, 2010; Schuermann, 2012). The double-edged sword is that the disclosure of stress test results, if not properly designed, may actually create panic by introducing more noise (Goldstein and Sapra, 2012).

The public disclosures surrounding the SCAP are considered to be one of the main reasons for its success. By assessing the overall needs of the U.S. financial system and the specific needs of individual banks, the exercise provided valuable information to market participants (Hirtle and others, 2009). Peristian and others (2012) investigate the information value of the SCAP and find that the supervisors’ comprehensive assessments of each bank’s estimated losses and capital needs under the adverse scenario produced information about the banks that private sector analysis did not already know. The up-front transparency with regard to the availability of the financial backstop provided the necessary reassurance against the findings.

The EU exercise further demonstrated the importance of disclosing information relevant for addressing market concerns. Although bank-by-bank results were published in the CEBS 2010 and the EBA 2011 exercises, investors were skeptical about the failure to adequately stress banks’ sovereign exposures in the banking book, as discussed above, and consequently remained unconvinced about their health. In particular, the disclosures of the EBA 2011 stress test results were richly documented and included details on the sovereign bond portfolios of individual banks as well as their capital composition. However, the EU authorities lacked the mandate to require any follow-up action in these areas and were thus unable to allay market concerns without being able to provide clarity on this part of the exercise. It was not until the EU Capital Exercise 2011, when banks were required to disclose the requisite sovereign capital buffers against their exposures and to submit their recapitalization plans to reach 9 percent CT1 capital (EBA, 2011e), that market sentiment began to bottom out (Figure 3 and IMF, 2013a).

More generally, the effective crisis stress tests to date have published detailed information on certain aspects of the exercise (Fed, 2009b; CBI, 2011; Oliver Wyman, 2012b). Specifically, each has disclosed: (i) the stress test design and methodology and their implementation; (ii) macroeconomic, absorption capacity and loan loss assumptions; and (iii) individual bank results showing projected losses for portfolios categories considered most important by markets for a particular banking system (e.g., by loan type for the United States, Ireland and Spain), capital components and projected capital shortfalls (Table 8). Details of the stress test models have typically not been published, but markets have seemed satisfied by the detailed cross-checks and reviews conducted by the authorities or third parties.

Table 8.

Crisis Stress Tests: Disclosure Scorecard

article image
Sources: CBI; EBA; Fed; IMF; Oliver Wyman; and Roland Berger.

Included for completeness only—not intended as a crisis stress test; surveillance stress testing exercise was conducted in a crisis environment.

Some models published.

Very limited information disclosed.

Combined amount provided.

Only loss rates provided.

B. Other Important Considerations

Asset quality review

Reliable inputs are critical for the credibility of any crisis stress test. Thus, an AQR of banks’ portfolios should be undertaken ahead of the stress test, although the nature and extent of the AQR may differ depending on market perceptions of the reliability of the reported information and the conduct of the stress test. It should also ideally include a DIV exercise to ensure the completeness and accuracy of data and the veracity of related information technology and risk monitoring systems at banks.

In a crisis environment, an AQR would be a first step towards a comprehensive and detailed assessment of possible stress test buffers. It would help ensure that the input data are “clean” and thus facilitate more realistic loan loss estimates. An AQR typically comprises two important but different types of costs:

  • (i) the actual cost of running the exercise, which may require a third party contractor, plus possibly input from auditors and asset valuation companies; and

  • (ii) the cost of cleaning up the books first if significant inaccuracies in reporting (i.e., incorrect loan and/or non-performing loan classifications) and/or lender forbearance are discovered, through loss recognition of unviable loans via additional provisioning (which flows through profits to capital); this step would be taken ahead of the stress test which would then provide an estimate of potential additional capital needs under hypothetical adverse scenarios (Appendix II).

In contrast to Ireland and Spain, where variants of more comprehensive AQRs were conducted, the U.S. stress test applied a lower-intensity substitute. Supervisors addressed the market’s concerns about the quality of banks’ loan portfolios by using a more quantitative methodology:

  • Banks were instructed to estimate cash flow losses using a set of indicative loss rate ranges provided by the supervisors for specific loan categories.

  • The estimates were adjusted by granular, bank-specific information on factors such as past performance, portfolio composition, origination vintage, borrower characteristics, geographic distribution, international operations and business mix to benchmark indicative loan loss parameters (Fed, 2009a).

  • Reviews of the SCAP submissions by banks were subsequently conducted by experts in accounting and asset pricing and incorporated inputs from on-site supervisors.

Markets were reassured, banks were able to recapitalize and strengthen their balance sheets. The virtuous circle took hold as was the goal of the SCAP (see Hirtle and others, 2009): The largest banks could confidently continue to lend with the knowledge that the SCAP buffer would be adequate under adverse conditions, thus supporting economic recovery and consequently, the banks’ own asset quality. A possible reason for the market’s acceptance of the substitute to the AQR could be the credibility of the authorities’ review procedures and possibly perceptions of more reliable data quality in the first place.

One of the main shortcomings of the EU-wide stress tests has been the lack of any prior validation of banks’ asset quality. The EBA has recommended that national supervisors conduct AQRs on major EU banks ahead of the 2014 EU stress testing exercise, with the objective of reviewing asset classifications and valuations to help dispel concerns over deteriorating asset quality (EBA, 2013). The exercise will be coordinated with the planned balance sheet assessment of the Single Supervisory Mechanism (SSM) to be conducted by the ECB, in terms of its methodology and timing. The SSM exercise will consist of a comprehensive review of the banks that will fall under direct supervision of the ECB (see Constâncio, 2013).

Follow-up stress test(s)

Follow-up stress tests have been useful in consolidating the gains from crisis stress tests. In some cases, the former have evolved since their introduction during the crisis, more recently, with greater stringency and improved disclosure in some areas (Table 9). In the United States, follow-up supervisory stress tests to the SCAP have been conducted every year since. Separately, two other EU-wide crisis stress tests have been conducted since 2009 in efforts to regain market confidence in the region’s banking system.

Table 9.

European Union and the United States: Evolution of Publicized Stress Testing Exercises

article image
Sources: EBA; Fed; World Economic Outlook; and authors’ calculations.Notes:

Industry consensus growth forecasts applied for the SCAP exercise are the average of Consensus Forecasts, Blue Chip Economic Indicators and Survey of Professional Forecasters.

No specific numbers are provided for the CCAR 2011 and 2012 baseline growth forecasts; identical industry sources as SCAP are assumed.

All U.S. domiciled banking organizations are required to compute risk-based capital requirements using the regulatory capital definition (general-risk based capital rules, Basel I); none had entered a transitional floor period for RWAs as at 2011.

The U.S. T1 leverage minimum is 3 percent for banks with a composite Bank Holding Company Rating System (BOPEC) rating of “1” and for those that have implemented the Board’s risk-based capital measure for market risk; the minimum is 4 percent for all other banks.

CCAR 2012 also applies Basel III framework calculations, fully phased.

The DFA and CCAR are closely related, but with some important differences. The projections of pre-tax net income from the DFA exercise are used as direct inputs to the CCAR. The primary difference between the two is the capital action assumptions: the Fed uses a standardized set of capital action assumptions for the DFA; in contrast, BHC’s planned capital actions are incorporated in the CCAR to project post-stress capital ratios.

The U.S. supervisory stress tests have taken disclosure to another level and may yet set a new benchmark for market expectations. The supervisors implemented the CCAR in 2011 but chose to keep the results confidential, although an overview of the exercise and the stress scenario were published (Fed, 2011a). It coincided with a weakening in the financials stock index (Figure 2). Following the low-profile CCAR 2011 exercise, information on the CCAR 2012 and CCAR 2013, as well as the DFA 2013 stress tests were published in detail (Fed, 2011b, 2012a, 2012b, 2013a and 2013b). The disclosures included the frameworks, assumptions, methodologies and bank-by-bank results of capital ratios, projected losses by portfolio and type of loan and impact on the profit and loss account, and took into account banks’ proposed capital actions. The stock prices of financial firms appeared to benefit from the renewed transparency, having trended upwards since late-2011 while volatility has continued to moderate (Figure 2). From 2013 onwards, DFA stress tests are implemented alongside the CCAR (each with different capital action assumptions). The former requires annual and mid-cycle supervisory stress tests for systemically important financial institutions and the publication of those results.

The stress scenarios for the U.S. supervisory stress tests have been appropriately more onerous than that applied in the SCAP. The CCAR 2011 projected an adverse growth shock of 1.4 standard deviations from the projected baseline, while both the 2012 and 2013 stress tests assumed adverse growth scenarios of 2.5 standard deviations (Figure 7). In the DFA mid-cycle exercise, each bank will develop its own baseline, adverse, and severely adverse scenarios to best reflect its individual operations and risks; the banks are then required to publish the results of their respective severely adverse scenarios to help “promote market discipline and facilitate an understanding of the financial conditions and risks” (Fed, 2013c). The supervisors use the stress test results to require banks to calibrate their proposed capital actions to ensure that they strengthen their capital positions.

While transparency of stress tests is critical in a crisis, its costs may be more subtle during normal times and may require trade-offs. On the one hand, it would reduce opacity and instill market discipline. It could also be pre-emptive in terms of reducing uncertainty surrounding any public stress tests in future crises if markets get used to expecting that any adverse finding will entail appropriate follow-up action(s). On the other, as Schuermann (2013) observes, it could encourage banks to try and recreate the supervisory models rather than trace out their own risk profiles; or as noted by Sapra and Goldstein (2012), encourage banks to hold loan portfolios that generate good performance to pass the test, but which may not be beneficial for them in the longer-term; lead to over-reaction by markets ex post; or deter speculators from trading on their own views and market information, thus hampering the usefulness of that information for regulatory purposes.

Follow-up stress tests are also being planned by European supervisors. Given that Europe’s banking system are not yet out of the woods, these tests provide opportunities for the authorities to improve upon previous exercises and to solidify previous gains made in regaining market confidence. As noted above, another round of EBA stress tests of EU banking systems is scheduled for 2014 (EBA, 2013), the fourth since its introduction in 2009. Ireland has postponed the next round of the PCAR to 2014 H1, to take place before the EBA 2014 stress test. In Spain, the BdE has indicated that it will, going forward, include stress tests “internally and on a regular basis, in its supervisory arsenal” (BdE, 2013); next steps in this area will take into account the EBA’s recommended AQR exercise and the balance sheet assessment in the context of the SSM, as well as the EBA 2014 stress test.

Liquidity Stress Test

Liquidity stress testing has become an important risk management tool following the manifestation of unprecedented liquidity shocks to the global banking system during the crisis. However, the earlier crisis stress tests had eschewed liquidity risk and focused on solvency risk instead. Schuermann (2012) observes that the “dynamism” of liquidity positions, which are subject to rapid change, means that any snapshot at a particular point in time is unlikely to be informative by the time of disclosure. The positive outcomes of some of the solvency stress tests suggest that markets did not necessarily require supporting liquidity tests:

  • Liquidity risk has not been specifically assessed as part of the EBA stress testing exercises. However, a confidential thematic review of liquidity funding risks was initiated in 2011 Q1 to assess banks’ vulnerability in relation to liquidity risk. The EBA 2011 solvency stress test analyzed the evolution of the cost of funding connected to the specific financial structure of the banks in question, in particular, the impact of increases in interest rates on assets and liabilities, including that of sovereign stress on banks’ funding costs.

  • Likewise, liquidity stress tests were not conducted in Spain’s case. However, the funding costs in the solvency stress tests were assumed to increase with the proposed scenarios for the solvency stress tests.

  • Ireland’s Prudential Liquidity Assessment and Review (PLAR) 2011 has been the only crisis liquidity stress test implemented to date. It covered the four PCAR banks. The exercise set bank specific funding targets consistent with Basel III and other international measures of stable, high quality funding (CBI, 2011). The PCAR 2011 specified its constraints and parameters for funding costs and access to funds in line with the PLAR.

That said, it is unclear that supporting crisis liquidity stress tests would not have enhanced the solvency exercises, especially in Europe. Funding conditions for banks in the region remain impaired and the evidence suggests that market sentiment towards the banking sector and sovereigns only improved following the introduction of the LTRO and OMT facilities (Figure 3).

V. Comparing Crisis Stress Test Results with Restructuring Costs

There has been much confusion over the divergences between the capital shortfall of a bank estimated by a crisis stress test and its eventual recapitalization needs from any actual restructuring. In reality, the two exercises should not be expected to yield the same capital number given that they operate under vastly different assumptions. Rather, restructuring costs should be higher for the following reasons:

  • Foremost is that macroprudential stress tests should assume the Going Concern Principle. In other words, banks are assumed to operate as going concerns indefinitely and do not have to realize lifetime losses on their asset portfolios. This means that the banks are assumed to have the ability to hold loans to maturity, and stress test valuations are focused on projected cash flow credit losses related to borrowers’ failure to meet their obligations rather than on their liquidation values (see Bernanke, 2009). Since the results of crisis stress tests are used to help identify banks that may need to be restructured, standardization of scenarios and some key assumptions are necessary during this phase.

  • Any required restructuring after the initial crisis stress test would be a more bespoke exercise. At that stage, a thorough assessment of the identified banks’ books prior to any recapitalization would be necessary. It would typically entail the recognition of valuation losses (e.g., foreclosed real estate holdings or tax credits) or additional loan losses, which would also include some projections of future losses under stress, to determine an adequate capital buffer. Moreover, the banks’ non-core assets may have to be realized towards the cost of the restructuring effort, which could include selling off investment portfolios in their respective banking books at significant haircuts.

The Spain case represents a good example of how stress test numbers could differ significantly from estimated restructuring costs (see Lister and Goodman, 2012). The Fund for Orderly Bank Restructuring (Fondo de Reestructuración Ordenada Bancaria or FROB) recently noted three items on savings banks’ balance sheets that were not captured in the crisis stress tests (see Alba, 2013), namely: (i) compensation due for breach of insurance contracts; (ii) the fall in dividends from equity holdings; and (iii) differences between deferred tax credits and realized tax credits. The FROB’s explanation was that the nationalized institutions tend to be subject to more strenuous stress tests on these risk factors as they are required by the competition authorities to sell off their industry equity stakes and must mark them to market, while others have the option of keeping them on the books and are not required to recognize similar losses.

VI. Concluding Remarks

Stress tests have become an important instrument in supervisors’ crisis management toolkit during the current global financial crisis. They are based on the concept of using a microprudential exercise for addressing macroprudential risks through improved transparency and disclosure. Introduced by the U.S. authorities through the very high profile SCAP in 2009, crisis stress tests have since been used by other jurisdictions with varying outcomes. The impact and case study analyses employed in this paper suggest that the design of particular elements of a stress test is critical if it is to be used for systemic crisis management. Moreover, an appreciation of certain concepts and nuances is necessary if the tool is to be applied constructively and the results are to be properly interpreted.

Stress testing remains an art rather than a science, where expert judgment is indispensable. However, the use of stress tests for systemic crisis management has added other dimensions to this art form, notably:

  • The public nature of crisis stress tests means that they must be designed to withstand intense scrutiny. Therefore, certain elements of the design, such as the timing of the test, its governance, the objective of the exercise, the proposed action plan to address the findings and the nature of disclosure may necessarily have to be executed differently from what would be typical in normal supervisory stress tests.

  • Other elements of crisis stress tests must be sufficiently rigorous so that the results are convincing. These include the scope of coverage and the scenario design, although the latter need not necessarily be complex.

  • Crisis stress tests also require the support of other activities to enhance their credibility. Specifically, AQRs are vital for the reliability of the inputs, while follow-up stress tests to update markets on developments are important for consolidating previous gains. Separate liquidity stress tests to complement the solvency ones are also increasingly being employed, although not all are published.

  • Political economy could play a key role in determining the effectiveness of crisis stress tests. Given the potential economic and reputational implications of the findings, the design of these tests could be influenced by political economy considerations.

  • Finally, it would be remiss to discount the importance of luck in any crisis stress test. Its successful implementation may well require a dose of good fortune, e.g., in areas such as the actual health of banks, the timing of the exercise, market conditions and public receptiveness to the disclosures (see Dudley, 2011).

Ultimately, the lessons learned from our study suggest that country authorities must be fully committed if they are to undertake a crisis stress test. They must have a clear objective and take action once valuations have fallen to certain levels. At that stage, and before any crisis of confidence becomes firmly entrenched, they must be prepared to transparently conduct a thorough examination of their banking system, take necessary follow-up action(s) based on the findings and have the requisite resources to back them, if the exercise is to serve its purpose of improving sentiment towards the banking system. Otherwise, the effort would likely backfire and exacerbate the loss in market confidence, with potentially devastating consequences for the real economy. Many of the design features we have identified are also relevant to confidential supervisory stress tests under crisis conditions, except perhaps for the transparency considerations.

Crisis stress tests may also be heralding a new era in transparency. Prior to the global financial crisis, supervisory stress tests were conducted in utmost confidentiality. The very public U.S. crisis stress test was succeeded by supervisory stress tests that have since provided similar levels of disclosure. Other jurisdictions are not yet out of the woods and some appropriately continue to maintain or have even improved on the transparency of their follow up crisis stress tests. It remains to be seen if they will follow suit with their supervisory stress tests in the future when the situation improves. Opinion is divided as to the desirability of unfettered transparency of stress tests during normal times, but the bar for disclosure has been set high and markets may yet demand similar standards when the crisis recedes.

Appendix I. Case Studies of Crisis Solvency Stress Tests: United States, European Union, Ireland and Spain

Appendix Table 1.

Crisis Stress Tests: Features of Design

article image
article image
article image
article image
article image
Sources: BdE; CBI; EBA; Fed; IMF; and authors.

Included for completeness only—not intended as a crisis stress test; surveillance stress testing exercise was conducted in a crisis environment.