Chapter 3 Quality of FSAP Processes and Diagnostic Tools

International Monetary Fund. Independent Evaluation Office
Published Date:
May 2006
  • ShareShare
Show Summary Details

We discuss here the efficiency of FSAP processes and views on the technical quality of the FSAP teams before going on to discuss various components of the FSAP output—the macroprudential analysis, the standards and codes assessments, and how effectively the various diagnostic elements are integrated into a comprehensive overall assessment with clear and well-prioritized recommendations.

Efficiency of FSAP Inputs and Processes

Our in-depth reviews of 25 cases as well as interviews and surveys of officials and IMF and World Bank staff suggest the following main messages with regard to organizational aspects of the FSAP:

  • Country authorities generally rated the technical quality of the FSAP teams highly, particularly the expertise of specialists. Both our in-depth interviews with officials and the authorities’ survey results (see Figure 3.1) suggest a high degree of satisfaction with FSAP teams’ technical skills. A large proportion of officials we interviewed said that they viewed the opportunity to interact with the FSAP technical experts as a major value added from the exercise; indeed, many would have liked to have had more structured arrangements to follow up on specific issues with the experts concerned.

  • However, insufficient time for the FSAP team to prepare and familiarize with country-specific circumstances was a widespread complaint—noted by many authorities and, to a lesser extent, by the teams themselves. In a number of cases, greater consultation with the authorities at an early stage of the process (i.e., the TOR stage) would have provided guidance on the most relevant expertise.1 Interviews also suggest some shortcomings in the integration of the FSAP technical expertise with area department country-specific knowledge.

  • The burden of the FSAP on the authorities is inevitably very high, but could be eased somewhat by better planning. While we have not been able to obtain any specific estimates of the costs the FSAP imposes on the authorities, the resource inputs required have strained the capacity of even well-trained and well-funded supervisory systems, especially when extensive translation of documents into English was required.2 A large proportion of survey respondents were of the view that the time and data requirements of the exercise were excessive (Figure 3.2). The in-depth examination of the 25 country cases indicated that a large part of these costs were intrinsic to the exercise, and many officials recognized that the extremely intensive data gathering had eventually yielded benefits in terms of better data for macroprudential analysis or greater transparency about a country’s financial system and regulatory approaches.3 Nevertheless, the burden could be significantly eased by (a) better planning and consultation at an early stage to take account of country circumstances, leading to greater selectivity in information requests; (b) greater lead time on questionnaires and data requests; (c) greater personnel continuity from previous financial sector work and in any follow-up work; and (d) possibly preparing some of the ROSCs in advance of the main FSAP mission.

  • The choice of FSAP team leaders is critical. The team leader (and deputy) play a crucial role in identifying priorities for the assessment and integrating the results of the various diagnostic instruments into an overall assessment with clear, well-prioritized recommendations. This is an enormously challenging task for which considerable technical expertise and policy judgment is required. Our interviews with some IMF and World Bank senior staff indicate a concern that, as the FSAP becomes more “routine,” less-experienced team leaders have begun to be chosen, with a potentially adverse impact on quality (e.g., a tendency to follow a “template” approach without a deep understanding of the situation in each country).

  • From the IMF’s perspective, the preparation of separate documents (the FSSA and the FSA, respectively) for the Board of the IMF and World Bank appears to have helped minimize delays and the burden of tailoring the FSAP results to different institutional needs. However, the FSSA is better anchored in IMF processes (Article IV) than the FSA is in those of the Bank (see IEG report) and FSSA reports are produced with a significantly shorter time lag than the FSAs.4 A number of IMF staff emphasized that anchoring the FSSA in the Article IV surveillance process had imposed a clear timetable that had helped to avoid excessively drawn out discussions on details—both with the authorities and among the FSAP team.

Figure 3.1.Country Authorities’ Assessment of FSAP Team’s Technical Skills

(In percent)

Source: Q6 of the survey of country authorities.

Figure 3.2.Country Authorities’ Views on the FSAP Process

(In percent)

Source: Q7.1 and Q7.3 of the survey of country authorities.

Macroprudential Risk Analysis

A message from our interviews, reinforced by the survey results, is that in many, but not all, countries the FSAP has contributed significantly to assessing financial sector vulnerabilities—by helping to change the culture toward one that emphasizes system-wide risk assessments and, in many cases, upgrading methodologies. Within this overall positive experience, however, there are significant differences across countries, and several shortcomings need to be addressed.

The two main diagnostic tools used in FSAPs for analyzing macroprudential risks of financial systems are, first, stress testing how different measures of financial strength (e.g., capital adequacy and profitability) would respond to a variety of shocks and, second, analyzing trends in various financial soundness indicators (FSIs). The principal conclusions are as follows (see Annex 5 for more details):

  • The use of methodologies for stress testing at the level of the overall financial sector is still in its infancy. The degree of sophistication of approaches used varies substantially across FSAPs, depending in large part on data availability, cooperation with the authorities, time available for the analysis, and the judgment of the FSAP team (see Box 3.1 for some “good practice” characteristics encountered in the country reviews). But even with relatively “sophisticated” approaches, the results obtained can depend critically on how various shocks are calibrated and feedback effects modeled. In practice, data and other limitations constrained the use of stress testing to fairly basic approaches. For example, in almost half of the 25 cases examined in depth, the principal methodology for analyzing credit risk of the banking sector was based on a simple static exercise that assumed (relatively arbitrary) increases in levels of banks’ nonperforming loans together with assumptions on different provisioning levels. Even rudimentary tests can add value, especially when undertaken in conjunction with other analysis, but the limitations of such approaches need to be clearly flagged.

  • The reporting of results in most FSAPs tends to downplay these limitations and often reports the bottom line results from stress testing as if from a “black box” exercise. This often results in overly simplistic messages about the strength of the financial sector. Greater “health warnings” about the interpretation of results are needed, especially when the quality of the data is weak.

  • There is a considerable gap between the “good practice” approaches to modeling shocks and those used in many other cases. A number of recent FSAPs (including many for European countries, but also Chile) have generated a consistent series of shocks to specific macro variables derived from a macro model. In other cases, a series of shocks to particular variables have been aggregated but without being derived from a clearly defined, consistent macroeconomic model.5 Stress tests are supposed to analyze “exceptional but plausible” scenarios of shocks, but in only about half of the cases we examined was there an attempt to provide a clear rationale for the size and composition of the shocks chosen.

  • Some assessments have avoided analyzing the consequences of politically sensitive shocks (e.g., public debt defaults). While there is an understandable tendency not to rock the boat by focusing on such major potential adverse events, the result could be reassuring statements in the FSAP that the financial system is robust to a variety of milder shocks, leaving it to each observer to read between lines with regard to the larger shocks. This could lead to potentially misleading signals. One possible alternative could be to adopt an approach where certain types of shock are considered in certain situations—for example, when the consequences for bank balance sheets of specific downgrades in sovereign public debt are to be analyzed—but without creating uniform sets of shocks that preclude adaptation to particular country circumstances.6

  • There is still insufficient attention in many FSAPs to global and regional linkages, including for countries with substantial international capital market links. The evaluation’s rating on incorporation of global and regional risks shows that consideration of these risks for the 25 country cases has fallen short of good practice in a significant proportion of cases (e.g., about one-third of cases were assessed as having some problems in this respect; see Table 3.1). Moreover, the evaluation’s average internal rating in this respect was even lower (2.2 on the four-point scale) for those countries judged to be of global or regional systemic importance. The evaluation found generally little analysis of cross-border linkages capable of spillover effects even in some countries with global systemic importance (e.g., Japan and Russia). Also, the Ireland and Singapore FSAPs focus their analysis on the vulnerabilities of local banks or the foreign banks that have domestic operations. They make reference to the linkages of domestic banks with the international financial centers but do not analyze the risks of these centers in any detail, as it is assumed they do not belong to any country in particular.7

  • FSIs have generally not yet been used in a meaningful manner in most assessments (reflecting problems with data and interpretation of appropriate benchmarks for signaling vulnerability as well as inadequate time series). Although most FSAP reports include tables on FSIs, in only half of the 25 cases examined in depth did the reports provide some interpretation in terms of the risk implications of the figures. Since sector-wide averages may mask concerns with specific groups, for analyzing potential vulnerabilities aggregate indicators frequently need to be complemented by indicators for key peer groups within the banking sector (i.e., state banks, foreign banks, local private banks). More generally, interviews with area department staff indicated that many felt they lacked the necessary training and experience to interpret FSIs and integrate the analysis into ongoing surveillance work, even when the data were available on a consistent basis.

  • The quality of the data on the financial system is often not emphasized sufficiently. In some countries, more caution is needed before using available statistical data at face value, either for stress testing or other analysis.

Table 3.1.Results of the IEO Assessments of FSAP Content1
CriteriaMean Score

(On scale of 1–4)
Percentage of Ratings

Indicating Some Problems

(I.e., ratings of 3 or 4)
Extent of incorporation of regional and global risks into analysis1.8428
Balance of development and stability issues1.8816
Integration of standards and codes in overall assessment1.8420
Coverage of overall financial sector22.0020
Clarity and candor of findings1.8812
Importance and consequence well explained1.9420
Clarity of recommendations1.828
Usability of recommendations (e.g., specificity)1.9616
Prioritization of recommendations2.4644
Degree of alignment of FSAP and FSSA31.420

Box 3.1.Good Practices on Stress Testing

Stress testing is a method for quantifying the impact of future extreme but plausible shocks on a financial system. The degree of sophistication of approaches used varies substantially across FSAPs, depending in large part on data availability, sophistication of the financial system, cooperation from the authorities, time available for the analysis, and the judgment of the FSAP team. We summarize here a number of “good practice” approaches to different aspects of such tests, drawn from the 25-country example (see Annex 5 for further details).1

Data quality. The quality of data, and its implications, for any results should be described candidly; many FSAPs are weak in this respect (Cameroon is a “good practice” exception). There are some cases where the available data are of poor quality and where vulnerabilities are fairly obvious. Not conducting stress testing should always be an alternative in such cases, as otherwise there is a high risk of spuriously concrete results that mask an unknown situation (e.g., the Costa Rica FSAP appropriately did not undertake any formal stress tests).

Scenarios and events. Most stress tests have included single factor sensitivity analysis. The most recent vintages (e.g., Jordan and New Zealand and many European countries) have also included the use of scenarios that involve simultaneous movement in various macro risk factors. This is a positive trend, as such scenarious could help analyze better the vulnerabilities of the financial system

Calibration of shocks. The challenge is to be able to have a common understanding for what can be considered exceptional but plausible shocks. Where feasible, the calibration techniques could use models to characterize the relationships among macro risk factors in the context of different scenarios and/or cases in which single variables are shocked (by using statistical or historical approaches). For example, some recent FSAPs (Germany and Chile) have derived a consistent set of shocks to macro variables from a macro model.

Methodologies. While it is often necessary to tailor an FSAP stress test to data availability and the sophistication of the financial system, it would be useful to form “country peer groups” based on some criteria related to the complexity and sophistication of a financial system. Standardizing a core set of data sets, methodologies and sensitivity analysis within the peer group could lead to the development of common benchmarks for cross-country comparisons, thus facilitating vulnerability analyses. For example, for the group of industrial countries, stress testing should aim to move toward a common good practices set of methodologies.

Interpretation of the results. More attention needs to be given to the interpretation of stress test results, not only in light of the methodological caveats but also in terms of the relative importance of different shocks (e.g., avoid overemphasizing market risks when credit risks are more relevant from a vulnerability perspective). This is an area where many FSAPs are weak, but Korea and Cameroon are “good practice” examples.

1 Mention of a country’s FSAP with regard to one aspect does not necessarily mean it was “good practice” in other respects.

Standards and Codes Assessments

The evaluation reviewed only assessments of standards and codes prepared as part of FSAP exercises.8 Drawing on the 25 in-depth case reviews, interviews with country officials, and IMF and World Bank staff, as well as the survey results, the principal conclusions are as follows:

  • There is no evidence that the streamlining of the number of standards assessed in detail has created problems for the ability of FSAPs to make overall judgments on financial sector vulnerabilities, but the rationale for which standards to assess is not discussed sufficiently. A “scoping exercise” is first conducted to identify the set of standards to be assessed in each FSAP, but in the cases we reviewed there was often limited discussion in the TOR or the subsequent FSAP reports of why choices are made (e.g., why insurance or securities standards were covered in some low-income countries where the sector was very small in relation to GDP or why payments system standards were or were not covered). However, the evaluation did not identify any cases where omission of a detailed standards assessment had contributed to significant shortcomings in the overall assessment of potential vulnerabilities. The review of all post-2003 FSAP cases confirmed these conclusions.

  • While the assessments generally distinguish between de jure standards and de facto implementation, the significance of institutional weaknesses is often not emphasized sufficiently. In most of the 25-country sample, assessors did take account of differences between de jure laws and regulations and de facto implementation. Indeed, the assessment methodologies to some degree require making this distinction—that is, they require interpretation of compliance and evidence of enforcement or nonenforcement of various principles.9 However, while many FSAP reports do discuss problems in forbearance in regulations or low enforcement capabilities, it is sometimes difficult to read between the lines to judge the severity of the potential macroeconomic significance of such shortcomings (e.g., Dominican Republic, Sri Lanka, Tunisia). One “good practice” example where enforcement issues are explicitly linked to the vulnerability analysis is the Ghana FSAP.

  • Integration of the various standards assessments into an overall FSAP assessment does seem to have added value, but the degree of integration varied from case to case. The review of the 25 country cases suggests that incorporation of the standards assessments into a broader discussion of financial sector vulnerability and development issues did add value in many cases, especially for the banking sector standards.10 In some cases, however, a “head count” approach to listing performance vis-à-vis various principles was not accompanied by a sufficiently integrated discussion of the potential impact of various identified shortcomings (e.g., in the Egypt, Philippines, and Romania FSAPs).

  • A number of officials noted that an excessive focus on the “number” of principles for which a country was fully or largely compliant could give a misleading signal on the potential downside consequences of remaining gaps.11 Interviews with staff and authorities indicate that there were often greater disagreements on the ratings than on the underlying qualitative assessment. In its recent review of the Standards and Codes Initiative, the IMF Executive Board endorsed a number of changes to the presentation of ROSC findings, including a principle-by-principle summary of the observance of each standard and an executive summary providing a clear assessment of the overall degree of observance of the standard, while avoiding a rating or “pass or fail” report.12 While it is too early to judge the effect of such changes, this evaluation reinforces the view that the overall qualitative assessment and identification of key remaining gaps are the most critical elements and that the exercise should not be condensed solely to one of ratings, even if that is the aspect that market participants indicate that they value the most (see below).

  • The governance structure for assessing standards is a little vague, but present arrangements for providing feedback work satisfactorily in practice. The issue is—“who assesses the assessors?” In principle, this is the responsibility of the IMF and World Bank Boards for those standards assessed under the FSAP. In practice, members of the Board are not in a position (e.g., they are not provided with the necessary information) to make such judgments. The various standard-setting bodies, and their secretariats, do have the appropriate background but do not have governance responsibility for assessing whether the assessment exercises are proceeding satisfactorily; indeed, they do not even see those FSSAs that countries do not agree to publish. Nevertheless, discussions with the various standard-setting groups suggest that, in practice, there are sufficient informal and formal channels (including the Financial Sector Forum and IMF–World Bank staff participation in various technical committees) for adequate feedback to be provided on how assessments are being conducted. Our interviews with the various secretariats suggest a high degree of satisfaction with the results (see below).

For example, in several Eastern European countries looking toward membership of the European Union, cross-country experience on how a number of regulatory issues were dealt with in various EU members was of particular interest. However, some of the technical experts on the FSAP teams, while highly qualified, did not have the appropriate background to provide such information.

For example, in the case of Japan senior officials noted that at some stage in the process about 10 percent of the staff of the Financial Services Agency were involved in the FSAP exercise—at a time when there was strong need to attend to the financial sector’s difficulties.

For example, in Korea, senior officials expressed the view that the FSAP had very high data requirements (e.g., on cross-sectoral data for stress-testing analysis of the corporate sector), but these subsequently proved very useful for their continued assessment of the financial sector and exposures to the corporate sector.

The average time lag between the date of the first FSAP mission and circulation of the FSSA is just under 40 weeks; for FSAs, the average time lag is about 58 weeks; no FSAs for FSAP Updates had been completed as of end-August 2005.

In the case of Japan, problems associated with weak domestic “ownership” of the exercise also limited its usefulness. Agreement could not be reached on the nature of the tests to be undertaken and the authorities eventually asked that the data they had provided not be used. The FSAP team undertook at a late stage a number of tests using more limited data available from published balance sheets of banks.

For example, one approach could be to include in the group of shocks “tested” the impact on interest risk premiums from a downgrade in public debt by a prespecified number of grades whenever the initial credit rating is below a particular threshold. This suggestion would be consistent with the Basel II capital standards methodology, which proposes a system of risk weightings (at least for foreign-currency-denominated debt) based on the sovereign’s credit rating.

One “good practice” exception where cross-border links were explicitly incorporated is the New Zealand FSAP; in light of the substantial role of Australian-owned banks in New Zealand, the mission discussed the performance of these banks with Australian regulatory authorities.

See Annex 11, Table A11.3 for a full list of the various standards and codes. Those most often covered in FSAPs are the Basel Committee’s Core Principles for Effective Banking Supervision (BCP), International Association of Insurance Supervisors (IAIS), International Organization of Securities Commissions (IOSCO), and Committee on Payments and Settlement Systems (CPSS), which were the main focus of the evaluation; the banking supervision standards were assessed in all FSAPs. For a discussion of other standards and codes, see “The Standards and Codes Initiative—Is it Effective? And How Can it Be Improved?” (SM/05/252, July 1, 2005). As noted earlier, assessment of the AML/CFT standard was not part of the terms of reference of this evaluation.

See, for example, the Basel Core Principles methodology (available at

This conclusion is supported by the survey of IMF surveillance mission chiefs conducted as part of the internal review of the Standards and Codes Initiative. A high proportion of mission chiefs—especially for emerging market and developing countries—were of the view that the standards assessments had added value to the usefulness of the FSAP for surveillance. See “The Standards and Codes Initiative,” op. cit., Background Paper (SM/05/252, Supplement 1), Table 15. See also the assessment of the overall FSAP content in the next section below.

To give only one example, senior Egyptian officials noted that their efforts to strengthen further the supervisory framework initially encountered an internal reaction from many supervisory staff that the high proportion of “compliant” ratings under the BCP assessment meant that there was no need for substantial additional efforts.

See “The Acting Chair’s Summing Up: The Standards and Codes Initiative—Is It Effective? And How Can It be Improved?” (Buff/05/125, July 25, 2005).

    Other Resources Citing This Publication