News-based Sentiment Indicators
  • 1 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund
  • | 2 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund
  • | 3 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund
  • | 4 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund

Contributor Notes

Author’s E-Mail Address: chuang@imf.org, dulybina@imf.org, aroitman@imf.org

We construct sentiment indices for 20 countries from 1980 to 2019. Relying on computational text analysis, we capture specific language like “fear”, “risk”, “hedging”, “opinion”, and, “crisis”, as well as “positive” and “negative” sentiments, in news articles from the Financial Times. We assess the performance of our sentiment indices as “news-based” early warning indicators (EWIs) for financial crises. We find that sentiment indices spike and/or trend up ahead of financial crises.

Abstract

We construct sentiment indices for 20 countries from 1980 to 2019. Relying on computational text analysis, we capture specific language like “fear”, “risk”, “hedging”, “opinion”, and, “crisis”, as well as “positive” and “negative” sentiments, in news articles from the Financial Times. We assess the performance of our sentiment indices as “news-based” early warning indicators (EWIs) for financial crises. We find that sentiment indices spike and/or trend up ahead of financial crises.

I. Introduction

The (still growing) literature on financial crises has documented their frequency, costs, potential drivers, and even developed different theories to explain why and how financial crises can unravel. But anticipating financial crises remains a challenging task.

There is a large literature on early warning indicators (EWIs) for crises, described in Chamon and Crowe (2012). Kaminsky, Lizondo, and Reinhart (1998) proposed an early warning system focusing on the evolution of several indicators that tend to exhibit an unusual behavior in the periods preceding a crisis. When an indicator exceeds a certain threshold value, this is interpreted as a warning “signal” that a crisis may occur within the following 24 months. EWIs typically emphasize that crises take root in unsustainable macro-financial imbalances, but there is less understanding about the developments in news sentiment ahead of crises. Yet, these developments could potentially be informative in anticipating crises.

There have been efforts to develop and improve EWIs relying mostly on conventional economic data.2 The main goal has been to establish thresholds for relevant macro-financial variables, above which, crisis probabilities start to increase. To complement existing EWIs and crisis prediction models we develop a set of new “text-based” EWIs that capture sentiment in financial news. Our approach shares elements from the literature on “text-based” uncertainty measures but with two main differences. First, we rely on computational linguistics to construct our indices, and second, our main interest is in using sentiment indicators as EWIs.

Sentiment measurement is not new. Traditional approaches to quantify sentiment have primarily relied on surveys. More recently, there is increasing interest in extracting information from different types of text corpuses (e.g., financial news, central banks statements) using machine learning and computational techniques. In this paper, we apply computational linguistic methods to build sentiment indicators using a large “text” dataset, in order to capture sentiment in financial news.

Linguistically-determined words’ clusters capture sentiment. Unlike survey-based sentiment measures, and different from several “uncertainty” measures recently developed in the literature, we use “semantic clustering” techniques, as opposed to “lexical” or “bag-of-words” approaches (which typically rely on predetermined, and sometimes narrow, sets of words). Relying on word vector representation techniques, semantic clustering enables us to identify the appropriate set of words (e.g., semantic cluster) that capture a specific sentiment.

We evaluate performance of our sentiment indices as EWIs—an innovation compared to previous studies, which focus on contemporaneous correlations between sentiment and specific events (i.e., elections, wars, or geopolitical tensions), business cycles, or asset prices.

The rest of the paper is organized as follows. Section II describes the database and methodology used to construct 10 sentiment indices for 20 countries. Section III presents an assessment of our sentiment indices as EWIs, and section IV provides robustness checks. Conclusions are presented in section V.

II. Data and Methodology

This paper assesses whether changes in sentiment precede banking crises, as defined in Kaminsky and Reinhart (K&R, 1999). K&R identify several crisis episodes for 20 countries between 1970 and 1998. We focus on those episodes to assess whether sentiment from financial news spikes, or trends up ahead of crises. Table 1 reproduces Table 2 from K&R and lists all crises episodes for all countries in their sample.

Table 1.

Sample of Crises Episodes

article image
Note: Episodes in which the beginning of a banking crisis is followed by a balance-of-payment crisis within 48 mouths are classified as twin crises.Sources: American Banker, various issues: Gerald Caprio. Jr. and Daniela Kliugebiel (1996); New York Times, various issues: Sundararajau et al. (1991): Wall Street Journal, various issues.
Table 2.

Seed and Related Terms for Fear Language

article image
Note: Our database goes back to 1980 when the digital version of the FT did not exist, so old articles are scanned (or some other electronic copy of the original) versions of the actual (hard copy version) articles. Some words appear as “partial” words or “broken” words because our algorithm detects words that are not clear in the digital copy of an original (hard copy version) article but look like the words that are supposed to be found. For example: con_cerns, or concems are very likely to be the true word “concern”.

A. Data

The database used in this paper contains over 3 million news articles from the Financial Times (FT), daily, from 1980 to 2019. All articles are in English and cover most countries in the world. These news articles cover business, finance, and economics topics, hence an appropriate source of financial news to construct sentiment indicators. On average, this database provides about 6859 news articles per month, and, on average, about 48 articles per month for each of the 20 countries in our sample. This gives us enough material to build a monthly index for each country.

B. Methodology: Word Vector Representation and Semantic Clustering

We create “word vector representations” to represent words through vectors relying on a Vector Space Semantics methodology.3 This is done by mapping specific vocabulary items in high-dimensional space based on context probabilities (i.e., identifying words that tend to co-occur with a target word or term, and how often). Box 1 provides an intuitive summary of our methodology. We follow Mikolov et al. (2013a) and use a vector space model to triangulate the top “n” most similar terms to a set of seed terms for a semantic concept of interest (e.g. ‘fear’).4 Seed terms are specific words that could be associated with a given sentiment. For example, we use the word “danger” as one of the seeds to characterize “risk” sentiment.

Word Vector Representations, Semantic Clustering, and Sentiment

Creating “word vector representations” is like constructing a “map” of words. This map is represented in an n-dimensional space and constructed using each word’s context. The context of a word refers to its surrounding word(s), and the position of a word in the map depends on the probability of a context given that word.

The exact “coordinates” of a word in the map are “learned” from its context. Machine learning techniques use the context (2 words preceding, and 2 words following every single word in the corpus) to assign a unique location to each word in the map. Making predictions on the probability that other words are contextually close to a given word enables to find specific coordinates for that word in the space.

Semantic clusters are (linguistically-determined) groups of words associated to a specific word. Semantic clustering is used in computational linguistics to characterize or identify a given concept or issue of interest. Semantic clusters are groups of words that typically belong and are used together. For example, if one is interested in the concept of risk, instead of only focusing on the word “risk”, one can focus on the semantic cluster associated to the word risk.

Semantic Clustering enables to identify sentiment. We are interested in identifying sentiment in financial news. Instead of building an index based on the frequency of one specific word associated to a sentiment (e.g., fear), we use semantic clustering to better characterize a sentiment. For example, a set of words associated to the “fear” sentiment may be accurately describing it, without necessarily mention the word “fear”.

Typical vector dimensionality used in implementations is between 100 and 300. In our implementation, the window size used to compute term co-occurrence is 5 and the vector dimensionality is 200.The dictionary used in the implementation is given by words with a frequency of at least 50 in the FT corpus. Vector representations of words were computed using package “genism” in platform Python. The corpus used to train the vectors is given by a selection of text published in the FT between 1980 and 2018.

Vector space representations have been shown to efficiently summarize the semantic relationships between words in a corpus; and enable to measure semantic relatedness between any two given words. Word-vector models can computationally determine “semantic clusters” containing words that belong together. 5

Semantic similarity is determined by measuring the cosine of angle differentials between two word-vectors. For example, given the word “risk”, closely related words can be identified computing the distance (operationalized via cosine similarity) between the vector representing “risk” and the vector representations of all other words within the corpus.

For our purposes, a semantic cluster captures a specific sentiment, semantically related to some concept (e.g., risk), reflecting how that concept is used within the FT corpus.6 For example, the concept of “fear” and its associated cluster are different in the FT compared to the respective cluster in a corpus of movie reviews.

C. Abstract Term Clusters for Five Different Sentiments

To identify what types of sentiment could trigger an “early warning signal (EWS)” ahead of crises we focus on semantically related groups of words/terms rather than individual words. In this way, our findings relate to semantic concepts (e.g., “risk”) rather than to specific lexical items (e.g., recession). Box 2 outlines the steps to construct our indices.

Term clusters that are less specific and more abstract help identify sentiment indicators that are potentially more robust to time, type of crises, and country differences. We identify five types of language to capture the following sentiments: “fear”, “risk”, “hedging”, “opinion”, and “crisis”.

Intuitively, one could expect to find these types of language prior to swings in high frequency indicators like sovereign spreads or exchange rates.

’Fear’ Sentiment: To construct a “fear” sentiment index we consider terminology and language associated with the following “seed” terms: “fear”, “concern”, “afraid” and “anxious”. Table 2 below presents the top 15 terms/words most closely semantically related to each seed term in the “fear” sentiment cluster.

’Risk’ Sentiment: For the semantic class of words dealing with “risk” sentiment, the seed terms are: “warn”, “risk”, “threat”, “financial&hazard”, “financial&contagious”, “impact”, “financial&infect”, “terror” and “danger”.7

’Hedging’ Sentiment: To capture “hedging” sentiment the semantic class of words examined are those associated with the concepts of hedging and uncertainty. The seed terms for the hedging class are: “may”, “possibly”, “uncertain”, “maybe”, “can”, “perhaps”, “doubt” and “unsure”.

’Opinion’ Sentiment: Intuitively, one could expect “opinion” type of language and articles about a specific country to appear more frequently in financial news ahead of a crisis. The semantic class of words examined for the opinion sentiment index were those often used to express an opinion or belief. The seed terms for this class are: “say”, “feel”, “predict”, “tell”, “believe”, “think”, “suggest”, “decide”, “propose”, “advise”, “hint”, “clue”, “speak” and “announce”.

’Crisis’ Sentiment: To capture “crisis” sentiment, the class of specific terminology examined here was vocabulary used to generally denote or describe crisis events themselves. The seed terms for this class are: “financial crisis” and “depression”.

D. Negative and Positive Sentiment

We build semantic clusters to capture positive and negative sentiment in the FT. Using as seed words the lists of words identified in Correa et. al. 2018, we build our own semantic clusters based on the FT word vector representation.8

E. Aggregated Sentiment Indices

We also consider aggregated indices, combining specific sentiment indices into more comprehensive indices. A broader set of words may be better at detecting “anomalies” and triggering EWS at the cost of relatively lower interpretability. We combine the “fear”, “risk”, “hedging”, “opinion”, and “crisis” sentiment indices into one broader index (labeled: “All Sentiment (w/o pos. and neg.)”), by using the semantic clusters comprised in those 5 indices.

Similarly, adding the negative sentiment index to those five (“fear”, “risk”, “hedging”, “opinion” and “crisis”) indices, and labeling it “Negative Sentiment +”, we capture all sentiments considered in this paper, except the positive sentiment.

Adding the positive sentiment index to the “Negative Sentiment +” and labeling it “All Sentiment” we obtain the largest set of words encompassing all individual sentiments. This could potentially increase the likelihood of capturing any type of “anomaly” before financial crises and triggering an EWS at the right time. The underlying specific indices may help shed light on possible sentiment drivers of this (more) general index.

Constructing Sentiment Indices based on Semantic Clustering

There are two main steps to build a sentiment index relying on semantic clustering. First, build semantic clusters, and second, measure their frequency.

The steps followed to construct semantic clusters are the following: First, we build word vector representations using a large corpus of news articles (3,114,080 articles) from the FT between 1980 and 2019. This may be thought of as a semantic map of all the words in the corpus, to be used to quantify the degree of semantic similarity between any two words. Second, we select several semantic concepts that we think may be more prevalent than usual in financial news ahead of crises.

“Seed” words that characterize specific sentiment are: “fear”, “risk”, “hedging”, “opinion”, “crisis”. In addition, following Correa et. al. 2017, we use, as seeds, a predetermined set of negative and positive words to capture negative and positive sentiment in the FT. Those seeds enable us to obtain a larger set of relevant and semantically similar words which are specific to our dataset (Appendix 1).*

We construct country-specific indices by identifying articles associated with a specific country. If the name of that country appears in either the title, abstract, or first paragraph of an article, then we define the article to be about that country. For each country, the monthly frequency of each sentiment is extracted and normalized by the total number of words present in all articles for that month for that country. The resulting relative frequency for each word cluster is then treated as a proxy for the prevalence of the sentiment it represents over time. Specifically:

Indexij=#ofspecificwordsij#ofwordsij*1000

Where “ # of specific wordsij” refers to the total number of words within a semantic cluster (e.g.., “fear cluster”), in a given month (i), for a given country (j), and “ # of wordsij” refers to the total number of words for a given country (j) within all articles in a given month (i).

* In some cases, when we detected that the algorithm had associated non-economic words to a seed (or as part of the set of words associated to a specific seed) we used a “sentiment vector”. This is a vector that represents the concept of “fear”, “hedging”, etc, and is constructed adding the vectors corresponding to the appropriate (economic and financial) seed words.

III. Evaluating “Text-Based” EWIs

We assess whether our sentiment indices trigger EWS ahead of the financial crises identified in K&R. For each sentiment index, an EWS is triggered each time there is a spike, defined as follows: For every point in time (month) we calculate a backward looking average of 24 months and assess whether the index is above 2 standard deviations from the average.9 We find that, for each country in our sample, at least one of our indicators would have successfully anticipated most crises in a window of 24 months.10

A. Evaluation Metrics

To assess whether our sentiment indicators can be used as EWI ahead of financial crises, we rely on three measures that are typically used in pattern recognition, information retrieval, and binary classification: Precision, recall, and F-score. Precision is the percentage of instances predicted to be positive that were actually positive. In the context of crises, low precision means that an algorithm is classifying non-crisis episodes as crises.

Precision=TPTP+FP

Where TP means “true positive” and FP means “false positive”. Recall is the percentage of positive instances that were predicted to be positive. Low recall means there are crises that are not being detected.

Recall=TPTP+FN

Where FN means “false negative”. In statistical terms, absence of type I and type II errors corresponds respectively to maximum precision (no false positives) and maximum recall (no false negatives). Precision could be interpreted as a measure of exactness or quality, whereas recall a measure of completeness or quantity. There is a tradeoff between recall and precision. The higher the recall, the more instances will get classified as “crisis”, and the classifications less accurate (lower precision).

The F score is a measure of a test’s accuracy. It considers both the precision (P) and the recall (R) of the test to compute the score: P is the number of correct positive results divided by the number of all positive results returned by the classifier, and R is the number of correct positive results divided by the number of all relevant samples (all samples that should have been identified as positive). The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. We use the F2 score which weighs precision higher than recall (by placing more emphasis on false negatives).

F2=(1+β2)P*R(1+β2)P+R

True positives are those spikes (as defined above) which precede the start of a crisis. False positives are spikes which are not followed by a crisis start within a 24-months window. False negatives are given by those crises identified by K&R which were not preceded by a spike within a 24-months window. Signals that were triggered between crisis start and crisis peak (as defined in K&R) were counted neither as true positives nor false positives.11

When assessing and computing prediction metrics, we do not consider the full sample (1980–2018), because we know that after 1999 there are no more crisis in the K&R sample. For example, the financial crisis in Argentina in 2001, is not taken into account to compute the evaluation metrics. However, it is important to note that most of our sentiment indicators would have successfully triggered an EWS ahead of time (Appendix 3).

B. Benchmark index

To have a benchmark for our evaluation metrics we compare our sentiment indices to a random index constructed as follows: for each month, we randomly generate 20 seed words for each country, we build a time series from 1980 to 2019, and run the same evaluation process presented above. We repeat this process 100 times and use the average F2 score to compare to our indices’ F2 score (Figure 2).

Figure 1.
Figure 1.

Sentiment Indices Performance Compared to a Random Benchmark (24m window)

Citation: IMF Working Papers 2019, 273; 10.5089/9781513518374.001.A001

Figure 2.
Figure 2.

Crises (Start to Peak) and Crisis Sentiment Signals

Citation: IMF Working Papers 2019, 273; 10.5089/9781513518374.001.A001

Note: Blue horizontal bars represent crises windows (from start to peak) as identified in K&R (Table 1) for each country. Black dashes represent EWS

The overall average F2-score for the random benchmark is 0.39. All our sentiment indices perform better, and the broader the sentiment, the better the performance relative to the benchmark.

C. Prediction Performance

We developed 10 sentiment indices for 20 countries between 1980 and 2019. Our main results can be summarized in 7 points below. We find that:

1- On average (across countries) general sentiment indices (e.g., positive sentiment) perform better than specific sentiment indices (e.g., fear sentiment).

Table 3 presents a summary of the results for each sentiment index. At an aggregate level (as opposed to country specific) “Positive sentiment” as well as “All sentiment” perform best in terms of forecasting power (highest F2 score), followed by the negative and opinion sentiments.

On average (across all countries), EWS in all sentiments precede between 57 (F2 Score) and 90 percent (Recall) of the identified crises, and on average 23 percent of the EWS identified are true positives. Among the five specific indices, “crisis”, “opinion”, and “risk” sentiments display the highest recall, followed by “fear” and “hedging” sentiments (Table 3).

Table 3.

Summary Evaluation of “Text-Based” EWIs

article image

On average, aggregated results (for all countries in the sample) for the more general indices display better forecasting performance. Wider clusters improve the F2 score, though not necessarily the recall. This suggests that a richer and broader set of words can improve accuracy (precision) in triggering an EWS at the right time.

2- For a given country, some sentiment indices (e.g., hedging sentiment for Turkey) perform better than others (e.g., crisis sentiment for Turkey).

3- Some sentiment indices perform better for some countries (e.g., crisis sentiment for Mexico compared to crisis sentiment for Malaysia).

Tables 4 and 5 present the F2-score for all our indices for all countries in the sample.12 Among the 5 specific sentiment indicators (e.g., “fear”, “risk”, “hedging”, “opinion”, and “crisis”), their forecasting power for Brazil, Mexico, Sweden, and Turkey appears, on average, better (high F2 score) than for other countries. For Brazil, “fear” and “hedging” sentiments appear to be the best performers, whereas for Mexico and Sweden, “crisis” sentiment appears to perform better. In the case of Turkey, “hedging” is clearly better performing than all other sentiment indices. For Argentina and Bolivia, “opinion” language appears to perform well; and so does “crisis” language in the case of Finland.

Table 4.

F2 Scores for Specific Sentiment

article image
Table 5.

F2 Scores for Aggregated Sentiment

article image

4- At the individual country level, general sentiment indices (e.g., negative sentiment) do not always perform better than specific sentiment indices (e.g., crisis sentiment).

The five more general sentiment indicators (e.g., positive/negative sentiment, “All sentiment w/o pos and neg”, “Negative sentiment +” and “All sentiment”) appear to have better forecasting power for Brazil, Mexico, and Turkey (high F2 score) than for other countries. For Brazil, positive sentiment appears to be the best performer (possibly capturing excess of optimism, asset/financial bubbles, and/or unsustainable imbalances ahead of crises). For Mexico and Turkey, even though all indices appear particularly strong, “Negative sentiment +” and “All sentiment” appear to perform better than the rest. Slightly less strong, but still high, is “All sentiment” for Indonesia and “Negative sentiment” for Sweden.

Interestingly, for Mexico, Finland, and Denmark, “Crisis sentiment” appears as the best performer across our 10 indices for Mexico. For Norway, “Risk sentiment” appears as the best performer.

Chile, Philippines, Spain, and Uruguay experienced crises at the beginning of the 1980s. Given that: (i) our dataset starts in January 1980, and (ii) our methodology relies on a 24-months rolling z score, our dataset does not allow us to assess whether EWS preceded the start of those crises.13 However, for the cases of Spain, Philippines and Uruguay, our “crisis” sentiment index would have triggered EWS ahead of the respective peaks of these crises (Figure 1, and Appendices 2 and 3). Similarly, for the second crisis in Philippines (July 1997), sentiment indices trigger a signal right after the start of the crisis, but ahead of the peak (Figure 1, and Appendices 2 and 3).

5- Our crisis sentiment index, for most countries, triggered EWS at the right time.

Figure 1 shows that the “crisis” sentiment index would have triggered EWS ahead of some large crises. Specifically, it would have captured the 1995 and 1999 crises in Brazil, the 1987–89 crisis in Denmark, the 1991–92 crisis in Finland, the 1983–84 crisis in Israel, the 1985, and 1998–99 crises in Malaysia, the 1982–84 crisis in Mexico, the 1988–91 crisis in Norway, the 1983 crisis in Peru, the 1992 crisis in Sweden, the 1984 crisis in Thailand, and the 1991 crisis in Turkey.

EWS triggered after crises’ starts, would still have been effective in flagging crises ahead of the (crisis) peak. For example, for the Philippines, even if there was no EWS ahead of the start of the 1997 crisis, our “crisis” sentiment index would have triggered seven (almost consecutive) EWS during Q3-Q4 1997, and in Q1 1998.

6- Considering a battery of different sentiments increases chances of detecting crises.

It is useful to consider a set of sentiment indicators (not just one). Sentiment ahead of crises may very well vary across time, countries, types of macro imbalances, and types of crises. Considering a battery of different sentiment indices, or aggregate indices (comprising more than one sentiment) increases the chances of detecting spikes in sentiment, at the right time.

Appendix 3 shows time series for all our sentiment indices, for all countries, between 1980 and 1999. We find that most of our indices trigger EWS ahead of severe financial crises. For example, for Turkey 1991, the “hedging” sentiment index trigger 5 strong EWS ahead of the start of the crisis, the “fear” sentiment 4, and the “crisis” sentiment 2. So, depending on the crisis, the country, and the sentiment index, some indicators could be more successful than others, both in terms of number of signals and timing.

The number of EWS that are triggered ahead of a crisis can vary across indices for a given country. For example, in the case of Thailand, many of our indices triggered EWS ahead of the peak of the crisis in the late 1990s. But for the Brazil 1994–96 crisis, the “risk” sentiment index would not have triggered an EWS, whereas the “crisis” sentiment would have successfully flagged the crisis. For the 1999 crisis in Brazil, both indices would have flagged several EWS ahead of time. In the case of Israel, the “crisis” and “positive” sentiment indices would have been more accurate in triggering an EWS “at the right time” compared to the “risk” and “negative” sentiments which also triggered EWS but perhaps too early (Appendices 2 and 3).

7- Our sentiments indices performed well in recent crisis (or near crisis) episodes.

Many of our sentiment indices triggered EWS ahead of recent crisis (or near crisis) episodes. For example, “crisis” and “negative” sentiment indices in Argentina would have triggered EWS in an 18 months window prior to the 2018 currency crisis (Box 3). In Brazil, “negative” (and other) sentiment indices triggered EWS ahead of the Lava Jato corruption scandal in 2014, and ahead of President Rousseff’s impeachment in 2016. For Turkey, our “negative” sentiment index triggered EWS ahead of the coup attempt in 2016 and reached an all-time high (also triggering an EWS) in early 2018, ahead of the market pressures that hit the country later that year (Appendices 2 and 3). In Spain, our “crisis” index would have triggered six EWS ahead of the Global Financial Crisis (GFC).

Negative Sentiment in Argentina in 2018 and 2019

Our “negative sentiment” index proved to be an effective EWS of past crises in Argentina. Sharp movements in indices—crossing a threshold of 2 standard deviations above the previous 24 months average—are a good early warning indicator of severe crises. For Argentina, they would have successfully predicted 2 financial crises in the 1980s, the severe recession in 1998, the 2001 crisis, and the currency crisis in 2018.

The index performed well as a leading indicator of financial stress in 2018 (chart). It increased sharply in April, ahead of the International Monetary Fund (IMF) Stand-By Arrangement (SBA) request in May, while the sovereign spread, was less informative, remaining broadly stable up to the crisis. It then increased by almost 30 percent between June and

August, ahead of the request for an early release of funds within the IMF SBA, and ahead of a 30 percent depreciation in September.

uA01fig03

Argentina, Negative Sentiment and Sovereign Spread

(Jan 201S-May 2019)

Citation: IMF Working Papers 2019, 273; 10.5089/9781513518374.001.A001

After falling at the end of 2018, the index increased again between January and April 2019. After dropping temporarily in Q4 2018, the negative sentiment was 11 percent higher in April 2019 than during its peak before the program request (April 2018). The jump likely reflects an increase in inflation and inflation expectations, and a depreciation of the peso.

Although not the focus of this paper, it is worth noting that sentiment varies over time and across countries. Appendix 3 shows that the level of most sentiment indices is higher after the GFC. This could be due to several factors, including a somewhat permanent shift in FT language, possibly reflecting persistent concerns about crises and vulnerabilities after the GFC. Secular stagnation concerns might also explain the permanent increase in the use of negative language after 2010 (Appendix 3).

IV. Robustness Checks

A. Economic term clusters

Following Baker et al. 2015, we selected economic words/terms (e.g., fiscal deficit, recession, etc.) that could potentially be associated with pre-crisis sentiment, to assess whether their frequency increases ahead of financial crises, possibly indicating increases in risks and vulnerabilities ahead of a crisis.

The terms identified, with some exceptions, do not appear to display spikes ahead of crises, suggesting that the focus should be on word/term clusters (as opposed to narrow economic words/terms). In addition, lexical terms associated with pre-crisis sentiment will likely vary depending on the country affected, the type of imbalances, the specific transmission channels, etc., which would in turn be related to country specific underlying economic vulnerabilities, and the type of crisis.

Broader sentiments as the ones we identify in this paper show better performance compared to predetermined and isolated economic words/terms (Tables 3 and 6).

Table 6.

Results Selected Economic Terms Clusters

article image

B. Different Time Windows

Our results are sensitive to different time windows after an EWS is triggered. This is true for all our indices and the random benchmark. Given an EWS in any given month, the longer the window considered after that EWS is triggered, the more likely it is for that window to include a crisis.

Figure 3.
Figure 3.

Sentiment Indices Performance Compared to a Random Benchmark

Citation: IMF Working Papers 2019, 273; 10.5089/9781513518374.001.A001

The same is true at the country level, despite of some variation. Longer windows improve the F2 score (Table 7).

Table 7.

F2 Scores for Selected Seed Terms for All Countries

article image

C. Different Weights within Clusters

Given that each seed word in each of our indices is associated with a cluster of words, we consider weighting each of those words by its relative proximity to the corresponding seed to assess whether this would affect forecasting power. Table 8 shows that for a selected set of seeds using weights does not materially change the results.

Table 8.

Results for Selected Seed Words for Alternative Time Windows

article image

D. Topic Modeling

As an additional robustness check, we use topic modeling as a “purely machine-driven” (without human judgement) approach to select relevant topics, prior to selecting seed words. The purpose of this exercise is to explore an alternative approach to building sentiment indices, and assess whether it performs better than semantic clustering.

The following process is used to construct these “topic-based” indices is the following: First we let the machine select a number of topics at each point in time (i.e., month). Second, we focus on the set of (30 closest) words associated with each of those topics. Third, we use each of those words as a “seed”, and fourth we select 10 words associated with each of those seeds.14 We build an index value for the associated words’ clusters in the same way we did for the semantic modeling approach. Intuitively, this is a mechanical (no judgement involved) way to select a set of words that are associated with a specific topic (as opposed to a specific sentiment or semantic terminology). In general, the results (mainly F2 score) for selected topics are slightly worse compared to sentiment indices (Tables 3 and 9).

Table 9

Summary Results for Selected Topics

article image

Table 9 shows that, on average, significant increases in selected topics frequency preceded between 49 percent (F2) and 79 percent (Recall) of crises, and about 20 percent of the signals identified were true positives. In terms of individual topics, “Regulatory” and “Financial Analysis” display the highest F2 scores, followed by “Investment”. All topics have relatively high recall, and much lower precision, in line with the sentiment indices, but overall performance appears slightly worse compared to the sentiment indices’ performance.15

V. Limitations and Extensions

A. Limitations

Time varying language. Sentiment, and association between words and topics, as well as frequency of words and topics may be time varying, and hence indices that rely on a time invariant set of words could miss relevant shifts in sentiments over time. Despite this potential shortcoming, the performance of our indices is reasonably good.

B. Extensions

News Sources. Country-specific news sources could eventually improve accuracy and forecasting power (increase our precision metric) as the composition of (words within) the sentiment clusters could potentially change.

Crises Dates. Using alternative sources for different crises dates could expand the “training” sample beyond 1999. This would add more (crisis) events, though outcomes may or may not improve. The set of identified words for each of our indices seems to have a reasonable forecasting performance for the types of crisis identified in K&R (for the countries in our sample) and even ahead of recent crises, or extreme economic stress events. Different crises definitions may affect the composition of words’ clusters relevant to “predict” such type of crises, and may also differ from the ones we developed in this paper.

Other Events. It could be interesting to assess how (much) sentiment shifts ahead of other country specific and/or global events and shocks. Different types of events that one could consider are not necessarily crisis events, but rather large depreciations and recessions following sizeable terms of trade shocks (i.e., Russia 2014–15).

Country coverage. Country coverage could be expanded but relevant events to “test” forecasting power might need to be rethought for countries which do not suffer recurrent systemic crises. This is especially the case for advanced economies, which do not experience frequent episodes of severe crises.

VI. Conclusion

We construct 10 new sentiment indices for 20 countries, from January 1980 to May 2019, using Financial Times’ news articles. We find that our indices contain useful information and show that sentiment spikes and/or trends up ahead of financial crises.

Predicting crises is inherently challenging because these are rare events, and data coverage are often scarce. We believe that this new dataset can contribute to expand available EWIs and can be useful to researchers and policy makers for multiple applications.

First, the fact that our sentiment indices tend to spike ahead of financial crises (e.g., Brazil 1999) or periods of severe economic stress (e.g., Turkey 2018) suggests that our “news-based” indicators could potentially improve performance of traditional forecasting models.

Second our sentiment dataset could be used to examine potential similarities and/or differences in cross country sentiment, and subsequent economic outcomes, when there are common global shocks.

Third, the relatively broad country and time coverage allows to tackle issues so far not explored because of data limitations such as the potential role that sentiment could play in affecting exchange rate markets, capital flows’ swings, or whether sentiment in financial news precedes movements in high(er) frequency financial variables like sovereign spreads (e.g., Argentina in early 2018), CDS, or interest rates.

References

  • Aldasoro, Iñaki and Borio, Claudio E.V. and Drehmann, Mathias, Early Warning Indicators of Banking Crises: Expanding the Family. 2018. BIS Quarterly Review, March 2018.

  • Baker, Scott R., Bloom, Nicholas, and Davis, Steven J., 2016. Measuring Economic Policy Uncertainty.” Quarterly Journal of Economics, vol 131(4), pages 15931636.

    • Search Google Scholar
    • Export Citation
  • Chamon, M., and Crowe, C., 2012, “’Predictive’ Indicators of Crises,” in Handbook in Financial Globalization: The Evidence and Impact of Financial Globalization,ed. by G. Caprio (London: Elsevier), pp. 499505.

    • Search Google Scholar
    • Export Citation
  • Correa, Ricardo, Garud, Keshav, Londono, Juan M., and Mislang, Nathan, 2017. Sentiment in Central Banks’ Financial Stability Reports. International Finance Discussion Papers 1203.

  • Kaminsky, Graciela L., S. Lizondo, and Reinhart, Carmen M., 1998. Leading Indicators of Currency Crisis. Washington, DC: IMF Staff Papers, Spring.

    • Search Google Scholar
    • Export Citation
  • Kaminsky, Graciela L., and Reinhart, Carmen M., 1999. The Twin Crises: The Causes of Banking and Balance-of-Payments Problems.” American Economic Review, 89 (3): 473500.

    • Search Google Scholar
    • Export Citation
  • Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffery. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.

    • Search Google Scholar
    • Export Citation
  • Pennington, Jeffrey, Socher, Richard, and Manning, Christopher D., 2014. Glove : Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) 12.

Appendix 1. Semantic Clusters

This appendix summarizes the mechanical steps to construct semantic clusters.

We built semantic clusters as follows:

First, we select one or several “seed terms” for each concept. These seed terms are meant to be those most archetypically related to the semantic concepts of interest.

Second, we measure the (cosine) similarity between the word embeddings representing the seed terms and all other word embeddings trained on the corpus.

Third, we select the top 15-word embeddings with the highest (cosine) similarity to the seed terms (i.e. the 15 words closest to the seed terms on our “semantic map” of the corpus) and add those to the seed terms in order to form the specific set of words that function as proxy for the semantic concepts of interest.

Table A1.

Seed and Related Terms for Risk language

article image
Table A2.

Seed and Related Terms for Hedging Language

article image
Table A3.

Seed and Related Terms Opinion Language

article image
Table A4.

Seed and Related Terms for Crisis Language

article image
Table A5.

Seed and Related Terms for Positive Sentiment

article image
article image
Table A6.

Seed and Related Terms for Negative Sentiment

article image
article image
article image
article image

Appendix 2. EWS

Appendix 3. Time Series

Appendix 4. All Metrics for All Countries and All Indices

Table A1.

Results for Specific Sentiment

article image