Forecasting Social Unrest: A Machine Learning Approach
  • 1 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund

We produce a social unrest risk index for 125 countries covering a period of 1996 to 2020. The risk of social unrest is based on the probability of unrest in the following year derived from a machine learning model drawing on over 340 indicators covering a wide range of macro-financial, socioeconomic, development and political variables. The prediction model correctly forecasts unrest in the following year approximately two-thirds of the time. Shapley values indicate that the key drivers of the predictions include high levels of unrest, food price inflation and mobile phone penetration, which accord with previous findings in the literature.

Abstract

We produce a social unrest risk index for 125 countries covering a period of 1996 to 2020. The risk of social unrest is based on the probability of unrest in the following year derived from a machine learning model drawing on over 340 indicators covering a wide range of macro-financial, socioeconomic, development and political variables. The prediction model correctly forecasts unrest in the following year approximately two-thirds of the time. Shapley values indicate that the key drivers of the predictions include high levels of unrest, food price inflation and mobile phone penetration, which accord with previous findings in the literature.

1 Introduction

Gideon Rachman of the Financial Times dubbed 2019 “the year of the street protest.”1 From Hong Kong SAR to Bolivia to Lebanon, large-scale protests broke out, disrupting daily life and economic activity. While widespread, these protests were highly varied in their alleged cause. In Chile, an increase in metro fare prices; in India, a proposed citizenship law; in Malta, claims of corruption. More broadly, the incidence of social unrest has been rising over the past decade (see figure la). While this upswing began with the well-known protests in Tunisia in early 2011, which spread to other Arab states, it has extended to include a wide group of countries and regions throughout the decade.2 These include events relating to anti-austerity and Occupy movement protests in Europe in 2012, anti-government and election events in 2013 in Europe and Latin America, and election and pro-democracy protests in 2014 (Hong Kong SAR, India, Thailand). The last half of the decade saw record levels of protest events including the “Yellow Vests” protests in France, judicial reform in Poland, Catalan pro-independence in Spain and, most recently, protests against police violence in the USA.

The spectre of social unrest raises a number of policy relevant risks to financial, economic and political stability. Barrett et al. (2021) provide high frequency evidence that unrest events reduce equity valuations while Acemoglu et al. (2017) find, for the case of Egypt during the Arab Spring, equity valuation losses for firms connected to the administration currently in power – indicating that protests can play an important role in curtailing rent seeking behaviour. The impact on the real economy of unrest events can be seen in the 0.5 percentage point decline in per capita GDP around unrest events in our sample (see figure lb). Of course, this unconditional relationship may express mere correlation or simultaneity in the link between unrest and growth. In a companion piece, Hlatshwayo and Redl (2020), we use variation of unrest in neighboring states to instrument for unrest shocks showing that these events are associated with a decline in GDP of around 1 percent over a three year period3. Hadzi-Vaskov et al. (2021) find that GDP declines by 0.2 percentage points following an unrest shock, using the same instrumental variable as Hlatshwayo and Redl (2020). Unrest also raises the likelihood of broader political changes. Aidt and Franck (2015) provide empirical evidence for Acemoglu and Robinson (2000)’s theory linking social unrest to the threat of revolution (and the extension of the franchise (i.e., voting rights) by elites to avoid revolution). They show that U.K. constituencies where the swing riots of 1830–31 took place were associated with a significantly larger vote share for the party that supported voting reform. Similarly Aidt and Leon (2016) demonstrate that drought-induced riots are associated with incumbents making democratic concessions in Sub-Saharan Africa. The role of in-person protest remains relevant even in advanced democracies: Madestam et al. (2013) show that the exogenous changes in attendance at “Tea Party” movement protests was associated with more Republican party votes in the 2010 midterm elections. These examples highlight a potential upside of social unrest; it may support beneficial reforms over the medium term.

These substantive macro-implications raise the question of whether we can predict unrest and the variables that drive it with a degree of reliability that is useful for policy. This paper develops a forecasting model to predict unrest events one year ahead, where unrest events are measured using the data set of Barrett et al. (2020), based on newspaper mentions of unrest. This allows us to determine what predictors are associated with future unrest and to produce an index of the risk of unrest based on the probability our model assigns to these events. A number of potential drivers of unrest have been identified in the literature: increases in food prices (Bellemare (2015)), inequality (Acemoglu and Robinson (2000)), competition between elites (Turchin and Korotayev (2020)) and population growth coupled with competition for resources in less developed nations (Acemoglu et al. (2019)). Social media penetration has been shown to aid coordination of protest activities (Enikolopov et al. (2020)) where that role can be amplified by economic downturns (Manacorda and Tesei (2020)) and is associated with greater regional spillovers (Arezki et al. (2020)). However, it is likely that the drivers of unrest are disparate and interact with socioeconomic conditions in complex and non-linear ways that are difficult to enumerate.

We employ a flexible machine learning approach to gauge the importance of a large set of predictors and capture non-linearities. Our preferred model has a balanced accuracy level of 66% and, in that sense, is correct in predicting unrest approximately two-thirds of the time. We find a relatively modest role for predictors in the literature, with the most important predictor of unrest in the year ahead being the level of unrest in current year. However, our results do accord with the above literature in highlighting the role of inflation (especially food inflation), unrest in neighboring states, and digital media usage (such as mobile phones) out of the large set of potential predictors considered. Our paper is the first to provide evidence of the forecasting power of a such a wide range of socioeconomic, environmental, political and macroeconomic data (over 340 indicators) and to provide evidence for a very broad set of countries (125). The closest paper to ours is that of Cadena et al. (2015), who study the ability of Twitter data to forecast newspaper-identified unrest events in Brazil, Mexico, and Venezuela but do not consider additional predictors or a broader set of countries.

The structure of the paper is as follows: section 2 outlines the data employed; section 3 describes the empirical specifications used in the forecasting horse-race; section 4 discusses the drivers of the forecasting results; section 5 discusses some country examples and distributional results; and section 6 concludes.

2 Data

2.1 Unrest events of Barrett et al. (2020)

Unrest is measured using the newspaper-based Reported Social Unrest Index (RSUI) of Barrett et al. (2020) (hereafter BANL). Their source is the Dow Jones Factiva news aggregator and covers a wide range of English language newspapers and wire services in the USA, UK and Canada. The RSUI is a monthly count of articles that include the county’s name where the words “protest” or “riot” or “revolution” within 10 words of “unrest” scaled by the total number of articles in a given period.4 BANL demonstrate a close accord between their index and two major alternative sources for unrest, the Cross-National Time-Series Data (CNTSD) database of Banks and Wilson (2020) and the Armed Conflict Location and Event Database (ACLED).5

Rather than forecasting the continuous RSUI index at a monthly frequency, which would pose a significant challenge in separating signals from noise as well as a narrower set of high frequency predictors with which to do it, we make predictions on whether an unrest event takes place in the following year. BANL produce a carefully vetted unrest event database from their RSUI index. Unrest events are defined as a local monthly peak, which is a high reading for that country’s RSUI and where more than 10% of articles are on the topic of unrest.6 Events are hand-checked resulting in 15% of events (99 of 679) failing screening, however 69% of these mis-identified events are corrected with systemic changes to the search terms for the RSUI leading to only 4.6% of events mis-identified in the final search algorithm – with the largest share of those errors relating to current flags that are in fact about past unrest. This is likely to be much less of a problem at annual frequency (which we use) as discussion in an article of events that in fact took place earlier in the year should still result in a unrest flag for that year. BANL demonstrate that their approach precisely captures well known unrest episodes such as the Arab spring of 2011, the color revolutions in formerly communist countries in the 2000s, the sequence of unrest and coups taking place in Thailand between 2006–2014 as well as the waves of unrest in Venezuela following the elections in April 2013 of Nicolas Maduro following the death of Hugo Chavez in March of that year.

Not all unrest events are alike: unrest relating to a coup d’etat may have very different economic and social implications than one relating to climate change or globalization. To add granularity we manually classify the unrest events based on short written descriptions of the events provided by Barrett et al. (2020).7 We use seven non-exclusive categories of events based on key words. If an event description contains these key words we flag it as belonging to the relevant category – see table 1. Three categories relate broadly to the political environment: Government, Democratic-reforms, and Elections. The first captures protest directly relating to presidents, political opposition, resignations, political coalitions and impeachments. The second is broader and relates to protests around democratic reforms or rights; corruption, political reforms and the free press; and issues relating to equity topics such as gender. The third directly relates to elections, which is broken into a separate category due to the large number of events of this type. A closely-related category is protests related to basic needs capturing fiscal austerity, energy subsidies, calls for improved education and access to health care, as well as general strikes. Global issues intends to capture protests around themes that have mobilized people across many countries such as environmental, anti-war, anti-globalization and anti-immigration unrest.8 Further categories relate to coup d’etats and assassinations; religion; and protests that involve violence. We are able to classify around two-thirds of events in this way; the remaining events are labeled as unknown. Figure (1a) shows presents the unrest events based on this classification. The most common type of unrest is related to government followed by election and democratic reform-related unrest. Global issues unrest is rare on average but has increased in frequency in the past decade.

Table 1:

Classifying types of unrest

article image

2.2 Predictors

A wide range of socioeconomic, environmental, political and macroeconomic data are used as inputs to forecast unrest events. Altogether, we consider over 340 indicators and a complete list is provided in the appendix; here we outline the data sources and broad types of variables covered.

Our starting point is the fiscal crisis prediction model database of Hellwig (2020), which draws primarily on the IMF’s World Economic Outlook (WEO) and International Financial Statistics (IFS) and World Bank World Development Indicators (WDI) datasets. This includes a number of categorical variables splitting the sample of countries by levels of development (Advanced, Emerging and Low Income Economies), region, continent, membership in a monetary union, commodity exporting status, fuel importer classification, whether an election has been held in a given year, or if a country is part of the Heavily Indebted Poor Country (HIPC) initiative. Macroeconomic variables cover GDP level and growth, exchange rates, terms of trade, foreign reserve levels, fiscal balance, aid receipts, remittances, and population. Transformations of these variables include growth rates, first differences, lags, and volatility measures (e.g., of exchange rates by using a higher frequency version of the variable and capturing its fluctuations in a given year).

We augment this with data from the International Country Risk Guide and the Cross-National Time-Series Data Archive (CNTSD) to cover political characteristics of countries. Recent literature highlights how low-quality institutions (Moseley (2015)) can catalyze unrest while certain political regimes help mitigate protest participation (Ackermann (2017)). Given the difficulty of capturing institutional quality with just one measure, we include a battery of indicators (e.g., legislative effectiveness, executive effectiveness, government cohesion, perceptions of legitimacy and accountability, and bureaucratic quality). Regime type is measured using Polity IV scores, differences in executive selection processes, the degree of parliamentary responsibility, the nature of the head of state, and how active military groups are in political processes). These datasets also contain indicators that proxy for sources of tension (e.g., religious frictions, ethnic conflicts, war, and terrorism); polarization (e.g., the extent of political party fractionalization and the occurrence of government purges); and instability (e.g., assassinations, political violence, and mass strikes)—all of which have the potential to go hand-in-hand with broader societal unrest. Finally, we include measures that capture the size of prison populations, which may serve as a proxy for political discord (e.g., where there are large numbers of political prisoners), an excessively strict legal system that could foment eventual uprisings, or inadequate investments in education and community development (i.e., where insufficient economic opportunities drive up both crime and unrest). With respect to the latter, indicators that measure educational attainment, poverty, quality of healthcare, and the investment environment are also included.

These data sets also allow us to proxy for informational frictions and coordination costs. Access to news media, social media, and associated technologies (e.g., televisions and mobile phones) can improve the flow and aggregation of information and ease logistical hurdles for mobilizing large populations (Manacorda and Tesei (2020)). To capture these features, we include measures of internet and mobile phone penetration as well as televisions per capita and newspaper circulation.

From the World Bank World Development Indicators (WDI) we collect data on five broad categories. First, poverty, measured via the poverty gap at the $5.5 per day level, prevalence of food insecurity and multidimensional poverty head count and intensity indices. Second, access to basic services such as electricity, sanitation and health care. Third, inequality via Gini indices, share of income held by highest 10%, financial access of the poorest 40%, as well as measures of gender inequality relating to schooling and literacy. Fourth, measures of population urbanization and density since densely populated urban environments are more likely support the spread of unrest. We also include the share of international migrants in the population to proxy for xenophobic tensions. Finally, unemployment for the population as well as the youth unemployment rate by gender are included.

We include natural disaster events from the EM-DAT database from the Centre for Research on the Epidemiology of Disasters at the University Catholique de Louvain.9 Disasters, if perceived to be poorly prepared for or responded to, may lead to anti-government sentiment. We include data on the number of people affected by drought, earthquake, epidemic, extreme temperatures, floods, industrial accidents, and storms.

Elevated levels of policy and economic uncertainty have been shown to drag on growth, investment and raise unemployment (see, for example, Bloom (2014)). Besides the direct economic effects of uncertainty, the inability to plan may inhibit households and businesses and lead to tensions with the government. We include a text-based measure of uncertainty based on the Economist Intelligence Unit’s quarterly country reports produced by Ahir et al. (2018) – this data set covers more than 140 countries from 1996 onwards.

Sharp increases in the cost of living are likely to lead to hardship and discontent, especially for the case of necessities such as food (see Bellemare (2015)). To capture the role of different categories of prices we use the annual maximum of the year-on-year monthly inflation figures for a variety of CPI categories from the IMF CPI database. 10 Similarly, announced policy changes, such as privatization of state-owned-enterprises or relaxation of labor market regulations may elicit protest from affected groups. To account for this channel, we include the data of Duval et al. (2018) who build narrative indicators of structural reforms covering product market and labor markets.11

3 Model

3.1 Machine Learning Models Considered

We consider a variety of forecasting models to predict unrest events, falling into three groups. All models are described formally in the appendix; here we provide intuition for the models considered. The first group, familiar to economists, are linear regression models that use a logistic function to transform the linear model’s output to ensure that the range of the function lies between 0 and 1. However, we include a regularization term that allows many predictors to be considered without suffering from over-fitting. The Ridge model adds a quadratic penalty term (L2 norm) in the parameters of the linear regression to the standard likelihood function while Lasso adds an absolute value penalty (L1 norm). The Ridge model assumes that the true model is dense, where all predictors matter, whereas Lasso assumes a sparse model, where only a few predictors are important.

Other models are adept at capturing non-linearities between the target and the predictors. This group includes a neural net and a support vector machine (SVM) – both are models that have performed well on non-traditional data such as text and image recognition tasks in other applications (For instance, see Gentzkow et al. (2019), Redmon and Farhadi (2017)). SVMs classify events using a non-linear decision boundary that aims to maximize the gap between different groups of observations.12 Neural nets are closely related to standard logistic regressions except that neural networks use the output of one logistic model as an input to other logistic models, layering many on top of on another. This layering results in the ability to capture highly non-linear functions in a computationally efficient way.

The final group of models are tree based. These methods start with a decision tree – which partitions the space of predictor variables so that being in a given region has predictive power for the target variable.13 While intuitive, such models are prone to over-fitting and so a variety of techniques have been introduced to combine a large number of trees in a robust way. AdaBoost and Gradient Boost are boosting algorithms which are algorithms that build strong prediction models from a group of weak, typically very simple, prediction models. Weak prediction models are attractive since they are not prone to over-fitting, however their individual performance is poor and so they are combined to yield more accurate predictions. Freund and Schapire (1997) introduced the AdaBoost algorithm14 which sequentially evaluates simple models (in this case shallow tree models) placing higher weight on models that perform well while also putting higher weight on the mis-classified data. The final prediction is a weighted average of all the simple models. Gradient boosting algorithms sequentially improve simple models by also modeling the slope of the loss function found in the previous step (intuitively, this is like building a simple model to predict a target and a model to predict the errors of that model before combining them). For a popular approach to gradient boosting, see Chen and Guestrin (2016). An alternative to boosting is bagging or bootstrapping, where many models are fitted on bootstrapped samples of the training data and average the results.15 Bagging reduces over-fitting in flexible models such as decision trees, reducing the variance of (out-of-sample) predictions. The major innovation in this area is that of Breiman (2001), who averages the performance of many de-correlated trees – where each tree is grown using only a random subsample of the available predictors (See Algorithm 1). We find that tree-based models perform best in our context with Random Forest providing the best overall model; although the Random Forest approach is comparable to the boosting methods (see 2a) and discussion below.

3.2 Model Evaluation and Performance

We evaluate the models using an expanding window split of the training and testing set starting from an initial training set of annual data from 1990 to 1995 to make a prediction for the test set of 1996. We then roll the sample forward, generating predictions up to the year 2019. Hyperparameters are chosen using a coarse grid using the average model performance over the full rolling window sample, i.e. up to 2019, see appendix I. We do not pursue K-fold cross validation, as is common in the machine learning literature, due to time dependence in the data which is typical in economic datasets. Additionally, BANL demonstrate significant persistence in the RSUI measure, which is the basis for the events we aim to predict. Shao (1993) demonstrates asymptotic inconsistency for K-fold cross-validation in the presence of time dependence. Creative solutions to using cross-validation in a time series context exist, such as Racine (2000) who uses a gap between the testing and training set to break serial dependence. However, this approach would consume too many degrees of freedom in our relatively short annual data set.16

article image

Our preferred test statistic is the area under the Receiver Operator Characteristic (ROC) curve (which we denote by AUC). The ROC plots the true positive rate (events correctly classified as unrest divided by total unrest events) against the false positive rate (events incorrectly classified as unrest divided by all events with no unrest associated with them) as the threshold for classifying an event as unrest (e.g., the estimated model probability is above 50%) is varied. Our benchmark is a random guess (coin flip), which is reflected by the AUC’s 45 degree line (see figure 2b). The AUC is defined following the empirical AUC, denoted θ^, of DeLong et al. (1988)17:

θ^=1NMΣj=1NΣi=1MH(Xi,Yj)
whereH(X,Y)=(1Y<X0.5Y=X0Y>X

Where X denotes the model probability of unrest for events where unrest took place and Y, the model probability of unrest (risk score) for an event where unrest did not take place. The intuition behind this definition is that we would want the model to assign a lower probability to events where unrest did not happen than to those where it did, and so for these cases its score rises by 1/MN; in cases where these are equal, the model receives a lower score of 1/2MN; and in the erroneous case (Y>X), the model receives no addition to its AUC. As noted by DeLong et al. (1988), this definition of the AUC is equivalent to the Mann-Whitney U test which estimates the probability, θ^, that a randomly selected event from the group where no unrest took place will have a risk score less than or equal to a randomly selected event from the group where unrest did occur. Additionally, the AUC can, in our case with only two classes, be interpreted as a measure of balanced accuracy or the arithmetic average of the true positive rate and the true negative rate.18 DeLong et al. (1988) provides a test statistic for comparing two models via θ^, which turns out to be a simple z-score computed as, θ^Aθ^BVar[θ^Aθ^B] Below we use this to test each model against the AUC equivalent of a random guess, 0.5; thus, we use the z-score that corresponds to θ^i0.5σi, where i corresponds to the model type and σ is the standard deviation of θ^ over the test set.

The model results are compared in figure 2a which shows the mean θ^ along with the standard deviation over the test set (the 23 out-of-sample years, 1996–2019). Details on hyper parameter tuning are found in the appendix. All models are adept at incorporating our large set of predictors, as noted above; however, the linear models perform relatively poorly with an AUC of 0.55 for both the Lasso and Ridge logistic models. The neural net model performs comparably but remains close to a random guess (i.e., an AUC of 0.5) at 0.54. While neural nets have shown remarkable performance in contexts with very large datasets, it is somewhat unsurprising that they perform less well in our context which is too small to take advantage of the flexibility of this class of models. The SVM model preforms significantly better than the preceding models. The SVM from our hyper parameter grid-search uses a radial basis function kernel19 to capture non-linearities; however, the favored parameterization acts to push the model towards a more linear representation reducing over-fitting – this balancing act works relatively well compared to the case of the neural net. The tree based models outperform our other models and all have (one-sided) DeLong test statistics that are significantly different from 0.5, with Random Forest possessing the highest mean θ^, and is our preferred forecasting model used for the remaining results.

Figure 2
Figure 2
** significant at 5%, * at 10% for DeLong test. Horizontal bars show mean AUC over the test set with the error bars representing standard errors.

The broad picture of our preferred model’s performance is demonstrated by the ROC curve (see figure 2b). This indicates that while our model performs significantly better than a coin-flip, the trade-off involved requires accepting a relatively high false positive rate (40–60%) to achieve true positive rates in the 60–80% region. To make this more concrete we calculate an optimal threshold at which to judge whether unrest has occurred. We pick the threshold to minimize the average of missed unrest events (1- true positive rate) and false alarms, for each out-of-sample prediction. The average probability of an unrest event in a country in a year is 16.1%, and the average threshold selected from this procedure is slightly higher at 16.8%. The choice of a threshold allows us to compute some intuitive model performance statistics. Balanced accuracy is 65.9%, which along with the result for the mean AUC above indicates that this model is correct approximately two thirds of the time. Recall measures the number of relevant events selected and is defined as TPTP+FN=70.9%; thus, few crises are missed. However, this is at a cost of relatively low precision, defined as TPTP+FP=30.8% indicating that a significant number of false alarms in order to reach that level of recall. This may seem odd given the way the threshold is chosen but it is simply an expression of the trade-off illustrated in the ROC curve: a relatively high false positive rate is required to catch a large number of unrest events.

The performance of the model varies over the sample period, with highest performance in the early 2000s and late 2010s (see figure 3a). It is noteworthy that the worst performance takes place around 2009–10 when the model is surprised by the structural break induced by the Arab spring; however, it recovers in the following years. We also calculate the model’s performance by type of unrest event where the model performs similarly on most categories with the exception of violent unrest and coups (see figure 3b). Hlatshwayo and Redl (2020) show that coups and violent unrest tend to be associated with larger negative macroeconomic consequences and, based on our classifications of unrest types, violent unrest is less prevalent than other forms (e.g., anti-government and election-related unrest). This work suggests that the larger economic effects from violent forms of unrest may be associated with their rare, unanticipated, and unpredictable nature.

Figure 3
Figure 3
** significant at 5%, * at 10% for DeLong test. Horizontal bars show mean AUC over the test set “with the error bars representing standard errors.

4 Feature Importance

To understand the importance of the different features in driving the model predictions, we use Shapley values. The description of Shapley values follows Joseph (2019). To understand this approach to feature importance in non-linear models, it is useful to begin with the linear case. The contribution of a feature, or a single observation of that feature, to a prediction in a linear model is given by the regression coefficients multiplied by the observation value(s). Consider the linear model; the model decomposition Φ for the feature x, with observation, i, is:

Φ[f^(xi)]=ϕ0(f^)+Σk=1nϕk(xi;f^)=β^0+Σk=1nxikβ^k(1)

where β^0 is the estimate of the unconditional expected value of f(x), and βk the slope coefficients corresponding to the kth feature. Shapley values provide a generalization of this formula for non-linear models drawing on ideas from cooperative game theory. Each observation is treated like a player in a cooperative game with other observations where the result of the game is a model prediction. The difficulty is that pay-offs to observations working together cannot be assessed by simply using a single observation. This is because non-linear models, usually rely on interaction effects between many observations. Adding an observation that is highly correlated with the observations already in a coalition will lead to little additional pay-off but uncorrelated observations that contain useful signals will. The way to gauge the value of these coalitions is formalized below:

f^(xi)=ϕ0S+Σk=1nϕkS(xi;f^)ΦS(xi)(2)
ϕkS(xi;f^)=Σx(x1,x2,...xp)\(xk)|x|!(n|x|1)!n!(f^(xi|x(xk))f^(xi|x))(3)

where x ⊆ (x1, x2, ..., xp)\(xk) is the set of all variable coalitions when we exclude the hth variable (whose value we wish to measure), |x’| is the number of variables included in the model evaluation for each set of coalitions, |x|!(n|x|1)!n! is the combination weighting factor for each coalition and f^(xi|x(xk))f^(xi|x) is the pay-off for including xk. in the coalition x’. This approach to measuring feature importance in the machine learning literature is due to Strumbelj and Kononenko (2010).

Figure 4:

We calculate Shapley values using the implementation of Lundberg and Lee (2017). A choice needs to be made as to whether the model is evaluated on the data on which it is trained or on the out of sample data on which its forecasts are evaluated (the test data). Here we apply the decomposition (3) in the same way we evaluated the model’s forecast performance: fitting the model on an initial sample of 1990–1995 and calculating the Shapley values corresponding to the predictions for 1996, then rolling the sample forward, one year at a time, to 2019. An alternative approach would be to use the model fitted on the full sample of data, i.e., up to 2019. We choose the test sample approach since highly flexible machine learning models such as Random Forest can have arbitrarily good in-sample fit, which is unlikely to be representative of out-of-sample predictions. This approach is also appealing in that it provides a sense of how the importance of different features has changed over time as the model is re-estimated, as it would be done in practice.

Due to the large number of features, we aggregated them into categories to gauge the importance of different categories of variables, the mapping is available in the appendix.20 For example, the Reported Social Unrest Index of Barrett et al. (2020) and all its lags are pooled into one category of “Reported Social Unrest Index” (RSUI) which captures the influence of past levels of unrest on future unrest; the features covering internet, television, newspaper, telephone and mobile phone usage are grouped under “Digital and Media”. Figure (4a) presents the average of the absolute values of the Shapley values over all test sets by category.21 The values represent the contribution to predictions over and above the average prediction (φ0S), which is 16 percent for all years and countries. The RSUI is by far the most important category of features adding, on average, around 6.5 percentage points to the average prediction. The remaining categories yield smaller contributions. The next most important group are those relating to inflation, where food and oil inflation are broken out separately given their importance. The influence of food inflation is in line with the findings of Bellemare (2015). Oil price inflation is primarily influential in its interaction with different levels of the RSUI, where lower values of oil inflation dampen the effect of high historical levels of unrest. We also find a role for contagion in driving unrest where unrest in neighboring states can help predict unrest, in line with Arezki et al. (2020) and Barrett et al. (2020). The importance of Digital and Media is in line with the findings of Enikolopov et al. (2020) and Manacorda and Tesei (2020).

Figure (4b) illustrates that the influence of features can change abruptly over time, with a large role for inflation (especially food price inflation) during the sharp increase in unrest following the Arab Spring of 2011. The model also indicates the contagion from neighbors became more important around that time as well. Inflation has remained important in the 2010s; however, the role of food prices has declined while that of oil prices has increased, a pattern broadly in line with their respective price changes over this period.

5 A Social Unrest Risk Index

The model produces a probability of unrest for each country for each year. This probability can be treated like a risk index – an indicator of the likelihood of unrest in the following year. The average risk for the sample is 16 percent but this masks a pattern of generally lower risk in the first half of the sample, hovering around 12 percent, then rising significantly after the Arab Spring and maintaining elevated levels of the RSUI in the 2010s of values closer to 20 percent (Figure 5a). This shift towards a higher risk of unrest can be seen in the shifts of the distribution of unrest risk over all countries in the sample. Figure (5b) shows kernel density plots of the distribution of unrest risk over time. In the first half of the sample, the majority of countries had a low risk of unrest with a long right tail of higher risk countries. By 2010, risks had risen in the tail and a bi-modal profile emerged. After 2015, the distribution remains bi-modal but the mode moved to the right of the mean and we see a much larger incidence of high risk countries than in the 1990s. In summary, unrest moved from a low probability tail risk in the 1990s (of around 10 percent on average) to a relatively likely outcome by the end of the 2010s (at 25 percent).

Figure 5:
Figure 5:

Risk of Unrest

Citation: IMF Working Papers 2021, 263; 10.5089/9781557758873.001.A001

(a) Rolling out-of-sample probability of unrest averaged across countries for each year, (b) Density plot for the probability of unrest across countries.

Figure (6) illustrates the risk index for selected country cases and contrasts this with unrest events. Unrest events flagged in the figure correspond to one year before they occur; this is done because the model is producing a risk index based on the probability unrest will occur in the following year. Therefore changes in the risk index are then aligned with the dates when unrest takes place. The event timing discussed here will correspond to the figure. Both the United Kingdom and the United States share a similar pattern with a low risk prior to the last decade when risks rose throughout the decade. The United Kingdom experienced a large number of unrest events: the global events relate to the Occupy movement and Brexit; democratic events relate to the Scottish independence referendum; and basic needs unrest relate to anti-austerity protests. The risk index did not provide warning of the global events relating to the Occupy protests but did remain elevated from 2012 foreshadowing the frequent protests taking place thereafter. The labeled events in the United States relate to the election of President Donald Trump and protests against police violence (under Democratic unrest). Notably, the risk index rose prior to these events in 2015 and remained high thereafter. Egypt and Thailand offer cases where protest led to a change of government. In the case of Egypt, we have the Tahrir Square protests and subsequent resignation of President Hosni Mubarak. The model did not pick up the risk of this event, as the model does poorly in catching the structural break that takes place around the time of the Arab Spring in general. Part of the explanation may be that the model underappreciated the importance of food prices prior to the Arab Spring but subsequently places a high weight on this feature. The model performs better around the events leading up to the coup d’etat in Thailand that took place in 2013, with a high and rising risk index (with the exclusion of 2012).

6 Conclusion

Social unrest raises financial, economic and political risks in the near term, with potentially positive effects over the medium term. The increasing frequency of such events over the past decades suggests that an early warning system for these events may be useful for policy makers. While a few indicators have been identified in the literature as important in driving unrest, our work illustrates that the drivers of unrest are disparate and interact with a wide range socioeconomic conditions in complex and non-linear ways. In particular, we combine the measurement of unrest by Barrett et al. (2020) with a large data set of socioeconomic, environmental, political and macroeconomic data to forecast unrest events. We compare the performance of a set of popular forecasting models, finding that the linear models typically used by economists perform poorly as do highly complex models popular in Artificial Intelligence (due to our relatively small data set). Tree based models appear offer the appropriate balance of flexibility and simplicity. Our preferred Random Forest model produces a balanced accuracy level of 66%22 and an AUC of 65%, which is significantly better than chance, based on the test of DeLong et al. (1988).

We explore the drivers of the model’s predictions using Shapley values. These accord with prior literature in emphasizing food price inflation, media usage and contagion from unrest in neighboring states. However, this paper also puts in context a wider variety of drivers such as GDP growth, development indicators such as access to basic services, urbanization, remittances, and uncertainty. We find that there is a substantial auto-regressive nature to unrest where unrest levels today (and in the past) are the most important predictor for future unrest.

The predictions of the model provide an unrest risk index that shows a rising risk of unrest since the early 1990s when the risk of unrest was around 10 percent to a level closer to 25 percent today. Future work could focus on using these results to build higher frequency predictions. Our results suggest that incorporating social media data and high frequency data on inflation may be fruitful.

References

  • Acemoglu, D., Fergusson, L., and Johnson, S. (2019). Population and Conflict. The Review of Economic Studies, 87(4):15651604.

  • Acemoglu, D., Hassan, T. A., and Tahoun, A. (2017). The Power of the Street: Evidence from Egypts Arab Spring. The Review of Financial Studies, 31(1):142.

    • Search Google Scholar
    • Export Citation
  • Acemoglu, D. and Robinson, J. A. (2000). Why Did the West Extend the Franchise? Democracy, Inequality, and Growth in Historical Perspective. The Quarterly Journal of Economics, 115(4):11671199.

    • Search Google Scholar
    • Export Citation
  • Ackermann, K. (2017). Individual differences and political contexts the role of personality traits and direct democracy in explaining political protest. Swiss Political Science Review, 23:2149.

    • Search Google Scholar
    • Export Citation
  • Ahir, H., Bloom, N., and Furceri, D. (2018). World uncertainty index, mimeo.

  • Aidt, T. S. and Franck, R. (2015). Democratization under the threat of revolution: Evidence from the great reform act of 1832. Econometrica, 83(2):505547.

    • Search Google Scholar
    • Export Citation
  • Aidt, T. S. and Leon, G. (2016). The democratic window of opportunity: Evidence from riots in sub-saharan africa. Journal of Conflict Resolution, 60(4):694717.

    • Search Google Scholar
    • Export Citation
  • Arezki, R., Adesse Dama, A., Djankov, S., and Nguyen, H. (2020). Contagious protests. World Bank Policy Research Working Paper 9321.

  • Banks, A. and Wilson, K. (2020). Cross-national time-series data archive.

  • Barrett, P., Appendino, M., Nguyen, K., and de Leon Miranda, J. (2020). Measuring social unrest using media reports. IMF Working Paper No. 20/129.

    • Search Google Scholar
    • Export Citation
  • Barrett, P., Bondar, M., Chen, S., Chivakul, M., and Igan, D. (2021). Social unrest and financial markets, forthcomming IMF Working Paper.

  • Bellemare, M. F. (2015). Rising food prices, food price volatility, and social unrest. American Journal of Agricultural Economics, 97(1):121.

    • Search Google Scholar
    • Export Citation
  • Bloom, N. (2014). Fluctuations in Uncertainty. Journal of Economic Perspectives, 28(2):153176.

  • Breiman, L. (2001). Random forests. Machine learning.

  • Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification And Regression Trees. CRC Press.

  • Cadena, J., Korkmaz, G., Kuhlman, C. J., Marathe, A., Ramakrishnan, N., and Vullikanti, A. (2015). Forecasting social unrest using activity cascades. PIOS ONE, 10(6): 127.

    • Search Google Scholar
    • Export Citation
  • Chen, T. and Guestrin, C. (2016). Xgboost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    • Search Google Scholar
    • Export Citation
  • DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3):837845.

    • Search Google Scholar
    • Export Citation
  • Duval, R. A., Furceri, D., Hu, B., Jalles, J. T., and Nguyen, H. (2018). A Narrative Database of Major Labor and Product Market Reforms in Advanced Economies. IMF Working Papers 2018/019, International Monetary Fund.

    • Search Google Scholar
    • Export Citation
  • Enikolopov, R., Makarin, A., and Petrova, M. (2020). Social media and protest participation: Evidence from russia. Econometrica, 88(4):14791514.

    • Search Google Scholar
    • Export Citation
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119139.

    • Search Google Scholar
    • Export Citation
  • Gentzkow, M., Kelly, B., and Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3):53574.

  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

  • Hadzi-Vaskov, M., Pienknaguraand, S., and Ricci, L. A. (2021). The macroeconomic impact of social unrest. IMF Working paper WP/21/135.

  • Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA.

    • Search Google Scholar
    • Export Citation
  • Hellwig, K.-P. (2020). Predicting fiscal crises: A machine learning approach, mimeo.

  • Hlatshwayo, S. and Redl, C. (2020). The macroeconomic relevance of social unrest, forthcoming.

  • Joseph, A. (2019). Parametric inference with universal function approximators. Bank of England working papers 784, Bank of England.

  • Kaminsky, G. L. and Reinhart, C. M. (1999). The twin crises: The causes of banking and balance-of-payments problems. American Economic Review, 89(3):473500.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P. and Ba, J. (2017). Adam: A method for stochastic optimization.

  • Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 47654774. Curran Associates, Inc.

    • Search Google Scholar
    • Export Citation
  • Madestam, A., Shoag, D., Veuger, S., and Yanagizawa-Drott, D. (2013). Do Political Protests Matter? Evidence from the Tea Party Movement*. The Quarterly Journal of Economics, 128(4):16331685.

    • Search Google Scholar
    • Export Citation
  • Manacorda, M. and Tesei, A. (2020). Liberation technology: Mobile phones and political mobilization in africa. Econometrica, 88(2):533567.

    • Search Google Scholar
    • Export Citation
  • Moseley, M. W. (2015). Contentious engagement understanding protest participation in latin american democracies. Journal of Politics in Latin America, 7(3):348.

    • Search Google Scholar
    • Export Citation
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:28252830.

    • Search Google Scholar
    • Export Citation
  • Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. Journal of Econometrics, 99(1):3961.

    • Search Google Scholar
    • Export Citation
  • Redmon, J. and Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 65176525.

    • Search Google Scholar
    • Export Citation
  • Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88(422) :486494.

  • Strumbelj, E. and Kononenko, I. (2010). An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research, 11.

    • Search Google Scholar
    • Export Citation
  • Turchin, P. and Korotayev, A. (2020). The 2010 structural-demographic forecast for the 2010 to 2020 decade: A retrospective assessment. PLoS ONE 15(8).

    • Search Google Scholar
    • Export Citation

7 Appendix I: Machine learning models

A range of ML tools were compared for prediction performance. All models were compared using the expanding window approach described in the text, based on the AUC. We use the Python library of Pedregosa et al. (2011), scikit-learn.org, to implement these algorithms. This section draws heavily on the latter source, as well as Hastie et al. (2001).

7.1 Linear models: regularized logistic regression

The regularized logistic regression solves the following problem:

minβ,α(1ρ2β22+ρβ1+CΣi=1n[log(exp(yi(XiTβ+α)))+1])

This is a standard logistic regression where a penalty is applied to the size of the slope parameters, β. If ρ = 1 then this is the Lasso estimator for a logistic regression. If the ρ = 0,this is a Ridge logistic regression. Note that we assumed that C = 1. Given this formulation no hyper parameters require tuning and we simply ran the expanding window test for the two extreme values of ρ.

7.2 Neural Network

Neural networks take linear combinations of input variables, combine these using a nonlinear activation function (similar to a logistic regression), and then using the output as a derived input for prediction. That process can be repeated many times into layers of derived inputs. Here we outline the basic principles and direct the reader to Hastie et al. (2001) for an introduction or to Goodfellow et al. (2016) for a discussion of deep neural networks. We seek to minimize the loss function L(y,y^,W)=ylny^(1y)ln(1y)+α^W22, where W is a matrix of weights or parameters, y is an observation of the target variable, ŷ is a prediction for the target and a is a hyper parameter which was set to 0.0001. This loss function is minimized given the equations for i = 1,..., m layers of the network:

Z[i]=W[i]X+b[i]A[i]=g(Z[i])Z[i+1]=W[i+1]X+b[i+1]A[i+1]=g(Z[i+1])

The weights matrix, W, is ni-xm, where ni is the number of nodes used for layer i. Note the square brackets superscript indicates the position in the network rather than an exponent. The weights are solved using the gradient descent algorithm of Kingma and Ba (2017). A m = 3 layer network was assumed with grid search over the following choices for the number nodes in each layer:

  • layer 1: 5,10,20,30

  • layer 2: 10, 8, 5 ,1

  • layer 3: 10, 5, 1

This leads to 48 models whose forecasts are tested. The optimal parameters were: 5,10 and 10 for the three respective layers (highlighted in bold above).

7.3 Support vector machine

A support vector machine aims to use a linear model to separate points in the space of features that are associated with different classes (in our case unrest or no unrest). Specifically, the goal is to find a set of slope coefficients, w, and intercepts, b such that the prediction made by sign (wT φ(x) + b) gives the correct classification for the target y = (1,-1) where unrest is 1 and no unrest -1. We (mean-variance) normalise the set of predictors, prior to applying the algorithm so that all have comparable scale. Typically classification problems are not perfectly separable, some samples are allowed to be ζifrom the boundary. The formal statement is:

minw,b,ζ12wTw+CΣi=1nζis.t.yi(wTϕ(x)+b)1ζiζi0,i=1,...,n

Where K(x,x’) = φ(x)T φ(x) = exp (—γ‖x’x’‖2). C controls the strength of the penalty on the boundary, with a low C leading a smoother decision boundary. γ affects how much a single observation can affect the shape of the boundary, a higher γ reduces the influence of a single observation v. a cluster of observations. We estimate the model on grid for C = (0.1,1,10,100,1000,10000) and for γ = (0.001,0.01,0.1,1,10,100)- with the optimal parameters (C, γ) = (1,0.001).

7.4 Tree based models

The tree based ensemble method of Random Forests was described in detail in the text. Here we note that we searched over a grid for the max depth of the trees cover (1,3,5) and the maximum number of features from which to the bootstrapped sample Z covering (50,100,200)features. The optimal parameters were a max tree depth of 3 and max features of 200.

7.4.1 AdaBoost

AdaBoost is a competing ensemble method which performed well in our tests. AdaBoost takes a collection of weak learners (here short decision trees, or stumps) and re-weights them to produce a stronger ensemble learner. The re-weighting scheme puts higher weight on better performing models and on unexplained observations. The hyper parameters for this algorithm are the number of simple or short trees to consider and the learning rate which effectively amplified the role of the weights use to aggregate the simple models. Below is a description of how the algorithm works:

article image

We considered a number of estimators ranging form 50 to 300 in increments of 25 and a learning rate of 0.1 to 0.6 in increments of 0.1. We found the optimal performance when the number of estimators is 50 and learning rate 0.1.

7.4.2 Gradient Boosted Trees

The final method we consider, which provided comparable performance to the Random Forest classifier, is Gradient Boosted Trees. Gradient boosting, similar to AdaBoost, builds strong learners from weak.

y^i=FM(xi)=Σi=1Mνhm(xi)

Where M is the number of estimators, or simple/short decision trees. The recursive formulation is Fm(x) = Fm-1(x) + vhm(x), where we can think of hm(x) as a new tree added to the existing model and v the learning rate, with the goal of minimizing the loss of the overall model:

minhmΣi=1ML(yi,Fm1(x)+νhm(xi))

The problem is initialized with F0,which is a constant. In the case of a square loss function this would just be the mean of y. The loss function can be approximated by a first-order expansion:

L(yi,Fm1(xi)+hm(xi))L(yi,Fm1(xi))+νhm(xi)[L(yi,F(xi))F(xi)]F=Fm1

The quantity in square brackets is the derivative of L with respect to its second parameter, evaluated at Fm-1(xi), we denote by gi. Note that the first term will be a constant. So we have:

hm=argminhLmargminhνhm(xi)gi

h(xi) will minimize hmwhen it is chosen to be proportional to —vgi. The above is applicable to the case of a continuous target variable, for the case of classification the only alteration required is to map F(x) into a probability, which can be done, for example, with a sigmoid function. The algorithm can be combined with bootstrap sampling of the features at each iteration, called sub sampling in the toolkit of Pedregosa et al. (2011).

We considered 4 hyper parameters. First, the number of estimators with a grid of (30,50,100,200). Second, the learning rate with a grid of (0.1,0.2,0.5). Third, the subsample with a grid of (0.8,1). Finally, a choice of 1 or 3 for the max depth of the trees. The optimal combination is (30, 0.1,0.8,1), respectively.

8 Appendix II: Input Data and Aggregation scheme

article image
article image
article image
article image
article image