In Search of Information: Use of Google Trends' Data to Narrow Information Gaps for Low-income Developing Countries
  • 1 https://isni.org/isni/0000000404811396, International Monetary Fund

Contributor Notes

Author’s E-Mail Address: fnarita@imf.org (Futoshi Narita) ryin@imf.org (Rujun Yin)

Timely data availability is a long-standing challenge in policy-making and analysis for low-income developing countries. This paper explores the use of Google Trends’ data to narrow such information gaps and finds that online search frequencies about a country significantly correlate with macroeconomic variables (e.g., real GDP, inflation, capital flows), conditional on other covariates. The correlation with real GDP is stronger than that of nighttime lights, whereas the opposite is found for emerging market economies. The search frequencies also improve out-of-sample forecasting performance albeit slightly, demonstrating their potential to facilitate timely assessments of economic conditions in low-income developing countries.

Abstract

Timely data availability is a long-standing challenge in policy-making and analysis for low-income developing countries. This paper explores the use of Google Trends’ data to narrow such information gaps and finds that online search frequencies about a country significantly correlate with macroeconomic variables (e.g., real GDP, inflation, capital flows), conditional on other covariates. The correlation with real GDP is stronger than that of nighttime lights, whereas the opposite is found for emerging market economies. The search frequencies also improve out-of-sample forecasting performance albeit slightly, demonstrating their potential to facilitate timely assessments of economic conditions in low-income developing countries.

I. Introduction

Timely data availability in low-income developing countries (LIDCs) is a long-standing challenge to researchers and policy makers. LIDCs have more missing data and longer time lags in data release than more developed economies. For example, as of July 2018, official FDI data for 2017 are available only for less than half of LIDCs, compared to 90 percent for advanced economies.1 A survey of the IMF staff indicates severer deficiencies in data quality and availability for low-income countries (Independent Evaluation Office, 2016; Figure 2). The lack of reliable and timely information hampers real-time assessment of economic conditions and restricts the ability to set sound policies.

Nontraditional data sources—so-called big data—have proven to be useful in providing operationally valuable information in LIDCs.2 Satellite imagery data, such as nighttime lights, are used to measure economic growth and poverty in countries and sub-regions where data are scarce (Henderson, Storeygard, and Weil, 2012; Jean and others, 2016; Engstrom, Hersh, and Newhouse, 2017). In Kenya, researchers analyze mobile phone call records to help combat malaria more effectively (Wesolowski and others, 2012). A sensor technology generates usage statistics to improve performance of water pumps in Kenya and Ethiopia (Thomas and others, 2018, Table B.1).

This paper explores the potential of Google’s search volume index (SVI)—a frequency of online search query submissions—to help narrow information gaps in LIDCs. Google’s SVI would contain fruitful information about individuals’ interests and attentions, considering the growing access to the Internet—especially, through mobile devices in developing countries—and Google’s global user share of over 90 percent (StatCounter, 2018). People may search for information online to make economic decisions or to look for some economic developments. The SVI could capture these human behaviors in search of information, and that is the information potentially useful for economic analyses (see Appendix I, Section D, for discussion to formalize this idea). The information search could be more relevant for cross-border activities—travel, trade, foreign investment—that may face larger information barriers than local activities, and thus, it could be particularly useful for analyses on LIDCs, where such external economic activities play a key role (IMF, 2015a).

To the best of our knowledge, this is the first study to apply Google’s SVIs to a macroeconomic analysis on a comprehensive set of developing countries. The existing literature focuses on the use of Google’s SVI for more developed countries than LIDCs (Table 1). Following Choi and Varian (2012; a working paper version was released in 2009), many researchers started to use Google’s SVI to forecast or nowcast socioeconomic indicators.3 The official statistical authorities and central banks have also adopted Google’s SVIs and other big data for policy-making, data compilation, and economic research, but the efforts are still largely concentrated in advanced or frontier emerging economies (IMF, 2018d, Box 3). Our main analysis covers about 50 LIDCs (less than the total of 59 due to lack of macroeconomic data, while SVIs are available for all countries) and we also extend the analysis to about 80 other emerging and developing economies.

Table 1.

Use of Google’s SVI in forecasting/nowcasting economic variables

article image
Source: Authors’ survey.Note: This list may not be exhaustive, and any omissions are purely incidental. See also Buono and others (2017) for a broader survey on the use of nontraditional data in macroeconomic nowcasting. AEs: advanced economies; EMEs: emerging market economies.

We find that Google’s SVI can provide useful information to enhance real-time monitoring of economic conditions in LIDCs. We construct a panel data set of the SVI for each country by setting the country name as a search topic. And to be more granular, we further collect SVIs by category. For example, for Uganda, the SVI under the finance category increases if someone submits a query such as “Uganda exchange rate,” other things being equal. We choose five categories (finance; business and industrial; law and government; health; travel) and find in-sample significance of some of these SVIs in simple regression models of contemporaneous forecasting (i.e., nowcasting) that predict macroeconomic variables, conditional on lagged covariates. The use of these SVIs also improves out-of-sample performance, albeit slightly, measured by the mean of squared forecasting errors, computed by recursive forecasting regressions.

Using SVIs under various categories altogether seems to help disentangle positive and negative effects from the changes in individuals’ attentions to a country. The fact that SVIs may signal confounded offsetting effects has been an issue in the application of the SVI (e.g., see Vozlyublennaia, 2014; page 18). In normal time, people may pay attention to a country if they are involved in some activities in the country, such as, searching for accommodations. This way, SVIs help identify positive effects on the country’s economy. However, people may also pay attention because of natural disasters, conflicts, epidemics, scandals, etc. These events are rather associated with negative effects on the economy. Combining SVIs under different categories may help separate these offsetting effects, albeit not perfectly (Scott and Varian, 2015; Acevedo, 2016). We generally find that the business-and-industrial and travel categories tend to be associated with positive effects, whereas the finance, law-and-government, and health categories tend to indicate negative effects.

The SVIs show stronger correlation with real GDP than that of nighttime lights for LIDCs, while the opposite is found for emerging market economies (EMEs). The significance of SVIs in the regressions for real GDP shows a stark contrast with the results for nighttime lights extracted from satellite imagery (Henderson, Storeygard, and Weil, 2012), which lost significance once lagged covariates are included in the regressors. This is striking, because nighttime lights are well accepted as a proxy to economic activity in the development literature. For EMEs, however, nighttime lights significantly correlate with real GDP while SVIs are not as significant as in the case of LIDCs. This contrasting finding may indicate some structural differences between LIDCs and EMEs.

In addition to these new empirical findings, this paper also contributes to the literature by providing a foundation for interpreting the SVI. The paper formalizes the underlying conditions where Google’s SVI could be associated with people’s attention to the entities represented by a query (Appendix I, Section D). These conditions clarify what can be captured by the SVI and what kind of biases the SVI is subject to, filling the gap in the literature and providing a solid basis for the empirical research using SVIs in general.

The rest of the paper is structured as follows. Section II explains how we compile the data from the Google Trends service, while leaving technical details to Appendix I. Section III presents the main empirical results, including the comparison with nighttime lights in Section III.C. Section IV discusses several extensions, such as the results for EMEs in Section IV.D. Section V concludes with policy implications. Appendix II presents supplementary tables.

II. Search Volume Index for a Country

The Google Trends service enables us to retrieve an SVI—a normalized measure of the search frequency—of a keyword or a topic. The SVI represents the number of search query submissions to the Google search engine on a keyword or a topic, relative to the total number of query submissions on all kinds of keywords. The SVI is further rescaled on a range of 0 to 100 so that the resulting time series of an SVI shows 100 at its maximum. We can specify the locations where the queries were submitted and the categories under which the searches were made. A search topic, rather than just a word, can be specified to resolve ambiguity due to homographs—e.g., word “Turkey” can mean a country or a bird (Stephens-Davidowitz and Varian, 2015)—by using Google’s Knowledge Graph service. See Appendix I for more details.

We use a country name as a search topic to obtain an SVI that proxies individuals’ attention to a LIDC. The SVI based on a country name will increase if more search queries about the country are submitted to the Google servers than any other search queries. We argue that this SVI could reflect the number of people all over the world who get interested in something about the country (see Appendix I, Section D, for the conditions under which this claim would hold) and that we may be able to extract useful information about the country from the SVI. We use Google’s Knowledge Graph service to resolve ambiguity of country names, including language issues (e.g., “Côte d’Ivoire” and “Ivory Coast”) and adjust SVIs to make them comparable across countries (Appendix I, Section C). The SVIs constructed as such exhibit some positive correlation with the country income levels.

To separate positive and negative sentiments, we retrieve SVIs by category. A common issue with the SVI is the difficulty in labeling search terms with positive or negative sentiment and identifying how they are linked to economic indicators. Among the 25 major categories, we choose five categories—finance; business and industrial; law and government; health; and travel—to capture searches related to economic activities (finance; business and industrial; travel) and at the same time to control for searches related to negative incidents that may adversely affect the economy (law and government; health). It is an empirical question how successful this strategy would be. Note that SVIs under more granular subcategories (as shown in Appendix Table 2) tend to return zeros due to lower search frequencies than Google’s reporting threshold.

Some cases illustrate underlying relationships between SVIs and economic activities. For example, the SVI for Myanmar under the travel category seems to capture the increasing trend of tourist arrivals to Myanmar since 2011 (Figure 1). From the beginning of 2011, Myanmar underwent a series of political reforms (IMF, 2015b). The following sections investigate whether this conjecture could be generalized, based on regression analyses.

Figure 1.
Figure 1.

SVI under the travel category and tourist arrivals in Myanmar

Citation: IMF Working Papers 2018, 286; 10.5089/9781484390177.001.A001

Sources: Google Trends, World Development Indicators (World Bank, 2018), and the authors’ calculations.

III. Can Google’s SVIs Improve Forecasting Performance for LIDCS?

A. Forecasting model

To examine potential of Google’s SVIs, we consider a simple forecasting model using SVIs. We construct a panel data set of SVIs (the yearly averages of monthly data) from 2004 to 2017 for 59 LIDCs, combined with macroeconomic data taken from several databases (see Appendix Table 3 for variable definitions and data sources; Appendix Table 4 for summary statistics; and Appendix Table 5 for pairwise correlation coefficients for selected variables). We postulate a simple linear regression as follows:

Yit=ρYi,t1+βSVIit+γXi,t1+αi+Dt+ϵit,

where Yit denotes a variable to predict (real GDP growth, real exports, travel arrivals, inflation, exchange rates, private capital inflows, FDI inflows); SVIit denotes a vector of SVIs under the selected five categories; Xit denotes a vector of other control variables; αi and Dt are country fixed effects and time dummies, respectively; and εit denotes the residuals. See Appendix Table 3 for how each variable is constructed and transformed (e.g., in natural logarithm or in percent change).

This specification is motivated by real-time assessment of the economy when only lagged data are available. We put control variables Xit with a one-year lag, whereas the SVIs are contemporaneous, because our purpose is to explore the benefits from timely observation of SVIs in real-time monitoring of the economy. For example, we consider a situation to assess real GDP growth for the year 2016 as of January 2017 when actual real GDP for 2016 was not available, although SVIs for 2016 were available. Control varibles Xit are chosen based on the empirical literature on variables to forecast (e.g., for economic growth regression, Barro, 2015; for the determinants of capital flows, Araujo and others, 2017; Hashimoto and Wacker, 2016; Choi and Hashimoto, 2018), although many of the control variables that are used in the literature are not included due to lack of observations for many LIDCs. For example, including the gross enrollment ratio to secondary education reduces the sample size by one-third, but the estimation results do not change significantly.

The purpose of the exercise is to find useful correlation between SVIs and economic variables, instead of establishing causality. Our simple model specification suffers from endogeneity due to any causalities from Yit to SVIit and the so-called Nickell bias due to the inclusion of country fixed effects and the lagged dependent variable (e.g., see Barro, 2015). We do not address these issues because our purpose is to predict Yit by modeling the expected value of Yit conditional on all the information available, instead of estimating structural causation between variables of interest (see Kleinberg and others, 2015, for a useful distinction between prediction and causation). Also, high correlation across SVIs by category—ranging from 0.77 to 0.92 (Appendix Table 5)—would not be a matter of concern in predicting Yit However, such high correlation would pose a challenge in separating the category SVIs into those that capture positive sentiments and those that capture negative sentiments.

B. In-sample regression results

We find some of the SVIs show significance in the simple forecasting model, contributing to a better fit of the model. We confirm that these findings are robust to the issue of sampling, conducted in constructing SVIs (see Appendix I for details), by repeating the same exercise for five separate vintages of the SVIs constructed during April-June 2018. For ease of exposition, we refer to the SVI under a category in a concise way; for example, the SVI under the business-and-industrial category is referred to the business-industrial SVI, and so on. Specific findings are as follows:

  • Economic activities (Table 2). The business-industrial SVI exhibits a significant positive correlation with real GDP, indicating that a 10 percent increase in business-related attention would be associated with a 0.7 percent increase in real GDP. The law -government SVI and the health SVI, on the other hand, show significant negative correlations, implying that these SVIs may capture slowdowns in economic activities due to public concerns on legal, political, or health issues. These SVIs show a broadly similar pattern of correlation with real exports and tourist arrivals—with larger magnitudes—, in line with a conjecture that people’s attention from outside of the country is the source of the observed correlations. The travel SVI is positively correlated with tourist arrivals. We have also tried tourism receipts, but the correlation is not as robust as for tourist arrivals, possibly because the SVI is more associated with the number of people interested in visiting the country, rather than how much they spend in the country.

  • Prices (Table 3). There is strong positive correlation between inflation and the finance SVI—a 10 percent increase in finance-related attention would be associated with an increase in inflation by 0.3 percentage points. The results for the nominal exchange rate imply that the finance SVI may reflect currency depreciation pressures and that its pass-through to inflation may explain the results for inflation. Correlation between the finance SVI and the real effective exchange rate (REER) is not significant, possibly due to relatively high pass-through in LIDCs. The law-government SVI seems to be correlated with REER appreciation, which we admit is not so intuitive because the law-government SVI is negatively associated with economic activities (as is shown in Table 2). The travel SVI is significantly associated with lower prices, which would be due to people’s travel interests to a destination with cheaper goods and services.

  • Capital flows (Table 4). We find positive associations between gross capital inflows and the business-industrial SVI. Motivated by Araujo and others (2017), we separately examine FDI and non-FDI flows and find somewhat stronger correlation for non-FDI flows. The finance SVI show no significant association, possibly because the SVI may be more associated with individuals’ behaviors (e.g., checking the exchange rate) and personal investment to these countries is not yet significant. The behaviors of institutional investors may be better captured by the business-industrial SVI. The travel SVI is negatively correlated with capital flows, which may reflect lower financing needs due to higher travel service receipts.

Table 2.

Economic activities and the search volume index (SVI) in LIDCs

article image
Sources: Chinn and Ito (2006), Google Trends, International Financial Statistics (IMF, 2018b), World Development Indicators (World Bank, 2018), World Economic Outlook (IMF, 2018e), and the authors’ estimation.Note. Sample period: 2004–2016. Cluster-robust standard errors are reported in parentheses. Superscripts *, **, and *** indicate statistical significance at the 10 percent, 5 percent, and 1 percent level, respectively. See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. LIDCs: low-income developing countries; REER: real effective exchange rate; SVI: search volume index.
Table 3.

Price developments and the search volume index (SVI) in LIDCs

article image
Sources: Chinn and Ito (2006), Google Trends, International Financial Statistics (IMF, 2018b), World Development Indicators (World Bank, 2018), World Economic Outlook (IMF, 2018e), and the authors’ estimation.Note. Sample period: 2004–2016. Cluster-robust standard errors are reported in parentheses. Superscripts *, **, and *** indicate statistical significance at the 10 percent, 5 percent, and 1 percent level, respectively. See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. LIDCs: low-income developing countries; REER: real effective exchange rate; SVI: search volume index.

The findings are broadly robust to model uncertainty (Table 5). We employ the Bayesian model averaging (BMA) methodology to examine robustness of our findings to specification uncertainty (Leamer, 1978). The estimation is implemented using the Stata command bma (De Luca and Magnus, 2011). The results show that our findings are mostly robust to specification uncertainty, although the correlations with inflation and capital flows are not so strong as they appear in Tables 3 and 4.

Table 4.

Capital flows and the search volume index (SVI) in LIDCs

article image
Sources: Chinn and Ito (2006), Google Trends, Financial Flows Analytics (IMF, 2018a), International Financial Statistics (IMF, 2018b), World Development Indicators (World Bank, 2018), World Economic Outlook (IMF, 2018e), and the authors’ estimation.Note. Sample period: 2004–2016. Cluster-robust standard errors are reported in parentheses. Superscripts *, **, and *** indicate statistical significance at the 10 percent, 5 percent, and 1 percent level, respectively. See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. LIDCs: low-income developing countries; REER: real effective exchange rate; SVI: search volume index.
Table 5.

Bayesian model averaging results for LIDCs

article image
Sources: Chinn and Ito (2006), Google Trends, Financial Flows Analytics (IMF, 2018a), International Financial Statistics (IMF, 2018b), World Development Indicators (World Bank, 2018), World Economic Outlook (IMF, 2018e), and the authors’ estimation.Note. Sample period: 2004–2016. Posterior inclusion probability (PIP) are reported in brackets. The coefficients are bolded if PIP exceeds 0.5, corresponding to what is known as the median probability model (Barbieri and Berger, 2004). The estimation is implemented using the Stata command bma (De Luca and Magnus, 2011). See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. LIDCs: low-income developing countries; REER: real effective exchange rate; SVI: search volume index.

C. Comparison with nighttime lights

Nighttime lights (NLs) extracted from processed satellite imagery can also serve as a nontraditional source of information for real-time economic monitoring, like SVIs. Since the seminal application by Henderson, Storeygard, and Weil (2012), NLs have gained popularity as a proxy to the degree of economic activity (for a recent survey on the economic applications of satellite data, see Donaldson and Storeygard, 2016). While Henderson, Storeygard, and Weil (2012) compile annual data based on the Defense Meteorological Satellite Program Operational Linescan System (DMSP OLS) data, a newer data set based on the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) is available monthly since April 2012 (until October 2018 as of November 18, 2018), although its annual data set—with additional data cleaning—is available only for 2015 and 2016 at the time of writing.4 We use the annual data compiled by the R package Rnightlights, developed by Njuguna (2018), while cross-checking them with the data compiled by Henderson, Storeygard, and Weil (2012). The correlation between the two NL data are almost one (Appendix Table 5).

We benchmark SVIs with NLs and find that SVIs may contain stronger signals on economic activity than NLs in LIDCs, while we find the opposite for EMEs. The significance of SVIs broadly remains while NLs are not statistically significant for LIDCs (Table 6, columns 1–4). For EMEs, however, the opposite is found—NLs are significant while SVIs are not (Table 6, columns 5–6). Further investigation indicates that the significance of NLs is lost for LIDCs when regressors include the lag of covariates (Appendix Table 6), whereas it is not lost for EMEs (Appendix Table 7). The contrasting results imply that there are some interesting structural differences between LIDCs and EMEs. For example, SVIs may better capture external factors, which may be relatively more important in LIDCs, whereas NLs may better reflect the level of domestic economic activity, which may play a larger role in EMEs than in LIDCs. The comparison between LIDCs and EMEs is also discussed in Section IV.D.

Table 6.

Search volume index (SVI) and nighttime lights (NLs)

article image
Sources: Chinn and Ito (2006); Earth Observation Group; GADM (2018); Google Trends; Financial Flows Analytics (IMF, 2018a); Henderson, Storeygard, and Weil (2012); International Financial Statistics (IMF, 2018b); National Geophysical Data Center (with U.S. Air Force Weather Agency); World Development Indicators (World Bank, 2018); World Economic Outlook (IMF, 2018e); and the authors’ estimation.Note. For ordinary least squares (OLS), cluster-robust standard errors are reported in parentheses. Superscripts *, **, and *** indicate statistical significance at the 10 percent, 5 percent, and 1 percent level, respectively. For Bayesian model averaging (BMA), posterior inclusion probability (PIP) are reported in brackets. The coefficients are bolded if PIP exceeds 0.5, corresponding to what is known as the median probability model (Barbieri and Berger, 2004). The estimation is implemented using the Stata command bma (De Luca and Magnus, 2011). The “NLs from HSW (2012)” line shows the coefficients on NL data (variable lndn) compiled by Henderson, Storeygard, and Weil (2012), available for 1992–2008. The “NLs from Rnightlights” line shows the coefficients on NL data compiled by the R package Rnightlights developed by Njuguna (2018), available for 1992–2013 based on DMSP OLS data (also used by Henderson, Storeygard, and Weil, 2012) and for 2015–2016 based on the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) data. The DMSP OLS data are based on the processed images provided by National Geophysical Data Center, while images are collected by U.S. Air Force Weather Agency. The VIIRS DNB data are produced by the Earth Observation Group, NOAA/NCEI. See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. Among EMEs, the NL data exclude countries identified as outliers by Henderson, Storeygard, and Weil (2012, footnote 16, p. 1011; Bahrain, Equatorial Guinea, Serbia, Montenegro). For the data compiled by Rnightlights, several large economies are also excluded due to their heavy computational burden (Brazil, Chile, China, Indonesia, India, Mexico, Peru, Russia). DMSP OLS: Defense Meteorological Satellite Program Operational Linescan System; EMEs: emerging market economies; LIDCs: low-income developing countries; NCEI: National Centers for Environmental Information; NOAA: National Oceanic and Atmospheric Administration; REER: real effective exchange rate; SVI: search volume index.

D. Out-of-sample nowcasting

We also examine out-of-sample performance of short-term forecasting (nowcasting). We conduct recursive forecasting using 2012 as the starting year and calculate the mean squared error (MSE) of prediction for 2013–2016.5 Namely, we predict the value of the variable of interest for 2013 by feeding observations available in 2013 (i.e., SVIs for 2013 and other variables for 2012) using the model estimated by the observations up to 2012. We then repeat this to predict values for 2014, 2015, and 2016, incrementally using more data to estimate the model.

We compare the best predicting models selected from the pool of variables with and without SVIs. As including irrelevant variables to a model may increase the MSE, we conduct an exhaustive search from the pool of SVIs and control variables to identify the set of variables with which the linear regression model minimizes the MSE, combined with country fixed effects and time dummies. We then do this again only for control variables, without SVIs, and compare the MSEs between the two best predicting models.6

This way, we find that adding SVIs to the pool of variables improves performance in nowcasting economic indicators. We find that for all economic indicators to predict, the MSE of the best model is lower when including SVIs in the pool of selection, in the case of LIDCs (Table 7, Panel A). The differences in MSEs between the best models with and without SVIs are not very large in general nor statistically significant. Note that most of our comparisons are between nested models and the standard statistical inference based on the Diebold-Mariano test (Diebold and Mariano, 1995) across nested models may not be valid, especially in the presence of autocorrelation or cross-panel dependency (e.g., see Diebold, 2015, for the review of the literature). The SVIs included in the best model are generally in line with the in-sample analysis, but not always the same. For example, for real GDP, while the law-government SVI is always selected in the top 10 models in terms of the MSE, as is significant in the in-sample results, the business-industrial SVI is not selected, but instead, the finance SVI is selected (Appendix Table 8). Further investigation would be interesting to reconcile in-sample and out-of-sample results, as is actively discussed in the literature (e.g., Inoue and Kilian, 2005; Diebold, 2015, and associated comment papers).

Table 7.

Out-of-sample performance of nowcasting

article image
Sources: Chinn and Ito (2006), Google Trends, Financial Flows Analytics (IMF, 2018a), International Financial Statistics (IMF, 2018b), World Development Indicators (World Bank, 2018), World Economic Outlook (IMF, 2018e), and the authors’ estimation.Note: We conduct recursive forecasting using a panel data set from 2004 to 2016. We set 2012 as the starting year and calculate the mean squared error (MSE) of prediction for 2013–2016. We predict the value of the variable of interest for 2013, by feeding observations available in 2013 (i.e., SVIs for 2013 and other controls for 2012) using the model estimated by the observations up to 2012. We then repeat this to predict values for 2014, 2015, and 2016, incrementally using more data to estimate the model. We include country fixed effects and time dummies, from which we back out the averaged constant term so that country fixed effects and time effects are redefined as deviations from the constant term, and thus, ex ante time effects for prediction years can be assumed to be zero. Panel A shows the results for LIDCs and Panel B shows the results for EMEs. The “Control variables + SVIs” lines show the minimum MSEs identified by an exhaustive search from the pool of all variables to be included in the model. The “Controls only” lines show the minimum MSEs identified by an exhaustive search from the pool of control variables, excluding the SVIs. See Appendix Tables 8 and 9 for the best model specifications chosen in this procedure. To overcome a computational challenge stemming from the exhaustive search across variables to include, we follow the algorithm proposed by Somaini and Wolak (2016) to speed up the calculation to estimate regressions with two-way fixed effects. The “Difference (in percent)” lines show the differences of the above two lines in percent of the second line. Superscripts *, **, and *** indicate statistical significance at the 10 percent, 5 percent, and 1 percent level, respectively, based on a Diebold-Mariano test (Diebold and Mariano, 1995) using cluster-robust standard errors, although it should be noted that most of these model comparisons are between nested models and conducting statistical inference across nested models is not trivial, especially when forecasting errors could exhibit autocorrelation or cross-panel dependency (e.g., see Diebold, 2015, for a review of the literature). The nominal exchange rate is the local currency per U.S. dollar, transformed to annual percent changes, period average. See Appendix Table 1 for country groupings and Appendix Table 3 for variable definitions (most of variables are in natural logarithm or percent change) and data sources. For inflation and nominal exchange rate, we divide them by 100 to be comparable to other logged variables for this table. EMEs: emerging market economies; LIDCs: low-income developing countries; SVI: search volume index.

IV. Extensions

A. Jumps in the SVIs

We observe jumps (or positive outliers) in SVIs occasionally. These acute increases in the SVIs are associated with critical events, including natural disasters, major policy changes, and key developments in the business environment. We identify 178 jumps in the SVI for the “all” category (i.e., with no category specified) out of 804 observations in our sample for LIDCs, using a methodology in the finance literature (Lee and Mykland, 2008). The difference between the squared percent change and the consecutive absolute percent change (called bi-power variations) indicates a huge change in the SVI within a period (see Appendix I, Section E for details). The reason for not using each SVI by category for the jump detection is to focus on very acute increases in individuals’ attention that are significant enough to stand out in the SVI with no category specified, even though their causes would be category-specific.

Excluding the periods when a jump occurred seems to sharpen estimation results. As each jump would have a very different implication from one another, we exclude those periods with jumps from the sample and re-estimate our models. The results show more statistical significance in many cases, while there is no significant change for inflation and the significance rather weakens for real exports and FDI inflows (Appendix Table 10). This implies that jumps in SVIs could indicate the periods when their relationships with economic variables become unstable or strongly nonlinear, and that excluding such periods either strengthens the true linear relationships or weakens the spurious significance.

B. Lagged effects

There could be time lags for people’s attentions to materialize as actual economic actions. Search of background information would happen before travel or investment take place. In this regard, SVIs could rather serve as a leading indicator.

In our specifications, lagged SVIs do not show significant correlation as clearly as contemporaneous SVIs do (Appendix Table 11). This is probably because our models are at the annual frequency and the one-year lag could be too long. An exception is the case of private capital flows where lagged SVIs work better. For real GDP, lagged SVIs seem to complement contemporaneous SVIs. More meaningful leading signals could possibly be found in the SVIs at a higher frequency such as monthly, although limited availability of other indicators at a higher frequency would pose a challenge for such an analysis.

C. Searches made domestically

We further examine SVIs on the searches made domestically. We construct an additional data set of SVIs by changing the location from “worldwide” to each country of interest—e.g., searches about Bangladesh made in Bangladesh. We refer to these SVIs as domestic SVIs. The domestic SVIs would capture individuals’ attention to a country in that country. The domestic SVIs are more likely to be subject to the issue of low responses and the reporting cut-off, but they would potentially capture certain activities (especially those that happened locally) better than the worldwide SVIs.

Including domestic SVIs do not generally change the regression results, implying that the major source of information from worldwide SVIs is attention from foreign locations. Results do not change for most of the cases, except capital flows, which now show weaker correlation (Appendix Table 12). The domestic business-industrial SVI is negatively associated with inflation, which may reflect the importance of inflation for local businesses.

D. Does it work for EMEs too?

We also investigate whether Google’s SVIs would be useful for macroeconomic analyses in EMEs. Our work can naturally extend to EMEs, many of which share common characteristics with LIDCs (see Appendix Table 1 for the list of the EMEs and Appendix Table 13 for summary statistics for EMEs).

For EMEs, the correlations between SVIs and macroeconomic variables are not as robust as those for LIDCs (Appendix Table 14). As discussed in Section III.C, the weaker correlations might imply relatively weaker influences of the external factors to EMEs than LIDCs—due to larger domestic markets in EMEs—because SVIs may better capture external factors related to online searches from abroad. Another reason could be that investors’ behaviors to gain information about EMEs through the Internet may not be significant signals among other key factors in more matured and complicated financial markets in EMEs than those in LIDCs. Similarly, adding SVIs does not improve the nowcasting accuracy for EMEs as much as it does for LIDCs (Table 7, Panel B; Appendix Table 9).

V. Conclusion

This paper presents an effort to use advanced technology to address the recurrent issue of lack of information in policy-making and analysis for developing economies. While progress has been made in timely provision of official data, nontraditional data obtained through recent technology have enormous potential to fill information gaps in developing economies. We investigate how much information we could obtain from Internet search frequencies to strengthen the capacity to monitor and assess current economic developments.

Our findings help us better utilize new sources of information such as Google Trends’ data in economic analyses. Useful information contained in Google’s SVI is demonstrated by the improved in-sample and out-of-sample performances of a simple forecasting model, conditional on lagged macroeconomic variables. The contrasting results between LIDCs and EMEs regarding the comparison of SVIs and another new source of information—nighttime lights—not only demonstrate the stronger case for the use of SVIs for LIDCs but also suggest the need to further investigate any structural differences between these country groups. The estimated regression models indicate whether positive or negative effects are to be expected for each SVI and provide quantitative implications from the changes in SVIs. The results also indicate that jumps or outliers in SVIs may need to be separately treated because the estimated linear relationships are likely to break on these occasions. Monitoring SVIs can complement the use of judgement required in making forecasts, particularly for low-income countries where statistical models are generally less reliable than advanced economies due to data availability (Independent Evaluation Office, 2014, paragraph 34, p. 13).

There are several lessons learned about the use of Google Trends’ data in economic analyses. First, monitoring SVIs under several categories is recommended to separate positive and negative signals. Second, interpreting jumps in SVIs warrants caution as they may likely indicate a departure from the normal relationships. What causes a jump can be identified by typing the country name and the period when a jump occurred into an online search engine. Lastly, a more granular analysis using specific search terms would be attractive but indeed challenging. This is not only because such an analysis would highly depend on the choice of terms (e.g., see a discussion by Smith, 2016, cited by Harchaoui and Janssen, 2018), but also because using more than one search term often leads to very low frequencies and sometimes falls below a threshold to be cut off, resulting in a zero response. For this reason, we use Google’s Knowledge Graph service to identify a topic rather than a term and keep our topic as broad as a country, while achieving granularity by using SVIs under various categories. These practical solutions, however, rely on nontransparent methodologies and could undermine the credibility of the analyses.

There is still more to be explored to fully realize the potential benefits of using Google’s SVIs. Our results at the annual frequency makes the case for more practical analyses on the use of Google’s SVIs in constructing high frequency indicators of economic activities, as SVIs are available monthly (or even weekly for past 5 years, via the web service). In practice, nowcasting models may need to be tailored to the country of applications for more accuracy. Taking care of jumps in SVIs would be more important in such analyses, as these jumps can be noise or may serve as forewarning for a surge or a decline in the economy. Lastly, more flexible methodologies to analyze data, such as machine learning techniques discussed by Varian (2014) and Mullainathan and Spiess (2017), could help extract more useful information from the SVIs.

The use of SVIs to cross-check the validity of official statistics would be interesting, but we need to be cautious. As is the case for nighttime lights (Henderson, Storeygard, and Weil, 2012), the SVIs may possibly be used to cross-check the validity of official statistics, particularly in the context of a large share of the informal economy in LIDCs and other developing economies. If official statistics (e.g., real GDP) appeared much lower than the levels implied by observed SVIs, then it might indicate that a sizable portion of economic activities might not be captured by official statistics. This is the same logic behind the sociological literature on measuring issue salience (e.g., Stephens-Davidowitz, 2017). We need to be cautious, however, because a deviation between SVIs and official statistics would not necessarily be a proof of inaccuracy in the official statistics. Other factors include noises in SVIs themselves, unannounced changes in measurement of SVIs, and structural breaks in the relationships between SVIs and economic activities. Reis, Ferreira, and Perduca (2014, section 6) list the challenges in using Google’s SVIs in compiling official statistics, including transparency, auditability, consistency in measurement over time, and continuity of the Google Trends service in the future.

Further research is also needed for a more systematic use of Google’s SVIs in policy decision making. Although the frequency of online search per se should be as objective as transaction data—unlike qualitative indicators based on subjective judgements—, it is still influenced by uncertainties stemming from the natural language processing algorithm used to compile category-specific SVIs (whose details are not disclosed to the public) and from Google’s Knowledge Graph service that may not perfectly distinguish topics very close to each other (e.g., the Republic of the Congo versus the Democratic Republic of the Congo). The objectivity of Google’s SVIs can be examined by comparing them with survey data (Vosen and Schmidt, 2011). In addition, while we provide certain conditions in Appendix I, Section D where the SVI could represent people’s attention without bias, the SVI may send a biased signal if these conditions do not hold. Lastly, as is known as Campbell’s law (Campbell, 1979), a predominant use of Google’s SVIs in policy decision making could provide undesirable incentives to manipulate frequencies of particular search terms— manually or automatically using Internet bots—, distorting the useful relationships between SVIs and macroeconomic data. Addressing these concerns and caveats is left for future research.

In Search of Information: Use of Google Trends’ Data to Narrow Information Gaps for Low-income Developing Countries
Author: Mr. Futoshi Narita and Rujun Yin