The widespread availability of internet search data is a new source of high-frequency information that can potentially improve the precision of macroeconomic forecasting, especially in areas with data constraints. This paper investigates whether travel-related online search queries enhance accuracy in the forecasting of tourist arrivals to The Bahamas from the U.S. The results indicate that the forecast model incorporating internet search data provides additional information about tourist flows over a univariate approach using the traditional autoregressive integrated moving average (ARIMA) model and multivariate models with macroeconomic indicators. The Google Trends-augmented model improves predictability of tourist arrivals by about 30 percent compared to the benchmark ARIMA model and more than 20 percent compared to the model extended only with income and relative prices.

Abstract

The widespread availability of internet search data is a new source of high-frequency information that can potentially improve the precision of macroeconomic forecasting, especially in areas with data constraints. This paper investigates whether travel-related online search queries enhance accuracy in the forecasting of tourist arrivals to The Bahamas from the U.S. The results indicate that the forecast model incorporating internet search data provides additional information about tourist flows over a univariate approach using the traditional autoregressive integrated moving average (ARIMA) model and multivariate models with macroeconomic indicators. The Google Trends-augmented model improves predictability of tourist arrivals by about 30 percent compared to the benchmark ARIMA model and more than 20 percent compared to the model extended only with income and relative prices.

I. Introduction

Tourism is the main engine of economic growth across the Caribbean, creating millions of jobs and generating billions of dollars in exports. In the case of The Bahamas, the economy is extremely dependent on fast-growing tourism, which contributes—directly and indirectly—about 48 percent of GDP and 56 percent of employment, respectively. According to the World Travel and Tourism Council, the share of tourism is projected to reach 60 percent of GDP and 70 percent of the workforce by 2030. Tourist arrivals to The Bahamas has increased by 9.8 percent over the past decade, reaching over 6.5 million visitors (including cruise passengers) in 2018, with about 80 percent coming from the United States (U.S.). While the tourism sector makes a significant contribution to the economy, it is also a major source of volatility due to greater exposure to external factors. Therefore, improving the prediction of tourist arrivals is important for forecasting overall economic growth as well as for effective planning and budgeting by the government and the private sector.

This study conducts a comprehensive investigation of the predictive ability of Google Trends data for tourist arrivals to The Bahamas. With the spread of the Internet throughout the world, the data collected by search engines like Google allows researchers to measure the intended behavior of consumers at the individual level and take that into account in forecasting at the macroeconomic level. Furthermore, the availability of internet search data provides new high-frequency information that can potentially improve forecast accuracy. Accordingly, this paper develops an econometric model of tourist arrivals to The Bahamas during the period 2004–2018 using online search data and assess its forecasting accuracy against the standard autoregressive integrated moving average (ARIMA) model. This approach follows seminal papers on the use of online search queries as an alternative source of high-frequency information to improve forecasting (Choi and Varian, 2009; Della Penna and Huang, 2009; Choi and Varian, 2012; Scott and Varian, 2015). While the jury is still out on the gain that economists can get from using the Google Trends data, there is a growing body of research showing that “big data” can be extremely useful when information is fragmented or missing altogether, as in the case of many developing and low-income countries (Carrière-Swallow and Labbé, 2013; Narita and Yin, 2018).2 Therefore, this paper’s contribution to the literature is to expand the traditional time series model in forecasting tourism demand by incorporating internet search data and assessing the predictive power of alternative models.

The results indicate that internet search activity-augmented forecast models perform significantly better than the traditional time-series models. A high degree of persistence is found in tourist arrivals to The Bahamas from the U.S., as indicated by the statistically significant autoregressive lags. Extending the model with exogenous variables reveal that both the income and exchange rate elasticities of tourist flows have the expected positive and negative signs, respectively, but the coefficients are statistically insignificant at the conventional levels. In other words, while the number of visitors from the U.S. to The Bahamas increases with the average level of personal income in the U.S. and decreases with higher relative prices in The Bahamas, these effects do not appear to reach a significant threshold. Augmenting the forecast model with the Google Trends data, however, improves the overall fit of the model with a positive and statistically significant coefficient. This means that online search queries in the U.S. about traveling to The Bahamas provide additional information that helps outperform the traditional time-series models in predicting tourist arrivals to The Bahamas from the U.S. Furthermore, the quality of alternative forecast models is checked through three different measures of predictive accuracy—the mean absolute error (MAE), the root mean squared error (RMSE) and the Theil Inequality Coefficient (U-Theil). The results confirm that the model augmented with the Google Trends data improves predictability of tourist arrivals by about 30 percent compared to the benchmark ARIMA model and more than 20 percent compared to the model extended only with personal income and relative prices.

The remainder of this paper is organized as follows. Section II provides a summary of the related literature. Section III describes the compilation of data used in the analysis. Section IV presents the empirical methodology and specifications employed to predict tourist flows and discusses the results. Section V offers concluding remarks.

II. Related Literature

This paper connects two different strands of the literature. The first thread relates to the determinants of international tourism flows, and the second is on utilizing Google search data to improve variety of economic and financial forecasts. Tourism demand for a destination is influenced by a variety of factors, including cultural, economic and social. Uysal (1998), Turner and Wit (2001) and Goh and Law (2003) argue that both quantitative and qualitative factors, such as price, income, cultural and historical heritage, and advertising influence tourism activity. Zhang, Song, and Huang (2009) and Culiuc (2014) show that developed countries, with a larger share of global tourism flows, tend to have higher elasticity with respect to income and real exchange rates, while tourist flows to small island countries are less sensitive to macroeconomic developments. Focusing on tourism flows to the Caribbean, Wolfe and Romeu (2011) and Laframboise and others (2014) find that tourist arrivals and expenditures are sensitive to income in source markets and price factors, although not in high-end destinations like The Bahamas. Acevedo and others (2016), on the other hand, concentrate on the role of airlift supply on tourism and show that the number of flights from the U.S. is a principal factor influencing tourist flows to the Caribbean.

There is a growing literature on various applications of online search data in a wide spectrum of areas. Providing an overview of the literature using the Google Trends data, Jun, Yoo, and Choi (2018) show that research using data on internet search activity has increased significantly over the past decade, covering a wide range of areas from the spread of influenza to house and motor vehicle sales. Internet search data, such as the Google Trends database, provides valuable knowledge about the information seeking behavior of consumers and businesses on a time-series basis for a given search term’s relative popularity in a geographic region (Shim and others, 2001). Choi and Varian (2009; 2012) and Askitas and Zimmermann (2009) illustrated the usefulness of internet search queries compiled by Google in forecasting unemployment trends and housing market developments. The internet search process, as captured by Google Trends, reveals information regarding the intention of consumers and investors at the macro level. For example, Vosen and Schmidt (2011) show that internet search data helps better predict private consumption. This finding is supported at a disaggregated level as well for the forecast accuracy of motor vehicle purchases (Carrière-Swallow and Labbé, 2013) and movie admissions (Hand and Guy, 2015). Similarly, Chen and others (2015) predict economic recessions with internet searches related to “recession” and “layoff” in the U.S., while Suhoy (2009) finds that the Google Trends data helps better predict the 2008 recession in Israel. Da, Engelberg, and Gao (2011) show that online search data is a robust indicator of investor appetite and thereby stock market movements, while Fantazzini and Fomichev (2014) forecast the price of crude oil using Google Trends data along with macroeconomic variables.

Alternative sources of information including Google Trends are found to be useful in forecasting international tourist flows. The ARIMA model is the most widely used time-series forecasting approach in the tourism sector, even though studies show that it does not always perform well compared to alternative techniques (Goh and Law, 2002; Lorde and Moore, 2008; Song and Li, 2008; Jackman and Greenidge, 2010; Hadavandi and others, 2011; Claveria, Monte, and Torra, 2016). With the greater use of the Internet across the world, online search data has become a valuable source of information on tourism trends. For example, Pan, Wu, and Song (2012) and Siliverstovs and Wochner (2018) show that including information about aggregated search trends improves the accuracy of forecasting demand for hotel rooms in California and Switzerland, respectively. Jackman and Naitram (2015) find that tourist arrivals to Barbados from Canada and the United Kingdom (U.K.) could be better predicted by using internet search queries from these countries, while Bangwayo-Skeete and Skeete (2015) conclude that Google Trends data help forecast tourism flows from Canada, the U.S. and the U.K. to five Caribbean destinations. Similarly, Rivera (2016) and Yang and others (2015) find a significant reduction in forecasting errors with internet search queries in the case of tourism demand in Puerto Rico and China, respectively.

III. Data Overview

This study utilizes a monthly dataset covering tourist arrivals to The Bahamas and related online search queries over the period from January 2004 to December 2018. Monthly data of tourist arrivals to The Bahamas from the U.S. are sourced from the Ministry of Tourism and covers stop-over tourists arriving via air, which amount to 1.63 million (or about 25 percent of total) in 2018. Data on real personal income in the U.S. is obtained from Haver Analytics and the real effective exchange rate (REER) index is drawn from the IMF’s World Economic Outlook database.

Data on internet search activity is extracted from Google Trends, which provides country-specific and high-frequency indicators. Google is the largest internet search engine in the world, accounting for over 90 percent of search activity. The Google Trends data—available since January 2004—aggregate individual search queries on Google according to terms, time, category and location based on the Internet Protocol (IP) address from which the search is conducted. Google uses a sampling procedure that introduces measurement error into the series. Data downloads for the same search query in different days are based on different samples and, consequently, will lead to marginally different series. To minimize this measurement error, following the approach used by Carrieré-Swallow and Labbé (2013), the Google Trends data is collected each day over a 30-day period, and the average of each observation is calculated over this period. These monthly series are normalized by the total number of Google search queries in a given country and ranges between a minimum value of 0 and a maximum value of 100. Stephens-Davidowitz and Varian (2014) provide further details on the construction of the Google Trends data, which are available at www.google.com/trends. Out of a selection of online search keywords related to travel to The Bahamas originating from the U.S., it is concluded that “The Bahamas travel” is the most relevant online search term for the empirical analysis of tourist arrivals to The Bahamas.3

Figure 1.
Figure 1.

Internet Searches and Tourist Arrivals

Citation: IMF Working Papers 2020, 022; 10.5089/9781513526348.001.A001

Source: Google Trends; Ministry of Tourism.

Descriptive statistics for the variables are presented in Table 1. The monthly data on tourist arrivals to The Bahamas and internet search activity related to traveling to The Bahamas exhibit a significant degree of seasonality over the course of a year. Therefore, all series used in the empirical analysis are seasonally adjusted by applying the X-13ARIMA-SEATS seasonal adjustment procedure developed by the U.S. Census Bureau (2013).4 It is essential to analyze the time-series properties of the data to avoid spurious results by conducting panel unit root tests. The stationarity of time series is checked by applying the Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests, and the results, presented in Appendix Table 1, indicate that the variables used in the analysis are stationary after logarithmic transformation.

Table 1.

Summary Statistics

article image
Source: Google Trends; Haver Analytics; IMF; Ministry of Tourism; author’s calculations.

IV. Empirical Methodology and Results

This paper compares the forecasting accuracy of an online search activity-augmented model against the traditional time-series models. The ARIMA approach in time-series analysis, developed by Box and Jenkins (1976), identifies an adequate representation of the stochastic process from which the sample is derived. As a benchmark, a univariate ARIMA model is estimated to forecast tourist flows using only past tourist arrivals in the following form:

log(At,c)=α+Σi=112βilog(Ati,c)+ϵt(1)

in which At,c denotes the seasonally-adjusted number of tourist arrivals to The Bahamas at time t from the country c (the U.S.) and εt is the error term. The ARIMA model combines two regression processes: (1) an autoregressive (AR) process that assumes that the dependent variable is a function of its own past values; and (2) a moving average (MA) process that allows the inclusion of persistent random shocks. A range of ARMA(p, q) models of order up to p = 4 and q = 4 is estimated to capture the best possible time-series characterization, and the ARMA(1,1) is chosen as the benchmark specification according to the Akaike information criterion (AIC). In the next step, the model is extended so as to incorporate exogenous variables:

log(At,c)=α+Σi=112βilog(Ati,c)+Σi=112δilog(Xti,c)+ϵt(2)

where Xt,c includes monthly data on personal income in the U.S. and The Bahamas’ REER. This model helps ensure that internet search volumes do not merely present information already reflected by macroeconomic factors. Finally, the Google Trends data is added to estimate the following internet search activity-augmented model:

log(At,c)=α+Σi=112βilog(Ati,c)+Σi=112δilog(Xti,c)+Σi=112γilog(Gti,c)+ϵt(3)

in which Gt,c represent the Google Trends data related to online search activity concerning travel to The Bahamas conducted at time t in the U.S. The correlation coefficients between tourist arrivals and explanatory variables at different lag periods indicate that the best fit is contemporaneous for personal income and the REER and at a lag order of 2 months for the Google Trends data.5

The analysis indicates that the online search activity-augmented model outperforms in forecasting tourist arrivals to The Bahamas. The empirical results, presented in Table 2, show that the autoregressive lags are statistically significant, indicating a high degree of persistence in tourist arrivals to The Bahamas from the U.S. When personal income is added as exogenous variable to the standard model, the estimated coefficient has the expected positive sign, but it is found to be statistically insignificant at the conventional levels. This is broadly consistent with previous studies showing that high-end tourism destinations like The Bahamas do not tend to be significantly sensitive to income in source markets.6 Similarly, the REER has the expected negative coefficient, but no statistical significance.7 This implies that the negative effect of higher relative prices in The Bahamas on tourist arrivals from the U.S. is not statistically significant, which may manifest the fact that the REER remained in a narrow range during the sample period and most visitors from the U.S. may have relatively high standards of living.8 Finally, the results show that augmenting the model with the Google Trends data improves the overall fit of the model with a positive and statistically significant coefficient. The in-sample estimation performance means that online search queries in the U.S. about traveling to The Bahamas provide better information on the number of visitors to The Bahamas from the U.S.

Table 2.

Tourist Arrivals to The Bahamas—Estimation Results

article image
Note: Standard errors are reported in brackets. A constant is included in each regression, but not shown in the table. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

The quality of alternative forecast models is evaluated through three different measures of out-of-sample predictive accuracy. To evaluate forecast performance of these alternative time-series models, the MAE, the RMSE, and the U-Theil—the most commonly used metrics in the literature—are employed as defined by the following equations:

MAE=1nΣt=1n|A^t,cAt,c|(5)
RMSE=1nΣt=1n(A^t,cAt,c)2(6)
UTheil=1nΣt=1n(A^t,cAt,c)21nΣt=1n(A^t,c)2+1nΣt=1n(At,c)2(7)

in which Ât,c and At,c are the predicted and actual number of tourist arrivals to The Bahamas at time t from the U.S., respectively, and n is the number of observations in the sample. The model with the lowest MAE, RMSE, and U-Theil values is considered to better forecast accuracy.

The forecast model augmented with the Google Trends data improves predictive accuracy by 30 percent compared to the benchmark ARIMA model. All models are based on the data covering the period from January 2004 to December 2017, then tested in forecasting the final 12 months of data (covering 2018). The results, presented in Table 3, indicate that the inclusion of exogenous variables (personal income and the REER) leads to a significant decline in the MAE, the RMSE and the U-Theil, which means greater precision in forecasting. With the extended model including internet search queries, there is even greater improvement in forecast accuracy compared to models without the Google Trends data. The model augmented with online search data (Model 4) improves forecasting performance by about 30 percent compared to the benchmark ARIMA model (Model 1) and more than 20 percent compared to the model extended only with macroeconomic variables (Model 3).

Table 3.

Tourist Arrivals to The Bahamas—Evaluation of Forecast Models

article image
Note: Each model is based on the data covering the period January 2004-December 2017, then tested in forecasting on the final 12 months of data (covering 2018). The model with the lowest MAE, RMSE, and U-Theil values is considered to better forecast accuracy, which is shown in bold.

These findings are in line with previous studies, but an additional robustness check is performed by estimating a vector autoregressive (VAR) model. The lag length for the VAR model is obtained by the AIC as p = 4. The baseline model (1) refers to a univariate specification based only on historical tourist arrivals and thereby provides a benchmark to evaluate the forecast performance of multivariate models incorporating exogenous variables (personal income, the REER, and Google Trends data). The VAR models’ overall goodness-of-fit and forecasting accuracy are compared using the adjusted R2 and the RMSE and Theil measures, respectively. These results, reported in Table 4, indicate that the inclusion of personal income and the REER leads to a small increase in the adjusted R2 and a slight decline in the RMSE and Theil metrics, which imply a limited improvement in forecasting accuracy compared to the baseline model. The Google Trends-augmented model, however, greater forecast superiority, boosting the adjusted R2 by more than 16 percent and lowering the RMSE and Theil measures by 10 percent.

Table 4.

Tourist Arrivals to The Bahamas—Evaluation of VAR Models

article image
Note: The model with the highest adjusted R2 and the lowest RMSE and Theil values is considered to better forecast accuracy, which is shown in bold.

V. Conclusion

This paper investigates the use of online search data to forecast tourist arrivals to The Bahamas from the U.S. during the period 2004–2018. As tourism is the main engine of economic growth in The Bahamas, accurate forecasting of tourist arrivals is critical for informed decision-making by policymakers and businesses. The main objective of this paper is therefore to evaluate alternative forecast models and the usefulness of the Google Trends data in predicting monthly tourist arrivals to The Bahamas from the U.S., which accounts for about 80 percent of the total number of visitors. The results presented in this paper indicate that the forecast model incorporating online search data outperforms the standard autoregressive models and those including other exogenous variables (such as personal income and the REER) in forecasting tourist flows. Furthermore, checking the quality of alternative forecast models through three different measures of predictive accuracy, the model augmented with the Google Trends data is found to bring a significant improvement in forecast performance. These findings are broadly consistent with previous studies and indicate that internet search data can help in real-time surveillance and provide more reliable forecasts of tourism activity. Hence, policymakers and private firms operating in the tourism industry would benefit from internet search data-augmented forecast models in better planning and investment.

Where Should We Go? Internet Searches and Tourist Arrivals
Author: Mr. Serhan Cevik