Identifying Optimal Indicators and Lag Terms for Nowcasting Models
Author:
Jing Xie
Search for other papers by Jing Xie in
Current site
Google Scholar
PubMed
Close

Many central banks and government agencies use nowcasting techniques to obtain policy relevant information about the business cycle. Existing nowcasting methods, however, have two critical shortcomings for this purpose. First, in contrast to machine-learning models, they do not provide much if any guidance on selecting the best explantory variables (both high- and low-frequency indicators) from the (typically) larger set of variables available to the nowcaster. Second, in addition to the selection of explanatory variables, the order of the autoregression and moving average terms to use in the baseline nowcasting regression is often set arbitrarily. This paper proposes a simple procedure that simultaneously selects the optimal indicators and ARIMA(p,q) terms for the baseline nowcasting regression. The proposed AS-ARIMAX (Adjusted Stepwise Autoregressive Moving Average methods with exogenous variables) approach significantly reduces out-of-sample root mean square error for nowcasts of real GDP of six countries, including India, Argentina, Australia, South Africa, the United Kingdom, and the United States.

Abstract

Many central banks and government agencies use nowcasting techniques to obtain policy relevant information about the business cycle. Existing nowcasting methods, however, have two critical shortcomings for this purpose. First, in contrast to machine-learning models, they do not provide much if any guidance on selecting the best explantory variables (both high- and low-frequency indicators) from the (typically) larger set of variables available to the nowcaster. Second, in addition to the selection of explanatory variables, the order of the autoregression and moving average terms to use in the baseline nowcasting regression is often set arbitrarily. This paper proposes a simple procedure that simultaneously selects the optimal indicators and ARIMA(p,q) terms for the baseline nowcasting regression. The proposed AS-ARIMAX (Adjusted Stepwise Autoregressive Moving Average methods with exogenous variables) approach significantly reduces out-of-sample root mean square error for nowcasts of real GDP of six countries, including India, Argentina, Australia, South Africa, the United Kingdom, and the United States.

Section 1. Introduction

When major economic shocks such as the COVID-19 pandemic or the global financial crisis occur, governments typically use counter-cyclical policies to soften the severity of the negative shock to real gross domestic product (GDP). Such evidence-driven counter-cycle policy requires timely information on the state of the economy relative to trend. Unfortunately, the required data is often unavailable because of: (a) the so-called “ragged-edge” problem arising from publication lags or, more generally, missing data points, especially in the case of real GDP (Wallis, 1986) and (b) the mixed/incompatible frequencies with which key economic indicators are available (Armesto, Engemann, & Owyan, 2010).

Many central banks and government agencies use nowcasting techniques (e.g., Bridge, Mixed-Data Sampling and Dynamic Factor Model) to address these issues. Examples include the European Central Bank (Bańbura, et al., 2013), the Central Bank of Malta (Ellul and Ruisi, 2022), and the Federal Reserve Bank of Atlanta (Higgins, 2014). Nowcasting—the art of forecasting “the here and now”—enables real-time forecasting of lower frequency variables (such as real GDP and inflation) using more timely indicators that have similar or higher frequencies. Standard nowcasting models typically involve two steps: (a) forecasting the high frequency indicators in the preferred baseline nowcasting regression to eliminate the ragged edge problem; and (b) converting the high frequency indicators to the target frequency of the baseline regression. The way this conversion proceeds identifies the specific nowcasting procedure used (see Section 5).

A critical shortcoming of existing nowcasting methods, however, is that they do not provide adequate guidance on the selection of the right-hand side variables (typically exogenous) to include in the baseline regression. Moreover, the appropriate order of the autoregression (AR) and moving average terms (MA) to use in the baseline regression is rarely discussed and is often set arbitrarily. Indeed, to the best of our knowledge, not many nowcasting exercises use an ARIMA model with exogenous variables (i.e., ARIMAX model). Interestingly, medical researchers have for some time been successfully using ARIMAX models to nowcast influenza outbreaks with Google Flu Trends as exogenous variable, reporting significant reductions in mean absolute error (MAE) compared to using a more standard baseline model with previous flu levels only as explanatory variables (Preis & Moat, 2014).

This paper investigates the effectiveness of ARIMAX models for nowcasting key economic variables such as real GDP. We propose a simple procedure for selecting ─ from a larger set of economic variables ─ indicators that are economically meaningful (in the sense that their estimated coefficient is consistent with economic priors), statistically significant, and effective in terms of improving the accuracy of the nowcast.

For the example of India’s real GDP, we show that applying a simple variable selection procedure that allows for ARIMA(p,q) terms in addition to optimally selected explanatory variables significantly enhances the nowcasting performance of the Bridge and U-MIDAS estimators relative to benchmark models formulated without using the proposed variable selection procedure.

The remainder of the paper is organized as follows: In Section 2 we review the automatic ARIMA estimation procedure available in EViews. Section 3 proposes an “Adjusted Stepwise ARIMAX Variable Selection Procedure (henceforth AS-ARIMAX)” to identify “optimal” ARIMA orders and exogenous variables for nowcasting1. The approach is implemented using EViews’ automatic ARIMA selection procedure and customized codes. Sections 4 and 5 define the three benchmarks and two nowcasting models that we use for the empirical study. Sections 6 and 7 apply the AS-ARIMAX method to India’s real GDP, yielding significant forecasting gains relative to the benchmark nowcasting models. Section 8 applies the AS-ARIMAX method to nowcast five additional countries’ real GDP, to further prove the efficiency and applicability of the approach. Section 9 concludes.

Section 2. Automatic ARIMA Selection Procedure

Although EViews provides comprehensive tools for users to determine the orders of the ARIMA model using traditional (non-automated) Box-Jenkins methods, the procedure can be time-consuming and comes with significant risk of misidentification because of the difficulty of matching the data’s correlogram with a specific ARIMA model. To improve efficiency and model identification, EViews also offers an automatic ARIMA model selection procedure to help users automatically determine the appropriate ARIMA specification. This procedure involves the following steps (EViews User’s Guide I, pp538 – 540):

Step 1. Selecting appropriate transformations of the dependent variable

EViews runs the following two regressions to determine the appropriate transformation method:

D(yt)2=α1+β1yt(1)
Dlog(yt)2=α2+β2log(yt)(2)

Each of these regressions is a simple test for heteroskedasticity, with lower absolute t-statistic on β suggesting more homoskedasticity than heteroskedasticity. EViews uses a log transformation if the absolute t-statistic on β2 is smaller than that on β1. The natural log transformation is suitable for series with exponential growth rates that typically suffer from heteroskedasticity (since the change is non-constant). Given that the log transform linearizes the relationship, a β2 that is lower than β1 suggests regression (2) exhibits relatively more homoskedasticity. Thus, the log transformation is more appropriate.

Step 2. Selecting the level of differencing of the dependent variable

After deciding on the appropriate transformation method, one must decide the appropriate level of differencing to use on the dependent variable. EViews uses successive KPSS unit roots tests, with null hypothesis of stationarity, to determine the correct level of differencing. Based on the work by Hyndman and Khandakar (2008), EViews runs the successive unit roots tests as follows: the KPSS test is first run on the non-transformed data. If the test rejects the stationarity, the KPSS test is then rerun with differenced data. Such procedure continues until EViews can no longer reject the null hypothesis of stationarity.

Step 3. Selecting the exogenous regressors

EViews allows users to specify exogenous regressors to include in the ARIMA selection process. By default, a constant term is included. We will define our proposed way of inputting the exogenous regressors in Section 3.

Step 4. Selecting the order of the ARIMA terms

Conditional on the user specified exogenous variables, EViews uses standard model selection criteria to determine the ARIMAX model that best fits a set of data. EViews offers standard information criteria (Akaike Information (AIC), Schwarz (SIC or BIC), and Hannan – Quinn (HQ)), along with the Mean Square Error (MSE), as model selection criteria. See below for the basic formula for these two types of model selection criterion.

Information Criteria: each of these three criteria are based upon the estimated log-likelihood of the fitted model, the number of parameters, and observations in the model. The model with the smallest information criterion is preferred.

AkaikeInfoCriterion(AIC):2(lT)+2k(1T)SchwarzCriterion(SC):2(lT)+klog(T)THannanQuinnCriterian(HQ):2(lT)+2klog(T)T

where l is the value of the log of the likelihood function, k is the number of parameters estimated using T observations.

Mean Square Error (MSE) Evaluation: this is also called in-sample forecast evaluation, in which each model is estimated using a sub-sample (i.e., first 80~90% of the data) and forecasted over the remaining data (i.e., 10~20%). Then the MSE is calculated according to

MeanSquareError(MSE)=1hΣt=ThT(yty^t)2

where h is the number of periods in the forecast sub-sample, yt is the actual data, ŷt is the forecast at time t, and T is the number observations in the sample. The model with the smallest MSE is selected.

The EViews automatic ARIMA selection procedure is conditional on the exogenous variables being pre-specified by the user. That is, the procedure determines only the autoregressive and moving average orders without any allowance for automatic exogenous variable selection. In the next section, we introduce an Adjusted Stepwise-ARIMAX (AS-ARIMAX) procedure that offers customized stepwise selection procedures for an arbitrary set of exogenous variables.

Section 3. Adjusted Stepwise ARIMAX Variable Selection Procedure and a Simple Example

Stepwise model selection procedures, which add or remove variables from a regression based on the statistical significance of the candidate variable, have been widely used to find the preferred baseline forecasting/nowcasting model. The process starts with either backward elimination from the most general model or forward inclusion from the smallest possible model. With forward selection, candidate variables are added to the model sequentially based on the significance level. The procedure checks whether all the variables are statistically significant and removes those that are not. With backward selection, all candidate variables are added to the model initially and then individual variables are deleted if they are insignificant. Note that the procedure will re-introduce a “dropped” variable if it subsequently determined to be statistically significant (Chowdhury & Turin, 2020).

Despite the popularity of stepwise model selection procedures in recent decades, criticisms have continued to arise. Smith (2018) argues that the fundamental problem with stepwise regression is that it may bypass explanatory variables that have causal effects on the dependent variables yet include nuisance (spurious) variables that are coincidentally statistically significant. Such an outcome typically results in good in-sample forecasting fit but poor out-of-sample forecasting.

To tackle such issues, we proposed a modified stepwise procedure that shifts the focus from statistical significance to the overall forecasting improvement that can be attributed to a specific exogenous variable (indicator). Beginning with no exogenous variables in the model except for the constant term, we test each variable separately and add it to the baseline model if it has an estimated coefficient that is consistent with economic priors and yields superior model forecasting performance.

Specifically, we decide whether a variable (Xt) is a suitable candidate based on three criteria:

  • Condition 1: The Xt decreases the Akaike Information Criteria (AIC) value, compared to the model without Xt.

  • Condition 2: The coefficient sign of Xt matches economic priors.

  • Condition 3: Xt is statistically significant at the 5% confidence level.

The adjusted stepwise ARIMAX (AS-ARIMAX) variable selection procedure involves four steps (see Appendix 1 for detailed procedure charts):

  • Step One: we add the first candidate indicator X1 as an exogenous regressor to the automatic ARIMA procedure for the target variable (Model 1-A). Then, we repeat the procedure without X1(Model 1-B). If Model 1-A satisfies conditions 1–2, we keep X1 otherwise it is discarded.

  • Step Two: if X1 is retained in step one, we add the second indicator, X2, to the baseline model as an exogenous regressor, repeating the automatic ARIMA procedure for the target variable (Model 2-A) and repeat the procedure without X2 (Model 2-B). If Model 2-A meets the condition 1–2, we retain both X1 and X2.

  • If X1 is removed from step one, we then repeat step two with X2 as an exogenous regressor only. We keep X2 if it meets conditions 1–2 and discard it otherwise. We repeat Step 1 and 2 with the remaining candidate variables.

  • Step Three: we add the selected variables using steps 1 and 2 to the automatic ARIMA model selection procedure. We then evaluate the validity of condition 3 for the selected variables and retain the variable if it is statistically significant at the 5% confidence level. Meanwhile, we also check condition 2 to ensure that the coefficient sign of each variable consistently matches with economic priors.

  • Step Four: after ensuring all independent variables meet the three previous conditions, we need to manually check the significance level of the selected ARIMA orders to ensure those orders are meaningful in the model. We may start by removing the ARIMA term with the highest non-significant t-statistics, until all ARIMA are statistically significant at 15% level and regressors are statistically significant at 5% level while having intuitive coefficient signs.

Lastly, standard regression diagnostics are performed on the residuals of the preferred model, to guard against variable omission and non-normal error distributions.

To illustrate the proposed procedure, assume that we need to select variables to nowcast India’s real GDP (Yt) from the ten pre-selected, exogenous indicators shown in Table 1. The pre-selected data include commonly used macroeconomic variables in nowcasting models, covering the external, real, and monetary sides of the economy. They were obtained from official Indian government agencies (e.g., Reserve Bank of India, Ministry of Commerce and Industry, and Ministry of Statistics and Program Planning). All ten indicators are published monthly and are updated after the latest official data release for real GDP.

Table 1:

Pre-Selected data to Nowcast India’s Real GDP

article image

We start the exogenous selection procedure with the first indicator DLOG(CREDIT_CARD) and repeat the procedure for the remaining variables. Table 2 presents the detailed report for each indicator and how each variable meets the first two acceptance conditions. After Step 2 is completed, five indicators meet both conditions, namely: DLOG(CREDIT_CARD), DLOG(IP), DLOG(ELEC_GENR), D(T_BILL), and DLOG(ECMA), allowing us to move forward to Step 3.

Table 2

Automatic ARIMA Stepwise Variable Selection – Steps 1 and 2 Result

article image

Table 2 shows the detailed statistics for each variable selection criterion. A gray shaded area means that the indicator passed the indicated criterion. For example, DLOG(CREDIT_CARD) decreases the AIC value and is statistically significant. The last column tallies the number of conditions met by each variable. We retain only those variables that satisfy both conditions.

With the five variables selected from Steps 1 and 2, we can proceed with Step 3, which brings the selected variables to the automatic ARIMA procedure to assess Condition 3 (statistical significance) and reassess Condition 2 (coefficient sign consistent with economic priors).

Table 3 presents the result of Step 3, which shows only DLOG(ELEC_GENR) need to be remove due to statistical insignificancy. Other four variables remain to be valid with both condition 2 and 3.

Table 3

Automatic ARIMA Stepwise Variable Selection – Step 3 Result

article image

This variable, along with the automatically selected ARIMA terms, are then used to formulate the baseline nowcasting model (Table 4).

Table 4

Automatic ARIMA Stepwise Variable Selection – Selected Baseline Model

article image

As shown in Table 4, the ARIMA (4,1) model has been selected for the real GDP of India based on the Akaike Information Criterion (AIC). Note that the exogenous regressor in the baseline model fulfills the coefficient and significance level requirements.

Notice that some of the selected AR terms are not statistically significant in the baseline model. Step 4 is now used to remove the insignificant AR terms one-by-one, starting from the AR term with the highest p-value. After removing the AR(3), AR(1), and AR(2) terms, we obtain an adjusted baseline model with RHS variables that satisfy all the selection criteria.

Table 5

Automatic ARIMA Stepwise Variable Selection – Adjusted Baseline Model

article image

The final step is to conduct residual-based diagnostic tests to guard against omitted variable bias, which presents no evidence of serial correlation nor heteroskedastic. We will demonstrate how to improve this model using a more comprehensive list of candidate variables in the section 6.

Section 4. Benchmark Models

We use three alternate benchmark models to assess the effectiveness of the proposed AS-ARIMAX approach to the country case of India:

  • 1) The random walk model (with Autoregression of order 1)

  • 2) The professional forecasters survey from the Reserve Bank of India

  • 3) The combinatorial variable selection

We show below that our Bridge and Unrestricted Mixed-frequency Data Sampling (U-MIDAS) estimations for real GDP – both formulated using the AS-ARIMAX approach – outperform the three benchmark models, delivering much lower root mean square error (RMSE).

We now explain the three benchmark models in detail.

Benchmark 1: Univariate autoregression model

The first-order autoregressive AR (1) model has been used very frequently as a benchmark to compare the relative performance of nowcasting models. For example, Giannone, et al. (2013) used AR (1) as the benchmark in their study on nowcasting China’s real GDP; Bok, et al. (2017) used naïve as the benchmark in their report on nowcasting using big data for the United States.

The AR (1) model is (Bragoli & Fosten, 2017):

ytQ=ρyt1Q+ϵtQ

where ytQ is the quarter-on-quarter growth rate of quarterly real GDP, yt1Q is the previous period value of the ytQ,ϵtQ is a zero mean idiosyncratic term, and ρ is the autoregressive parameter satisfies |ρ| < 1 .

Benchmark 2. Reserve Bank of India Professional Forecasters

The Reserve Bank of India (RBI) conducts and publishes a survey of 30 professional forecasters on the annual growth rate of Indian real GDP by industry. We use the mean of these professional forecasts. Since the forecast survey focuses on the annual growth rate, we convert the forecast output to the quarter-on-quarter rate to be consistent with the target variable. The RBI forecast series has been seasonally adjusted using X-13 procedure to ensure the consistency with target variable.

Benchmark 3. EViews Regression Variable Selection: Combinatorial

EViews has five variable selection methods (Uni-directional, Stepwise, Swap-wise, Combinatorial, Auto-Search/GETS and Lasso Selection). Among the five methods provided, the combinatorial method provides the most thorough evaluation as it evaluates all the possible combinations of added variables, selecting the combination with the largest R-squared (EViews User Guide II, pp89).

Section 5. Nowcasting Methodology

The main idea we are proposing here is to first apply the AS-ARIMAX selection procedure to select the indicators that meet the three conditions mentioned previously (i.e., AIC value, intuitive coefficient sign, and statistically significant). Then, we use the selected indicators in standard nowcasting models to assess their predictive performance. We use two nowcasting models in what follows: Bridge and the Unrestricted Mixed-Frequency Data Sampling (U-MIDAS).

Nowcasting Model 1: Bridge Model

The first model we use is the Bridge model, which relies on linear regressions that link (“bridge”) high-frequency explanatory variables with the low-frequency target variables. To nowcast quarterly GDP, the high-frequency indicators (e.g., monthly) are converted to the lower, target frequency (e.g., quarterly) using the sum or average of the observations in the quarter. The Bridge model is then estimated using ordinary least squares (OLS). If the high-frequency indicators have publication lags, an auxiliary regression is used to forecast the high frequency indicators so that each low frequency period has a complete set of high frequency values. Note that the inclusion of the right-side variables or indicators in the Bridge model is not based on casual relations (as compared to a more structural model), but on a pre-assessment or prior that they contain timely updated information on the future direction of the dependent variable (e.g., real GDP). Because of its simplicity and transparency, numerous policy institutions have used bridge equations to guide policy decisions (e.g., Federal Reserve Bank of San Francisco (Ingenito & Trehan, 1996), Euro Area (Baffigia, Golinellib, & Parigia, 2004), and Norges Ban (Foroni & Marcellino, 2013)).

The ARIMAX Bridge model can be represented as

ytq=α+Σi=1pφiYti+Σi=1qθiεti+Σi=1jβixitq+utq

where βi is the coefficient of the exogenous regressor, tq= 1,...,T indicates time in quarters, xi is a high-frequency indicator, and utq is an i.i.d. error term. Moreover, Σi=1pφiYti is the autoregressive (AR) term of order ρ (i.e., AR (p)) and Σi=1qθiεti is the moving average (MA) of q (i.e., MA (q)).

Nowcasting Model 2: U-MIDAS

The second nowcasting model we use is the unconstrained mixed-frequency model (U-MIDAS). The Mixed-Frequency Data Sampling (MIDAS) model is a tightly parameterized reduced form regression in which variables are sampled at a different frequency (Ghysels, Sinko, & Valkanov, 2007). To guard against parameter proliferation issues, the MIDAS model uses distributed lag polynomials that depend on a smaller number of parameters. The MIDAS approach is suitable if the frequency mismatch is large (e.g., when using daily indicators to nowcast a quarterly variable). By contrast, the unrestricted MIDAS model (U-MIDAS) is used when the frequency mismatch is not large. Unlike standard MIDAS, it does not use functional distributed lags. Foroni, Marcellino and Schumacher (2012) studied the performance of U-MIDAS and found that U-MIDAS generally performs better than MIDAS when mixing quarterly and monthly data (i.e., small frequency mismatch).

In this paper, we apply the AS-ARIMAX procedure to the U-MIDAS model. We convert the higher frequency indicators to quarterly frequency using split-sampling. The model can be expressed in its simplest form as follows:

ytm=α+Σi=1pφiYti+Σi=1qθiεti+Σi=1jβixitm+Σi=1jβixitm1+Σi=1jβixitm2+utm

where βi is the coefficient of the exogenous regressor, tq = 1,...,T indicates time in quarters, xi is a high-frequency indicator, xitm is the first skip-sampled quarterly high-frequency variable, and xitm-1 and xitm-2 are the second and third skip-sampled variables, j is the number of high-frequency indicators.

Section 6. Nowcasting Indian Real GDP

We now demonstrate the effectiveness of the AS-ARIMAX approach for nowcasting the real GDP of India. We follow Bragoli and Fosten (2017) and show that the AS-ARIMAX approach outperforms the three benchmark models proposed in Section 4.

Target variable: Real GDP

Indian real Gross Domestic Product (GDP) data is published by Central Statistics Organization (CSO), India. We also use Real Gross Domestic Product at Basic Prices as our target variable, to make it consistent with the target variable of the professional forecasts published by the Reserve Bank of India.

Input variables

We started with the indicators list proposed by Bragoli and Foster (B&F, 2017), applying two selection criteria to pre-filter the list of indicators:

  • 1) Frequency: the frequency of the selected indicators must be the same or higher than the target frequency. In this study, our target variable (i.e., real GDP) is available quarterly. Therefore, we choose indicators with at least quarterly frequency.

  • 2) Availability: the selected indicators must have published data after the latest publication of the target variable. In this study, the target variable is available until 2022Q1. Therefore, we use indicators that have published data after 2022Q1 (i.e., 2022M03).

We removed two indicators upfront: India Crude Oil Production and Steel Production as these series failed to meet the availability requirement. In addition, India’s “Industrial Performance Assessment indicator” is unavailable from our data sources. We used an industry index as a replacement, representing the industry performance of eight core industries in India: Coal, Crude Oil, Natural Gas, Petroleum Refinery products, fertilizers, steel, cement, and electricity. In addition to the headline indicators used by B&F, we wish to gauge the predictive ability of each sub-component of the industrial, stock indexes, and foreign trade. Therefore, we added sub-components of industrial production, the eight core industries index, export, foreign investment flows, and the NSE stock index. After these adjustments, the exogenous variable candidate list contains eighty variables (Appendix 2).

Data Transformation

To ensure the stationarity of the variables in the OLS regression, we apply the Augmented Dicky-Fuller unit root test to each variable. If the test rejects the null hypothesis of a unit root, we treat the series as stationary. Otherwise, we apply an appropriate transformation to ensure stationarity (i.e., first difference or log difference).

Seasonal Adjustment

We gather the seasonal adjustment status of each variable from the data source and use the X-13 seasonal adjustment procedure (United States Census Bureau, 2022) as required to seasonally adjust the series. Detailed seasonal adjustment status for each series can be found in Appendix 2.

Frequency Conversion

All the selected indicators have monthly frequency. Given the target variable is quarterly, we need to convert the high-frequency indicators from monthly to the quarterly frequency. For the Bridge model, we use the aggregation approach by summing or averaging the monthly data. We decide the conversion method based on the nature of the indicators. Specifically, we characterize each variable as “flow” or “stock/index”. We use “sum observation” for “flow” variables and “average observation” for “stock/index” variables. Detailed descriptions of each indicator’s stock or flow classification can be found in Appendix 2.

Model evaluation

To evaluate the model’s performance, we use a realistic forecast evaluation methodology with a “pseudo realtime” historical series construction that reflects the operational procedures typically used in a forecasting unit of a central bank (Bok, Caratelli, Giannone, Sbordone, & Tambalotti, 2017). We emulate a nowcasting protocol in which the baseline model is re-estimated regularly based on all the information available at that specific time.

To keep the procedure as straightforward as possible, we assume that the monthly indicators have a one-month lag, meaning we need to forecast each regressor by one period to ensure sufficient data for the nowcasting exercise. Suppose we are currently at the end of 2018Q2, and we wish to nowcast the real GDP (rgdp), which is only available until 2018Q1, using monthly indicators available till 2018M05. We use all the available quarterly data to construct a baseline model (estimated using data out to 2018Q1).

Then, we forecast all the monthly indicators for one month to 2018M06 using an auxiliary model to ensure that we have sufficient quarterly data for a nowcast in 2018Q2. After forecasting the monthly indicators and converting the forecasted series to quarterly frequency, we nowcast real GDP to 2018Q2. We record the nowcast value for 2018Q2 and repeat the same steps for 2018Q3. The procedure is repeated until we have the nowcasting result for our target evaluation end date. We then evaluate the forecast accuracy of different models using RMSE and the Theil U2 statistics, as these contain information most applicable to model selection procedures.

Section 7. Empirical Results

Starting with the eighty indicators described in Appendix 2, the AS-ARIMAX indicator selection procedure shortlisted five variables for the baseline model with AR(1), MA(1) and MA(3) terms (see Table 6). The selected indicators are:

  • HVI, defined as India’s Eight Core Industry Infrastructure Index (SA, Apr.11-Mar.12=100)

  • IP_LEATHER, defined as India’s Industrial Production in Leather and Related Products (SA, Apr.11-Mar.12=100)

  • IP_CAPITAL, defined as India’s Industrial Production in Capital Goods (SA, Apr.11-Mar.12=100)

  • IP_TEXTILES, defined as India’s Industrial Production in Textiles (SA, Apr.11-Mar.12=100)

  • NFGEB, defined as India’s Central Government: Expenditure (SA, 10 Mil. Rupees)

Table 6

Selected Baseline Model

article image

Given manufacturing’s importance to the Indian economy (i.e., representing more than 23% of India’s 2021 real GDP), industrial production (IP) sub-components account for three of five indicators selected. The selected IP indicators cover capital goods and the wearing-apparel industry. Additionally, the selection of the “Eight Core Industry Infrastructure Index” presents the importance of core industries (such as refinery products, electricity, and steel) to Indian’s economy development. The last indicator reflects central government expenditure, which is a sound indicator for the fiscal policy and will likely to impact economic activity more generally.

The procedure ensures that the selected indicators are statistically significant at a 5% confidence interval. Moreover, the estimated coefficients on the selected indicators have the correct sign.

Table 7 presents the correlogram and Q statistics of the LM test with 12 lags. Since all the p-values of Q-stat is above the 5 percent significance level, we conclude that no serial correlation is present in the baseline model.

Table 7

Testing for Serial Correlation: Q Statistics

article image

The p-value for the Jarque-Bera test exceeds the significance level, indicating accepting the null hypothesis of normal distribution. We can also see a clear bell-shaped distribution from the histogram of the fitted residuals.

Table 8

Testing for Normality: Jarque-Bera Test

article image

The Breusch-Pagan-Godfrey heteroskedasticity test shown in Table 9 indicates no evidence of heteroskedasticity given that the p-value of the F-statistics are all above 0.05.

Table 9

Testing for Heteroskedasticity: Breusch-Pagan-Godfrey test

article image

Given that we have confirmed the validity of the selected baseline model, the next step is to use the baseline model to generate nowcasting results for Indian real GDP. To compare the Bridge and U-MIDAS estimations with our three benchmark models, we used a one-period ahead out-of-sample evaluation methodology starting from 2018Q1 and ending in 2022Q1. We also created a simpler forecast evaluation approach, in which we estimated the model up to 2022Q1, and then created in-sample forecast from 2018Q1 to 2022Q1.

Table 10 shows the RMSE values for the benchmark and the nowcasting models during the same evaluation period. The results indicate that the two nowcasting models, created using the AS-ARIMAX variable selection procedure, outperform all three-benchmark models. Among the two nowcasting models, U-MIDAS performs better with both evaluations, while Bridge and U-MIDAS performs almost the same with out-of-sample evaluation (difference equals to 0.001).

Table 10

Forecast Evaluation Comparison: RMSE

article image
Note: Accuracy Gains in RMSE = – 100% * (U-MIDAS – Combinatorial)/Combinatorial

Comparing the best-performed benchmark model (i.e., combinatorial), by using the AS-ARIMAX procedure, U-MIDAS decreases the out-of-sample RMSE by more than twenty-six percent. Comparing the worst performing benchmark model (i.e., RBI forecast), the AS-ARIMAX nowcasting model improves the out-of-sample RMSE by more than seventy percent.

The result of the Theil coefficient is largely consistent with that of RMSE: nowcasting models using AS -ARIMAX procedure outperformed all three benchmarks. The combinatorial is still the best-performing benchmark model using the Theil U2 statistic. Meanwhile, U-MIDAS persist as the best performing model among the five models in both in-sample and out-of-sample evaluation. Compared with the combinatorial model, the U-MIDAS model reduces the out-of-sample Theil U2 by more than sixty percent.

Table 11

Forecast Evaluation Comparison: Theil U2

article image

The results from both statistics provide strong suggestive evidence of the significant efficiency gains from using the AS-ARIMAX variable selection procedure in determining the high-frequency (HF) regressors in the nowcasting model.

We also demonstrate the efficiency gains by calculating the forecast error (i.e., 100*(Forecasted value – Actual Value)) and visualizing the differences between forecasted and actual real GDP.

Figure 1 shows the realistic evaluation (i.e., out-of-sample forecast) graph of the three benchmark Models. The forecast error of all three benchmark models increased significantly as the COVID-19 pandemic hit India, resulting in a strict nationwide lockdown during 2020Q2. Note that the AR1 and Reserve Bank of India (RBI) forecasts only capture the impact of COVID-19 with a lag: both models predict a negative growth rate in 2020Q3 when the actual economy recovered with the ease of national lockdown. The AR(1) model did not capture the impact of the second COVID-19 wave in 2021Q2. The RBI forecast captured such an impact quite precisely. The combinatorial approach (COMB) generates forecasts that are most aligned with the actual compared to other benchmarks. However, it fails to reflect the intensity of the negative impact. For example, the actual quarter-on-quarter growth rate in 2020Q2 is -21.3%, while the COMB approach suggests it to be only -9.6%.

Figure 1
Figure 1

Realistic Forecast Evaluation: Three Benchmark Models

Citation: IMF Working Papers 2023, 045; 10.5089/9798400235177.001.A001

Figure 2 shows the forecasting performance of the two nowcasting models (Bridge and U-MIDAS). Unlike all three benchmark models, the Bridge and U-MIDAS models capture the negative shocks emanating from both COVID-19 waves in direction and intensity. The forecast gap from Bridge and U-MIDAS is much smaller than the COMB model (the best performing benchmark).

Figure 2
Figure 2

Realistic Forecast Evaluation: Bridge and U-MIDAS Models

Citation: IMF Working Papers 2023, 045; 10.5089/9798400235177.001.A001

Section 8. Other Country Examples

We also implemented the proposed approach in five additional countries to further demonstrate the efficacy of the AS-ARIMAX procedure. The countries selected (Argentina, Australia, South Africa, the United Kingdom, and the United States) are diverse in terms of geographic location and income level.

For each country, we pre-selected 30 indicators covering their external environment, surveys, consumptions, financial, trade, labor markets, and productions, all readily available and updated after the latest actual GDP figure. We then apply the AS-ARIMAX procedure to determine the optimal baseline model. The computed nowcasting models are compared with a univariate autocorrelation AR(1) model and EViews’s combinatorial variable selection approach.

Table 12 reports the realistic out-of-sample forecast evaluation results (RMSE) for the benchmark and nowcasting models. The gray shaded area indicates the best model based on RMSE. The result is largely consistent with the output from the India country example, in which the Bridge and U-MIDAS models outperformed both benchmark models. Compared with the combinatorial model, the U-MIDAS model reduces the out-of-sample RMSE on average of sixty-seven percent across all five countries (see Appendix III for a detailed report on each country).

Table 12

Forecast Evaluation Comparison for Other Countries: RMSE

article image
*Accuracy gains between best nowcasting model and best benchmark model

Section 9. Conclusion

This paper focuses on how to choose the best nowcasting model given a set of exogenous indicators to select from. The AS-ARIMAX model selection procedure proposed in this paper ensures the inclusion of significant ARIMA terms in the model, while also assessing three critical conditions for the explanatory variables that are likely to improve forecasting/nowcasting performance. We show that the AS-ARIMAX approach yields reliable nowcasting in both direction and intensity during the COVID-19 crisis period compared to more traditional approaches. Using Indian real GDP data, we show that the AS-ARIMAX selection procedure reduces the RMSE by at least twenty-six percent compared to the three competing nowcasting methods. We also verify its effectiveness for five other countries by showing that the AS-ARIMAX selection procedures reduce the RMSE of the baseline model by an average of sixty-seven percent compared to two benchmark forecasting models. Given these impressive forecasting gains, the effectiveness of the AS-ARIMAX approach using other macroeconomic variables (e.g., the inflation rate) will be assessed in future work.

Annex I. AS-ARIMAX Procedure Charts

Annex II. Candidate Indicators and Three Main Attributes

article image
article image
Note: in the “Frequency Conversion” column, the Stock vs. Flow nature of each variable is used to convert monthly variables to quarterly frequency. We will apply “Sum observation” to “flow” variables and “Average observation” to “Stock/Index” variables. In the “Seasonal Adjustment” column, the seasonality of each variable is used to decide whether to apply the X-13 seasonal adjustment procedure. “NSA” means the series has not been seasonally adjusted and needs to be adjusted. In the “Coefficient Sign” column, the expected sign of the coefficient is entered based on an economic prior between the regressor and the target variable. “-1” means a negative coefficient is expected, “1” means a positive coefficient is expected.

Annex III. Other Country Examples

1. Argentina:

1) Data

article image

2) Selected Baseline Model

article image
Note: GVI is Argentina: Economic Activity Indicator (SA, 2004=100).

3) Realistic Forecast Evaluation (out-of-sample): Nowcasting Model vs. Benchmarks

2. Australia:

1) Data

article image

2) Selected Baseline Model

article image
Note: AUSHPT is Australia: Tourist Arrival (NSA, Persons); NVKCO is Australia: Retail Turnover: Clothing, Footwear & Personal Accessory (SA, Mil. A$); CVRT is Australia: New Motor Vehicle Sales: Passenger Vehicles (SA, Units)

3) Realistic Forecast Evaluation (out-of-sample): Nowcasting Model vs. Benchmarks

3. South Africa:

1) Data

article image

2) Selected Baseline Model

article image
Note: H199SRO is South Africa: Retail Sales: All Other Retailers (SA, Mil. Rand); VLC is South Africa: Business Cycles: Coincident Indicator (SA, 2015=100); TRS is South Africa: Retail Sales: Current Prices (SA, Mil. Rand); CE is South Africa: Electricity Consumed (SA, Gigawatt Hours).

3) Realistic Forecast Evaluation (out-of-sample): Nowcasting Model vs. Benchmarks

4. United Kingdom:

1) Data

article image

2) Selected Baseline Model

article image
Note: SDM is UK: Industrial Production: Manufacturing (SA, 2019=100); HCVRT is U.K.: New Passenger Car Registrations (SA, Units); STRS is Great Britain: Retail Sales Volume Index (SA, 2019=100); NXUSV is U.K.: Exchange Rate (Avg, US$/Pound).

3) Realistic Forecast Evaluation (out-of-sample): Nowcasting Model vs. Benchmarks

5. United States:

1) Data:

article image

2) Selected Baseline Model

article image
Note: LANAGRA is All Employees: Total Nonfarm (SA, Thous); TSTH is Real Manufacturing & Trade Sales: All Industries (SA, Mil.Chn.2012$); NRST is Retail Sales & Food Services (SA, Mil.$); CEXP is University of Michigan: Consumer Expectations (NSA, Q1–66=100)

4) Realistic Forecast Evaluation (out-of-sample): Nowcasting Model vs. Benchmarks

Bibliography

  • Armesto, M. T., Engemann, K. M., & Owyan, M. T. (2010). Forecasting with Mixed Frequencies. Federal Reserve Bank of St. Louis Review, 52136.

    • Search Google Scholar
    • Export Citation
  • Baffigia, A., Golinellib, R., & Parigia, G. (2004). Bridge Models to Forecast the euro area GDP. International Journal of Forecasting, Volume 20, Issue 3, July–September 2004, Pages 447460.

    • Search Google Scholar
    • Export Citation
  • Bańbura, M., Giannone, D., Modugno, M., & Reichlin, L. (2013). Nowcasting and the Real-Time Data Flow. European Central Bank Working Paper, No 1564.

    • Search Google Scholar
    • Export Citation
  • Bok, B., Caratelli, D., Giannone, D., Sbordone, A., & Tambalotti, A. (2017). Macroeconomic Nowcasting and Forecasting with Big Data. New York City: Federal Reserve Bank of New York.

    • Search Google Scholar
    • Export Citation
  • Bragoli, D., & Fosten, J. (2017). Nowcasting Indian GDP. Oxford Bulletin of Economics and Statistics, 260282.

  • Breitung, J., & Hafner, C. (2016). A Simple Model for Now-casting Volatility Series. LIDAM Discussion Papers ISBA 2016035 Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).

    • Search Google Scholar
    • Export Citation
  • C.W.J. Granger, & P. Newbold. (1974). Spurious Regressions in Econometrics. Journal of Econometrics, Volume 2, Issue 2, July 1974, Pages 111120.

    • Search Google Scholar
    • Export Citation
  • Camacho, M., & Perez-Quiros, G. (2010). Introducing the Euro-sting: Short-term Indicator of Euro Area Growth. Journal of Applied Econometrics, Volume 25, Issue 4.

    • Search Google Scholar
    • Export Citation
  • Chipman, J. S. (2014). Gauss-Markov Theorem. International Encyclopedia of Statistical Science, pp 577582.

  • Chowdhury, M. Z., & Turin, T. C. (2020). Variable Selection Strategies and its Iimportance in Clinical Prediction Modelling. Family Medicine and Community Health, e000262. doi:10.1136/.

    • Search Google Scholar
    • Export Citation
  • Cimadomo J, G. D. (2020). Nowcasting with Large Bayesian Vector Autoregressions. Working Paper Series No. 2453. European Central Bank.

  • Claudia, F., Massimiliano, M., & Christian, S. (2011). U-MIDAS: MIDAS regressions with Unrestricted Lag Polynomials. Discussion Paper Series 1: Economic Studies, 2011,35, Deutsche Bundesbank.

    • Search Google Scholar
    • Export Citation
  • Duarte, P., & Sussmuth, B. (2014). Robust Implementation of a Parsimonious Dynamic Factor Model to Nowcast GDP. CESifo Working Paper, No. 4574.

    • Search Google Scholar
    • Export Citation
  • Rueben Ellul & Germano Ruisi. (2022). “Nowcasting the Maltese economy with a dynamic factor model,” CBM Working Papers WP/02/2022, Central Bank of Malta.

    • Search Google Scholar
    • Export Citation
  • Foroni, C., & Marcellino, M. (2013). A Survey of Econometric Methods for Mixed Frequency Data. Norges Bank Research Working Paper.

  • Foroni, C., M. Marcellino, and C. Schumacher (2012): “U-MIDAS: MIDAS regressions with unrestricted lag polynomials”, CEPR Discussion Papers, 8828.

    • Search Google Scholar
    • Export Citation
  • Ghysels, E., Sinko, A., & Valkanov, R. (2007). MIDAS Regressions: Further Results and New Directions. Econometric Review, 26 (1), 5390.

    • Search Google Scholar
    • Export Citation
  • Giannone, D., Agrippino, S. M., & Modugno, M. (2013). Nowcasting China Real GDP. CIRANO.

  • Giannone, D., Reichlin, L., & Small, D. (2005). Nowcasting GDP and Inflation: The Real-Time Informational Contest of Macroeconomic Data Releases. Washington, DC: Federal Reserve Board.

    • Search Google Scholar
    • Export Citation
  • Higgins, P. (2014). GDPNow: A Model for GDP “Nowcasting”. Federal Reserve Bank of Altanta Working Paper Series, 2014 – 7.

  • Hopp, D. (2022). Benchmarking Econometric and Machine Learning Methodologies in Nowcasting. UNCTAD Research Paper No. 83.

  • Hyndman, R. J., & Khandakar, Y. (2008). Automatic Time Series Forecasting: The Forecast Package for R. Journal of Statistical Software, 27(3), 122. https://doi.org/10.18637/jss.v027.i03.

    • Search Google Scholar
    • Export Citation
  • IHS Markit – EViews. (2020). EViews 12 User’s Guide I: Chapter 11. Series. Seal Beach, CA.

  • IHS Markit – EViews. (2022). EViews 12 User’s Guide II: Chapter 22. Regression Variable Selection.

  • Ingenito, R., & Trehan, B. (1996). Using Monthly Data to Predict Quarterly Output. Federal Reserve Bank of San Francisco Economic Review 3, 311.

    • Search Google Scholar
    • Export Citation
  • Lewis, D. J., Mertens, K., Stock, J. H., & Trivedi, M. (2020). Measuring Real Activity Using a Weekly Economic Index. New York City: Federal Reserve Bank of New York.

    • Search Google Scholar
    • Export Citation
  • Preis, T., & Moat, H. S. (2014). Adaptive Nowcasting of Influenza Outbreaks using Google Searches. R Soc Open Sci, doi: 10.1098/rsos.140095.

    • Search Google Scholar
    • Export Citation
  • Smith, G. (2018). Step Away from Stepwise. Journal of Big Data, 5, 32.

  • Stock, J. a. (2002). Macroeconomic Forecasting Using Diffusion Indexes. Journal of Business and Economic Statistics, 20(2), I47-I62.

  • Stock, J. a. (2002a). Forecasting using Principal Components from a Large Number of Predictors. Journal of the American Statistical Association, 97, II67-II79.

    • Search Google Scholar
    • Export Citation
  • United States Census Bureau. (2022, July 11). Retrieved from X-13ARIMA-SEATS Seasonal Adjustment Program: https://www.census.gov/data/software/x13as.html

  • Wallis, K. F. (1986). Forecasting with an econometric model: The ’Ragged Edge’ Problem. Journal of Forecasting, 113.

1

The author has developed the EViews code to implement the AS-ARIMAX procedure and would be happy to make it available to interested parties. Please contact Jing Xie (jxie2@imf.org) for such requests.

  • Collapse
  • Expand
Identifying Optimal Indicators and Lag Terms for Nowcasting Models
Author:
Jing Xie