Do Commodity Futures Help Forecast Spot Prices?

Contributor Notes

Author’s E-Mail Address:,

We assess the spot price forecasting performance of 10 commodity futures at various horizons up to two years and test whether this performance is affected by market conditions. We reject efficient markets based on in-sample tests but, out-of-sample, we find that the forecast from the futures market is hard to beat. We find that the forecasting performance of futures does not depend on the slope of the futures curve, in contrast to the predictions of well-known models of commodity markets. We also find futures' forecasting performance to be invariant to whether prices are in an upswing or downswing, casting doubt on aspersions that uninformed investors participating during bull markets impede the price discovery process.


We assess the spot price forecasting performance of 10 commodity futures at various horizons up to two years and test whether this performance is affected by market conditions. We reject efficient markets based on in-sample tests but, out-of-sample, we find that the forecast from the futures market is hard to beat. We find that the forecasting performance of futures does not depend on the slope of the futures curve, in contrast to the predictions of well-known models of commodity markets. We also find futures' forecasting performance to be invariant to whether prices are in an upswing or downswing, casting doubt on aspersions that uninformed investors participating during bull markets impede the price discovery process.

1. Introduction

Why write another paper on the ability of futures to forecast commodity spot prices? There are two main reasons. First, futures prices did a poor job as forecasters during the recent commodity price cycle and it is natural to ask whether we can do better (Figure 1). Second, it is an issue for which matters have yet to be decisively settled. Enhancing the measurement of futures prices, updating the sample period, and broadening the coverage may bring us closer to a definitive answer. Third, forecasting commodity prices is an important, yet often costly, exercise. In particular, for policymakers in countries for which commodity prices can significantly affect the terms of trade, inflation, and poverty levels, obtaining the best possible forecast for commodity prices should be an important priority. Fitting structural or reduced-form models or applying informed judgment to forecast the prices of a wide range of commodities, all of which have different market structures and fundamentals, can be costly and may add no more value than extrapolating from the current price.

Figure 1.
Figure 1.

Futures Price Curves and Spot Price Developments, 2005-10

Citation: IMF Working Papers 2011, 254; 10.5089/9781463923891.001.A001

Source: Bloomberg.

This paper provides four contributions to the literature on futures as forecasters of commodity spot prices. First, we provide a careful measure of futures prices that exactly matches the horizon of the subsequent change in spot prices and addresses problems posed by illiquid long-dated contracts. Second, we compare the ability of futures prices (and other candidate models) to provide useful forecasts of spot prices at various horizons stretching out two years. For policymakers, the relevant forecast horizon for commodity prices is typically longer than the standard 3 to 12 months tested in much of the literature. Third, we update and broaden the assessment of futures prices as forecasters, including the years since 2002 when commodity market liquidity has greatly increased. Until now, there have been few studies of forecasting by futures markets for a broad range of commodities. Finally, we assess the forecasting ability of futures prices during different market conditions, defined by the futures curve shape (backwardation and contango) and spot price trends (bull and bear markets). This innovation allows us to provide a stricter test of commodity market efficiency and assess the contention that financial investors—attracted by rising commodity prices—have impeded the price discovery process in futures markets.

The plan of this paper is as follows. Section 2 outlines the models to be estimated and the intuition behind the specifications. Section 3 describes the data. Section 4 outlines the main empirical results and discusses their significance. Section 5 provides some brief concluding remarks.

2. Model and Empirical Test Specification

2.1. Model Specification

This section describes the empirical specifications used in the analysis. The methodology is well-known and we include it here to enhance the clarity of the estimations. We start with a futures pricing model in which the price of a futures contract for a commodity is equal to the discounted value of the expected spot price:


Where F(t,T) denotes the futures price in period t with delivery in period T, Ef(t)[S(T)] is the expectation in the futures market in period t of the spot price in period T, and ρ is the continuously compounded rational expectations risk premium. (In a market without limits to arbitrage, expectations in spot and futures markets will be the same, but we retain this notation for a useful derivation below.) (1) states that a position that requires no upfront investment should deliver an expected return equal to the risk premium of the investor. Taking logs of (1) obtains a linear relationship between the futures price and expected spot price. From this point on, we will consider a forecast length of k = Tt so that:


In (2), f is the log futures price, ρ is the (assumed constant) risk premium scaled to the length of the forecast horizon k, Eft st+k is the period t expectation in the futures market of the log spot price k periods ahead, and ½ var(s) is Jensen’s inequality correction. Henceforth, we will ignore Jensen’s term, although it can equivalently be included in the constant term ρk in the case of homoscedastic variance, an assumption that we maintain in this paper.

Subtracting the current log spot price st from both sides of (2) gives an expression which states that the current spread between the futures price and spot price (the basis) is equal to the expected change in the spot price for the period until delivery less the risk premium.


Based on (3), one approach to testing for market efficiency is to estimate the following regression.


The parameter α can be interpreted as the constant component of the risk premium. If the basis provides an unbiased forecast of the future spot price then α = 0, β=1 and εt+k has a conditional mean of zero. This regression is typically estimated in the market efficiency literature, including for commodities (e.g., Chinn and Coibion, 2010 and Reeve and Vigfusson, 2011). However, this approach glosses over some important details. In particular, what are the implications of assuming rational expectations? How should we test for rational expectations?

Our focus in this paper is on the usefulness of futures as forecasters of spot prices; we are less interested in testing in-sample prediction properties such as efficiency and unbiasedness. Nonetheless, as noted by Clements and Hendry (1998), given that these properties are often considered to be minimum requirements for optimal or rational forecasts (assuming a typical symmetric loss function), this is a good place to begin our assessment of futures prices as forecasters.

It is worth noting at the outset, however, that in-sample tests of efficiency and bias can tell us whether certain models may be useful for forecasting, but they do not evaluate their forecast accuracy. As Granger and Newbold (1977) argue, the distributional properties of forecasters and actual values are almost always different; as a result, a direct comparison of the two is of limited use. More useful forecast evaluation techniques are based on out-of-sample forecast errors. This is the subject of section 4.

2.2. In-sample rationality tests

We discuss and test three notions of rationality in this paper. First and weakest of these notions is that the futures market does not make persistent in-sample prediction errors and that, over the “long run”, expected errors are zero. In other words, while the prediction errors may exhibit some persistence and serial correlation, they eventually converge to zero. A stronger assumption is that the in-sample futures market prediction error v is an independent serially-uncorrelated white noise process:


Substituting (6) into (2) obtains a linear relationship between the futures price in period t with delivery k periods ahead and the realized spot price in period t+k.


(7) states that the current futures price is a linear function of the realized spot price and that, if f and s are I(1) processes, then the two series must be cointegrated with a cointegrating vector (1,-β), a restriction that can be tested, where the coefficient β generalizes the specification.

The second notion of rationality is weak-form efficiency. This means that the current futures price incorporates all the information useful for in-sample prediction. To assess efficiency, (7) can be rearranged to bring the realized spot price change to the left-hand side. This is similar, but different in an important way, to (4):


where ϕ0 = ρk/β, ϕ1 = 1/β and the residual v is the prediction error from (7). Note that (4) implies one important restriction on (8), namely that the cointegrating vector is (1,-1) so that the coefficient on the lagged spot price level is zero and can therefore be excluded from the relationship. A general representation of (8) that is useful if f and s are I(1) processes is an error-correction specification. This represents the realized spot price change as a process that allows for the presence of lagged price changes as shown below:


In this model, cointegration requires that δ ≠ 0 to ensure that the realized spot price change responds to deviations in the long-run equilibrium. As shown by Beck (1994), weak-form short-run market efficiency requires the following conditions:


This assertion can be checked by rewriting (10) as:


If we reject the null hypothesis (10), then the coefficients on the lagged levels of the spot price and the futures price (i.e., the futures price at period t-k) would help predict the spot price, violating the efficiency condition.

The final and strictest notion of rationality is that the futures market is unbiased. This incorporates weak-form efficiency but also requires that there is no risk premium. In other words, unbiasedness imposes a stricter set of restrictions that are similar to the efficiency condition, with the difference that the coefficient on the futures price in the cointegrating equation is now equal to unity:


We will follow Beck (1994) and test both sets of restrictions, for efficiency (10) and unbiasedness (12) using the error-correction specification (9), but in contrast to her analysis, we use overlapping weekly observations, a wider set of commodities, and longer horizons.

For those commodities for which spot and futures prices are I(1), we assessed the first notion of rationality by conducting cointegration tests. We applied the Engle-Granger procedure to (6) in which the realized spot price is the dependent variable:


The lag length for each test equation of the residuals was determined by Bayesian information criteria. Given our use of overlapping observations, we control for residual correlation by using Newey-West HAC standard errors with a bandwidth equal to the futures contract horizon (i.e. days to maturity) minus one.

The lag specification K and M for each commodity at each forecast horizon k in (9) is decided on the basis of Bayesian information criterion. Again, given our use of overlapping observations, we control for residual correlation by using Newey-West HAC standard errors. We assessed efficiency and unbiasedness by applying the Wald test and using the robust estimate of the covariance matrix

To test for in-sample efficiency, for commodities for which both realized spot and futures prices were both I(0) variables, we used an equation in log levels rather than an ECM:


Lag length and robust standard error procedures were the same as described for the ECM approach. In this case, the restrictions implied by efficiency are that ϕj = γl = 0 for j ≥ 1 and l ≥ 0, with the additional restriction of unbiasedness μ = 1.

2.3. Out-of-sample forecasting ability test specifications

In-sample tests tell us something about the “rationality” of futures markets, but they do not indicate their usefulness as forecasters of spot prices. In this section, we describe tests of out-of-sample performance of futures prices as forecasters against a random walk benchmark.

The appropriate metric to assess forecasting ability will depend upon the forecaster’s loss function. For policymaking purposes, this function can and often should be asymmetric. A commodity exporting country would likely be much more concerned about lower-than-expected prices than an upside surprise. This could be because a downside surprise would lead to a shortfall in commodity-related government revenues and a need to increase public borrowing or reduce spending. In contrast, revenues from an upside surprise could be saved. Similar, but opposite, considerations would hold for commodity importers. In practice, it is difficult to specify a loss function and conventional practice has been to assume a symmetric loss function. The optimal forecaster in this case will be that which minimizes the mean squared forecast error (MSFE) and this is the approach we use here.

We compare the forecasting power of futures prices and other candidate models to the random walk without drift benchmark. The use of the random walk “no change” benchmark of forecast accuracy has a long history, stretching at least as far back as Theil (1966). What should be our prior be for the relative performance of futures contracts against spot markets? If we assume that current spot and futures prices incorporate the same information that is relevant for forecasting future spot prices, then futures prices should be at least as accurate as spot prices, on average. To see this, consider the cost-of-carry relationship for a financial asset, ignoring coupon or dividend payments, in a market without frictions. This states that the log price of a futures contract at time t which specifies delivery at t+k denoted by ft,t+k is equal to the current log spot price st plus the continuously compounded constant interest rate r for the period t+k:


This relationship tends not to hold for commodities for two reasons. Spot and futures prices must take into account the costs of holding physical inventory, e.g., warehousing and insurance, which increases the cost of carry. Also, market participants may hold physical inventory of a commodity for its value as a consumption good, rather than as a financial asset. The benefit that accrues to the inventory holder is often referred to as the marginal convenience yield. We incorporate the physical storage costs, denoted by m, as a constant proportion of the spot price. The inclusion of the convenience yield for the marginal unit of inventory, denoted by ψ, leads to more profound changes. In particular, as the level of current and expected future inventories (denoted by N) falls, the probability of experiencing a physical “stock out” increases, and ψ should rise, at an increasing rate as inventories fall towards their zero bound. (This nonlinearity is a key feature of important theoretical models such as Williams and Wright, 1991.) Incorporating these two features of commodity markets, storage costs and marginal convenience yields, into (15) obtains the arbitrage condition:


Just as we can write the asset pricing equation (2) for futures markets, we can do the same for spot markets, assuming that the risk premium in both markets is the same:


In (17), investors holding a position in the spot market must be compensated for the physical cost of carry and the risk premium, less the marginal convenience yield. The forecast “errors” made by the spot price (denoted εs) and futures price(denoted εf), respectively, are:


Using (2), (17) and (18), we can then write the difference in the squared forecast errors (denoted by dt) made by spot prices and futures prices as:


In most commodity pricing models with arbitrage, expectations in both spot and futures markets are the same and linked by the market for storage (e.g., Pindyck, 2001). In other words, market participants with different forecast would trade inventories until the difference in the spot and futures prices are explained by (16). This means the final two terms on the right hand side of (19) cancel out and that the squared forecast error of the futures price must always be less than or equal to the squared prediction error of the spot price; that is:


The actual value of (19) will clearly depend upon the quantity (r+m-ψ)2. In turn, this should reflect the importance of current market conditions (rather than expectations for the future) in determining current spot prices. When current conditions exert a large influence on spot markets (for example, a period of temporary physical scarcity and low N), ψ is high, the market is backwardated, and the futures price (which is less influenced by current market conditions) should provide, on average, a better forecast than the spot price. (We say on average because any particular observation may be affected by shocks that cause both spot and futures prices to rise or fall.) Conversely, if spot markets are driven by interest rate arbitrage with the forward-looking futures market (typically when inventories are plentiful), ψ must be low, and the forecasts of both spot and futures prices should be relatively close (assuming r and m are not too large). These assertions, particularly that ψ can be relatively large in backwardated markets, are confirmed by the data (Roache and Erbil, 2010).

It follows that a stronger test of (20) is one in which the loss function d is conditioned on the slope of the curve (and implicitly the value of ψ). We estimate a regression in which the loss function dt is the dependent variable and a constant and dummy variable are independent variables (with the dummy taking a value of 1 when the futures curve is backwardated and 0 otherwise, at the time the “forecast” is made). Given the quantity (r+m-ψ)2 is larger, on average, in backwardation, (20) predicts that the coefficient on the dummy variable is negative. In other words, the forecast ability of futures relative to a random walk benchmark should be better in backwardation. We also use our dataset to test the related assertion of Reeve and Vigfusson (2011) that what matters is the difference between the spot and futures price with a dummy variable that takes a value of 1 when the spot price is more than 5 percent higher than the futures price. By definition, ψ is very high during these periods and the spot market is largely driven by current conditions.

Our other candidate models include univariate ARIMA(1,1,1) and ARMA(1,1) with and without drift, depending upon the order of integration of the spot price. We also consider an exponential smoother, a futures price with risk premium model, a basis model, an error correction model, and a model in levels that includes lagged spot and futures prices (Table 1). The parameterized models allow us to test the forecasting performance of models that optimized in-sample fit.

Table 1.

Candidate and Benchmark Forecasting Models

article image

Each model produces one-step ahead forecasts, hence, the notation t+1 for the realized spot and error terms in Table 1. This means that the lags used in the time series models match the length of the forecast horizon. For example, for the 91 day spot price ARIMA (1,1,1) forecasting model, the lag of the change in the spot price would be for the previous 91 day period, and so on for 182, 364, and 728 days. The exceptions to this approach include the “weekly” ARIMA (labeled as W-ARIMA) and the Holt-Winters exponential smoothing models that produce k-step ahead forecasts, with t-j lags representing variable from the j weeks previous.

We test the null hypothesis that the squared forecast errors from the candidate models and the random walk benchmark are equal using the Diebold-Mariano (DM) procedure. When our candidate model is the futures price, a left-tailed rejection of the null would provide evidence in support of (20). This test statistic is:


ω^ is a consistent estimate of the long-run variance of T1/2d¯ that takes account of the serial correlation introduced by overlapping observations. Diebold and Mariano (1995) show that under the null that both predictors are equally accurate, this statistic is asymptotically normally distributed with DM ~ N(0,1). For models that nest the random walk and estimate additional parameters, we also report the results of hypothesis tests based on the adjusted MSFE of Clarke and West (2005). This test takes account of the noise introduced when, under the null hypothesis, parameters that are zero in the population affect the forecasts of models estimated from finite samples.

Our final test is not guided by any firm predictions from theory, although it is related to the “noise trader” perspective of commodity markets that has become increasingly prevalent (Vansteenkiste, 2011). The essence of this story is that financialization—defined as the influx of financial investors into commodity markets—has encouraged the participation of investors in futures markets that either have less information than traditional market participants or ignore fundamentals altogether. There is likely to be less of an effect on spot market activity because these investors often do not have the capacity to hold physical positions (which would involve storage and insurance). The participation of these types of investors is sometimes believed to increase during periods when prices are rising because they trade on price momentum; in other words, the more that prices increase, the greater their expectations of further price gains. In this scenario, we should expect that the forecasting performance of futures markets relative to spot markets should deteriorate during bull markets (i.e., when prices are rising) compared to bear markets (when prices are falling).

We test this hypothesis by estimating a regression in which the variable d is the dependent variable and a constant and a dummy representing bull markets are the independent variables. Bull and bear markets are identified using the Bry-Boschan algorithm that has been used previously to identify turning points in commodity price cycles (Cashin, McDermott, and Scott, 1999). The null hypothesis is that the relative forecasting ability of futures prices is the same in both bull and bear markets, or that the coefficient on the dummy variable is zero.

3. Data

3.1. Overview of data

Our sample of spot and futures prices begins in January 1990, ends in June 2011, and is sampled at a weekly frequency. Although futures prices are available for some commodities before 1990, this strikes a balance between a sufficiently long sample period and breadth of coverage. Prices for futures contracts are taken from Bloomberg. We use the set of the first 24 contracts, ordered by days to delivery. For those commodities with equally-spaced contracts with delivery dates for each month of the year, this implies that the futures curve stretches out for two years. In some cases, the number of traded contracts is less than 24, but the futures curve may stretch out further than two years; this is typically the case for agricultural commodities for which there is not a delivery date each month. In addition, there were insufficient data to undertake analysis at the two year horizon for aluminum and gasoline. Table 3 provides a detailed overview of the futures contracts used in the empirical analysis.

Table 3.

Commodity Price Data: Specifications

article image

3.2. Estimating fitted futures prices

One contribution of this paper is to provide a careful measure of the futures and spot price of each commodity, ensuring to the fullest extent possible that the horizon of the futures price matches that for the corresponding spot price change. Irregularly spaced futures contracts, even for those with delivery for each month of the year, make it difficult to ensure an exact match for high frequency observations. For example, there is no commodity for which there always exists a futures contract with a delivery date exactly 28 days forward that can be sampled at weekly intervals.

We overcome this constraint by estimating a smoothed futures curve for each week with a third order polynomial regression. This will provide a continuous curve from which we will always be able to extract an implied futures price at some exactly specified future date. In other words, at the end of each trading week, we estimate the following cross-sectional regression:


where F is the vector of futures prices for a particular commodity at the close of Friday trading with increasing maturity, γ0 is a constant, and t represents the number of days until delivery for each contract. The constant and the coefficients γ1, γ2, and γ3, are the parameters to be estimated for each commodity each week. To obtain the implied futures price for a notional contract with delivery exactly t1 days forward, we use the estimated coefficients from this equation and substitute in t1 for t. (In our case, we use t1 = 0, 91, 182, 364, and 728.) One result from this approach is that the implied spot price (t=0) equals γ0. This is useful when contract specification differences make it problematic to compare futures to actual spot prices. Another useful result is that it provides an implied price for illiquid long-dated contracts based on the liquid part of the curve rather than non-tradable price quotes.

To assess the average fit of (22) we report average R-squared statistics for each commodity (Table 4). These results indicate that for most commodities, a third-order polynomial provides an acceptable fit. The average R-squared for all commodities stands at a respectable 0.8. In some cases, the fit is below 0.8 and this reflects one of two causes. First, for some commodities seasonal factors mean that the curve may at times have more than one local maximum or exhibit a number of inflection points. In almost all of these cases, (22) still does a relatively good job (e.g., corn and wheat). Second, quoted futures prices at the back end of 2-year curve may be quite illiquid and not fully representative of market conditions. This is true, for example, for gold for which (22) should prove to be a very good fit (given the interest-arbitrage determined curve). Our least-squares method by definition tends to put more weight on the liquid and smooth part of the curve where arbitrage considerations (and actual market conditions) are more important.

Table 4.

Estimated Commodity Futures Price Curves: Measures of Fit

(R-squared statistics) 1/

article image

Average R-square for all equations per commodity.

4. Empirical results and Discussion

4.1. In-sample rationality

We first examine the weakest notion of rationality—whether futures markets make persistent in-sample prediction errors. As the preliminary step before testing for cointegration, we found that most spot and futures prices were nonstationary. These results, shown in Table 5, are the result of augmented Dickey-Fuller unit root tests with the lag length selected by Bayesian information criteria. Given our use of overlapping observations, we control for residual correlation by using Newey-West HAC standard errors with a bandwidth equal to the futures contract horizon (i.e. days to maturity) minus one.

Table 5.

Commodity Spot and Futures Prices: Unit Root Tests, Jan-1990 to Jun-2011

(t-statistics for the null hypothesis that the variable is a unit root process)

article image
Source: Authors’ estimates.1/ Bold figures indicate that the null hypothesis of a unit root was rejected at the 5 percent level using Dickey-Fuller critical values and t-statistics based on robust standard errors.

For those commodities with I(1) spot and futures prices, we were able to reject the null hypothesis that the coefficient ϕ = 0 from equation (13) in all cases (Table 6). In other words, the commodity futures price and the realized spot price are cointegrated and futures markets do not make persistent forecasting errors.

Table 6.

Hypothesis Tests of Cointegration Between Realized Spot and Futures Prices 1/

(Engle-Granger test statistics)

article image
Source: Authors’ calculations.

Based on Engle-Granger tests of the residuals from equation # with the number of lags selected by Bayesian information criteria. MacKinnon (1991) critical values.

Wald test p-value results for the null of efficiency (10) and the joint null of efficiency and unbiasedness (12) are shown in Table 7. The efficient market hypothesis was rejected for most commodities and at most horizons. In particular, at the 91 and 182 day horizons, less than half of the commodities are efficient at the 5 percent significance level. We can also rule out efficiency for all commodities at the one and two year horizon except for cotton.

Table 7.

Hypothesis Tests for the Efficiency and Unbiasedness of Futures Markets

(null hypothesis that market is efficient and unbiased, p-values)

article image
Source: Authors’ calculations.

Why was rationality, defined by efficiency and unbiasedness, so easy to reject? In most cases, lagged values of spot and futures prices provide significant in-sample explanatory power for realized spot prices. This finding was particularly true for the longer horizons of one and two years. Interpreting the coefficients on lags—which varied widely in sign and size across commodities—is difficult largely due to high degree of multicollinearity between the lags of realized spot prices and futures prices. One clear result, however, is that Bayesian criteria typically selected quite long lag lengths for the test equations, often of five or six periods, especially for horizons beyond 3 months. In other words, the evolution of spot prices seems to depend not only on lagged spot and futures prices, but on lags that stretch back some way into the past. The signs on these coefficients varied by commodity, which rules out a simple interpretation such as long-run mean reversion.

4.2. Out-of-sample forecasting

Root mean square errors (RMSEs) for each of the candidate models and the random walk benchmark are presented in Table 8. One striking result is the large size of these errors for all of the approaches considered, confirming the extent to which commodity prices are volatile and difficult to forecast. For example, the RMSE for crude oil, copper, and corn futures—three of the most liquid contracts in the energy, metals, and food groups, respectively—range between 15 to 19 percent at the three month horizon. Absolute forecasting performance, in most cases, deteriorates with the length of the horizon; for example, for the same three contracts the RMSE range was 31 to 45 percent at the two year horizon.

Table 8.

Root Mean Square Errors, 1990-2011


article image
article image

The second notable result from the RMSEs is that the futures price and random walk models appear to outperform all of the other models, particularly for horizons from 91 to 364 days. The futures price and random walk RMSEs averaged across all 10 commodities is 16.1 percent and 16.4 percent, respectively at the 91 day horizon. In contrast, the RMSEs for the time series models are about 9 percentage points higher and for the models producing k-step ahead forecasts (including the exponential smoother and the weekly ARIMA) about 2 percentage points higher. These gaps only narrow significantly at the 2 year horizon. The one exception to this is the good performance of the Holt-Winters exponential smoother at the longer horizons, suggesting some trend persistence that is not picked up by other models.

In Figure 2, we show extended results for crude oil. This includes a larger range of models (specifically ARIMA and ARMA without drift) and extends the forecast horizon to 3 years (or 1092 days). The deterioration in the forecasting performance of futures the longer the forecast horizon is particularly clear from this analysis.

Figure 2.
Figure 2.

Root Mean Square Errors, 1990-2011 (percent)

Citation: IMF Working Papers 2011, 254; 10.5089/9781463923891.001.A001

Hypothesis tests of out-of-sample forecast accuracy are presented in Table 9. The null hypothesis in these tests is that there is no difference in forecasting accuracy between the candidate model and the benchmark. On the basis of DM tests, it was possible to reject the null and conclude that the random walk is a better forecaster than most of the naïve reduced form models for the majority of commodities in our sample. Taking into account parameter uncertainty and on the basis of Clarke and West adjusted mean squared errors, however, the tests were less conclusive. In most cases, it was not possible to reject the null and ARIMA specifications also appear to outperform the random walk, especially at longer horizons for some agricultural commodities.

Table 9.

Relative Forecast Evaluation: Diebold-Mariano and Clarke-West Test Statistics

article image
article image
article image

Clearer results were obtained for futures prices. In almost all cases, the futures price outperformed the random walk, although statistical significance (at the 90 percent and 95 percent levels) of this result was less common. Futures prices did better at shorter horizons and became less accurate relative to a random walk at the two-year horizon. This suggests that lower liquidity at the back end of the curve may be impeding the price discovery process and reducing their use as forecasters, particularly relative to the Holt-Winters smoother. Time series models, on the other hand, did much worse than the random walk.

To illustrate some of these results in more detail, we focus on crude oil. Futures prices forecast better than a random walk at all horizons up to 2 years, although this result is statistically significant (at the 10 percent level) only at the 91 day horizon. Notwithstanding high liquidity in crude oil markets, the back end of the futures curve underperforms just as it does with many other commodities. One striking result is that, based on the DM test, we can reject the null at the 2 year and 3 year horizon and conclude that that the exponential smoother is a better forecaster than the random walk beyond 2 years. It is beyond the scope of our paper to understand the characteristics of commodity price dynamics and the smoother that produces this result, but medium-term reversion to persistent trends in prices seems to be playing some role in this result.

Table 10 shows the results from a conditional test of (20) that the futures price should be a better forecaster during periods when the market is backwardated (as discussed in section 2.2). The figures are the coefficients on a dummy variable which takes a value of 1 when the market is backwardated in a regression in which the dependent variable is the loss differential d. (19) predicts that this coefficient will be negative. In most cases, we cannot reject the null hypothesis that both prices are equally good forecasters in both contango and backwardation. In some cases, notably crude oil, we find that futures prices are worse forecasters in these markets. These results have important implications for forecasters. In particular, they caution against disregarding spot prices when making forecasts in a backwardated (and tight) market.

Table 10.

Relative Forecasts Futures Prices During Contango and Backwardation 1/

(coefficient on dummy variable = 1 in backwardation)

article image

Robust standard errors in parentheses. Figures in bold significant at the 95 percent level.

We also tested the relative performance of futures when the market is in significant backwardation, which we define here as a spot price more than 5 percent higher than the corresponding futures price. (We use 2 percent in the case of corn due to insufficient observations at the higher threshold. We also had to drop aluminum and gold from this analysis as there are very few occurrences of such extreme backwardation in these markets.) The results shown in Table 10 show that futures are somewhat better forecasters relative to a random walk in extreme backwardation, but this is not a consistent result across commodities or forecast horizons, in contrast to the findings of Reeve and Vigfusson (2011).

What can explain the results in Table 10? One possible reason is that commodity markets are segmented and that spot market participants make better forecasts than their future market counterparts (and offset the “error” caused by the cost of carry as in (19)). Segmentation is likely to be true, but it does not explain why market participants cannot arbitrage both markets; for example, spot market participants should sell spot and buy futures if they expect prices to rise in a backwardated market. Our inability to reject the null, particularly when conditioning on curve shape, is a puzzle when considered against conventional commodity pricing models.

Table 11 shows the results from the test of the null hypothesis that the forecasting performance of futures prices relative to a random walk does not depend on whether spot prices are in a bull or bear market (i.e., there is no difference in the loss function d in either case). Proponents of the view that financial speculation during bull markets leads to futures prices being driven away from fundamental levels should anticipate that futures prices do a worse job of forecasting during these periods. In Table 11, this would mean that the coefficients are positive (see section 2.2 for details).

Table 11.

Relative Forecasts of Futures Prices During Bull and Bear Markets 1/

(coefficient on dummy variable=1 in bull market)

article image

Robust standard errors in parentheses. Figures in bold significant at the 95 percent level.

We find very little evidence to support this view. In most cases, the relative forecasting ability of futures prices does not depend on the phase of the market (the exceptions are the 1-year crude oil and gasoline contracts and near-dated wheat contracts). One explanation for this result is that speculation affects both spot and futures markets and this is likely to be partially true. But at the same time, during a bull market, this would require a large rise in inventories as spot market participants anticipate persistent increases in prices. The evidence of much of the last decade—in which inventories declined as prices rose for many commodities—is inconsistent with this story. We conclude that the results in Table 11 undermine the view that index investing (which takes positions across a basket of commodities) distorts the price discovery process in futures markets.

5. Conclusion

We arrive at four main conclusions regarding the forecasting performance of commodity futures prices in this paper. First, futures price-based forecasts are hard to beat. Futures prices perform at least as well as a random walk for most commodities and at most horizons and, in some cases, do significantly better. But the second result is that the failure of futures prices to clearly (and statistically significantly) outperform this benchmark in almost all cases is a puzzle. The spot price reflects the cost of carry and is more influenced by current physical market conditions (and less by expectations of the future) than is the futures price. In the absence of constraints on arbitrage, this should mean that futures prices outperform the random walk, on average. Third, many other naïve time series models, including some that maximize in-sample fit, tend to do much worse than a random walk. Parameter instability renders many time-series models as useless, at best. Fourth, the relative forecasting ability of futures prices deteriorates the longer the forecast horizon, which likely reflects lower liquidity at the back end of futures curves.

We also assessed the forecasting performance of futures prices relative to a random walk during different market conditions. Theory predicts that futures prices should do much better than the random walk when the market is in backwardation because the influence of current market conditions on spot prices is particularly strong during these periods. However, we do not find a significant difference in forecasting ability between periods of backwardation and contango. This result holds even when spot prices are significantly above futures prices, in strong backwardation. What can explain this result? Over small sample periods it is possible that permanent shocks that increase prices could lead to better “forecasts” by spot prices. But over long periods and assuming a symmetric distribution of shocks, this cannot be the explanation. Segmented markets could also explain this result, with backwardated markets reflecting different and better information in the spot market about future spot prices than futures markets. But this would require strict and unrealistic limits to arbitrage; what would prevent spot market participants from simply buying cheaper futures contracts? We believe this apparent puzzle would benefit from further research.

We also do not find a significant difference in the forecasting ability of futures markets during bull and bear markets, defined as when spot prices are trending higher or lower. This new result suggests that the recent period of financialization has not distorted the futures price discovery process.


  • Beck, Stacie E., 1994, “Cointegration and Market Efficiency in Commodities Futures Markets,” Applied Economics, Vol. 26, pp. 249-257.

    • Search Google Scholar
    • Export Citation
  • Cashin, Paul Anthony, C. John McDermott and Alasdair M. Scott, 1999, “Booms and Slumps in World Commodity Prices,” IMF Working Paper, No. 99/155.

    • Search Google Scholar
    • Export Citation
  • Clements, Michael and David Hendry, 1998, Forecasting Economic Time Series, Cambridge University Press, Cambridge.

  • Chinn, Menzie D., and Coibion, Olivier, 2010, “The Predictive Content of Commodity Futures,” NBER working paper 15830.

  • Clarke, Todd E., and Kenneth D. West, 2007, “Approximately Normal Tests for Equal Predictive Accuracy in Nested Models,” Journal of Econometrics, Vol. 138, Issue 1, pp. 291-311.

    • Search Google Scholar
    • Export Citation
  • Diebold, Francis X., and Roberto S. Mariano, 1995, “Comparing Predictive Accuracy,” Journal of Business & Economic Statistics, Vol. 13, No. 3, pp. 253-263.

    • Search Google Scholar
    • Export Citation
  • Granger, C.W.J., and Paul Newbold, 1977, Forecasting Economic Time Series, Academic Press, New York.

  • Pindyck, Robert S., 2001, “The Dynamics of Commodity Spot and Futures markets: A Primer,” Energy Journal, Vol. 22, No. 3, pp. 1-29.

  • Reeve, Trevor A. and Robert J. Vigfusson, 2011, “Evaluating the Forecasting Performance of Commodity Futures Prices,” International Finance Discussion Papers, Number 1025.

    • Search Google Scholar
    • Export Citation
  • Roache, Shaun and Neşe Erbil, 2010, “How Commodity Price Curves and Inventories React to a Short-Run Scarcity Shock,” IMF Working Paper, No. 10/222.

    • Search Google Scholar
    • Export Citation
  • Theil, H., 1966, Applied Economic Forecasting, North-Holland, Amsterdam.

  • Vansteenkiste, Isabel, 2011, “What is Driving Oil Futures Prices? Fundamentals Versus Speculation,” ECB Working Paper, No. 1371

  • Williams, J.C., and B.D. Wright, B.D, 1991, Storage and commodity markets, Cambridge University Press.


We would like to thank Thomas Helbling, Paul Cashin, Menzie Chinn, and colleagues in the Research Department for helpful comments and advice. The usual disclaimer applies.

Do Commodity Futures Help Forecast Spot Prices?
Author: Mr. David A Reichsfeld and Mr. Shaun K. Roache