11 ERRORS AND BIAS
- International Monetary Fund
- Published Date:
- August 2004
11.1 This chapter discusses the general types of potential error to which all price indices are subject. The literature on consumer price indices (CPIs) discusses these errors from two perspectives, and this chapter presents the two perspectives in turn. First, the chapter describes the sources of sampling and non-sampling error that arise in estimating a population CPI from a sample of observed prices. Second, the chapter reviews the arguments made in numerous recent studies that attribute bias to CPIs as a result of insufficiently accurate treatment of quality change, consumer substitution and other factors. It should be emphasized that many of the underlying issues discussed here are dealt with in much greater detail elsewhere in the manual.
Types of error
11.2 One of the main objectives of a sample survey is to compute estimates of population characteristics. Such estimates will never be exactly equal to the population characteristics. There will always be some error. Table 11.1 gives a taxonomy of the different types of error. See also Balk and Kersten (1986) and Dalen (1995) for overviews of the various sources of stochastic and non-stochastic errors experienced in calculating a CPI. Two broad categories can be distinguished: sampling errors and non-sampling errors.
11.3Sampling errors are due to the fact that an estimated CPI is based on samples and not on a complete enumeration of the populations involved. Sampling errors vanish if observations cover the complete population. As mentioned in previous chapters, statistical offices usually adopt a fixed weight price index as the object of estimation. A fixed weight index can be seen as a weighted average of partial indices of commodity groups, with weights being expenditure shares. The estimation procedures that most statistical offices apply to a CPI involve different kinds of samples. The most important kinds are:
for each commodity group, a sample of commodities to calculate the partial price index of the commodity group;
for each commodity, a sample of outlets to calculate the elementary price index of the commodity from individual price observations;
a sample of households needed for the estimation of the average expenditure shares of the commodity groups. (Some countries use data from national accounts instead of a household expenditure survey to obtain the expenditure shares.)
11.4 The sampling error can be split into a selection error and an estimation error. A selection error occurs when the actual selection probabilities deviate from the selection probabilities as specified in the sample design. The estimation error denotes the effect caused by using a sample based on a random selection procedure. Every new selection of a sample will result in different elements, and thus in a possibly different value of the estimator.
11.5Non-sampling errors may occur even when the whole population is observed. They can be subdivided into observation errors and non-observation errors. Observation errors are the errors made during the process of obtaining and recording the basic observations or responses.
11.6Overcoverage means that some elements are included in the survey which do not belong to the target population. For outlets, statistical offices usually have inadequate sampling frames. In some countries, for instance, a business register is used as the sampling frame for outlets. In such a register, outlets are classified according to major activity. The register thus usually exhibits extensive overcoverage, because it contains numerous outlets which are out of scope from the CPI perspective (e.g. firms that sell to businesses rather than to households). In addition, there is usually no detailed information on all the commodities sold by an outlet, so it is possible that a sampled outlet may turn out not to sell a particular commodity at all.
11.7Response errors in a household expenditure survey or price survey occur when the respondent does not understand the question, or does not want to give the right answer, or when the interviewer or price collector makes an error in recording the answer. In household expenditure surveys, for example, households appear to systematically underreport expenditures on commodity groups such as tobacco and alcoholic beverages. In most countries, the main price collection method is by persons who regularly visit outlets. They may return with prices of unwanted commodities.
11.8 The price data are processed in different stages, such as coding, entry, transfer and editing (control and correction). At each step mistakes, so-called processing errors, may occur. For example, at the outlets the price collectors write down the prices on paper forms. After the collectors have returned home, a computer is used as the input and transmission medium for the price information. It is clear that this way of processing prices is susceptible to errors.
11.9Non-observation errors are made when the intended measurements cannot be carried out. Undercoverage occurs when elements in the target population do not appear in the sampling frame. The sampling frame of outlets can have undercoverage, which means that some outlets where relevant commodities are purchased cannot be contacted. Some statistical offices appear to exclude mail order firms and non-food market stalls from their outlet sampling frame.
11.10 Another non-observation error is non-response. Non-response errors may arise from the failure to obtain the required information in a timely manner from all the units selected in the sample. A distinction can be drawn between total and partial (or item) non-response. Total non-response occurs when selected outlets cannot be contacted or refuse to participate in the price survey. Another instance of total non-response occurs when mail questionnaires and collection forms are returned by the respondent and the price collector, respectively, after the deadline for processing has passed. Mail questionnaires and collection forms that are only partially filled in are examples of partial non-response. If the price changes of the non-responding outlets differ from those of the responding outlets, the results of the price survey will be biased.
11.11 Total and partial non-response may also be encountered in a household expenditure survey. Total non-response occurs when households drawn in the sample refuse to cooperate. Partial non-response occurs, for instance, when certain households refuse to give information about their expenditure on certain commodity groups.
Measuring error and bias
Estimation of variance
11.12 The variance estimator depends on both the chosen estimator of a CPI and the sampling design. Boon (1998) gives an overview of the sampling methods that are applied in the compilation of CPIs by various European statistical institutes. It appeared that only four of them use some sort of probability techniques for outlet selection, and only one uses probability sampling for item selection. In the absence of probability techniques, so-called judgemental and cut-off selection methods are applied.
11.13 In view of the complexity of the (partially connected) sample designs in compiling a CPI, an integrated approach to variance estimation appears to be problematic. That is, it appears to be difficult to present a single formula for measuring the variance of a CPI, which captures all sources of sampling error. It is, however, feasible to develop partial (or conditional) measures, in which only the effect of a single source of variability is quantified. For instance, Balk and Kersten (1986) calculated the variance of a CPI resulting from the sampling variability of the household expenditure survey, conditional on the assumption that the partial price indices are known with certainty. Ideally, all the conditional sampling errors should be put together in a unifying framework in order to assess the relative importance of the various sources of error. Under rather restrictive assumptions, Balk (1989a) derived an integrated framework for the overall sampling error of a CPI.
11.14 There are various procedures for trying to estimate the sampling variance of a CPI. Design-based variance estimators (that is, variances of Horvitz-Thompson estimators) can be used, in combination with Taylor linearization procedures, for sampling errors arising from a probability sampling design. For instance, assuming a cross-classified sampling design, in which samples of commodities and outlets are drawn independently from a two-dimensional population, with probabilities proportional to size (PPS) in both dimensions, a design-based variance formula can be derived. In this way Dalen and Ohlsson (1995) found that the sampling error for a 12-month change of the all-commodity Swedish CPI was of the order of 0.1-0.2 per cent.
11.15 The main problem with non-probability sampling is that there is no theoretically acceptable way of knowing whether the dispersion in the sample data accurately reflects the dispersion in the population. It is then necessary to fall back on approximation techniques for variance estimation. One such technique is quasi-randomization (see Sarndal, Swensson and Wretman (1992, p. 574)), in which assumptions are made about the probabilities of sampling commodities and outlets. The problem with this method is that it is difficult to find a probability model that adequately approximates the method actually used for outlet and item selection. Another possibility is to use a replication method, such as the method of random groups, balanced half-samples, jackknife, or bootstrap. This is a completely non-parametric class of methods to estimate sampling distributions and standard errors. Each replication method works by drawing a large number of sub-samples from the given sample. From each sub-sample the parameter of interest can be estimated. Under rather weak conditions, it can be shown that the distribution of the resulting estimates approximates the sampling distribution of the original estimator. For more details on the replication methods see Sarndal, Swensson and Wretman (1992, pp. 418–445).
Qualitative descriptions of non-sampling errors
11.16 It is still more difficult to obtain quantitative measures of the non-sampling errors. Thus the use of qualitative indications is the only possibility. For instance, the coverage of the sampling frames as a proxy of the target populations can be addressed (including gaps, duplications and definitional problems). The percentage of the target outlet samples from which responses or usable price data were obtained (i.e. the response rates) can be provided. Any known difference in the prices of responding outlets and non-responding outlets can be described, as can an indication of the method of imputation or estimation used to compensate for non-response. Several categories of non-sampling errors provide the bulk of the bias issues discussed below.
Procedures to minimize errors
11.17 The estimation error can be controlled by means of the sampling design. For example, by increasing the sample size, or by taking selection probabilities proportional to some well-chosen auxiliary variable, the error in the estimated CPI can be reduced. The choice of an adequate sampling design for the CPI is an extremely complex matter. The target population is the set of all goods and services that are acquired, used or paid for by households from outlets in a particular time period. A proper probability sampling procedure selects a sample by a random mechanism in which each good or service in the population has a known probability of selection. In combination with a Horvitz-Thompson estimator, such a probability sampling design will produce an index that is (approximately) unbiased and precise.
11.18 The following three probability sampling designs are used extensively in survey practice: simple random (SI) sampling, probability proportional to size (PPS) sampling, and stratified sampling with SI or PPS sampling per stratum. The advantage of SI sampling is its simplicity; it gives each population element the same probability of being included in the sample. PPS sampling has the advantage that the more important elements have a larger chance of being sampled than the less important ones. For instance, at Statistics Sweden the outlets are selected with probabilities proportional to some proxy for size, namely their number of employees. Unequal probability designs can lead to a substantial variance reduction in comparison with equal probability designs. In stratified sampling, the population is divided into non-overlapping sub-populations called strata. For instance, at the United Kingdom Office for National Statistics the population of outlets is split by outlet type (multiple, independent or specialist) to form different strata. In each stratum a sample is selected according to a certain design. One of the reasons why stratified sampling is so popular is that most of the potential gain in precision of PPS sampling can be captured through stratified selection with SI sampling within well-constructed strata. Stratified sampling is in several aspects simpler than PPS sampling.
11.19 Because appropriate sampling frames are lacking, samples are frequently obtained by non-probability methods. Judgemental (or expert choice) sampling is one form of non-random selection. In this case an expert selects certain “typical” elements where data are to be collected. With skill on the part of the expert a fairly good sample might result, but there is no way to be sure. A more sophisticated non-probability method is quota sampling. In quota sampling the population is firstly divided into certain strata. For each stratum, the number (quota) of elements to be included in the sample is fixed. Next the interviewer in the field simply fills the quotas, which means in the case of outlet sampling that the selection of the outlets is ultimately based on the judgement of the price collectors. Another non-probability method is cut-off sampling, which means that a part of the target population is deliberately excluded from the sample selection process. In particular, this procedure is used when the distribution of the value of some auxiliary variable is highly skewed. For instance, a large part of the population may consist of small outlets whose contribution to total sales is modest. A decision may then be taken to exclude from the sampling frame the outlets with the lowest sales. Because the selection is non-random, non-probability methods usually lead to more or less biased estimates. Empirical results of research undertaken by Statistics Netherlands nevertheless show that non-probability selection methods do not necessarily perform worse, in terms of the mean square error, than probability sampling techniques (De Haan, Opperdoes and Schut, 1997).
11.20 Provided that the sampling design is given, the sampling variance of an estimated (all-commodities) CPI can in general be lowered by:
–enlarging the samples of households, commodities and outlets;
–the application of suitable stratifications to the various populations (e.g. grouping commodities with respect to similarity of price changes).
11.21 It is important to allocate optimally the available resources both between and within the different CPI samples, since badly allocated samples may lead to unnecessarily high sampling errors. The Swedish variance estimation results, presented in Dalen and Ohlsson (1995), show that the error resulting from commodity sampling is relatively high compared with the error resulting from outlet sampling. In this case, it is worthwhile increasing the sample size of commodities and reducing the sample size of outlets.
11.22 A systematic analysis of sampling errors offers possibilities for improving or reducing cost. The problem of optimum sample allocation is usually formulated as the determination of the sizes of the samples of commodities and outlets, and their distribution over the strata that minimizes the sampling error of an all-commodities CPI, subject to the available budget.
11.23 As already mentioned, a business register is usually not an adequate sampling frame for outlets, because it provides extensive overcoverage. It is recommended to set up an appropriate sampling frame by enumeration of the main outlets within each sampled municipality. Such enumeration yields a list of all outlets in a municipality together with the commodity groups that belong to their assortments. A less expensive way to organize an outlet sampling frame is to ask the price collectors–who may be assumed to know the local situation well–to make a list of outlets where purchases are made by households.
11.24 The populations of commodities (and varieties) and outlets are continually changing through time. The composition of most commodity groups is not constant over time, because commodities disappear from the market and new ones appear. The passage of time also plays a disturbing role with respect to the outlet population: outlets close, temporarily or permanently; new outlets emerge; the importance of some outlets diminishes or increases. The samples of commodities (and varieties) and outlets should be reviewed and updated periodically to maintain their representativity with respect to the current buying habits of the households.
11.25Response errors caused by the underreporting of certain categories of household expenditure can be adjusted by using producer-based estimates from the national accounts (see Linder (1996) for an example). Measurement errors by price collectors can be reduced by providing them with hand-held computers for data entry. In this way the validation of observed prices can be executed at the point of price collection (i.e. in the outlet), by means of an automatic comparison of the currently observed price quote with the previously observed one (by setting a limit on the percentage price change) and with the price quotes obtained from other outlets (by setting suitable upper and lower limits). Details are provided by Haworth, Fenwick and Beaven (1997).
11.26 It is useful to appoint data collection supervisors to conduct quality assurance checks on the data collectors. It is also a good idea to organize regularly meetings where price collectors and statisticians from the head office can share their experiences. In this way, the statisticians will keep in touch with the conditions in the field, and may take the opportunity to provide more information about frequently made price collection errors and new representative goods.
11.27 It is important to check the collected price data for processing errors and, where possible, to correct these errors. This activity is called data editing. When editing is carried out on individual observations, it is called micro-editing. When the resources to spend on data editing must be minimized, while at the same time maintaining a high level of data quality, selective editing and macro-editing are possibilities. Selective editing is a form of traditional micro-editing, in which the number of edits is kept to a minimum. Only those edits which have an impact on the survey results are carried out. Macro-editing offers a top-down approach. The edits are carried out on aggregated data (for instance, the price index numbers of a commodity group) instead of individual records (for example, price observations). Micro-editing of individual records is then carried out only if macro-edits raise suspicion. In particular, attention should be paid to outliers among the observations.
11.28Non-response usually introduces selection bias. There are three methods for the treatment of missing price observations. First, the corresponding price can be excluded from the data set of previous prices, so that the set of previous prices is “matched” with the set of current prices. Second, this matching can be achieved by using an imputed (or artificial) price for the missing one. The imputed price can be calculated by either carrying forward the previous price observation or by extrapolating the previous price observation using the change of other price observations for the same commodity. Third, there is the possibility to reweight the sample. The objective of reweighting is to inflate the weight given to the prices of the responding outlets. This compensates for those prices that are lost by non-response.
11.29 In a household expenditure survey, missing data are usually imputed with the help of information on the same household from a previous observation period or other households from the same observation period. To reduce bias in the average expenditure pattern arising from selective non-response, a household expenditure survey sample of households is generally post-stratified by a number of household characteristics, such as income, composition and size.
Types of bias
11.30 This section reviews several categories of error, either in pricing or in index construction, that potentially can lead to bias in the overall CPI. The emphasis here is on the categorization of errors, along with some consideration of their likely size, rather than on methods to reduce or eliminate the errors. The question might arise of why such a discussion is necessary, since such issues as quality change, and the appropriate methods for handling them in the CPI, are dealt with at both a conceptual and operational level in other chapters.
11.31 The reason this chapter addresses the topic of CPI bias per se is the great surge in interest in price measurement problems during the mid-1990s. Especially in the United States, the view became widespread that the CPI was subject to systematic upward biases because of the failure to deal adequately with consumer substitution, product quality improvements, and the introduction of new items and services. Moreover, it was recognized, first, that the existence of such upward bias would have fundamental implications for the measurement of recent trends in output and productivity, and second, that the elimination of upward bias could substantially improve the government budget situation through reduced government expenditures and increased tax revenues (see, for example, Eldridge (1999) and Duggan and Gillingham (1999)). These discoveries led to a series of papers and reports on CPI measurement problems, often accompanied by point estimates of aggregate bias.
11.32 Prominent examples of these quantitative studies of bias are those by the Advisory Commission to Study the CPI (United States Senate, 1996), Congressional Budget Office (1994), Crawford (1998), Cunningham (1996), Dalen (1999a), Diewert (1996c), Lebow, Roberts and Stockton (1994), Lebow and Rudd (2003), Shapiro and Wilcox (1997b), Shiratsuka (1999), White (1999), and Wynne and Sigalla (1994). Responses and estimates by statistical agencies include those provided by Abraham et al. (1998), US Bureau of Labor Statistics (1998), Ducharme (1997), Edwards (1997), Fenwick (1997), Lequiller (1997), Moulton (1996b), and Moulton and Moses (1997). Among the many other discussions of the CPI bias issue are those reported by Baker (1998), Boskin et al. (1998), Deaton (1998), Diewert (1998a), Krueger and Siskind (1998), Nordhaus (1998), Obst (2000), OECD (1997), Pollak (1998), Popkin (1997), and Triplett (1997).
11.33 Two points are worth making at the outset with respect to measuring bias in CPIs. First, the issue has usually been addressed in the context of the cost of living index (COLI). That is, the CPI bias has been denned as the difference between the rate of increase in the CPI and the rate of increase in a true COLI. Many authors on bias have taken as given that the COLI should be the CPFs measurement objective. Somewhat different conclusions might be reached if the index objective were taken to be a pure price index. Notably, the gains in consumer welfare from a widening array of new goods, or the ability of consumers to substitute away from items with increasing relative prices, might be deemed irrelevant and an index that ignored those factors might not be judged biased on that account.
11.34 The second point is that CPI bias is not amenable to estimation with the same level of rigour as that used in CPI variance estimation. Since the COLI or other ideal target index is unobserved, analysts have been forced to rely in part on conjectures and on generalizations from fragmentary empirical evidence in order to quantify the extent of bias. The notable exceptions are with respect to substitution bias, when traditional Laspeyres indices and indices using superlative formulae can be computed using the same underlying price and expenditure data, and the differences construed as a measure of the upward bias from use of the Laspeyres formula.
11.35 Several different taxonomies of bias have appeared in the literature mentioned above. It is sufficient, however, to employ four categories roughly corresponding to those set forth in the best-known study, namely the Final report of the Advisory Commission to Study the CPI (the Boskin Commission), established by the United States Senate Finance Committee in 1995. These categories are: upper-level substitution bias; elementary aggregate bias; quality change and new goods bias; and new outlet bias.
11.36 These categories can be further broken down into two subgroups according to whether they refer to errors in individual price measurements or errors in computing index series. Quality change bias and new goods bias arise because of failures to measure adequately the value to consumers of individual goods and services that appear in (or disappear from) the marketplace. It should be recognized that discussions of “new goods” problems apply equally to all products, whether goods or services. At a conceptual level, it can be difficult to distinguish these two biases from each other. Operationally, however, quality change bias pertains to the procedures for comparing new products or models with the older products they replace in the CPI samples. In general, new goods bias can be thought of as applying to wholly new types of products, or products that would not enter samples routinely through forced replacement. New outlet bias, sometimes referred to as outlet substitution bias, is similar to new goods bias but is focused on the appearance of new types of stores or marketing methods that offer goods at lower prices or higher quality.
11.37 The other categories of bias refer to the procedures for constructing index values from component series. As noted throughout this manual, CPI construction can be thought of as taking place in two steps, or at two levels. At the lower level, individual price quotations are combined; at the upper level, these basic indices are aggregated together. Corresponding to these two levels are two forms of potential bias. Elementary aggregate bias involves the averaging formulae used to combine price quotations into basic indices. Upper-level substitution bias applies to the formulae used to combine those elementary aggregates into higher-level indices. These components of potential bias, and the means used to measure them, are discussed in more detail below.
Components of bias
Upper-level substitution bias
11.38 Upper-level substitution bias is perhaps the most widely accepted source of CPI bias, and the kind with which economists are most familiar from textbook expositions of price index theory and practice. Simply stated, it arises when CPIs employ the Laspeyres formula (see Chapter 17), which is well known to provide an upper bound on a cost of living index under certain assumptions about consumer behaviour. As noted in paragraph 11.34 above, quantitative measures of upper-level substitution bias can be generated by comparing Laspeyres price indices to Fisher ideal, Tornqvist or other superlative indices. Under certain assumptions about, for example, constant preferences, these will stand as relatively precise bias estimates.
11.39Genereux (1983) and Aizcorbe and Jackman (1993) provide such index comparisons and estimates of upper-level substitution bias using actual CPI index series for Canada and the United States, respectively. Other early studies by Braithwait (1980) and Manser and McDonald (1988) estimate the substitution bias in United States national account indices. In lieu of superlative indices, the Braithwait study uses estimated exact cost of living indices based on demand system estimation. A similar estimate for the Netherlands is provided by Balk (1990). In these studies, the existence of an upward bias from the Laspeyres formula is demonstrated consistently. The biases in the annual index changes in individual years are relatively small, averaging 0.1 to 0.3 percentage points, and depend empirically on such factors as the distance from the Laspeyres base period, the level of index detail at which the alternative formulae are applied, and whether the superlative index is of the fixed base or chained variety.
11.40 The major differences between Laspeyres and superlative indices derive from the variation in relative prices over the period being compared, and from the shift in quantities consumed towards those index categories that have fallen in relative price. This leads to several conclusions:
If index movements are characterized by continuing, uniform drift in relative prices over time, with accompanying drifts in consumption, the size of the annual Laspeyres bias will tend to increase with the distance from the base period. (Greenlees (1997) notes, however, that there is little evidence for this phenomenon in the United States; see also Szulc (1983).)
Under the same circumstances, reducing the expenditure weight chaining interval will work to reduce the upper-level substitution bias in the Laspeyres CPI. The more frequent chaining will increase the weight given to indices that are falling in relative price, thereby reducing the rate of CPI growth. Conversely, if there is “bouncing” in relative index movements, frequent chaining can lead to an upward “chain drift” in a Laspeyres index.
Upper-level substitution bias will tend to be larger during periods of higher inflation, if these periods also have greater relative price variation. Little empirical evidence exists on this point, however.
11.41 The concept of upper-level substitution bias has been derived and discussed in the context of cost of living index theory, but an equivalent bias may be defined from the perspective of the pure price index. If the Fisher ideal or other superlative index is judged preferable on the basis of its symmetric treatment of base period and current period expenditure patterns, then the difference between that index and a Laspeyres could be interpreted as a measure of representativity bias. A similar argument could be applied with respect to lower-level substitution bias within elementary index cells.
11.42 Recently, Lebow and Rudd (2003) have defined and estimated another category of bias related to upper-level aggregation. They concluded that the consumer expenditure survey weights used in the United States CPI were subject to error because of, for example, under-reporting of alcohol and tobacco expenditures. This will lead to a weighting bias if the errors in relative weight are correlated with component index changes. (Sources for, and problems in, expenditure weight estimation are discussed in detail in Chapter 4.)
Elementary aggregate bias
11.43 Elementary aggregate bias can be divided into two components: formula bias and lower-level substitution bias. An elementary index in the CPI is biased if its expectation differs from its measurement objective. The term formula bias (or functional form bias) is used here to denote a situation in which the elementary index formula has an upward bias relative to the pure price index. When the measurement objective is a cost of living index, the elementary index formula suffers from lower-level substitution bias (or within-stratum substitution bias) if it does not reflect consumer substitution among the items contained in that index cell. Thus, given any elementary index formula, the two forms of bias can be distinguished according to the objective of the elementary index.
11.44Chapters 9 and 20 of this manual discuss the characteristics of alternative elementary index formulae. A key result is that the Carli formula for the arithmetic average of ratios has an upward bias relative to the trend in average item prices. Consequently, Eurostat has prohibited use of this formula in computations for the Harmonized Indices of Consumer Prices (HICPs). The weighted formulae used in basic indices of the United States CPI had some characteristics of the Carli formula prior to procedural and computational changes made in 1995 and 1996. The problems and the methods chosen to address them are discussed, for example, by Reins-dorf and Moulton (1997) and Moulton (1996b).
11.45 The ratio of arithmetic averages (Dutot) and geometric mean (Jevons) formulae eliminate formula bias as defined here, and both are permitted by Eurostat. Their expectations differ, however, when item prices do not change at a uniform rate. The differences provide one way of evaluating the potential importance of lower-level substitution bias. The geometric mean formula is exact for a cost of living index if consumers follow the Cobb-Douglas behavioural model, whereas the formula based on the ratio of arithmetic averages corresponds to zero-substitution behaviour. Thus, if the goal is to approximate a cost of living index, the geometric mean formula is likely to be judged preferable.
11.46 In the future, scanner data may make it possible to record item-level consumption data at a daily, weekly or monthly frequency and to use those data in superlative index calculations. Currently, however, it is impossible to employ superlative formulae to compute elementary CPI indices. Some assumption, such as the Cobb-Douglas, must be made in order to approximate a cost of living index. Note that the substitution that the index ideally should reflect involves consumer choice among all the items in the cell: different products, products in different outlets, different package sizes of the same product, or the same product offered for sale at different times of the period to which the index applies (see Dalton, Greenlees and Stewart (1998)). Thus, the appropriate degree of assumed substitution behaviour should depend, in principle, on the dimensions of variety within the item category.
11.47 The method used by the statistical agency for sampling items within a category will determine the effectiveness of formula choice in dealing with lower-level substitution bias. For example, if only a single representative item is chosen to represent the category, the index formula will fail to reflect the consumer response to any relative price change in the universe of items. More generally, the geometric mean formula index suffers from an upward bias in small samples, so lower-level substitution bias may be underestimated in empirical comparisons of the geometric mean to other index formulae. White (1999) discusses the relationship between sampling error and bias estimates. See also McClelland and Reinsdorf (1999) on the small sample bias in the geometric mean.
11.48 The impact of formula choice can be estimated with some degree of precision over a given historical period. Any corresponding bias, however, can be estimated only by assuming that the geometric mean or other functional form successfully approximates the index’s measurement objective.
11.49 As implied by the above discussion, the importance of elementary aggregate bias will vary by country, depending on the particular index formulae used, the degree of heterogeneity within index strata, and the sampling methods employed. Also, as with upper-level substitution bias, elementary aggregate bias will vary with the overall level of inflation in the economy if absolute and relative price changes are correlated.
11.50 The performance of any formula for elementary aggregate calculation will also be affected by the methods used by the statistical agency to handle special situations, such as seasonal goods and other products that are temporarily unavailable. Armknecht and Maitland-Smith (1999) discuss how the failure to impute missing prices can lead to bias in the modified Laspeyres and other index formulae.
Quality change and new products bias
11.51 Discussion of potential CPI biases arising from inadequate quality adjustment has a long history. For example, the Stigler Committee report on United States price statistics (Price Statistics Review Committee, 1961) indicated that “if a poll were taken of professional economists and statisticians, in all probability they would designate (and by a wide majority) the failure of the price indices to take full account of quality changes as the most important defect of these indices”. In most studies of bias, unmeasured or mismeasured quality change is also the largest contributor to the total estimated bias. Just as quality adjustment is widely recognized as an extremely difficult process, however, it is correspondingly difficult to measure any quality change bias.
11.52 Unlike substitution bias, which can be estimated by comparison of alternative formulae, quality change bias must be analysed on a product-by-product basis. Products and their associated index components will experience widely varying rates of quality change over time. Moreover, the methods used for quality adjustment will also vary. Whereas the linking method may dominate in terms of frequency of use, important index components may employ production cost, hedonic adjustment, or the other methods described in Chapters 7 and 21.
11.53 A crucial point to recognize is that the direction of overall quality change does not imply the direction of any quality change bias. Non-experts sometimes assume that the CPI does little or no quality adjustment, and that it therefore must overestimate price change in view of the many demonstrable improvements over time in the quality of goods and services. Rather, for any component index, the issue is whether the direct or indirect method chosen for quality adjustment overestimates or underestimates the relative quality of replacement items in the CPI sample. The resulting bias can be either positive or negative.
11.54 Empirical evidence on quality change bias has been based largely on extrapolation from individual studies of particular products. These individual studies may involve, for example, comparisons of hedonic regression indices to the corresponding CPI series or estimates of the value of some product improvement that is ignored in CPI calculations. Although the majority of such studies have suggested upward rather than downward bias, the reliance on fragmentary evidence has led to criticism by observers who point to evidence of quality declines that have not been subjected to systematic analysis.
11.55 Especially for services, overall quality trends can also be a matter of subjective valuation. New technology has led to unambiguous improvements in the quality of many consumer durables and other goods. By contrast, in service sectors such as mail delivery, public transport and medical care, it can be difficult to evaluate changes in quality. Airline travel, for example, has become safer and faster but perhaps less comfortable and reliable in recent decades, and the lack of cross-sectional variation in these characteristics makes the use of hedonic quality adjustment problematic.
11.56 New product bias, like elementary aggregate bias, can be divided conceptually into two components. The first concerns the failure to bring new products into the CPI sample with sufficient speed. This can lead to upward bias if those new products later experience large price reductions that are not reflected in the index. The second component is the welfare gain that consumers experience when a new product appears. This may not be viewed as a bias, however, when the cost of living index is not accepted as the CPI’s measurement objective.
11.57 As discussed in Chapter 8, “new goods” can be: products that replace predecessor items, for example CDs replacing vinyl records and tapes; product varieties that widen the range of consumer choice, such as imported beers and ethnic restaurants; or products that represent wholly new categories of consumption, such as microwave ovens or mobile telephones.
11.58 Like quality change bias, new product bias has sometimes been estimated primarily by generalization from individual product evidence. A frequent approach has been to measure the price change for a product or category during a period prior to its entry into the CPI sample. Studies by Hausman (1997, 1999) of breakfast cereals and cellular telephones provided quantitative measures of the consumer surplus gain from the new products, but this complex econometric approach has not been applied widely. Some of the Boskin Commission’s estimates of new product bias, notably those for food, were necessarily based on conjecture.
11.59 Also, like quality change bias, new product bias could be negative if the range of products decreases, if valuable consumer goods disappear from the market, or if the index fails to capture phases of rapid price increase for items. Most observers, however, seem to agree on the direction of bias as upward, and that the uncertainty concerns the magnitude.
New outlet bias
11.60 Conceptually, new outlet bias is identical to new product bias. It arises because of the failure to reflect either price changes in new outlets not yet sampled, or the welfare gain to consumers when the new outlets appear. The explanation for its existence as a separate bias category is twofold. The first reason is historical: new outlet bias was identified by Reinsdorf (1993) as a potentially major explanation for anomalous movements in the United States CPI. Second, the methods used to sample and compare outlets differ from those used with products, and the problems in controlling new outlet bias are somewhat different.
11.61 A failure to maintain a current outlet sample can introduce bias because the new outlets are distinctive in their pricing or service policy. Reinsdorf (1993), for example, focused on the growth of discount stores. It should be noted, however, that the problem could also be geographical in nature; it is important to employ outlet sampling frames that reflect new as well as traditional shopping locations.
11.62 One way that new products enter the CPI sample is through forced replacement, when exiting or less successful products disappear from shelves. Outlet disappearance is less frequent, and agency procedures may not provide for automatic replacement. Moreover, when a new outlet enters the sample there are no standard procedures for comparing data at the new and old outlets. Thus, the index will not incorporate any effects of, for example, lower price or inferior service quality at the new outlet.
11.63Reinsdorf (1993) estimated the degree of new outlet bias by comparing average prices at outlets entering and disappearing from United States CPI samples. There has been little or no empirical work, however, on the measurement or consumer valuation of outlet quality. As a consequence, there is little evidence on which to evaluate the accuracy of new outlet bias estimates.
Summary of bias estimates
11.64 The 1996 Boskin Commission report gave a range of estimates for the total upward United States CPI bias of 0.8 to 1.6 percentage points, with the point estimate being 1.1 percentage point. This total reflects the straightforward summation of the component bias estimates. As reported by the United States in United States General Accounting Office (2000), however, changes in CPI methods subsequent to 1996 led the Boskin Commission members to reduce their estimates of total bias. Lacking evidence to the contrary, additivity of biases has been assumed in most such studies. Shapiro and Wilcox (1997b) provide probability distributions and correlations of their component bias estimates, yielding an overall confidence interval for the total bias. Most detailed studies of bias also conclude that the CPI bias is in an upward direction, although there have been numerous criticisms of that conclusion.
11.65 It is apparent that statistical agencies cannot compute or publish CPI bias estimates on a regular basis. Many of the same obstacles that prevent the elimination of bias also stand in the way of estimating bias. These include the lack of complete data on product-level consumer preferences and spending behaviour, and the inability to observe and value all differences in quality among items in the marketplace. Without such information it is impossible to calculate a true cost of living index, and similarly impossible to measure the divergence between its rate of growth and the growth rate of the CPI.
11.66 Statistical agencies have been reluctant to provide their own estimates of CPI bias. In some cases, they have accepted the existence of substitution bias, recognizing that the use of a Laspeyres formula implies that the CPI usually will overstate price change relative to a cost of living index. Statistical agencies have, however, been reluctant to draw even qualitative conclusions from the fragmentary and speculative evidence on quality change, new products and new outlet bias.
11.67 In order to ensure public confidence in a CPI, a detailed and up-to-date description of the methods and data sources should be published. The document should include, among other things, the objectives and scope of the index, details of the weights, and last but not least, a discussion of the accuracy of the index. A description of the sources and magnitude of the sampling and non-sampling errors (coverage, non-response rates, etc.) in a CPI provides users with valuable information on the limitations that might apply to their uses of the index. One example of a handbook of CPI methods is that published by the United States Bureau of Labor Statistics (1997), which devotes a section to the varieties and sources of possible error in the index.