# 21. Quality Change and Hedonics

- International Monetary Fund
- Published Date:
- September 2004

**21.1**Chapters 15 to 20 cover theoretical issues relating to the choice of index number formulas and are based on a simplifying assumption: that the aggregation was over the same matched *i* = 1....*n* items in the two periods being compared. This meets the needs of the discussion of alternative index number formulas, since a measure of price change between two periods requires the quality of each item to remain the same. The practical compilation of PPIs involves defining the *price basis* (quality specification and terms of sale) of a sample of items in an initial period and monitoring the prices of this matched sample over time, so that only “pure” price changes are measured, not price changes tainted by changes in quality. In practice, this matching becomes imperfect. The quality of what is produced *does* change, and, furthermore, new goods (and services) appear on the market that the matched sampling ignores. The relative price changes of these new goods may differ from those of the existing ones, leading to bias in the index if they are excluded. In this chapter, a theoretical framework is outlined that extends the definition of items to include their quality characteristics. The focus of the chapter is on the *economic* theory of the market for quality characteristics and its practical manifestation in hedonic regression outlined in chapter 7,Section E.4. This provides a *background* for the more practical issues relating to quality adjustments in chapter 7 and item substitution in chapter 8.

## A. New and Disappearing Items and Quality Change

**21.2** The assumption in the previous chapters was that the same set of items was being compared in each period.^{1} Such a set can be considered as a sample from all the matched items available in periods 0 and *t*—the *intersection universe*, which includes only matched items. Yet, for many commodity markets, old items disappear and new items appear. Constraining the sample to be drawn from this intersection universe is unrealistic. Establishments may produce an item in period 0, but it may not be sold in subsequent periods *t*.^{2} New items may be introduced after period 0 that cannot be compared with a corresponding item in period 0. These items may be variants of the old existing ones or provide totally new services that cannot be directly compared with anything that previously existed. This universe of all items in periods 0 and *t* is the dynamic *double universe*.

**21.3** There is a third universe from which prices might be sampled: a *replacement universe*. The prices reported by establishments are those for an agreed price basis—a detailed description of the item being sold and the terms of the transaction. The price basis for items in period 0 are first determined, and then their prices are monitored in subsequent periods. If the item is discontinued and there are no longer prices to record for a particular *price basis*, prices of a comparable replacement item may be used to continue the series of prices. This universe is a *replacement universe* that starts with the base-period universe, but it also includes one-to-one replacements when an item from the sample in the base period is missing in the current period.

**21.4** When a comparable replacement is unavailable, a noncomparable one may be selected. In this case, an explicit adjustment has to be made to the price of either the old or the replacement item for the quality difference. Since the replacement is of a different quality than the old item, it is likely to have a different price basis. Alternatively, assumptions may be made so that the price change of the old item (had it continued to exist) follows those of other items, keeping to the matched universe. In this second case, an implicit adjustment is being made for quality changes, so that the difference in price changes for the group and the old item (had it continued to exist) is equivalent to their quality differences.^{3} What is stressed here is that the problem of missing items is the problem of adjusting prices for quality differences.

**21.5** Three practical problems emerge. First is the problem of explicit quality adjustment between a replacement and old item. The item is no longer produced, a replacement is found that is not strictly comparable in quality, the differences in quality are identified, and a price has to be put on these differences if the series of prices for the new replacement item are to be used to continue those of the old series.

**21.6** Second, in markets where the turnover of items is high, the sample space selected from the matched universe is going to become increasingly unrepresentative of the dynamic universe, as argued in detail in Chapter 8. Even the replacement universe may be inappropriate, as it will be made of series carrying with them quality adjustments in each period whose overall accuracy, given the rapidly changing technology, may be tenuous. In such cases, it may be that prices are no longer collected from a matched sample but from a sample of the main items available in each period, even though they are of a different quality. A comparison between the average prices of such items would be biased if, say, the quality of the items was improving. The need for, and details of, mechanisms to remove the effects of such changes from the average price comparisons were discussed in some detail in Chapter 7, Section G.

**21.7** Finally, there is the problem of new and disappearing goods and services—when the new item is not a variant of the old but provides a completely new service. It is not possible to use it as a replacement for an old item by adjusting a price for the quality differential because what it provides is, by definition, something new.

**21.8** There are a number of approaches to quality adjustment, and these are considered in Chapter 7. One of the approaches is to make explicit adjustments to prices for the quality difference between the old and replacement item using the coefficients from hedonic regression equations. *Hedonic regressions* are regressions of the prices of individual models of a product on their characteristics—for example, the prices of television sets on screen size, stereo sound, and text retrieval. The coefficients on such variables provide estimates of the monetary values of different quantifiable characteristics of the product. They can be used to adjust the price of a noncomparable replacement item for quality differences compared with the old item—for example, the replacement television set may have text-retrieval facilities that the previous version did not. Yet, it is important that a clear understanding exists of the meaning of such estimated coefficients if they are to be used for quality adjustment, especially given that their use is being promoted.^{4} To understand what these estimated parameters mean, it is first necessary to conceive of products as aggregates of their characteristics because, unlike items, characteristics have no separate prices attached to them. The price of the item is the price of a “tied” bundle of characteristics. One must also consider what determines the prices of these characteristics. Economic theory points toward examining demand and supply factors (Sections B.2 and B.3) and the interaction of the two to determine an equilibrium price (Section B.4). Having developed the analytical framework for such prices, it is then necessary to see what interpretation the economic theoretic framework allows us to put on these calculated coefficients (Section B.5). It will be seen that unless there is uniformity of buyers “tastes or suppliers” technologies, an identification problem prevents an unambiguous supply or demand interpretation. Borrowing a framework by Diewert (2002d), a demandside interpretation that assumes firms are competitive price takers is provided, which, under this user-value approach, shows the assumptions required to generate such meaningful coefficients (Section B.6). Yet, all the aforementioned analysis assumes competitive behavior, an assumption relaxed in Section B.7.

**21.9**Chapter 7, Section G, recommends two main approaches for handling industries with rapid turnover of items. If the sample in period 0 is soon outdated, the matched universe and even replacement is increasingly unrepresentative of the double universe, and repeated sampling from the double universe is required. In this case, either chained indices are advised in Chapter 7, Section G.3, or one of a number of *hedonic indices*, described in Chapter 7, Section G.2. Such indices differ from the use of hedonic regression for adjusting prices for quality differences for a missing item. These indices use hedonic regressions, say, by including a dummy variable for time on the right-hand side of the equation to estimate the quality-adjusted price change, as outlined below in Section C and in Chapter 7, Section G.2. They build on the theory outlined in Chapter 17 and Chapter 8, Section B. The economic theory of output price indices outlined in Chapter 21 is developed to include those tied bundles of a good that can be defined in terms of their characteristics as an item in the revenue function. *Theoretical output price indices* are defined that include changes in the prices of characteristics. Yet, as with the output price indices for goods considered in Chapter 17, there are many formulations that hedonic indices can take, and analogous issues and formulas arise here when discussing alternative approaches in Sections C.3–C.6.

**21.10** The estimation of hedonic regressions and the testing of their statistical properties are facilitated by the availability of user-friendly, yet powerful, statistical and econometric software. There are many standard issues in the estimation of regression equations, which can be examined by the diagnostics tests available in such software, as discussed in Kennedy (2003) and Maddala (1988). However, there are issues on functional form, the use of weighted least-squares estimators, and specifications that are quite specific to the estimation of hedonic equations. While many of these are taken up in Chapter 7, where an illustration is provided, Appendix 21.1 considers some of the theoretical issues. See also Gordon (1990), Griliches (1990), and Triplett (1990).

**21.11** Finally, in Section D, economic theory will be used to advise on the problem of new and disappearing goods and services. This problem arises where differences between existing goods and services and the new goods and services are substantive and cannot be meaningfully compared with an old item, even with a quality adjustment. The economic theory of reservation prices will be considered and some issues about its practical implementation expressed.

## B. Hedonic Prices and Implicit Markets

### B.1 Items as tied bundles of characteristics

**21.12** A *hedonic regression* is a regression equation that relates the prices of items, *p*, to the quantities of characteristics, given by the vector *z* = *(z ^{1}, z^{2}, ...., zn)*, that is,

where the items are defined in terms of varying amounts of their characteristics. In practice, what will be observed for each item or variant of the commodity is its price, a set of its characteristics, and possibly the quantity and, thus, value sold. Empirical work in this area has been concerned with two issues: estimating how the price of an item changes as a result of unit changes in each characteristic—that is, the estimated coefficients of equation (21.1)—and estimating the demand and supply functions for each characteristic. The depiction of an item as a basket of characteristics, each characteristic having its own implicit (shadow) price, requires in turn the specification of a market for such characteristics, since prices result from the workings of markets. Houthakker (1952), Becker (1965), Lancaster (1966), and Muth (1966) have identified the demand for items in terms of their characteristics. The sale of an item is the sale of a tied bundle of characteristics to consumers, whose economic behavior in choosing between items is depicted as one of choosing between bundles of characteristics.^{5} However, Rosen (1974) further developed the analysis by providing a structural market framework in terms of both producers and consumers. There are two sides: demand and supply. How much of each characteristic is supplied and consumed is determined by the interaction of the demand for characteristics by consumers and the supply of characteristics by producers. These are considered in turn.

### B.2 Consumer or demand side

**21.13**Figure 21.1, adapted from Triplett (1987, p. 634), presents a simplified version of the characteristic space between two characteristics. The hedonic surfaces *p _{1}* and

*p*in that figure trace out all the combinations of the two characteristics

_{2}*z*and

_{1}*z*that can be purchased at prices

_{2}*p*and

_{1}*p*. An indifference curve

_{2}*q*maps the combinations of

_{j}**z*and

_{1}*z*that the consumer is indifferent against purchasing; that is, the consumer will derive the same utility from any point on the curve. The tangency of

_{2}*q*with

_{j}**p*at

_{1}*A*is the solution to the utility maximization problem for a given budget (price

*p*) and tastes (reflected in

_{1}*q*).

_{j}*Figure 21.1. Consumption and Production Decisions for Combinations of Characteristics

**21.14** The slope of the hedonic surface is the marginal cost of acquiring the combination of characteristics, and the slope of the utility function is the marginal utility gained from their purchase. The tangency at *A* is the utility-maximizing combination of characteristics to be purchased at that price. If consumers purchased any other combination of characteristics in the space of Figure 21.1, it would either cost them more to do so or lead to a lower level of utility. Position *A’*, for example, has more of both *z _{1}* and

*z*, and the consumer receives a higher level of utility being on

_{2}*q*, but the consumer also has to have a higher budget and pays

_{j}*p*for being there. Note that the hedonic surface depicted here is nonlinear, so that relative characteristic prices are not fixed. The consumer with tastes

_{2}*q** chooses characteristic set

_{k}*B*at

*p*. Thus, the data observed in the market depend on the set of tastes. Triplett (2002) has argued that if tastes were all the same, then only one model of a personal computer would be purchased. But in the real world more than one model does exist, reflecting heterogeneous tastes and income levels. Rosen (1974) shows that of all the characteristic combinations and prices at which they may be offered, the hedonic surface traces out an envelope

_{1}^{6}of tangencies including

*q*and

_{j}**q*on

_{k}**p*in Figure 21.1. This envelope is simply a description of the locus of the points chosen. Since rational consumers who optimize are assumed, these are the points that will be observed in the market and are thus used to estimate the hedonic regression. Note further that points

_{1}*A*and

*B*alone will not allow the regression to determine the price of

*z*relative to

_{1}*z*, since the observed data will be two combinations of outputs at the same price. However, the locus of points on an expansion path

_{2}*A A’*would allow this to be determined. There may be expansion paths for consumers with different tastes, such as

*B*, and this may give rise to conflicting valuations, so that the overall parameter estimates determined by the regression from transactions observed in the market are an amalgam of such data. And this would just be a reflection of the reality of economic life. What arises from this exposition is the fact that the form of the hedonic function is determined in part by the distribution of buyers and their tastes in the market.

**21.15**The exposition is now formalized to include parameters for tastes and a numeraire commodity^{7} against which combinations of other aggregates are selected following Rosen (1974). The hedonic function *p(z)* describes variation in the market price of the items in terms of their characteristics. The consumer purchase decision is assumed to be based on utility maximization behavior, the utility function being given by *U(z, x;α)*, where *x* is a numeraire commodity, the maximization of utility being subject to a budget constraint given by income *y* measured as *y = x + p(z)* (the amount spent on the numeraire commodity and the hedonic commodities), and α is a vector of the features of the individual consumer that describe their tastes. Consumers maximize their utility by selecting a combination of quantities of *x* and characteristics *z* subject to a budget constraint. The market is assumed to be competitive, and consumers are described as price takers; they purchase only the one item, so their purchase decision does not influence the market price. The price they pay for a combination of characteristics, vector *z*, is given by *p(z)*. Since they are optimizing consumers, the combination chosen is such that

where ∂*p(z) / ∂z _{i}* is the first derivative of the hedonic function in equation (21.1) with respect to each

*z*characteristic. The coefficients of the hedonic function are equal to their shadow price

*p*, which measures the utility derived from that characteristic relative to the numeraire good for given budgets and tastes.

_{i}**21.16** A *value function* θ can be defined as the value of expenditure a consumer with tastes α is willing to pay for alternative values of *z* at a given utility *u* and income *y*, represented by θ(*z;u, y*,α). It defines a family of indifference curves relating the *z _{i}* to foregone

*x*, money. For individual characteristics

*z*, θ is the marginal rate of substitution between

_{i}*z*and money, or the implicit marginal valuation the consumer with tastes α puts on

_{i}*z*at a given utility level and income. It is an indication of the reservation demand price

_{i}^{8}for additional units of

*z*.

_{i}^{9}The price in the market is

*p(z)*, and utility is maximized when θ(

*z;u, y*,α) =

*p(z)*; that is, the purchase takes place where the surface of the indifference curve θ is tangent to the hedonic price surface. If different buyers have different value functions (tastes), some will buy more of a characteristic than others for a given price function, as illustrated in Figure 21.1.

**21.17** The joint distribution function of tastes and income sets out a family of value functions, each of which, when tangential to the price function, depicts a purchase and simultaneously defines the price function whose envelope is the market hedonic price function. The points of purchase traced out by the hedonic function thus depend on the budget of the individual and the tastes of the individual consumer purchasing an individual set of characteristics. If demand functions are to be traced out, the joint probability distribution of consumers with particular budgets and tastes occurring in the market needs to be specified, that is, *F(y*, α). This function, along with equation (21.1), allows the demand equations to be represented for each characteristic.

### B.3 Producer or supply side

**21.18** Referring again to Figure 21.1, it also shows the production side. In Chapter 17, Section B.1, a revenue-maximizing producer was considered whose revenue maximization problem was given by equation (17.1);^{10}

where *R(p, v)* is the maximum value of output, *p* and given that the vector of inputs *v* is available for use, using the period *t* technology. Figure 17.1 illustrated in goods-space how the producer would choose between different combinations of outputs, *q _{1}* and

*q*. In Figure 21.1, the characteristics-space problem is analogous to the goods-space one with producers choosing here between combinations of

_{2}*z*and

_{1}*z*to produce for a particular level of technology and inputs

_{2}*S(v)*. For a particular producer with level of inputs and technology

*S**facing a price surface

_{G}*p*, the optimal production combination is at

_{1}*A*. However, a different producer with technology and inputs

*S**facing a price surface

_{H}*p*would produce at

_{1}*B*. At these points, the marginal cost of

*z*with respect to

_{1}*z*is equal to its marginal price from the hedonic surface as depicted by the tangency of the point. Production under these circumstances at any other combination would not be optimal. The envelope of tangencies such as

_{2}*S**and

_{G}*S**trace out the production decisions that would be observed in the market from optimizing, price-taking producers and are used as data for estimating the hedonic regressions. The hedonic function can be seen to be determined, in part, by the distribution of technologies of producers, including their output scale.

_{H}**21.19**Rosen (1974) formalizes the producer side, whereby price-taking producers are assumed to have cost functions described by *C(M, z; τ)*,^{11} where *Q =, Q(z)* is the output scale-number of units produced by an establishment offering specifications of an item with characteristics *z*. They have to decide which items to produce, that is, which package of *z*. To do this, a cost minimization problem is solved that requires τ, equivalent to *S(v)* above, a vector of the technology of each producer that describes the output combinations each producer can produce with given input costs using its factors of production and the factor prices. It is the variation in τ across producers that distinguishes producer A’s decision about which combination of *z* to produce from that of producer *B* in Figure 21.1 Producers are optimizers who seek to maximize profits given by

by selecting *Q* and *z* optimally. The supplying market is assumed to be competitive, and producers are price takers, so the producers cannot influence price by their production decision. Their decision about how much to produce of each *z* is determined by the price of *z*, assuming that the producer can vary *Q* and *z* in the short run.^{12} Dividing equation (21.4) by *Q* and setting it equal to zero, the first-order profit-maximizing conditions are given by

where *p* = *p(z _{1}, z_{2},...., z_{n})* from equation (21.1).

**21.20** The *marginal unit revenue* from producing characteristic *z _{i}* is given by its shadow price in the price function and its marginal cost of production. In the producer case, the probability distribution of the technologies of firms,

*G (τ)*, is necessary if the overall quantity supplied of items with given characteristic sets are to be revealed. Since it is a profit maximization problem to select the optimal combination of characteristics to produce, marginal revenue from the additional attributes must equal their marginal cost of production per unit sold. Quantities are produced up to the point where unit revenues

*p(z)*equal marginal production costs, evaluated at the optimum bundle of characteristics supplied.

**21.21** While for consumers a *value function* was considered, producers require an *offer function φ(z;π,τ)*. The offer price is the price the seller is willing to accept for various designs at constant profit level π, when quantities produced are optimally chosen, while *p(z)* is the maximum price obtainable from those models in the market. Producer equilibrium is characterized by a tangency between a profit characteristics indifference surface and the market characteristics price surface, where *p(z _{i}) = φ_{zi}(z;π,τ)* and

*p(z) = φ*. Since there is a distribution of technologies

_{z}(z;π,τ)*G(τ)*, the producer equilibrium is characterized by a family of offer functions that envelop the market hedonic price function. The varying τ will depend on different factor prices for items produced in different countries, multiproduct firms with economies of scale, and differences in the technology, whether the quality of capital, labor, or intermediate inputs and their organization. Different values of τ will define a family of production surfaces.

### B.4 Equilibrium

**21.22** The theoretical framework first defined each item as a point on a plane of several dimensions made up by the *z _{1}, z_{2},...., z_{n}* quality characteristics; each item was a combination of values

*z*. If only two characteristics defined the item, then each point in the positive space of Figure 21.1 would define an item. The characteristics were not bought individually but as bundles of characteristics tied together to make up an item. It was assumed that the markets were differentiated so that there was a wide range of choices to be made.

_{1}, z_{2},...., z_{n.}^{13}The market was also assumed to be perfectly competitive with consumers and producers as price takers undertaking optimizing behavior to decide which items (tied sets of characteristics) to buy and sell. Competitive markets in characteristics and optimizing behavior are assumed so that the quantity demanded of characteristics

*z*must equal the quantity supplied. It has been shown that consumers’ and producers’ choices or “locations” on the plane will be dictated by consumer tastes and producer technology. Tauchen and Witte (2001, p. 4) show that the hedonic price function will differ across markets in accordance with the means and variances (and in some cases also higher moments) of the distributions of household and firm characteristics.

**21.23**Rosen (1974, p. 44) notes that a buyer and seller are perfectly matched when their respective value and offer functions are tangential. The common gradient at that point is given by the gradient of the market-clearing implicit price function *p(z)*. The consumption and production decisions were seen in the value and offer functions to be jointly determined, for given *p(z)*, by *F(y, α)* and *G(τ)*. In competitive markets there is a simultaneity in the determination of the hedonic equation, since the distribution of *F(y, α)* and *G(τ)* help determined the quantities demanded and supplied and also the slope of the function. Although the decisions made by consumers and producers are as price takers, the prices taken are those from the hedonic function. There is a sense in which the hedonic function and its shadow prices emerge from the operations of the market. The product markets implicitly reveal the hedonic function. Since consumers and producers are optimizers in competitive markets, the hedonic function, in principle, gives the minimum price of any bundle of characteristics. Given all of this, Rosen (1974, p. 44) asked: what do hedonic prices mean?

### B.5 What do hedonic prices mean?

**21.24** It would be convenient if, for PPI construction, the estimated coefficients from hedonic regressions were estimates of the marginal production cost or producer value of a characteristic or, for CPI construction, they were estimates of the marginal utility from a characteristic or user value. But theory tells us that this is not the case and that the interpretation is not clear.

**21.25** There was an erroneous perception in the 1960s that the coefficients from hedonic methods represented user values as opposed to resource costs. Rosen (1974), as has been shown, found that hedonic coefficients generally reflect both user values and resource costs; both supply and demand situations. The ratios of these coefficients may reflect consumers’ marginal rates of substitution or producers’ marginal rates of substitution (transformation) for characteristics. There is what is referred to in econometrics as an “identification” problem in which the observed prices and quantities are jointly determined by supply and demand considerations, and their underlying effects cannot be separated. The data collected on prices jointly arise from variations in demand by different consumers with different tastes and preferences, and from variations in supply by producers with different technologies.

**21.26** First, it is necessary to come to terms with this simultaneity problem. Hedonic regressions are an increasingly important analytical tool, one implicitly promoted by the attention given to it in this *Manual* but also promoted in separate manuals by organizations such as the OECD (see Triplett, 2002) and Eurostat (2001), and widely used by the U.S. Bureau of Labor Statistics (Kokoski, Waehrer, and Rozaklis, 2001, and Moulton, 2001b). So how do economists writing on the subject shrug their intellectual shoulders in light of these findings?

**21.27**Rosen (1974, p. 43) refers to the hedonic function as “…a joint envelope of a family of value functions and another family of offer functions. An envelope function by itself reveals nothing about the underlying members that generate it; and they in turn constitute the generating structure of the observations.”

**21.28** Griliches (1988, p. 120) notes the following:

My own view is that what the hedonic approach tries to do is to estimate aspects of the budget constraint facing consumers, allowing thereby the estimation of “missing” prices when quality changes. It is not in the business of estimating utility functions

*per se*, though it can also be useful for these purposes....what is being estimated is the actual locus of intersection of the demand curves of different consumers with varying tastes and the supply curves of different producers with possible varying technologies of production. One is unlikely, therefore to be able to recover the underlying utility and cost functions from such data alone, except in very special circumstances.

**21.29** Triplett (1987) states, It is well established—but still not widely understood—that the form of *h(·)* [the hedonic function] cannot be derived from the form of *Q(·)* and *t(·)* [utility and production functions], nor does *h(·)* represent a “reduced form” of supply and demand functions derived from *Q(·)* and *(·)*.

**21.30** Diewert (2003, p. 320), with his focus on the consumer side, says,

Thus, I am following Muellbauer’s (1974, p. 977) example where he says that his “approach is unashamedly one-sided; only the demand side is treated…Its subject matter is therefore rather different from that of the recent paper by Sherwin Rosen. The supply side and simultaneity problems which may arise are ignored.”

Diewert (2003) has also considered the theoretical PPI indices with a focus on the producer side. He bases the optimizing problem the establishments face when deciding on which combinations of characteristics to produce, however, on the consumer’s valuations, giving them precedence. There are many industries in which firms are effective price takers, and the prices taken are dictated by the consumer side rather than by cost and technological considerations. In Section B.6 this framework is outlined, which allows a more straightforward development of the theory of hedonic index numbers for PPIs.

**21.31** Second, the theoretical framework allows the conditions to be considered under which the hedonic coefficients are determined by only demand side or supply side factors—the circumstances under which clear explanations would be valid. The problem is that because the coefficients of a hedonic function are the outcome of the interaction of consumer- and producer-optimizing conditions, it is not possible to interpret the function only in terms of, say, producer marginal costs or consumer marginal values. However, suppose the *production technology* τ was the same for each producing establishment. Buyers differ but sellers are identical. Then, instead of a confusing family of offer functions, there is a unique offer function with the hedonic function describing the prices of characteristics the firm will supply with the given ruling technology to the current mixture of tastes. The offer function becomes *p(z)*, since there is no distribution of τ to confuse it. There are different tastes on the consumer side, and so what appears in the market is the result of firms trying to satisfy consumer preferences all for a constant technology and profit level; the structure of supply is revealed by the hedonic price function. In Figure 21.1 only the expansion path traced out by, say, *S _{H}** akin to

*A A’*, would be revealed. Now, suppose sellers differ, but

*buyers’ tastes*τ are identical. Here the family of

*value functions*collapses to be revealed as the hedonic function

*p(z)*, which identifies the structure of demand, such as

*A A’*in Figure 21.1.

^{14}Section B.6 uses Diewert’s (2003) approach in following a representative consumer, rather than consumers with different tastes, so that the demand side alone can be identified. Triplett (1987, p. 632) notes that of these possibilities, uniformity of technologies is the most likely, especially when access to technology is unrestricted in the long run, while uniformity of tastes is unlikely. There may be, of course, segmented markets where tastes are more uniform to which specific sets of items are tailored and for which hedonic equations can be estimated for individual segments.

^{15}In some industries there may be a prior expectation of uniformity of tastes against uniformity of technologies and interpretation of coefficients will accordingly follow. In many cases, however, the interpretation may be more problematic.

**21.32** Third, issues relating to the estimation of the underlying supply and demand functions for characteristics have implications for the estimation of hedonic functions. In Appendix 21.1, identification and estimation issues will be considered in this light. Finally, the subsequent concern with new products in Section D of this chapter refers to demand functions. However, attention is now turned to hedonic *indices*. In the next section, these are noted to have a quite different application than that for the quality adjustment of noncomparable replacement items.

### B.6 An alternative hedonic theoretical formulation

**21.33** This section is based on a formulation by Diewert (2002d). It assumes competitive pricetaking behavior on the part of firms. In this approach, the user’s valuations of the various models that could be produced flow to producers via the hedonic function in the same way that output prices are taken, as given in the usual theory of the output price index. It is necessary to set up the establishments revenue maximization problem assuming that it produces a single output, but in each period, the establishment has a choice of which type of model it could produce. Let the model be identified by a *K* dimensional vector of characteristics, *z ≡ [z _{1},…, z_{K}]*. Before tackling the establishments revenue maximization problem, it is necessary to characterize the set of output prices that the establishment faces in period

*t*as a function of the characteristics of the model that the establishment might produce. It is assumed that in period

*t*, the demanders of the output of the establishment have a cardinal utility function,

*f*, that enables each demander to determine that the value of a model with the vector of characteristics

^{t}(z)*z*compared with a model with characteristics vector

^{1}≡ [z_{1}^{1},…, z_{K}^{1}]*z*is

^{2}≡ [z_{1}^{2},…, z_{K}^{2}]*f*. Thus, in period

^{t}(z_{1}) / f^{t}(z^{2})*t*, demanders are willing to pay the amount of money

*P*for a model with the vector of characteristics

^{t}(z)*z*, where

The scalar *ρ ^{t}* is inserted into the willingness-to-pay function because, under certain restrictions,

*ρ*can be interpreted as a period

^{t}*t*price for the entire family of hedonic models that might be produced in period

*t*. These restrictions are

that is, the *model relative utility functions f ^{t}* are identical for the two periods under consideration. We will make use of the specific assumption in equation (21.7) later.

**21.34** In what follows, it is assumed that econometric estimates for the period 0 and 1 *hedonic model price functions*, Π^{0} and Π^{1}, are available, although we will also consider the case where only an estimate for Π^{0} is available.^{16} Now, consider an establishment that produces a single model in each period in the marketplace that is characterized by the hedonic model price functions, Π* ^{t}(z)*, for periods

*t*= 0,1. Suppose that in period

*t*, the establishment has the

*production function F*, where

^{t}is the number of models, each with vector of characteristics *z*, that can be produced if the vector of inputs *v* is available for use by the establishment in period *t*. As is usual in the economic approach to index numbers, we assume a competitive model, where each establishment takes output prices as fixed parameters beyond its control. In this case, there is an entire schedule of model prices that the establishment takes as given instead of just a single price in each period. Thus, it is assumed that if the establishment decides to produce a model with the vector of characteristics *z*, then it can sell any number of units of this model in period *t* at the price Π* ^{t}*(z)≡ ρf

*. Note that the establishment is allowed to choose which model type to produce in each period.*

^{t}(z)**21.35** Now, define the establishment’s *revenue function, R*, assuming the establishment is facing the period s hedonic price function Π* ^{s}* ≡

*ρ*and is using the vector of inputs v and has access to the period t production function

^{s}f^{s}*F*:

^{t}where *Z ^{t}* is a

*technologically feasible set of model characteristics*that can be produced in period

*t*. The second line follows from the line above by substituting the production-function constraint into the objective function.

**21.36** The actual period *t* revenue maximization problem that the establishment faces is defined by the revenue function in equation (21.9), except that we replace the period s hedonic price function *ρ ^{s}f^{s}* by the period

*t*hedonic price function

*ρ*, and the generic input quantity vector

^{t}f^{t}*v*is replaced by the observed period

*t*input quantity vector used by the establishment,

*v*. Further assume that the establishment produces

^{t}*q*units of a single model with characteristics vector

^{t}*z*and that [

^{t}*q*] solves the period

^{t}, z^{t}*t*revenue maximization problem—that is, [

*q*] is a solution to

^{t}, z^{t}^{17}

where the period *t* establishment output *q ^{t}* is equal to

Now, a family of *Konüs-type hedonic output price indices P* between periods 0 and 1 can be defined as follows:

**21.37** Thus, a particular member of the above family of indices is equal to the establishment’s revenue ratio, where the revenue in the numerator of equation (21.2) uses the hedonic model price function for period 1, and the revenue in the denominator of equation (21.12) uses the hedonic model price function for period 0. For both revenues, however, the technology of period *t* is used (that is, *F ^{t}* and

*Z*are used in both revenue maximization problems), and the same input quantity vector

^{t}*v*is used. This is the usual definition for an economic output price index, except that instead of a single price facing the producer in each period, we have a whole family of model prices facing the establishment in each period. Note that the only variables that are different in the numerator and denominator of equation (21.12) are the two hedonic model price functions facing the establishment in periods 0 and 1.

**21.38** The right-hand side of equation (21.12) looks a bit complex. However, if the assumption in equation (21.7) holds (that is, the period 0 and 1 hedonic model price functions are identical except for the multiplicative scalars ρ^{0} and ρ^{1}), then equation (21.12) reduces to the very simple ratio, ρ^{1} / ρ^{0}. To see this, use equation (21.12) and (21.10) as follows:

using equation (21.7)

assuming ρ^{0} and ρ^{1} are positive and canceling terms

= ρ^{1} / ρ^{0}.

This is a very useful result, since many hedonic regression models have been successfully estimated using equation (21.7). Under this assumption, *all* the theoretical hedonic establishment output price indices reduce to the observable ratio, ρ^{1} / ρ^{0}.

**21.39** We return to the general case where the assumption in equation (21.7) is not made. As usual, it is always of interest to specialize equation (21.12) to the special cases where the conditioning variables that are held constant in the numerator and denominator of equation (21.12), *F ^{t}, Z^{t}*, and

*v*, are equal to the period 0 and 1 values for these variables, namely,

*F*, and

^{0}, Z^{0}*v*, and

^{0}*F*, and

^{1}, Z^{1}*v*. Thus, define the

^{1}*Laspeyres-type hedonic output price index*between periods 0 and 1 for our establishment as follows:

using equation (21.10) for *t* = 0

using equation (21.9)

since *z ^{0}* is feasible for the maximization problem

using equation (21.11) for *t* = 0

where the *observable hedonic Laspeyres output price index P _{HL}* is defined as

Thus, the inequality in equation (21.14) says that the unobservable theoretical Laspeyres-type hedonic output price index *P(ρ ^{0}f ^{0}, ρ^{1}f^{1}, F^{0}, Z^{0}, v^{0})* is bounded from below by the observable (assuming that we have estimates for ρ

^{0}, ρ

^{1},

*f*, and

^{0}*f*) hedonic Laspeyres output price index

^{1}*P*. The inequality in equation (21.14) is the hedonic counterpart to a standard Laspeyres-type inequality for a theoretical output price index.

_{HL}**21.40** It is of modest interest to rewrite *P _{HL}* in terms of the observable model prices for the establishment in periods 0 and 1. Denote these prices by

*P*and

^{0}*P*, respectively. Using equation (21.6),

^{1}Now, rewriting equation (21.15) as follows

using equation (21.16)

The prices *P ^{1} / f ^{1}(z_{1})* and

*P*) can be interpreted as

^{0}/ f^{1}(z^{0}*quality-adjusted model prices*for the establishment in periods 1 and 0, respectively, using the hedonic regression pertaining to period 1 to do the quality adjustment.

**21.41** In the theoretical hedonic output price index *P(ρ ^{0}f^{0}, ρ^{1}f ^{1}, F^{0}, Z^{0}, v^{0})* defined by equation (21.14) above, we conditioned on

*F*(the baseperiod production function),

^{0}*Z*(the base-period set of models that were technologically feasible in period 0), and

^{0}*v*(the establishment’s base-period input vector). We now define a companion period 1 theoretical hedonic output price that conditions on the period 1 variables,

^{0}*F*. Thus, define the

^{1}, Z^{1}, v^{1}*Paasche-type hedonic output price index*between periods 0 and 1 for an establishment as follows:

^{18}

using equation (21.10) for *t* = 1

using equation (21.9)

since *z ^{1}* is feasible for the maximization problem

using equation (21.11) for *t* = 1

where the *observable hedonic Paasche output price index P _{HP}* is defined as

Thus, the inequality in equation (21.18) says that the unobservable theoretical Paasche-type hedonic output price index *P(ρ ^{0}f^{0}, ρ^{1}f^{1}, F^{1}, Z^{1}, v^{1})* is bounded from above by the observable (assuming that we have estimates for

*ρ*hedonic Paasche output price index

^{0}, ρ^{1}, f^{0}and f^{1})*P*. The inequality in equation (21.18) is the hedonic counterpart to a standard Paasche-type inequality for a theoretical output price index.

_{HP}**21.42** Again, it is of interest to rewrite *P _{HP}* in terms of the observable model prices for the establishment in periods 0 and 1. Rewrite equation (21.19) as follows:

using equation (21.16)

The prices *P ^{1} / f^{0}(z^{1})* and

*P*can be interpreted as

^{0}/ f^{0}(z^{0})*quality-adjusted model prices*for the establishment in periods 1 and 0, respectively, using the hedonic regression pertaining to period 0 to do the quality adjustment.

**21.43** It is possible to adapt a technique originally credited to Konüs (1924) and obtain a theoretical hedonic output price index that lies between the observable Laspeyres and Paasche bounding indices, *P ^{HL}* and

*P*, defined above. Recall the definition of the revenue function,

^{HP}*R(ρ*, defined by equation (21.9) above. Instead of using either

^{s}f^{s}, F^{t}, Z^{t}, v)*F*or

^{0}, Z^{0}, v^{0}*F*as reference production functions, feasible characteristics sets, and input vectors for the establishment in equation (21.12), use a

^{1}, Z^{1}, v^{1}*convex combination*or

*weighted average*of these variables in our definition of a theoretical hedonic output price index. Thus, for each scalar λ between 0 and 1, define the theoretical hedonic output price index between periods 0 and 1,

*P(λ)*, as follows:

When λ = 0, *P(λ)* simplifies to *P(ρ ^{0}f^{0}, ρ^{1}f^{1}, F^{0}, Z^{0}, v^{0})*, the Laspeyres-type hedonic output price index defined by equation (21.14) above. Thus, using the inequality in equation (21.14), we have

where *P _{HL}* is equal to

*ρ*, the observable Laspeyres hedonic output price index defined by equation (21.15) above. When λ = 1,

^{1}f^{1}(z^{0})/ρ^{0}f^{0}(z^{0})*P(λ)*simplifies to

*P(ρ*, the Paasche-type hedonic output price index defined by equation (21.18) above. Thus, using the inequality in equation (21.18), we have

^{0}f^{0}, ρ^{1}f^{1}, F^{0}, Z^{1}, v^{1})where *P _{HP}* is equal to

*ρ*, the observable Paasche hedonic output price index defined by equation (21.20) above.

^{1}f^{1}(z^{1})/ρ^{0}f^{0}(z^{1})**21.44** If *P(λ)* is a continuous function of λ between 0 and 1, then we can adapt the proof of Diewert (1983a, pp. 1060–61), which in turn is based on a technique of proof by Konüs (1924), and show that there exists a λ* such that 0 ≤ λ* ≤ 1, and either

that is, there exists a theoretical hedonic output price index between periods 0 and 1 using a technology that is intermediate to the technology of the establishment between periods 0 and 1, *P(λ*)* that lies *between* the observable^{19} Laspeyres and Paasche hedonic output price indices, *P _{HL}* and

*P*. However, to obtain this result, we need conditions on the hedonic model price functions,

_{HP}*ρ*and

^{0}f^{0}(z)*ρ*, on the production functions,

^{1}f^{1}(z)*F*and

^{0}(z, v)*F*, and on the feasible characteristics sets,

^{1}(z, v)*Z*and

^{0}*Z*, that will ensure that the maximum functions in the numerator and denominator in the last equality of equation (21.21) are continuous in λ. Sufficient conditions to guarantee continuity are

^{1}^{20}

The production functions

*F*and^{0}(z, v)*F*are positive and jointly continuous in^{1}(z, v)*z, v*,The hedonic model price functions

*f*and^{0}(z)*f*are positive and continuous in^{1}(z)*z*,*ρ*and ρ^{0}^{1}are positive, andThe sets of feasible characteristics

*Z*and^{0}*Z*are convex, closed, and bounded.^{1}

**21.45** A theoretical output price index has been defined that is bounded by two observable indices. It is natural to take a symmetric mean of the bounds to obtain a best single number that will approximate the theoretical index. Thus, let *m(a, b)* be a symmetric homogeneous mean of the two positive numbers *a* and *b*. We want to find a best *m(P _{HL}, P_{HP})*. If we want the resulting index,

*m(P*, to satisfy the time reversal test, then we can adapt the argument of Diewert (1997, p. 138) and show that the resulting

_{HL}, P_{Hp})*m(a, b)*must be the geometric mean,

*a*. Thus, a good candidate to best approximate a theoretical hedonic output price index is the following observable

^{1/2}b^{1/2}*Fisher hedonic output price index*:

using equations (21.15) and (21.21)

Note that P_{HF} reduces to *ρ ^{1} /ρ^{0}* if

*f*; that is, if the hedonic model price functions are identical for each of the two periods under consideration, except for the proportional factors,ρ1andρ0.

^{0}= f^{1}**21.46** Instead of using equations (21.15) and (21.17) in the first line of equation (21.7), equations (21.17) and (21.20) can be used. The resulting formula for the Fisher hedonic output price index is

equation (21.26) is preferred. It is the geometric mean of two sets of quality-adjusted model price ratios, using the hedonic regression in each of the two periods to do one of the quality adjustments.

**21.47**The above theory, for the quality adjustment of establishment output prices, is not perfect. It has two weak parts:

Using a convex combination of the two reference period technologies may not appeal to everyone, and

Our technique for converting the bounds to a single number is only one method out of many.

**21.48** The initial Laspeyres-type bounds and Paasche-type bounds formalizes the bounds outlined in Section C.5 below and referred to in Section C.2. The quality adjustments in equation (21.13) and (21.14) will be seen from this approach to be made using the user’s model valuation functions, *f ^{0}(z)* and

*f*. Producers’ costs or production functions enter into the quality adjustment only to determine

^{1}(z)*z*and

^{0}*z*; that is, only to determine which models the establishment will produce. Hence, establishments that have different technologies, primary inputs, or face different input prices will in general choose to produce different models in the same period. The choice problem has been modeled here only facing a single establishment, although the generalization should be straightforward.

^{1}### B.7 Markups and imperfect competition

**21.49**In Section B.5 it was shown there was some ambiguity in the interpretation of hedonic coefficients. A user-value or resource-cost interpretation was possible if there was uniformity in buyers’ tastes or suppliers’ technologies, respectively. In Section B.6 an assumption of price-taking behavior on the part of firms was introduced and a formal setting given to a user-value interpretation, albeit involving some restrictive assumptions. Yet the approaches in Sections B.5 and B.6 both assume perfectly competitive behavior, and the discussion extends now to the effects of markups in imperfect competition. Feenstra (1995) notes that in imperfect competition, when pricing is above marginal cost, the hedonic function should include a term for the price-cost markup.

**21.50** Pakes (2001) has developed the argument focusing on the study of new products as the result of prior investments in product development and marketing. A competitive marginal cost-pricing assumption would require that either (i) products with identical characteristics are developed from such investments, so that the law of one price for these identical products will eliminate any margin, or (ii) all products lose their investment (markup) in the new products. Neither of these is reasonable. Indeed, varying markups are a feature of differentiated products (see Feenstra and Levinsohn, 1995, for example). Pakes (2001) argued that markups should change over time. When new products are introduced, the improvements, and associated markups, are directed to characteristics where markups have previously been high. The markups on existing products with these characteristics will fall, and hedonic coefficients will thus change over time. Pakes (2001) also argued that there may be an ambiguity as to the signs of the coefficients—that there is no economic reason to expect a positive relationship between price and a desirable characteristic. Such a conclusion would be at odds with a resource-cost or user-value approach. If the characteristics being compared are *vertical*—that is, they are characteristics of which everyone would like more—then we can expect the sign to be positive. However, Pakes (2001) has argued that the sign on *horizontal* characteristics—that is, for which the ordering of the desirable amounts of characteristics is not the same for all consumers—can be negative. The entry of new products aimed at some segments of the market may drive down the markup on products with more desirable attributes. For example, some consumers may have a preference for television sets with smaller screen sizes and be willing to pay a premium price. Indeed, the required technology for the production of these sets may have required increased investment and, thus, increased expected markups. It may be that the quality of the picture on these sets is such that it drives down the price of large-sized sets, resulting in an inverse relationship between price and screen size, where the latter is taken as one variable over the full range of screen sizes. Prior (to the modeling) information on the two markets would allow the regression equation to be appropriately specified, with dummy slope and intercepts for the ranges of screen sizes with new and old technologies.

**21.51** Pakes (2001) takes the view that no meaning can be attributed to estimated coefficients and predicted values should be used for price comparisons of models of different quality attributes, rather than the individual coefficients. There are many good reasons for this, as discussed in Chapter 7, Section E.4.3 and Section G.2.2, and Appendix 21.1. Yet, it must be stressed that for vertical characteristics the coefficients may be quite meaningful, and even for horizontal characteristics or new characteristics, embodied with the latest research and development, some sense can be made by recourse to the above considerations. But again, theory does not support any easy answer to the interpretation of the coefficients from hedonic regressions. Their grace is that they emanate from market data, from the often complex interaction of demand and supply and strategic pricing decisions. That theory warns us not to give simplistic interpretations to such coefficients, and allows an understanding of the factors underlying them, is a strength of theory. Yet hedonic regression coefficients remain and are generally regarded (Shultze and Mackie, 2002) as the most promising objective basis for estimating the marginal value of quality dimensions of products, even though a purist interpretation is beyond their capability.^{21}

## C. Hedonic Indices

### C.1 The need for such indices

**21.52** In Section A it was noted that hedonic functions are required for two purposes with regard to a quality adjustment. The first is when an item is no longer produced and the replacement item, whose price is used to continue the series, is of a different quality from the original price basis. The differences in quality can be established in terms of different values of a subset of the *z* price-determining variables. The coefficients from the hedonic regressions, as estimates of the monetary value of additional units of each quality component *z*, can then be used to adjust the price of the old item so that it is comparable with the price of the new^{22}—so that, again, like is compared with like. This process could be described as “patching,” in that an adjustment is needed to the price of the old (or new replacement) series for the quality differences, to enable the new series to be patched onto the old. A second use of hedonic functions referred to in Section A is for estimating *hedonic indices*. These are suitable when the pace and scale of replacements of items is substantial and an extensive use of patching might (i) lead to extensive errors if there were some error or bias in the quality adjustment process and (ii) lead to sampling from a biased replacement universe as outlined in Section A. Hedonic indices use data in each period from a sample of items that should include those with a substantial share of sales revenue—sampling in each period from the double universe. There is no need to establish a price basis and for respondents to keep quoting prices from that basis. What is required are samples of items to be redrawn in each month along with information on their prices, characteristics *z _{i}*, and, possibly, quantities or values. The identification of multiple characteristics in the hedonic regressions controls for quality differences, as opposed to the matching of price quotes on the same price basis by the respondents. A number of procedures for estimating hedonic indices are briefly considered below.

### C.2 Theoretical characteristics of price indices

**21.53** In Chapter 17 theoretical output price indices were defined and practical index number formulas considered as estimates of these indices. Theoretical output index numbers are defined here not just on the goods produced, but also on their characteristics. *R(p, S(v))* was defined in Chapter 17 as the maximum value of output that the establishment can produce, given that it faces the vector of output prices *p* and given that the vector of inputs *v* (using technology *S*) is available for use. The establishment’s *output price index P* between any two periods, say, period 0 and period 1, was defined as

where *p ^{0}* and

*p*are the vectors of output prices that the establishment faces in periods 0 and 1, respectively, and

^{1}*S(v)*is a reference vector of technology using

*v*intermediate and primary inputs.

^{23}For theoretical indices in characteristic space, the revenue functions are

*also*defined over goods made up of bundles of characteristics represented by the hedonic function

^{24}

**21.54** The output price index defined by equation (21.28) is a ratio of hypothetical revenues that the establishment could realize, with a given technology and vector of inputs *v* to work with. Equation (21.28) incorporates substitution effects: if the prices of some characteristics increase more than others, then the revenue-maximizing establishment can switch its output mix of characteristics in favor of such characteristics. The numerator in equation (21.28) is the maximum revenue that the establishment could attain if it faced the output prices and implicit hedonic shadow prices of period 1, *p ^{1}* and

*p(z*, while the denominator in equation (21.28) is the maximum revenue that the establishment could attain if it faced the output and characteristic’s prices of period 0,

^{1})*p*and

^{0}*p(z*. Note that all the variables in the numerator and denominator functions are exactly the same, except that the output price and characteristics price vectors differ. This is a defining characteristic of an output price index: the technology and inputs are held constant. As with the economic indices in Chapter 15, there is an entire

^{0})*family*of indices depending on which reference technology and reference input vector

*v*that is chosen. In Section C.5 some explicit formulations will be considered, including a base-period 0 reference technology and inputs and a current-period 1 reference technology and inputs analogous to the derivation of Laspeyres and Paasche in Chapter 17, Section B.1. Before considering such hedonic indices in Section C.5, two simpler formulations are first considered in Sections C.3 and C.4: hedonic regressions using dummy variables on time and period-on-period hedonic indices. They are simpler and widely used because they require no information on quantities or weights. Yet, their interpretation from economic theory is therefore more limited. However, as will be shown, weighted formulations are possible using a WLS estimator, although they are first considered in their unweighted form.

### C.3 Hedonic regressions and dummy variables on time

**21.55** Let there be *K* characteristics of a product, and let model or item *i* of the product in period *t* have the vector of characteristics *z _{i}^{t} ≡ [z_{i1}^{t},..., z_{ik}^{t}]* for

*i*= 1,....,

*K*and

*t*= 1,…,

*T*. Denote the price of model

*i*in period

*t*by

*p*. A hedonic regression of the price of model

_{i}^{t}*i*in period

*t*on its characteristics set

*z*is given by

^{i}^{t}where *D _{t}* are dummy variables for the time periods,

*D*being 1 in period

^{2}*t*= 2, zero otherwise;

*D*being 1 in period

^{3}*t*= 3, zero otherwise, and so on. The coefficients

*γ*are estimates of quality-adjusted price changes, having controlled for the effects of variation in quality (

_{t}**21.56** The above approach uses the dummy variables on time to compare prices in period 1 with prices in each subsequent period. In doing so, the γ parameters are constrained to be constant over the period *t* = 1,…,*T*. Such an approach is fine retrospectively, but in real time the index may be estimated as a fixed-base or chained-base formulation. The *fixed-base* formulation would estimate the index for period 1 and 2, *I _{1,2}*, using equation (21.29) for

*t*= 1, 2; the index for period 3,

*I*, would use equation (21.29) for

_{1,3}*t*= 1, 3; for period 4,

*I*, using equation (21.29) for

_{1,4}*t*= 1, 4; and so forth. In each case the index constrains the parameters to be the same over the current and base period. A fixedbase, bilateral comparison using equation (21.29) makes use of the constrained parameter estimates over the two periods of the price comparison. A

*chained*formulation would estimate

*I*, for example, as the product of a series of links:

_{1,4}*I*.

_{1,4}= I_{1,2}× I_{2,3}× I_{3,4}^{25}Each successive binary comparison or link is combined by successive multiplication. The index for each link is estimated using equation (21.24). Because the periods of time being compared are close, it is generally more likely that the constraining of parameters required by chainedtime dummy hedonic indices is considered to be less severe than that required of their fixed-base counterparts.

**21.57** There is no explicit weighting in these formulations, and this is a serious disadvantage. In practice, cutoff sampling might be employed to include only the most important items. If sales data are available, a WLS (weighted by sales quantities—see Appendix 21.1) estimator instead of an OLS estimator should be used.^{26}

### C.4 Period-on-period hedonic indices

**21.58** An alternative approach to comparing period 1 and *t* is to estimate a hedonic regression for period *t* and insert the values of the characteristicsof each model existing in period 1 into the period *t* regression to predict, for each item, its price, *i* = 1,...,*N*. These prices (or an average) can be compared with (the average of) the actual prices of models *i* = 1,.....*N* models in period 1. The averages may be arithmetic, as in a Dutot index, or geometric, as in a Jevons index. The arithmetic formulation is defined as follows:

**21.59** Alternatively, the characteristics of models existing in period *t* can be inserted into a regression for period 1. Predicted prices of period *t* items generated at period 1 shadow prices (or an average) can be compared with (the average of) the actual prices in period *t*:

**21.60** For a fixed-base bilateral comparison using either equation (21.30a) or (21.30b), the hedonic equation need be estimated only for one period. The denominator in equation (21.30a) is the average observed price in period 1, which should be equal to the average price a hedonic regression based on period 1 data will predict using period 1 characteristics. The numerator, however, requires an estimated hedonic regression to predict period 1 characteristics at period *t* hedonic prices. Similarly, in equation (21.30b), a hedonic regression is required only for the denominator. For reasons analogous to those explained in Chapters 15, 16, and 17, a symmetric average of these indices should have some theoretical support.

**21.61** Note that all the indices described in Sections C.1 and C.2 use all the data available in each period. If there is a new item, for example, in period 4, it is included in the data set and its quality differences controlled for by the regression. Similarly, if old items drop out, they are still included in the indices in the periods in which they exist. This is part of the natural estimation procedure, unlike using matched data and hedonic adjustments on noncomparable replacements when items are no longer produced.

**21.62** As with the dummy variable approach, there is no need for matched data. Yet there is also no explicit weighting in these formulations and this is a serious disadvantage. Were data on quantities or values available, it is immediately apparent that such weights could be attached to the individual *i* = 1,....*N* prices or their estimates. This is considered in the next section.

### C.5 Superlative and exact hedonic indices

**21.63** In Chapter 17 Laspeyres and Paasche bounds were defined on a theoretical basis, as were superlative indices, which treat both periods symmetrically. These superlative formulas, in particular the Fisher index, were also seen in Chapter 16 to have desirable axiomatic properties. Furthermore, the Fisher index was supported from economic theory as a symmetric average of the Laspeyres and Paasche bounds and was found to be the most suitable such average of the two on axiomatic grounds. The Törnqvist index seemed to be best from the stochastic viewpoint and also did not require strong assumptions for its derivation from the economic approach as a superlative index. The Laspeyres and Paasche indices were found to correspond to (be *exact* for) underlying (Leontief) aggregator functions with no substitution possibilities, while superlative indices were exact for flexible functional forms including the quadratic and translog forms for the Fisher and Törnqvist indices, respectively. If data on prices, characteristics, *and quantities* are available, analogous approaches and findings arise for hedonic indices (see Fixler and Zieschang, 1992a, and Feenstra, 1995). Exact bounds on such an index were defined by Feenstra (1995). Consider the theoretical index in equation (21.28), but now defined only over items in terms of their characteristics. The prices are still of items, but they are wholly defined through *p(z)*. An arithmetic aggregation for a linear hedonic equation finds a Laspeyres lower bound (as quantities supplied are *increased* with increasing relative prices) is given by

where *R(.)* denotes the revenue at a set of output prices, input quantities, *v*, and technology, *S*, following the fixed-input output price index model. The price comparison is evaluated at a fixed level of period *t* – 1 technology and inputs. *s _{it–1}* are the shares in total value of output of product

*i*in period

*t*– 1, where

are prices in periods *t* adjusted for the sum of the changes in each quality characteristic weighted by their coefficients derived from a linear hedonic regression. As noted in Appendix 21.1, β* _{kt}* may be estimated using a WLS estimator where the weights are the sales quantities. The summation is over the same

*i*in both periods, since replacements are included when items are missing and equation (21.31b) adjusts their prices for quality differences.

**21.64** A Paasche upper bound is estimated as

where

which are prices in periods *t* – 1 adjusted for the sum of the changes in each quality characteristic weighted by its respective coefficients derived from a linear hedonic regression.

**21.65** Following from the inequalities in Chapter 17 where the Laspeyres *P _{L}* and Paasche

*P*form bounds on their true, economic theoretic indexes,

_{P}**21.66** The SEHI approach thus first applies the coefficients from hedonic regressions to changes in the characteristics to adjust observed prices for quality changes (equations 21.31b and 21.32b). Second, it incorporates a weighting system using data on the value of output of each model and its characteristics, rather than treating each model as equally important (equations 21.31a and 21.32a). Finally, it has a direct correspondence to formulation defined from economic theory.

**21.67** Semilogarithmic hedonic regressions would supply a set of β coefficients suitable for use with these base-period and current-period geometric bounds:

**21.68** In equation (21.34a) the two bounds on their respective theoretical indices have been shown to be brought together. The calculation of such indices is no small task. For examples of its application, see Silver and Heravi (2002; 2003) and Chapter 7, Section G.2, for comparisons over time, and Kokoski, Moulton, and Zieschang (1999) for price comparisons across areas of a country.

**21.69** The above has illustrated how weighted index number formulas might be constructed using data on prices, quantities, and characteristics for an item when the data are not matched. But what of unweighted indices, which was the concern of the initial section of this chapter? What correspondence do the unweighted hedonic indices outlined in Sections C.3 and C.4 above have to the unweighted index number formulas outlined at the start of this chapter?

### C.6 Unweighted hedonic indices and unweighted index number formulas

**21.70** Triplett (2002) argues and Diewert (2003) shows formally that an unweighted geometric mean Jevons index for matched data gives the same result as a logarithmic hedonic index run on the same data. There is simply no point in estimating hedonic indices using *matched* data. Those involved in the matching have worked to ensure that no quality adjustment is necessary. An index from a dummy variable hedonic regression such as equation (21.29), but in log-log form, for matched models can be shown (Aizcorbe, Corrado, and Doms, 2001) to equal

where *m* is the matched sample and *Z _{t}* and

*Z*are in principle the quality adjustments to the dummy variables for time in equation (21.29), that is,

_{t–1}*m = M*is the same model in each period. Consider the introduction of a new model

_{t}= M_{t–1}*n*introduced in period

*t*with no counterpart in

*t*– 1 and the demise of an old model

*o*so it has no counterpart in

*t*. So in period

*t*,

*M*is composed of the period

_{t}*t*matched items

*m*and the new items

*n*, and in period

*t*– 1,

*M*–1 is composed of the period

_{t}*t*– 1 matched items

*m*and the old items. Silver and Heravi (2002) have shown the dummy variable hedonic comparison to now be

**21.71** Consider the second expression in equation (21.36). First there is the change for *m* matched observations. This is the change in mean prices of matched models *m* in period *t* and *t* – 1 adjusted for quality. Note that the weight in period *t* for this matched component is the proportion of matched observations to all observations in period *t*. And, similarly, for period *t* – 1, the matched weight depends on how many unmatched old observations are in the sample. In the last line of equation (21.36), the change is between the unmatched new and the unmatched old mean (quality-adjusted) prices in periods *t* and *t* – 1. Thus, matched methods can be seen to ignore the last line in equation (21.36) and will thus differ from the hedonic dummy variable approach. The hedonic dummy variable approach in its inclusion of unmatched old and new observations can be seen from equation (21.36) possibly to differ from a geometric mean of matched prices changes. The extent of any difference depends, in this unweighted formulation, on the proportions of old and new items leaving and entering the sample and on the price changes of old and new items relative to those of matched items. If the market for commodities is one in which old quality-adjusted prices are unusually low while new qualityadjusted prices are unusually high, then the matched index will understate price changes (see Silver and Heravi, 2002, and Berndt, Ling, and Kyle, 2003, for examples). Different market behavior will lead to different forms of bias. There is a second way in which the results will differ. Index number formulas provide weights for the price changes. The Carli index, for example, weights each observation equally, while the Dutot index weights each observation according to its relative price in the base period. The Jevons index, with no assumptions as to economic behavior, weights each observation equally. Silver (2002) has argued, however, that the weight given to each observation in an ordinary least-squares regression also depends on the characteristics of the observations, some observations with unusual characteristics having more leverage. In this way, the results from the two approaches may differ even more.

## D. New Goods and Services

**21.72** This section briefly highlights issues relating to the incorporation of new goods into the index. Practical issues were outlined in Chapter 8, Section D.3. The term *new goods* will be used here to refer to those that provide a substantial and substantive change in what is provided, as opposed to more of a currently available set of service flows, such as a new model of an automobile that has a bigger engine. In this latter instance, there is a continuation of a service and production flow, and this may be linked to the service flow and production technology of the existing model. The practical concern with the definition of new goods as against quality changes is that the former cannot be easily linked to existing items as a continuation of an existing resource base and service flow, because of the very nature of their “newness.” There are alternative definitions; Oi (1997) directs the problem of defining new goods to that of defining a monopoly. If there is no close substitute, the good is new. A monopoly supplier may be able to supply an item with new combinations of the hedonic *z* characteristics because of a new technology and have a monopoly power in doing so, but in practice the new good can be linked via the hedonic characteristics set to the existing ones. In this practical sense, such goods are not considered new for the purposes of the *Manual*.

**21.73**Merkel (2000, p. 6) takes a similar practical line in devising a classification scheme that will meet the practical needs of PPI compilation. He considers *evolutionary* and *revolutionary* goods. The former are defined as

…extensions of existing goods. From a production inputs standpoint, evolutionary goods are similar to pre-existing goods. They are typically produced on the same production line and/or use largely the same production inputs and processes as pre-existing goods. Consequently, in theory at least, it should be possible to quality adjust for any differences between a pre-existing good and an evolutionary good.

**21.74** In contrast, revolutionary goods are goods that are substantially different from pre-existing goods. They are generally produced on entirely new production lines or with substantially new production inputs and processes than those used to produce preexisting goods. These differences make it virtually impossible, both from a theoretical and practical standpoint, to quality adjust between a revolutionary good and any preexisting good.

**21.75** The main concern regarding the incorporation of new goods into the PPI is the decision on the need and timing for their inclusion. Waiting for a new good to be established or waiting for the rebasing of an index before incorporating new products may lead to errors in the measurement of price changes if the unusual price movements at critical stages in the product life cycles are ignored. There are practical approaches to the early adoption of both evolutionary and revolutionary goods. These are outlined in Chapter 8, Section D.3. For evolutionary goods, such strategies include the rebasing of the index, resampling of items, and introduction of new goods as directed *sample substitutions* (Merkel, 2000). Also of use are hedonic quality adjustments and indices outlined in Chapter 8, Chapter 7,Section E.4, and Chapter 8, Section C above that facilitate the incorporation of such evolutionary goods, since they possess a similar characteristics set to existing ones but deliver different quantities of these characteristics. The modified short-run or chained framework outlined in Chapter 7, Sections G–H may also be more appropriate for product areas with high turnover of items. These approaches can incorporate the price change of new goods into the index as soon as prices are available for two successive periods, although issues relating to the proper weighting of such changes may remain.

**21.76** However, for revolutionary goods, substitution may not be appropriate. First, they may not be able to be defined within the existing classification systems. Second, they may be primarily produced by a new establishment, which will require extending the sample to such establishments. Third, there will be no previous items to match against and make a quality adjustment to prices, since by definition, they are substantially different from preexisting goods. And, finally, there is no weight to attach to the new establishment or item(s). *Sample augmentation* is appropriate for revolutionary goods, as opposed to sample substitution for evolutionary goods. It is necessary to bring the new revolutionary goods into the sample in addition to what exists. This may involve extending the classification, the sample of establishments, and item list within new or existing establishments (Merkel, 2000).

**21.77** Hedonic regression estimates were seen in Chapter 7 to have potential use for the quality adjustment of prices. There are a number of issues that arise from the specification and estimation of hedonic regressions, the use of diagnostic statistics, and courses of action when the standard OLS assumptions are seen to break down. Many of these issues are standard econometric ones and not the subject of this *Manual*. This is not to say, they are unimportant. The use of hedonic regressions will require some econometric or statistical expertise, but suitable texts are generally available. See Berndt (1991)—particularly the chapter on hedonic regressions—and Maddala (1988) and Kennedy (2003), among many others. Modern statistical and econometric software have adequate diagnostic tests for testing when OLS assumptions break down. There remain, however, some specific issues that merit attention, although it must be stressed that these points are over and above, and should not be taken to diminish, the important standard econometric issues found in econometric texts.

## Identification and appropriate estimators

**21.78** Wooldridge (1996, pp. 400–01) has shown on standard econometric grounds that the estimation of supply and demand functions by OLS is biased *and this bias carries over to the estimation of the hedonic function*. It is first useful to consider estimation issues in the supply and demand functions. These functions are rarely estimated in practice. The more common approach is to estimate offer functions, with the marginal price offered by the firm dependent on chosen attributes (product characteristics) and firm characteristics, and to estimate *bid* or value functions, with the marginal prices paid by a consumer dependent on chosen attributes and consumer characteristics.^{27} As noted earlier, the observed prices and quantities are the result of the interaction of structural demand and supply equations and the distributions of producer technologies and consumer tastes, and they cannot reveal the parameters of these offer and value functions. Rosen (1974, pp. 50–51) suggested a procedure for determining these parameters. Since these estimates are conditioned on tastes (α) and technologies (τ), the estimation procedure needs to include empirical measures or “proxy variables” of α and τ . For the tastes α of consumers, the empirical counterparts may be sociodemographic and economic variables, which may include age, income, education, and geographical region. For technologies τ, variables may include technologies and factor prices. First, the hedonic equation is estimated without these variables in the normal manner using the best-fitting functional form. This is to represent the price function consumers and producers face when making their decisions. Then, an implicit marginal price function is computed for each characteristic as *products*, the prices are observed in the market. For *characteristics* they are unobserved, and this first stage must be to estimate the parameters from the hedonic regression. The actual values of each *z _{i}* bought and sold is then inserted into each implicit marginal price function to yield a numerical value for each characteristic. These marginal values are used in the second stage

^{28}of estimation as endogenous variables for the estimation of the demand side:

where α* are the proxy variables for tastes, and the supply side:

where τ* are the proxy variables for technologies. The variables τ * drop out when there is no variation in technologies and

**21.79** Epple (1987) has argued that Rosen’s modeling strategy is likely to give rise to inappropriate estimation procedures of the demand and supply parameters. The hedonic approach to estimating the demand for characteristics has a difficulty arising from the fact that marginal prices are likely to be endogenous—they depend on the amount of each characteristic consumed and must be estimated from the hedonic function rather than observed directly. There are two resulting problems. First, there is an identification problem (see Epple, 1987) because both the marginal price of a characteristic and the inverse bid depend on the levels of characteristics consumed. Second, if important characteristics are unmeasured and they are correlated with measured characteristics, the coefficients on measured characteristics will be biased. This applies to all econometric models, but it is particularly relevant to hedonic models; on this point, see Wooldridge (1996, pp. 400–01) in particular. The equilibrium conditions for characteristic prices imply functional relationships among the characteristics of demanders, suppliers, and products. This in turn reduces the likelihood that important excluded variables will be uncorrelated with the included variables of the model (see also Bartik, 1988, on this point). The bias arises because buyers are differentiated by characteristics *(y*,α) and sellers by technologies τ . The type of item buyers will purchase is related to *(y*,α) and the type sellers provide to τ . On the plane of combinations of *z* transacted, the equilibrium ones chosen may be systematically related; the characteristics of buyers are related to those of sellers. Epple (1987) uses the example of stereo equipment: the higher income of some buyers leads to purchases of high-quality equipment, and the technical competence of sellers leads them to provide it. The consumer and producer characteristics may be correlated.

**21.80** Wooldridge (1996, pp. 400–01) suggests that individual consumer and firm characteristics such as income, education, and input prices should be used as instruments in estimating hedonic functions. In addition, variables other than a good’s characteristics should be included as instruments if they are price determining, such as geographical location—say, proximity to ports, good road systems, climate, and so on. Communities of economic agents are assumed, within which consumers consume and producers produce for each other at prices that vary across communities for identical goods. Variables on the characteristics of the communities will not in themselves enter the demand and supply equation but are price determining for observed prices recorded across communities. Tauchen and Witte (2001) provide a systematic investigation of the conditions under which consumer and producer and community characteristics will affect the hedonic parameter estimates for a single-regression equation estimated across all communities. A key concern is whether the hedonic price function error term represents factors that are unobserved by both the economic agents and the researcher, or by the researcher only. In the latter case, the error term may be correlated with the product attributes, and instrumental variable estimation is required. If the error term is *not* correlated with the product characteristics—preferences are quasi-linear—then a properly specified hedonic regression, including community- specific characteristics or appropriate slope dummies, can be estimated using OLS. In other cases, depending on the correlation between consumer and producer characteristics, assumptions about the error term and the method of incorporating community characteristics into the regression, instrumental variables, including consumer or producer or community dummy or characteristics, may need to be used.

## Functional form

**21.81** Triplett (1987; 2002) argues that neither classical utility theory nor production theory can specify the functional form of the hedonic function.^{29} This point dates back to Rosen (1974, p. 54) who describes the observations as being “…a joint-envelope function and cannot by themselves identify the structure of consumer preferences and producer technologies that generate them.” A priori judgments about what the form should look like may be based on ideas about how consumers and production technologies respond to price changes. These judgments are difficult to make when the observations are jointly determined by demand and supply factors but not impossible in rare instances. However, it is complicated when pricing is with a markup, the extent of which may vary over the life cycle of a product. Some tied combinations of characteristics will have higher markups than others. New-item introductions are likely to be attracted to these areas of characteristic space, and this will have the effect of increasing supply and thus lowering the markup and price (Cockburn and Anis, 1998; Feenstra, 1995, p. 647; and Triplett, 1987, p. 38). This again must be taken into account in any a priori reasoning—not an easy or straightforward matter.

**21.82** It may be that in some cases the hedonic function’s functional form will be very straightforward. For example, prices on the websites for options for products are often additive. The underlying cost and utility structure are unlikely to jointly generate such linear functions, but the producer or consumer is also paying for the convenience of selling in this way and are willing to bear losses or make gains if the cost or utility at higher values of *z* are priced lower or are worth more than the price set. But, in general, the data should convey what the functional form should look like, and imposing artificial structures simply leads to specification bias. For examples of econometric testing of hedonic functional form, see Cassel and Mendelsohn (1985); Cropper, Deck, and McConnell (1988); Rasmussen and Zuehlke (1990); Bode and van Dalen (2001); and Curry, Morgan, and Silver (2001).

**21.83** The three forms prevalent in the literature are linear, semilogarithmic, and doublelogarithmic (log-log). A number of studies have used econometric tests, in the absence of a clear theoretical statement, to choose among them. There have been a large number of hedonic studies, and, as illustrated in Curry, Morgan, and Silver (2001), in many of these the quite simple forms do well, at least in terms of the _{2} presented, and the parameters accord with a priori reasoning, usually on the consumer side. Of the three popular forms, some are favored in testing. For example, Murray and Sarantis (1999) favored the semilogarithmic form, while in others—for example Hoffmann (1998)—the three functional forms were found to scarcely differ in terms of their explanatory power. That the parameters from these simple forms accord with a priori reasoning, usually from the consumer side, is promising, but researchers should be aware that such matters are not assured. Of the three forms, the semilogarithmic form has much to commend it. The interpretation of its coefficients is quite straightforward—the coefficients represent proportionate changes in prices arising from a unit change in the value of the characteristic.^{30} This is a useful formulation, since quality adjustments are usually undertaken by making multiplicative instead of additive adjustments (see Chapter 7, Section C.3). The semilogarithmic form, unlike the log-log model, can also incorporate dummy variables for characteristics that are either present, *z _{i}* = 1, or not,

*z*= 0.

_{i}^{31}

**21.84** More complicated forms are possible. Simple forms have the virtue of parsimony and allow more efficient estimates to be made for a given sample. However, parsimony is not something to be achieved at the cost of misspecification bias. First, if the hedonic function is estimated across multiple independent markets, then interaction terms are required (see Mendelsohn, 1984, for fishing sites). Excluding them is tantamount to omitting variables and inappropriately constraining the estimated coefficients of the regression. Tauchen and Witte (2001) have outlined the particular biases that can arise from such omitted variables in hedonic studies. Second, it may be argued that the functional form should correspond to the aggregator for the index—linear for a Laspeyres index, logarithmic for a geometric Laspeyres index, translog for a Törnqvist index, and quadratic for a Fisher index (Chapter 17). However, as Triplett (2002) notes, the purpose of estimating hedonic regressions is to adjust prices for quality differences, and imposing a functional form on the data that is inconsistent with the data might create an error in the quality adjustment procedure. Yet, as Diewert (2002f) notes, flexible functional forms encompass these simple forms. The log-log form is a special case of the translog form as in equation (17.11), and the semi-log form is a special case of the semi-log quadratic form as in equation (17.16). If there are a priori reasons to expect interaction terms for specific characteristics, as illustrated in the example in Chapter 7, Section E.4, then these more general forms allow this, and the theory of hedonic functions neither dictates the form of the hedonic form nor restricts it.

## Changing tastes and technologies

**21.85** The estimates of the coefficients may change over time. Some of this will be attributed to sampling error, especially if multicollinearity is present, as discussed below. But, in other cases, it may be a genuine reflection of changes in tastes and technologies. If a subset of the estimated coefficients from a hedonic regression is to be used to quality-adjust a noncomparable replacement price, then the use of estimated out-of-date coefficients from some previous period to adjust the prices of the new replacement model would be inappropriate. There would be a need to update the indices as regularly as the changes demanded.^{32} For estimating hedonic indices, the matter is more complicated. The coefficients in a simple dummy timeperiod model as in Section C.3 now have different estimates of the parameters in each period. Silver (1999), using a simple example, shows how the estimate of quality-adjusted price change from such a dummy variable model requires a reference basket of characteristics. This is apparent for the hedonic imputation indices where separate indices using base- and current-period characteristics are estimated. A symmetric average of such indices is considered appropriate. A hedonic index based on a time dummy variable implicitly constrains the estimated coefficients from the base and current periods to be the same. Diewert (2003) formalizes the problem of choosing the reference characteristics when comparing prices over time when the parameters of the hedonic function may themselves be changing over time. He finds the results of hedonic indices to *not* be invariant to the choice of reference–period characteristic vector set *z*. The use of a sales- (quantity-) weighted average vector of characteristics proposed by Silver (1999) is considered, but Diewert notes that over long time periods this may become unrepresentative.^{33} Of course, if the dummy variable approach is used in a chained formulation as outlined in Section C.3, the weighted averages of characteristics remain reasonably up to date, though chaining has its own pros and cons (see Chapter 15). A fixed-base alternative noted by Diewert (2003) is to use a Laspeyres-type comparison with the base-period parameter set, and a Paasche-type current-period index with the current-period parameter set, and take the geometric mean of the two indices for reasons similar to those given in Chapter 17, Section B.3. The resulting Fisher-type index is similar to that given in equation (21.32a) proposed by Feenstra (1995).^{34} A feature of the time dummy approach is that it implicitly takes a symmetric average of the coefficients by constraining them to be the same. But what if, as is more likely the case, only base-period hedonic regression coefficients are available? Since hedonic indices based on a symmetric average of the coefficients are desirable, the spread or difference between estimates based on either a current- or a reference-period characteristics set is an indication of potential bias, and estimates of such spread may be undertaken retrospectively. If the spread is large, estimates based on the use of a single period’s characteristics set, say, the current period, should be treated with caution. More regular updating of the hedonic regressions is likely to reduce spread because the periods being compared will be closer and the characteristics of the items in the periods compared more similar.

## Weighting

**21.86** OLS estimators implicitly treat each item as being of equal importance, although some items will have quite substantial sales, while for others, sales will be minimal. It is axiomatic that an item with sales of more than 5,000 in a month should not be given the same influence in the regression estimator as one with a few transactions. Commodities with very low sales may be at the end of their life cycles or be custom made. Either way, their (quality-adjusted) prices and price changes may be unusual.^{35} Such observations with unusual prices should not be allowed to unduly influence the index.^{36} The estimation of hedonic regression equations by a WLS estimator is preferable. This estimator minimizes the sum of *weighted* squared deviations between the actual prices and the predicted prices from the regression equation, as opposed to OLS estimation, which uses an equal weight for each observation. There is a question as to whether to use quantity (volume) or expenditure weights. The use of quantity weights can be supported by considering the nature of their equivalent “price.” Such prices are the average (usually the same) price over a number of transactions. The underlying sampling unit is the individual transaction, so there is a sense that the data may be replicated as being composed of, say, 12 individual observations using an OLS estimator, as opposed to a single observation with a weight of 12 using a WLS estimator. Both would yield the same result. Inefficient estimates arise if the variance of the errors, *V(u _{i})*, is not constant—that is, they are heteroskedastic. WLS is equivalent to assuming that the error variances are related to the weights in a multiplicative manner, say,

*V(u*=

_{i})*σ*.

^{2}w_{i}^{2}^{37}A priori notions as to whether a hedonic regression model predicts better or worse at different levels of quantities or expenditures may help in identifying which weights are appropriate; however, statistical tests or plots of heteroskedasticity may be more useful.

**21.87** The sole use of statistical criteria for decidingon which weighing system to use has rightfully come under some criticism. Diewert (2002c)and Silver (2002) have argued that what matters is whether the estimates are representative of the target index in mind. Conventional target index numbers such as Laspeyres, Paasche, Fisher, and Törnqvist weight price changes by expenditure shares, and the latter two formulas have received support from the axiomatic, stochastic, fixed-base, andeconomic theoretic approaches, as shown in Chapters15–18. Thus, value weights are preferred to quantity weights: “The problem with quantity weighting is this: it will tend to give too little weight to cheap models that have low amounts of useful characteristics” (Diewert, 2002c, p. 8). He continues to argue that for a WLS estimator of hedonictime dummy variable indices, expenditure *share* weights should be used, as opposed to the *value* of expenditure, to avoid inflation increasing period 1 value weights, resulting in possible heteroskedastic residuals. Furthermore, for a semilogarithmic hedonic function when models are present in both periods, the average expenditure shares in periods 0 and 1 for m items, *½(s _{m0} + s_{m1})*, should be used as weights in the WLS estimator. If only matched models exist in the data, then such an estimator may be equivalent to the Törnqvist index. If an observation

*m*is available only in oneof the periods, its weight should be

*s*or

_{m0}*s*accordingly, and the WLS estimator provides a

_{m1}*generalization*of the Törnqvist index.

**21.88** Silver (2002) has shown that a WLS estimator using value weights will not necessarily give each observation a weight equal to its relative value. The estimator will give more weight to those observations with high leverage effects and residuals. Observations with values of characteristics with large deviations from their means—say, very old or new models—have relatively high leverage. New and old models are likely to be priced at quite different prices than those predicted from the hedonic regression, even after taking into account their different characteristics. Such prices result, for example, from a pricing strategy designed to skim segments of the market willing to pay a premium for a new model, or from a strategy to charge relatively low prices for an old model to dump it to make way for a new one. In such cases the influence these models have on deriving the estimated coefficients will be over and above that attributable to their value weights. Silver (2002) suggests that leverage effects should be calculated for each observation, and those with high leverage and low weights should be deleted, and the regression re-run. Thus, while quantity or value weights are preferable to no weights (that is, OLS), value weights are more appropriate than quantity ones, and, even so, account should be taken of observations with undue influence.

**21.89** Diewert (2002f) has also considered the issue of weighting with respect to the time dummy hedonic indices outlined in Section C.6. The use of WLS by value involves weights being applied to observations in both periods. However, if, for example, there is high inflation, then the sales values for a model in the current period will generally be larger than those of the corresponding model in the base period, and the assumption of homoscedastic residuals is unlikely to be met. Diewert (2002f) suggests the use of expenditure *shares* in each period, as opposed to values, as weights for WLS for time dummy hedonic indices. He also suggests that an average of expenditure shares in the periods being compared be used for matched models.

**21.90** Data on sales are not always available for weights, but the major selling items can generally be identified. In such cases, it is important to restrict the number of observations of items with relatively low sales, the extent of the restriction depending on the number of observations and the skewness of the sales distribution. In some cases, items with few sales provide the variability necessary for efficient estimates of the regression equation. In other cases, their low sales may be due to factors that make them unrepresentative of the hedonic surface, their residuals being unusually high. An example is low-selling models about to be dumped to make way for new models. Unweighted regressions may thus suffer from a sampling problem—even if the prices are perfectly quality adjusted, the index can be biased because it is unduly influenced by low-selling items with unrepresentative price-characteristic relationships. In the absence of weights, regression diagnostics have a role to play in helping to determine whether the undue variance in some observations belongs to such unusual low-selling items.^{38}

## Multicollinearity

**21.91** There are a priori reasons to expect for some commodities that the variation in the values of one characteristic will not be independent of one or a linear combination of other *z* characteristics. As a result, parameter estimates will be unbiased, yet imprecise. To illustrate this, a plot of the confidence interval for one parameter estimate against another collinear one is often described as elliptical, since the combinations of possible values they may take can easily drift from, say, high values of β_{1} and low β_{2} to higher values of β_{2} and low of β^{1}. Since the sample size for the estimates is effectively reduced, relatively small additions to and deletions from the sample may affect the parameter estimates more than would be expected. These are standard statistical issues, and the reader is referred to Maddala (1988) and Kennedy (2003). In a hedonic regression, multicollinearity might be expected because some characteristics may be technologically tied to others. Producers including one characteristic may need to include others for it all to work, while for the consumer side, purchasers buying, for example, an up-market brand may expect a certain bundle of features to come with it. Triplett (2002) argues strongly for the researcher to be aware of the features of the product and consumer market. There are standard, though not completely reliable, indicators of multicollinearity (such as variance inflation factors), but an exploration of its nature is greatly aided by an understanding of the market along with exploration of the effects of including and excluding individual variables on the signs and coefficients and on other diagnostic test statistics (see Maddala, 1988).^{39}

**21.92** If a subset of the estimated coefficients from a hedonic regression are to be used to quality-adjust a noncomparable replacement price, and if there is multicollinearity *between* variables in this subset *and* other independent variables, then the estimates of the coefficients to be used for the adjustment will be imprecise. The multicollinearity effectively reduces the sample size, and some of the effects of the variables in the subset may be wrongly ascribed to the other independent variables. The extent of this error will be determined by the strength of the multiple-correlation coefficient between all such “independent” variables (the multicollinearity), the standard error or “fit” of the regression, the dispersion of the independent variable concerned, and the sample size. These all affect the precision of the estimates, since they are components in the standard error of the *t*-statistics. Even if multicollinearity is expected to be quite high, large sample sizes and a well-fitting model may reduce the standard errors on the *t*-statistics to acceptable levels. If multicollinearity is expected to be severe, the predicted value for an item’s price may be computed using the whole regression and an adjustment made using the predicted value, as explained in Chapter 7, Section E.4, since there is a sense in which it would not matter whether the variation was wrongly attributed to either β_{1} or β_{2}. If dummy variable hedonic *indices* are being calculated (Section B.3 above), the time trend will be collinear with an included variable if a new feature appears in a new month for the vast majority of the items, so that the data are not rich enough to allow the separate effects of the coefficient on the time dummy to be precisely identified. The extent of the imprecision of the coefficient on the time dummy will be determined by the aforementioned factors. A similar argument holds for omitted variable bias.

## Omitted-variable bias

**21.93** The exclusion of tastes and technology and community characteristics has already been discussed. The concern here is with product characteristics. Consider again the use of a subset of the estimated coefficients from a hedonic regression to quality-adjust a noncomparable replacement price. It is well established that multicollinearity of omitted variables with included variables leads to bias in the estimates of the coefficients of included ones. If omitted variables are *independent* of the included variables, then the estimates of the coefficients on the included variables are unbiased. This is acceptable in this instance; the only caveat is that it may be that the quality adjustment for the replacement item also requires an adjustment for these omitted variables, and this, as noted by Triplett (2002), has to be undertaken using a separate method and data. But what if the omitted variable is multicollinear with a subset of included ones, and these included ones are to be used to quality adjust a noncomparable item? In this case, the coefficient on the subset of the included variables may be wrongly picking up some of the omitted variables’ effects. The coefficients will be used to quality-adjust prices for items that differ only with regard to this subset of included variables, and the price comparison will be biased if the characteristics of both included and omitted variables have different price changes. For hedonic *indices* using a dummy time trend, the estimates of quality-adjusted price changes will suffer from a similar bias if omitted variables excluded from the regression are multicollinear with the time change. What are picked up as quality-adjusted price changes over time may, in part, be changes due to the prices of these excluded variables. This requires that the prices on the omitted characteristics follow a different trend. Such effects are most likely when there are gradual improvements in the quality of items, such as the reliability and safety of consumer durables,^{40} which are difficult to measure, at least for the sample of items in real time. The quality-adjusted price changes will thus, overstate price changes in such instances.

The terminology is credited to Dalén (1998); see also Appendix 8.1.

Its absence may be temporary, being a seasonal item, and specific issues and methods for such temporarily unavailable items are considered in Chapter 9. The concern here is with items that disappear permanently.

Such methods and their assumptions are outlined in detail in Chapter 15.

Boskin and others (1996; 1998) and Shultze and Mackie (2002).

The range of items is assumed to be continuous in terms of the combinations of characteristics that define it. A noncontinuous case can be depicted where the price functions are piecewise linear, and an optimal set of characteristics is obtained by combining the purchases of different items (Lancaster, 1971; Gorman, 1980).

An envelope is more formally defined by letting *f(x, y, k)* = 0 be an implicit function of *x* and *y*. The form of the function is assumed to depend on *k*, the tastes in this case. A different curve corresponds to each value of *k* in the *xy* plane. The envelope of this family of curves is itself a curve with the property that it is tangent to each member of the family. The equation of the envelope is obtained by taking the partial derivative of *f(x, y, k)* with respect to *k* and eliminating *k* from the two equations *f(x, y, k)* = 0 and *fk(x, y, k)* = 0. (See Osgood, 1925.)

The numeraire commodity represents all other goods and services consumed—it represents the “normal” nonhedonic commodities. The price of *x* is set equal to unity; *p(z)* and income are measured in these units.

This is the hypothetical price that makes the demand for the characteristic equal to zero; that is, it is the price that, when inserted into the demand function, sets demand to zero.

The utility function is assumed strictly concave so that θ is concave in *z*, and the value function is increasing in *z _{i}* at a decreasing rate.

The time superscripts are not relevant in this context.

The cost function is assumed to be convex with no indivisibilities. The marginal cost of producing one more item of a given combination of characteristics is assumed to be positive and increasing, and, similarly, the marginal cost of increasing production of each component characteristic is positive and nondecreasing.

Rosen (1974) considered two other supply characterizations: the short run in which only *Q* is variable, and a long run in which plants can be added and retired. The determination of equilibrium supply and demand is not straightforward. A function *p(z)* is required such that market demand for all *z* will equate to market supply and clear the market. But demand and supply depend on the whole *p(z)*, since any adjustment to prices to equate demand and supply for one combination of items will induce substitutions and changes for others. Rosen (1974, pp. 44–48) discusses this in some detail.

So that choices among combinations of *z* are continuous, assume further that *z* possesses continuous secondorder derivatives.

Correspondingly, if the supply curves were perfectly inelastic, so that a change in price would not affect the supply of any of the differentiated products, then the variation in prices underlying the data and feeding the hedonic estimates would be determined by demand factors. The coefficients would provide estimates of user values. Similarly, if the supplying market were perfectly competitive, the estimates would be of resource costs. None of the price differences between differentiated items would be due to, say, novel configurations of characteristics, and no temporary monopoly profit would be achieved as a reward for this, or as a results of the exercise of market power. See Berndt (1983).

Berry, Levinsohn, and Pakes (1995) provide a detailed and interesting example for automobiles in which makes are used as market segments, while Tauchen and Witte (2001) provide a systematic theoretical study of estimation issues for supply, demand, and hedonic functions where consumers and producers and their transactions are indexed across communities.

We will need some identifying restrictions to identify the parameters of *f ^{0}* and

*f*along with ρ

^{1}^{0}and ρ

^{1}. One common model setsρ

^{0}= 1 and

*f*=

^{0}*f*. A more general model sets ρ

^{1}^{0}= 1 and

*f*=

^{0}(z*)*f*for a reference characteristics vector,

^{1}(z*)*z*≡ [z*.

_{1}*,…, z_{K}*]If the establishment is competitively optimizing with respect to its choice of inputs as well, then the period *t* input vector *v _{t}*, along with

*q*and

_{t}*z*, is a solution to the following period

_{t}*t*profit maximization problem for the establishment:

*w*is a vector of input prices that the establishment faces in period

_{t}*t*and

*w*denotes the inner product of the vectors

^{t}.v*w*and

^{t}*v*. It is possible to rework our analysis presented below, conditioning on an input price vector rather than on an input quantity vector.

Assume that all *ρ ^{t}*,

*f*, and

^{t}(z)*F*are positive for

^{t}(z, v^{t})*t*= 0,1.

We need estimates of the hedonic model price functions for both periods to implement these “observable” indices.

The result follows using Debreu’s (1952, pp. 889–90; 1959, p. 19) Maximum Theorem.

Diewert (2002f) goes further in suggesting positive sign restrictions are imposed on the coefficients in the econometric estimation.

Mechanisms for such adjustments are varied, as outlined inChapter 7, Section E.4.3, and Triplett (2002). They include using the coefficients from the salient set of characteristics or using the predicted values from the regression as a whole and, in either case, making the adjustment to the old for comparison with the new, or to the new for comparison with the old, or some effective average of the two.

This concept of the output price index (or a closely related variant) was defined by F.M. Fisher and Shell (1972, pp. 56–58), Samuelson and Swamy (1974, pp. 588–92), Archibald (1977, pp. 60–61), Diewert (1980, pp. 460–61; 1983a, p. 1055), and Balk (1998b, pp. 83–89). Readers who are familiar with the theory of the true cost-of-living index will note that the output price index defined by equation (21.2) is analogous to the true cost-of-living index, which is a ratio of cost functions, say, *C(u, p1)/C(u, p0)*, where *u* is a reference utility level: *R* replaces *C*, and the reference utility level *u* is replaced by the vector of reference variables *S(v)*. For references to the theory of the true cost-of-living index, see Konüs (1924), Pollak (1983a), or ILO and others (2004), which is the CPI counterpart to this *Manual*.

Triplett (1987) and Diewert (2002d), following Pollak (1975), consider a two-stage budgeting process whereby that portion of utility concerned with items defined as characteristics has its theoretical index defined in terms of a cost-minimizing selection of characteristics, conditioned on an optimum output level for composite and hedonic commodities. These quantities are then fed back into the second-stage overall revenue maximization.

Chapter 15, Section F contains a detailed account of chained indices.

Ioannidis and Silver (1999) and Bode and van Dalen (2001) compared the results from these different estimators, finding notable differences, but not in all cases (see also Silver and Heravi, 2002).

These are equivalent to inverse demand (supply) functions, with the prices dependent on the quantities demanded (supplied) and the individual consumer (producer) characteristics.

This two-stage approach is common in the literature, though Wooldridge (1996) discusses the joint estimation of the hedonic and demand and supply side functions as a system.

Arguea, Hsiao, and Taylor (1994) propose a linear form on the basis of arbitrage for characteristics, held to be likely in competitive markets, although Triplett (2002) argues that this is unlikely to be a realistic scenario in most commodity markets.

It is noted that the anti-log of the OLS-estimated coefficients are not unbiased—the estimation of semilogarithmic functions as transformed linear regressions requires an adjustment to provide minimum-variance unbiased estimates of parameters of the conditional mean. A standard adjustment is to add one-half of the coefficient’s squared standard error to the estimated coefficient (Goldberger, 1968, and Teekens and Koerts, 1972).

Diewert (2002f) argues against the linear form on the grounds that, while the hedonic model is linear, the estimation required is of a nonlinear *regression* model, and the semi-log and log-log models are linear *regression* models. He also notes that semi-log form has the disadvantage against the log-log of not being able to impose constraints of constant returns to scale. Diewert (2002d) also argues for the use of nonparametric functional forms and the estimation of linear generalized dummy variable hedonic regression models. This has been taken up in Curry, Morgan, and Silver (2001), who use neural networks that are shown to work well, although the variable set required for their estimation has to be relatively small.

In Chapter 15, Section C.3.2, the issue of adjusting the base- versus the current-period’s price is discussed, since there are different data demands.

Other averages may be proposed—for example, the needs of an index representative of the “typical” establishment would be better met by a trimmed mean or median.

Diewert (2002c) also suggests matching items where possible and using hedonic regressions to impute the prices of the missing old and new ones. Different forms of weighting systems, including superlative ones, can be applied to this set of price data in each period for both matched and unmatched data.

Such observations have higher variances of their error terms, leading to imprecise parameter estimates. This would argue for the use of WLS estimators with quantity sold as the weight. This is one of the standard treatments for heteroskedastic errors (see Berndt, 1991).

See Berndt, Ling, and Kyle (2003), Cockburn and Anis (1998), and Silver and Heravi (2002) for examples. Silver and Heravi (2002) show old items have above-average leverage effects and below-average residuals. Not only are they different, but they exert undue influence for their size (number of observations).

Estimating an equation for which each variable is divided by the square root of the weight using OLS is an equivalent procedure.

A less formal procedure is to take the standardized residuals from the regression and plot them against model characteristics that may denote low sales, such as certain brands (makes) or vintage (if not directly incorporated) or some technical feature that makes it unlikely that the item is being bought in quantity. Higher variances may be apparent from the scatter plot. If certain features are expected to have, on average, low sales, but seem to have high variances, leverages, and residuals (see Silver and Heravi, 2002), a case exists for at least downplaying their influence. Bode and van Dalen (2001) use formal statistical criteria to decide between different weighting systems and compare the results of OLS and WLS, finding, as with Ioannidis and Silver (1999), that different results can arise.

Triplett (2002) stresses the point that * R ^{2}* alone is insufficient for this purpose.

There are some commodity areas, such as airline comfort, that have been argued to have overall patterns of decreasing quality.