5. Sampling issues in price collection
- International Monetary Fund
- Published Date:
- September 2004
5.1 In an ideal world, it would always be possible to use statistically sound sampling techniques to produce price indices with a high degree of accuracy and within given resource constraints. Reality, however, is usually very far away from this ideal. It is almost always impossible to achieve efficient samples because (i) accurate estimates of population variances, required for allocation of sample units to strata, are rarely available; (ii) sampling frames are always deficient to some extent, missing some key information, such as births of new establishments, or desired stratification variables; and (iii) response rates are unpredictable and may prove to be deficient, which affects the accuracy of the price index levels and measured price changes.
5.2 The aim of the sampling statistician is, therefore, to make the best use of what is available and to apply the principles of sampling theory in a commonsense and practical way. Arguably the most important steps in sampling are to establish and understand fully what the survey is trying to estimate, the limitations of the sampling frame, and the environment in which the survey will be conducted, that is, likely response rates, data quality, and levels of resources.
5.3 There is a direct relationship among the uses of the PPI, the scope of the PPI survey coverage, and the requirements for sampling frames. Two of the major uses of the PPI are as a general indicator for inflation and a deflator in the national accounts. The broader the coverage of the PPI in terms of economic activities, the more useful it is in inflation analysis and compiling constant price GDP measures. But broad coverage requires the ability to develop sampling frames for a wide range of economic activities, including both goods-producing and service-producing activities. These sampling frames also must be kept up to date by recording both the births and deaths of enterprises in each sector.
5.4 Once coverage and uses have been established, a sample design can be drawn up, with decisions made about stratification, sample size, and allocation. Random sampling techniques may be employed in countries where large amounts of data are available and reasonable estimates of variance can be made. In many country situations, only limited details of sampling parameters are available, and the statistician may have to fall back on procedures that use expert knowledge at many stages in the selection process. To the extent possible, acceptable, practicable sampling procedures should be used. Judgmental approaches should be used only as a last resort.
5.5 As with most panel samples collected through time, price surveys suffer from problems associated with a changing population. Any sample of establishments and products will become increasingly unrepresentative over time, and it is likely to be depleted as establishments cease the sale or the production of selected products or cease operations altogether. Some form of panel rotation or supplementation for the samples is advised to minimize any bias caused by sample attrition, non-coverage of new products, new establishments, and new production technologies.
B. Common Problems in Price Survey Sampling
5.6 There may be many reasons why price surveys are thought to be unrepresentative and thus liable to lead to inaccurate results. All national price surveys suffer from problems to some extent. The following are some examples:
Samples are selected purposively rather than using probability sampling methods, increasing the chances of bias. For example, establishments may be selected for their convenient geographical location or because they are known to be good respondents;
Without probability selection methods, estimates of statistical accuracy cannot be made (but without some initial estimate of variance, a randomly selected sample cannot be optimized—that is, lowest variance given cost constraints—either. This is a difficult problem that is dealt with later);
The sample size for an industry or commodity may have become outdated if the industry or commodity has grown or contracted since the base period (period when sample was selected);
New products may not be identified or included in the survey. This problem may be relieved to some extent by rotating the sample of establishments;
The sampling frame may be out of date or may not include certain groups of the target population. For example, a common problem in the PPI is that information on small producers is unreliable because this group often is volatile and difficult for administrative authorities to track, resulting in the weight for small producers being wrong (typically they are underrepre-sented); and
Surveys may be voluntary, increasing the chance of nonresponse bias that results when those who do not respond have different price experiences than those who do respond.
C. Starting Position
5.7 Before starting to design a price survey, it is vital to understand the reasons for the survey and its uses. This will determine the format of the outputs required and help decide what data should be collected for the inputs. It is essential to assess and understand the environment in which the survey will be conducted—for example, what response rates might be expected and how good the data quality might be. Obviously, some of the most important decisions to be made concern the level of available resources. So all of the following parameters will affect the sample design and the future success of the survey.
5.8 It is vital to establish the objectives of the survey, by consulting survey users and answering questions such as the following:
Will the price indices be used for deflation of output, and/or as a measure of inflation?
If output deflation is the goal, then reliable, detailed industry and product indices will have a high priority in the PPI, and detailed item indices will be required in the CPI. If, on the other hand, inflation indicators are required, then more emphasis will be placed on aggregate indices, and a range of indica?tors may be required using different prices and weights for example, input, output, wholesale, and retail price indices.
What will the geographical coverage be? National or regional?
The geographical coverage is usually national for the PPI, but in a few countries with regional differences in price movements, regional indices may be important. In addition, a number of countries compile regional GDP estimates. There may be a need for regional PPI estimates for use as deflators, particularly if there are regional differences in price movements.
Do we want a monthly or quarterly time series?
Typically, the PPI is collected monthly as an inflation indicator, but in many countries the PPI may be quarterly because of cost considerations and because its primary use is as a deflator for national accounts usually produced on a quarterly basis.
Which prices are we trying to estimate? Basic prices, producer prices, wholesale prices, or purchasers prices?
The pricing concept will vary depending on the type of index produced. For the output PPI, the pricing concept is the basic price, that is, the per-unit revenue received by the producer from production. For an input PPI, the pricing concept is the purchaser’s price, that is, the per unit cost paid by the producer for material and energy inputs to the production process.
Assuming that a choice has to be made (for cost reasons), are industry PPIs of a higher priority than product PPIs, or vice versa?
If industry PPIs are of a higher priority, then a two-stage sampling scheme is used to derive reliable industry and product estimates; whereas if product PPIs have priority, reliable product samples should be compiled and then aggregated to yield industry PPIs whose reliability may not be quite as accurate.
Will separate indices be compiled for export and domestic market prices?
The PPI should cover all production of domestic producers, including products for domestic use and those for exports. Often countries collect information only on products for domestic use, although the PPI could be used to produce export price indices also.
Which industries and products should be covered? At what level of detail?
In the PPI, the industrial sector (mining and manufacturing) and public utilities are the primary sectors typically covered. Services, however, are becoming much more important in terms of economic importance and growth, and should be covered in the PPI through future expansions.
5.9 The data to be collected must be identified and understood:
What is the type of price to be collected, and can we collect actual transaction prices rather than list prices?
It can be difficult to define and collect prices for many goods and services. Often the quoted list or book price does not represent the price received by the establishment. Ideally, we want to collect actual prices received for a representative sample of establishment transactions. For goods, this can be achieved quite regularly. This is also the case with most services. However, for some services?for example, banking and insurance services the service and price of financial intermediation are not clear-cut, and the actual price may have to be derived from transaction information. (Additional information on prices for these services is provided in Chapter 10.) In addition, if the main use of the indices is to deflate output, then the prices collected should be actual transaction prices.
Will we collect basic prices (excluding taxes on products, including subsidies, and excluding transport costs separately invoiced)?
According to the 1993 SNA, output of goods and services ideally should be valued at basic prices, and so should PPIs, if they are to be used as deflators. If output PPIs used prices other than basic prices, their subsequent deflation may give spurious results.
At what time should prices be recorded?
In line with the valuation of output in the 1993 SNA, accrual accounting rules should be followed as far as possible, so that in the PPI, sales prices are recorded at the time of shipping or delivery. Although country practices often differ—for example, prices may be recorded at the time of purchase or order—the preferred timing is at the time of shipping or delivery. Prices could be an average of several observations during the month or the price on a particular day of the month; both approaches are used and are acceptable.
How should a price (transaction) be described?
The price-determining characteristics of each product or variety should be identified so that transaction specifications can be sufficiently detailed. For example, the price per liter of paint will depend on the number of cans to be shipped, type and quality of paint, terms of payment (net 30 days), type of customer, and any special discounts that may apply.
Are there likely to be periods of seasonal nonavailability? If so, how will these missing prices be dealt with?
Seasonal nonavailability has a direct impact on the quality of the index because the sample size will be predictably lower during these periods. This should be taken into consideration in the design of sample strata, so that several similar products included within the strata have year-round availability. Also, sample sizes for these strata should be increased because of the higher variability in price movements among seasonal products.
5.10 A decision should be made about the level of accuracy required:
Ideally, a maximum acceptable sampling error should be identified for each published index.
Sampling error can be assessed, however, only if probability sampling techniques have been used. This often means starting with some estimates of variance for the component index to determine ini?tial sample sizes. Then, once samples have been collected and variances calculated, the sample can be optimized based on the new variance information. However, the calculation of variances and sampling errors is very difficult to accomplish (Leaver, Johnstone, and Archer, 1991; Leaver and Swanson, 1992; Cope and Freeman, 1998; and Morris and Birch, 2001).1
In practice, there is a trade-off between cost and accuracy.
A high level of accuracy that would be desirable requires larger sample sizes that may not be affordable. In such cases, costs often determine the sample sizes, and the level of accuracy may suffer somewhat.
5.11 Once the coverage is decided, the population to be sampled should be identified and the sampling frame reviewed to determine whether the existing frame needs to be supplemented.
Does the frame contain all of the units in the target population? Does it cover all of the industries that are in the scope and all of the establishments in the targeted industries? Will separate frames have to be developed for each industry, group, or division?
Most business registers have a cutoff (threshold) below a certain size (number of employees or value of sales) and probably some industries that are less well covered, for example, construction and retail trade. Also, there is a need to identify establishments separately from parent enterprises.
How are units defined in the frame? There are probably borderline units where it is uncertain if they belong in the population.
A separate sample frame will need to be developed for PPI industries or products in order to facilitate the selection of the sample of establishments for those industries and products. For example, ancillary or auxiliary units of an enterprise may be out of scope, or certain products that are secondary to the industry should be included in the frame for another industry.
Are units mutually exclusive?
There could be double counting, which occurs when an establishment could be included both in its own right and as part of its parent enterprise.
Is there information available to allow stratification?
We need certain data elements that will serve as stratification variables for example, industrial classification, production or sales, number of employees, and location of establishment in order to select the sample.
Is there information available to allow weighting for probability proportionate to size (PPS) selection?
We will need measures of size, such as output, total sales, and value of shipments. If such measures of value are not available, employment may have to be used as a proxy.
5.12 The level of available resources should be decided:
This will be a constraint on sample sizes.
It is generally more expensive to increase the number of establishments sampled, as opposed to in?creasing the number of prices collected from each establishment. Simply increasing the second may add little to accuracy, when intraestablishment (within an establishment) variance is low compared with interestablishment (between establishments) variance.
And this may dictate the methods of measurement.
For example, whether personal visits can be used in addition to telephone collection or postal or electronic questionnaires.
5.13 Legislative issues may affect the sample design.
Will the survey be voluntary or statutory?
This will affect response rates, which, in turn, have implications for accuracy and sample sizes. Statutory surveys will have higher response rates, although they may result in lower data quality.
Are there rules concerning confidentiality?
This may impose a lower limit on sample sizes—for example, a minimum of four units per stratum may be required.
D. Sample Design
5.14 Given information about what the PPI survey is intended to achieve, the format of the inputs and outputs, desired level of accuracy, and available resources, the process of designing the sample can begin.2 Again, decisions need to be made, but the main objective of the design process is clear—to maximize efficiency—that is, to minimize sampling and nonsampling errors, and to minimize costs.
5.15 Decisions will need to be made about:
Sampling techniques (probability vs. nonprob-ability),
Sample structures and stratification,
Sample allocation between strata, and
Methods for reducing nonsampling errors.
D.1 Sampling techniques
D.1.1 Probability vs. nonprobability sampling
5.16 The statistician, confronted with any measurement problem, must initially consider the possibility of installing a rigorous probability sample. In the context of PPIs, probability sampling means the selection of a sample panel of producers and products (transactions) from a universe of industrial activity in which each producer and product has a known chance of selection.
5.17 Nonprobability sampling is known as judgmental or purposive sampling, or expert choice, and samples are chosen by experts to be representative. In practice, however, different experts would rarely agree on what is representative, and the samples are subject to biases of unknown size. Judgmental sampling may be justified when sample sizes are small, but concern about their biases increases with sample size.
5.18 Using a probability sample comes with two well-known advantages. First, it ensures that the items to be priced are selected in an impartial and objective fashion. In the absence of probability sampling, a danger exists that only items that are easy to price will be selected, resulting in biased estimates (indices). In particular, there is likely to be poor coverage of technologically advanced items, like machine tools, electronic equipment, aircraft, or home electronics in the PPI. These are difficult to price because of rapid changes in specifications. There is also a tendency to place too much emphasis on simpler products, like food items, cement, textiles, or steel bars, for which a comparable series of price quotations can easily be provided.
5.19 The second advantage is that a probability sample permits the measurement of the quality of the survey results through estimates of the variance or sampling error. The quality of results in this context relates to the chance of a difference between the results obtained from the sampled observations and the result that would have been obtained in a complete enumeration of all reporting units in the universe. The use of a probability sample, of course, does not permit the measurement of errors arising from nonresponse, inaccurate reports, obsolete weights, unrepresentativeness of the commodities priced, or any other nonsampling source.
5.20 Probability sampling conceivably could be used at all stages of the selection process. For example, a random sample of products could be selected from a comprehensive list of all goods produced by all mining and manufacturing firms. For each selected commodity, a random sample of producers could be picked using a comprehensive list of producers; for each selected producer, a random sample of specific brands could then be chosen for regular price reporting from a complete list of each producers output. A less rigorous approach might involve random choice of producers or retailers, followed by a purposive selection of individual products or items; alternatively, the producers or retailers might be selected on a nonprobability basis using cutoff sampling (described next), while a random sample is picked from all items made by the selected producers. This mixture of nonrandom with random selection procedures and cutoff sampling procedures narrows the interpretation that may be placed on estimated sampling errors but still will retain the advantage that a certain amount of objectivity is imparted to the selection process.
5.21 Optimal sample design requires, for all units in the population, information that will allow effective stratification and increased efficiency due to selection by PPS. Different variants of probability sampling can be used by statistical agencies:
Simple random sampling—every possible unit has an equal chance of being drawn.
Systematic sampling—every kth unit is selected, after a random start. This sampling is affected by any ordering or pattern in the sampling frame. Ordering leads to a form of implicit stratification, and a pattern in the frame can lead to biased samples.
PPS—each unit has a probability of selection in proportion to it size (or some other indicator of importance, but size is commonly used). Once these probabilities of selection are assigned, either simple random or systematic sampling techniques can be used.
5.22 Despite the attractions of probability sampling methods, there will be situations where it is neither necessary nor desirable. Price indices are an area of statistics where the risks in not having a probability sample are relatively low. The potential diversity of the change in prices charged by various producers of a given commodity over many time periods is relatively low. Compare this to the potential diversity for sales or capital expenditures of firms making the same product over the same period of time. The largest firm may become the smallest, and vice versa. Some may even abandon production of the commodity, and new firms may enter. In summary, the measurement of price changes appears to require less rigor with respect to probability sampling than do other areas of statistical measurement. The additional costs that may be involved in probability sampling can be allocated to other areas in the survey, such as price data collection or improvements to source data on weights.
5.23 That said, without probability sampling, statistical agencies will not be able to produce meaningful measures of sampling error to guide users in distinguishing between real changes in prices and those due to statistical noise. They also will experience difficulty in statistical decision making to improve the sample design and allocate resources more efficiently. Good measures of sampling error provide statistical offices with data for reallocating the sample to areas with high variance to reduce statistical error.
5.24 In several countries, the range of domestically produced mining and manufacturing goods is so limited and the number of firms producing them so small that there is no point in making a selection; the survey should try to cover all products and all producers.
5.25 In other cases, there may be no practical way of determining the universe in advance. A basic requirement for probability sampling is to define the universe (or population) and to identify all units in the universe. The universe list must be kept up to date with all units classified by an industry code such as the ISIC or NACE, which in practice is a costly and difficult business.
5.26 The cost of installing and administering a probability sample may be judged too high. There clearly are high costs involved in the design, selection process, control, and administration of a probability sample for collecting price observations.
5.27 Estimates of variability in price movements also are needed. This information is rarely available for all units in the population, certainly not at a de?tailed product or item level. One way of dealing with this is to use a two-phase sample, where certain information is collected from a sample of units, and then these units are resampled using this infor?mation. In the U.K. PPI for example, detailed prod
5.29 Thus, for most countries a strict probability approach will not be possible, or the costs will greatly outweigh the advantages, so a combination of probability and purposive sampling techniques is employed.
D.1.2 Cutoff sampling
5.30 Cutoff sampling is a strategy frequently used by countries to select samples. In this approach, a predetermined threshold is established with all units at or above the threshold included in the sample (selected with certainty) and units below the threshold level not included (zero probability of selections). Cutoff sampling generally results in a high degree of coverage among a small number of prospective units. This occurs because the distribution of the selection variable (for example, production or sales) is concentrated in a small number of large establishments.3
5.31 The problem with such an approach is that the smaller establishments may have different price movements from the larger units and, thus, introduce an element of bias into the price index. The bias would be the difference between the average price change for the noncovered units and the price change for the overall population. If the importance of units excluded is very small or the bias is very small, the effect on the overall error may be very small. Usually the total error is measured by the root mean square error, RMSE,
and the sample with the lower total error is deemed more efficient. Thus, the approach that produces the lowest total error or RMSE will be preferred. It is possible that a cutoff sample could be more efficient if the bias component of the excluded units is small. For example, if the noncovered units have substantial variation with regard to price change but small bias (that is, the average price change is not much different), the RMSE could be smaller using the cutoff sample, and the survey costs could be much lower.
5.32 Cutoff sampling has a great deal of practicality for selecting the industries and products in a multistage sampling scheme. For example, in selecting the industries in the manufacturing sector that will be included as sample strata, a threshold can be established that only industries that represent 1 percent or more of output will be chosen. Another aspect of sampling where the cutoff approach can be used is in the selection of the representative products within an establishment. If, for example, the selected establishment is assigned four price observations, then the four products with the most sales can be selected.
5.33 Cutoff sampling is not the same as probability sampling. Sampling errors for cutoff samples will not be accurate because the sample is not necessarily representative of the index population. Statistical offices will need to make special efforts to measure bias among smaller firms in order to calculate the RMSE to get a meaningful measure of error.
D.1.3 Multitiered stratification
5.34 Alternatively, it may be useful to use stratified samples in which various classes of establishments are sampled separately. Often it is helpful to identify three or four strata based on their size, such as large, medium-sized, and small establishments, with each stratum having a different sampling rate. For example, large establishments (based on turnover or employment) may be sampled with certainty (that is, all selected in the sample), mediumsized establishments may be sampled at a rate of 25 percent (one out of every four), and small establishments may be sampled at a rate of 2 percent(1 out of every 50).
D.2 Sampling frames
5.35 Whether selecting a sample using probability or nonprobability techniques, we need to define the universe (population) from which we wish to sample, that is, construct a sampling frame. In most countries it is possible to define the population using various lists of enterprises (business registers), compiled for administrative purposes. For the PPI, these business registers probably will be less than ideal for use as sampling frames, however, and will require some manipulation before being used. On the other hand, it is likely that the business registers also will form the sampling frame for any official censuses or surveys of production, in which case some of this manipulation will have been done. The results of the censuses and surveys also will have been used to update and improve the business register.
5.36 The ideal sampling frame would
Be a complete list of all eligible units (producing and exporting) within the geographic and industry or product coverage required.
5.37 Registers typically are compiled as the by-product of an administrative system such as tax collection or social security schemes. Alternatively, lists can be compiled using records such as bank accounts. Such lists generally contain, at a minimum, information about geographical location and size (turnover or number of employees) but may not indicate the principal activity of an enterprise or identify it as an exporter. Supplementary lists may be needed where certain areas of coverage are known to be inadequate. For example, in the United Kingdom a Builders Address File is maintained separately from the main business register since construction is recognized as a particular problem. In the United States, population census housing lists are supplemented by new construction information taken from building permit records. Also, information on the location of shops and value of expenditures for the CPI can be collected as part of the Household Budget Survey (HBS) or as a separate Point of Purchase Survey.
Be updated instantly with all births and deaths of units and changes in addresses, fax numbers, etc
5.38 Maintaining an up-to-date register is resource intensive. It generally is the case that information about the bigger units is more up to date than data on smaller units. This is a particular problem during periods of changing economic structure when some industries or residential areas are expanding, and new units may be starting up in large numbers. If units are not removed from the sampling frame when they no longer exist, they may be selected as part of the sample. This needs to be borne in mind when determining sample sizes. Also, a common error with systematic sampling is to substitute the next unit in the list when a dead unit is sampled, but this should be avoided since the probability of selection of that next unit is enhanced. The sampling interval should be repeated as usual and dead units simply dropped.
Hold certain fields for each unit, allowing sorting of the list and stratification as required.
5.39 For example, industry classification at the ISIC four-digit level and information about value of output would be maintained for PPI purposes (ideally of each product, at the six-digit CPA level, produced by each unit). This information would be updated annually.
5.40 Lists maintained primarily for tax collection purposes are likely to hold information on the values on which taxes are levied, for example, value added, profits, sales. Lists maintained for social security reasons will have information about numbers of employees, wage bills, etc. In countries where production surveys or censuses are performed for national accounts purposes, information on output and intermediate consumption can be held in the business register, too. In the United Kingdom, detailed information on the value of output of products (at the nine-digit level) is collected from a sample of enterprises each year in compliance with EU legislation (PRODCOM), and this information is stored in the register (for sampled enterprises only).
Identify each unit uniquely at the correct institutional level.
5.41 In practice some units may be listed more than once, and others may be grouped under one listing. Ideally, a structure would identify enterprises and their corresponding establishment structure with separate classification and other stratification information for each establishment. If such information is not immediately available from the business register, additional steps or surveys may be needed to collect this information as part of the process of sample frame refinement.
D.3 Sample structure
5.42 The sample structure is likely to depend both on whether industry or area statistics in our price surveys are considered a higher priority than products or population subgroups, or vice versa, and on what information is held in the sampling frame.
5.43 Consider the PPI structure using the following example:
We require PPIs for industries (four-digit ISIC) and PPIs for products (six-digit CPA);
Our product classification system is mapped onto our industrial classification system so that each product falls under a single industry;
There are establishments producing a range of products falling under more than one industry heading.
5.44 The first step in this process may involve selecting the industries and products that will be represented in the PPI. In most countries some industries and products are extremely small in terms of output or sales—for example, industries or products that comprise less than 0.02 percent of total output and sales in a sector such as manufacturing. (If this is not the case, then all industries and products could be included for estimation.) It would be possible to use a cutoff approach where those industries and products below the threshold level (in this example, 0.02 percent of sales) are excluded from the sample of industries or products, but their weight is allocated to another closely related stratum or distributed across a number of other strata. A sampling frame is then built for each industry and product.
5.45 The statistical office should review the industries that fall below the cutoff point and determine if any traditionally important industries or products should be included. Also, newly emerging industries that are expected to grow in importance might be included because they will eventually exceed the threshold. Finally, for the industries not selected, the statistical office should determine if there are logical combinations of industries that can be made to reach the threshold level. For example, ISIC industries 3118 (sugar factories and refineries) and 3119 (manufacture of cocoa, chocolate, and sugar confectionary), may both fall below the threshold level, but by combining the two industries, they would exceed the threshold. Thus a combined industry (3118,9, manufacture of sugar, cocoa, and chocolate) could be derived.4
5.46 To construct industry PPIs, we would classify each establishment by a four-digit ISIC heading based on its principal activity, draw a sample of establishments within each heading, select products and transactions to be priced from each establishment in the sample, and then weight them accordingly to give industry PPIs.
5.47 To construct product PPIs, we would need output or sales information for each establishment for each six-digit product that it produces, enabling us to form a list of all producers for each six-digit product. From each list we would sample transactions and weight them accordingly to give product PPIs.
5.48 Obviously, running both lists and both samples as described above, in parallel, would be inefficient and burdensome on enterprises, and it would require a large amount of product information at the outset. In practice a compromise usually is made. In some countries, the United Kingdom for example, where detailed product information is available (at least for a subpopulation) and users place importance on product PPIs, establishments are listed under product headings and sampled to give product PPIs, which are then weighted together to give industry PPIs. This approach does not allow for the fact that establishments’ behavior does not follow the strict mapping of products onto industries (third bullet in paragraph 5.43); that is, some establishments classified in one industry (A) will produce products (as a result of secondary activities) that are mapped into a different industry (B). Prices for these secondary products should be included in the industry PPI where the establishment is classified (A), despite the fact that the product heading appears elsewhere (B).
5.49 A compromise is to employ a two-stage 5 sampling scheme that is, the frame is stratified first by the four-digit industry, then stratified by size within each industry. Next, samples are selected for each stratum and product samples are drawn from those establishments selected. Each transaction selected must then be classified under a product heading, and product PPIs can be compiled using all prices for each product, regardless of the industry in which the establishments are classified. With two-stage sampling of this sort, some accuracy of the product PPIs will be sacrificed. This is the structure employed in the United States.
D.3.1 Clustering of price-forming units
5.50 It may be useful and more efficient to cluster the basic units in the frame into price-forming units.6.A price-forming unit is an entity whose price levels and movements are more or less identical (perfectly correlated). For example, several establishments owned by a single enterprise may constitute a profit-maximizing center and operate under the same price-setting regime. These establishments would constitute a cluster or price-forming unit. If a two-stage sample structure is used with industries as the principal strata, then establishments will be classified by industry and then clustered within industries.
5.51 It is a well-known principle of sampling that stratification into segments for which the dispersion of price changes is lower (more homogeneous) than the overall dispersion tends to increase the efficiency of the sample by reducing variance.
5.52 For example, in the two-stage sample described above, the list of price-forming units is first stratified by industry classification, for example, the four-digit ISIC. Each industry stratum then can be further stratified by variables appropriate for that industry. The ideal variant for stratification is the value to be measured in the survey—that is, price change—but in practice we use proxy variables that we assume to be correlated with price change. For example, the size of the production unit may cause differences in production technologies and, thus, different responses to changes in demand or input costs.
5.53 In the U.S. PPI, the sample design ensures that all units (that is, products or producers) above a certain size are included. The remaining units are sampled with probability of selection proportionate to size. The alternate approach of setting broad strata, such as those with value of sales of 1 million to 5 million, 5 million to 10 million, etc., will result in units within each stratum having an equal chance of selection and, when selected, an equal weight. In a PPS sample design, a unit with five million in sales will have roughly a five times greater chance of selection than a unit with one million in sales. Further, the unit falling into the sample on a PPS selection would have a weight inverse to its size, an additional improvement over broad stratum sampling.
5.54 Ideally, stratification should be optimized to minimize sampling errors. For example, the number of strata (L) can be optimized based on a relationship such as
D.5 Sample allocation
5.55 Given that there is always an upper limit on the amount of data that can be collected because of resource constraints, decisions must be made about how to allocate the data collection between the strata—that is, we must decide how many establishments to sample in each stratum and how many prices to collect from each. It is generally more expensive to increase the number of establishments sampled as opposed to increasing the number of prices collected from each establishment, although simply increasing the latter may add little to accuracy when intraestablishment variance is low. So, it is generally the case that the number of establishments to be sampled is the constraint, rather than the total number of prices collected.
5.56 Ideally, the sample allocation would be optimized so that accuracy is maximized within the cost constraint, according to some equation linking sample size with accuracy. For example, the simplest form of optimal allocation is to make the sampling fraction (fh) in a stratum (h) proportional to the standard deviation Sh in the stratum, and inversely proportional to the square root of the cost (ch) of including a unit from that stratum in the sample—that is,
Thus more heterogeneous and cheaper strata are sampled at higher rates. Often, costs do not differ between strata, so the optimum allocation reduces to fh α Sh, the so-called Neyman allocation.
5.57 If probability sampling techniques have been used, it is possible, in theory, to estimate variances at each level. Take the following alternative sample structures as examples:
(i) Only industry PPIs are needed, so the frame is stratified by the four-digit ISIC and then by size, and two-stage PPS sampling is employed to select establishments within each heading and then transactions from each establishment.
5.58 The variance of each industry PPI wll depend on the variance among (inter) establishments in that industry and the variance within (intra) the establishments in the sample. Since the second stage of the sampling does not stratify each establishment’s frame of transactions by product, the intra-establishment variance is likely to be relatively large, particularly if the industry produces a wide range of products. In this case, an optimizing model will allocate the total number of establishments to be sampled across industries and size classes according to interestablishment variance in each stratum. The model is likely to suggest collecting a large number of prices from each establishment, particularly from those showing large internal variance.
(ii) Only product PPIs are needed, so the frame is stratified by six-digit product codes, and two-stage PPS sampling is employed to select establishments within each code and then transactions from each establishment.
5.59 Again, the variance of each PPI will depend on the variance among (inter) establishments producing a product, and the variance within (intra) each establishment in the sample. The intra-establishment variance might be because of differences in variety or terms of transaction, but it likely will be relatively small compared with the interestablishment variance. So, an optimization model will allocate the sample of establishments in proportion to the variance within strata but will suggest collecting a fairly low number of prices for each product from each establishment.
(iii) Industry and product PPIs are needed, so the frame is stratified by the four-digit ISIC and then by size, and two-stage PPS sampling is employed to select establishments within each heading and then transactions from each establishment. Transactions within each establishment are stratified by product code.
5.60 Calculation of the variances of the industry and product PPIs is complex, and thus the optimization algorithm also is complex. There are variances among establishments in each industry, and within each product stratum in each establishment in the sample.
5.61 The above examples assume that probability sampling techniques are used and that variances therefore can be estimated. In sample surveying, however, we usually assume very limited information about the frequency distribution followed by sample measurements. This means that in practice, optimization often is done using a variety of pieces of information, applied to more or less formal optimization models. Information that may be available includes the following:
The total sample size that resources allow;
The number of units in each industry frame;
The economics of each industry, that is, the value of output, company and product composi?tion, product dispersion, price-setting mechanisms, etc.;
Which PPIs need to be published it may be necessary to allocate larger sample sizes to some strata industries or products than simple empirical methods would indicate in order for PPIs to be published at a detailed level without fear of breaching confidentiality guidelines; and
5.62 The aim often is simply to produce industry indices with comparable accuracy and to publish a reasonable amount of product detail. As for the number of prices collected from each establishment, it may be necessary to use a general rule, such as the average number of prices should be around 4 or 5 with no single establishment providing more than 15 or 20.
E. An Example of Sample Selection and Recruitment of Establishments
5.63 For sample selection to proceed, all of the earlier steps of sample design must have been completed. Decisions have been made on the sampling techniques to use at each stage of the sampling process. Assume for simplicity that the manufacturing sector has been chosen as the first area to be included in the PPI. (Subsequently, mining, agriculture, public utilities, transport, etc. may be added.) For this purpose, information on establishments such as industry, output, sales, name, and location is available from a recent Census of Manufacturing or a Census of Establishments. Industries at the four-digit ISIC level have been selected using a cutoff sampling strategy. All industries with output (sales) greater than 0.02 percent of total manufacturing output have been chosen. (The cutoff value—0.02 percent—is determined by the amount of economic activity considered significant within the country. If the number of industries is too large given the resources available, a higher cutoff threshold may need to be used.)
5.64 In addition, quite a few industries have production concentrated among a few large enterprises, while others have less-concentrated production. It would be helpful to stratify the industries by size of firm. In those industries where production is highly concentrated among a few large enterprises (for example, three firms represent 90 percent of production), the large enterprises are selected. In those industries with a more disperse concentration, the largest firms could be selected with certainty (that is, chosen with a probability of 1.0), while a sample of smaller firms could be selected using random sampling techniques (for example, PPS sampling as described below). In general, the number of sampling units for the smaller firms should increase as the concentration ratio (percentage of industry output by large firms) becomes smaller. For example, for industries where the concentration ratio is 70 percent, a sample of four units among the smaller establishments might be adequate, but if the concentration ratio is less than 50 percent, the number of units might be twice that size. Using such a process also requires that appropriate weights be assigned to each selected unit. For the certainty units, the weight would be the firms output (sales), while for other units it would be the sampling interval (see example below).
5.65 At this point the frame is stratified, allocations of sampling units have been made, and the sampling technique has been decided upon. Usually, three phases are left to sample selection:
(ii) Recruit establishments; and
(iii) Select transactions.
E.1Selection of establishments
5.66 The sampling frame of establishments has been stratified by four-digit industry and size for probability sampling (purposive sampling could be used instead, and some of the issues involved in this are discussed under Selecting products and transactions in the establishment). In this situation, either systematic or PPS sampling could be used, or a combination of the two. A common application of PPS is to assign a probability of 100 percent to units in the largest strata (as discussed above), and then select randomly from each of the other strata, with probability of selection proportionate to size.
5.67 A combination of systematic sampling and PPS is used in the United States, where a stratum frame would be ordered by size and cumulative totals calculated. For example, assume that we know the average cost per establishment for collecting price information, and that the costs will not vary significantly by industry. Based on this information, we determine that the number of establishments in the sample would be 400 (total data collection costs divided by average cost per establishment). If the industry for which we are drawing the sample represents 1.0 percent of the total sector output, then we would allocate four establishments to the industry (400 X .01), and we can proceed to draw the sample from the frame. Assume the information below in Table 5.1 is available from the sampling frame.
of production in
The sampling interval is calculated:
5.68 All establishments with production values greater than the sampling interval (145) have 100 percent probability of selection and are known as “certainty units” (Establishment E). These selected units are removed from the frame, we recalculate the cumulative size, and a new sampling interval is calculated using the reduced frame and the remaining number of sample units to be allocated (as shown in Table 5.2).
5.69 If there are new certainty units in the reduced sample, these are removed (not in this case) and the process is repeated until a sampling interval is calculated for which there are no certainty units. This sampling interval is used for systematic sampling. The remaining sample is sorted (largest to smallest as shown in Table 5.3), a random number between 0 and 1 is generated, and the sampling interval is multiplied by this random number to give the starting point for the sampling pattern.
of production in
of production in
Starting point: 0.34128 X 127=43
Thus, Establishments C, D, and F are selected, giving a total sample of C, D, E, and F.
5.70 The weights assigned to each establishment would be as follows. Establishment E will have a weight of 200. It was selected with certainty, and it will maintain the same weight because it is representing itself in the sample. Establishments C, D, and F will each have a weight of 127 because they are representing all the other establishments not selected in the sample. Thus, the total of their weights must be the total of all the noncertainty establishments, which is 380 in this example. Additional detail on the source of weights and methods for proportional allocation of weights within establishments to products is presented in Chapter 4, Sections D and section E.
5.71 An alternative approach used in some countries is to use cutoff samples so that a certain level of output or sales is achieved. For example, there may be a desire to have the sample represent 70 percent of the output in each industry in the sample. In such a case, a cutoff sample is used. Establishments in the industry sampling frame are ranked in order of the output (largest to smallest). The percentage of output that each establishment represents to the total for the industry is calculated. The cumulative percentage then is derived. A cutoff of 70 percent is established, so that all establishments below this threshold in the cumulative rankings are dropped and the sample will consist of those remaining. This approach guarantees that the sample consists of large establishments.
5.72 In the previous example if one used the cutoff procedure, establishments E, C, D, and B would have been selected because their cumulative percentage of output is 76.
E.2 Recruiting establishments
5.73 Recruiting an establishment means securing the cooperation of its staff (particularly if the survey is voluntary), so that data will be of a high quality. It is highly recommended that each establishment receive a personal visit during which the purpose and function of the price survey are explained, and the sample of transactions or varieties to be priced is selected. Supplementary data for weighting transactions also can be collected during the visit. All these tasks can be more effectively carried out via personal visits rather than via telephone calls or mailed questionnaires.
E.3Selecting products and transactions in the establishment
E.3.1 Probability and cutoff sampling procedures
5.74 The probability approach also can be used for selecting products and transactions by soliciting information from establishment records. Once in the establishment, however, the respondent may be reluctant to provide detailed records for selecting products and transactions. One alternative would be to ask the respondent to list the products produced and provide an estimate of the percentage each product represents of total sales. This information can be used to select the sample by ranking the products from highest to lowest and then making the selection using the same techniques discussed above.
5.75 Another alternative, if the respondent is unwilling to provide product percentages, is to ask him or her to rank the products in order of importance. Using the ranking information, estimated percentages can be established. Consider the information in Table 5.4 that is provided by a respondent in an establishment with eight products. The respondent was able to rank the products in order of importance. Each product can then be assigned its importance based on the reverse order of its ranking: Product G is assigned 5, Product H is assigned 4, etc. Next, an estimated percentage of sales is calculated using each importance as a percentage of the total of the assigned importances. Assume that the sample design indicates that three products are wanted for this establishment. These percentages can then be used to select a sample of products through the probability sampling procedures described above or through cutoff sampling procedures.
5.76 If probability procedures are used, the sampling interval is first calculated:
Sampling interval = 100/3 = 33.
A random number is selected to determine the starting point and the sampling pattern:
Starting point = 0.45814(33) = 15
Sampling pattern = 15, 48 (15 + 33), and 81 (48 + 33)
The selected sample will be Products G, H, and J. (Note that we do not select Product I because it is below the third interval in the sampling pattern.)
5.77 If the cutoff procedure is used, the first three products (G, H, and I) will be selected. With the cutoff procedure the three most important products are selected.
5.78 In addition, representative transactions for continuous pricing will need to identified. The respondent should be asked to supply information on various transactions that apply to the selected products. Again, the data can be in the form of actual values from company records, estimated percentages, or by ranking. If two transactions per product are required, then the same procedures as those just described would be followed to select the two transactions.
5.79 In the above examples, if the respondent could not provide any information or if he or she says that they are all equally important, then equal probability would be assumed. In such a case, each product or transaction would be assigned the same importance (that is, 100 divided by the number of products), and the selection procedure would continue as explained above.
E.3.2 Purposive sampling
5.80 Since the selection will be based largely on the judgment of the members of establishment staff present at the recruitment meeting (respondents), it is important that these people are knowledgeable and hold senior positions, probably from the marketing, sales, or accounting departments.
5.81 The first step is to stratify by products produced by the establishment selected for the industry sample. As a general guide, it is reasonable to have between 3 and 10 product strata (depending on the size of the establishment) that are deemed representative of the establishment’s output. It should be possible to obtain a sales figure or estimate for each stratum, or at least to order the strata by size. In the establishment, if exports make up more than 20 percent of total sales, and export prices are thought to move differently than domestic market prices, then, ideally, the product strata should be further stratified between exports and domestic market. Separate prices should be collected for exports and domestic products, as necessary.
5.82 Then for each stratum, one or two specific transactions should be chosen, bearing in mind the general rule that the average number of prices from establishments should be around 4 or 5, with no single establishment providing more than 15 or 20 (strata may have to be combined if the number is too large). The aim is to choose transactions and terms of sales that account for a significant proportion of sales, are broadly representative of other production, and are expected to be available for sale or stay in production at future price collections.
5.83 Weights for each transaction selected could be determined by proportional allocation of the establishment weight to each product and transaction selected. This procedure is discussed in Chapter 4, Section E.
E.4 Recording product specifications
5.84 After transactions have been selected, the price-determining characteristics must be carefully discussed and recorded on the collection form. (See Chapter 6 for more details on recording product specifications.) Examples of such characteristics are as follows:
Type of product;
Brand name or model number; and
Main price-determining characteristics—size, weight, power, etc.
Transaction specifications for the PPI:
Type of buyer—exporter, wholesaler, retailer, manufacturer, government;
Type of contract—single or multiple deliveries, orders, one-year, agreed volume;
Unit of measure—per unit, meter, ton;
Size of shipment—number of units;
Delivery basis—free on board, sale with or without delivery to customer;
Type of price—average, list, free on board, net of discount; and
Type ofdiscount—seasonal, volume, cash, competitive, trade.
F. Sample Maintenance and Rotation
5.85 Price surveys are panel surveys in that data are collected from the same establishments on more than one occasion. The general problems with such surveys are that the panel becomes depleted as establishments stop producing, the panel becomes increasingly unrepresentative as time passes and the universe changes, and some establishments may resent the burden of responding and leave the panel or provide poor-quality data. All these problems cause bias.
5.86 A widely used method to alleviate some of these problems is to limit the length of time that establishments stay on the panel by using some form of panel rotation.7. Rotation has two main benefits: (i) it ensures that most producers participate in the survey for a limited time and, therefore, the burden is shared among enterprises, and (ii) it helps to alleviate the problems caused by a sample being out of date that is, sample depletion and not being representative of current trends. Recruiting new establishments helps to ensure that new products are represented in the price surveys.
F.1 Approaches to sample rotation
5.87 Obviously, sample rotation has a cost since new panel members need to be recruited. There are several options regarding how rotation might be done. First, a rotation rate should be fixed. For example, if the whole panel is to be rotated every five years, then the annual rate is 20 percent. This could be implemented by dividing the industry headings into five groups and dealing with one group each year. Or 20 percent of all respondents, across all industries, could be dropped each year and replacements recruited. An establishment’s rotation cycle could be related to its size, so that larger establishments stay in the sample for more than five years, and small establishments stay in for fewer than five years.
5.88 If sample rotation is done by industry group, product group, or geographic location, this provides a good opportunity to review the sample design and reallocate and select new establishments as necessary. Rotation and sample revision fit best within a system of annual chain linking in which the product structure and weights can be updated each year.8
F.2 Procedures for introducing a new sample of establishments
5.89 The procedures used to introduce a new sample of establishments are similar to the overlap procedure used for linking replacement price observations or introducing a new product structure in a weight update. Assume the rotation strategy calls for replacing 20 percent of all industries. If the PPI sample consists of 100 four-digit industries, then each year the statistical office will replace the samples in 20 industries. For each of the targeted industries, a sampling frame is needed to select a new sample of establishments. The staff must then recruit the establishments, as discussed in sections E.2.
5.90 The new industry sample will have new weights for the selected establishments, products, and transactions. The new sample and weights will be used directly to replace the old sample. During the same month, the data collection staff will have to collect price observations for both the new and the old sample. The old sample prices are used to calculate the index in the usual way, and the new sample will provide new base-period prices to calculate the index for the next period using the new weights. For example, the old sample for a particular industry may consist of five establishments and 20 price observations, while the new sample may have eight establishments and 32 price observations. Both samples are collected during the overlap month, that is, 13 establishments with 52 price observations (assuming no establishment in the old sample is also in the new). The 20 observations from the old sample are used for the current-period index calculation. The 32 price observations for the new sample provide basic data for setting new base prices in the new sample.
5.91 The index formula used will influence the relationship between the price reference period for the weights and the reference period for the base prices. If the statistical office compiles a Lowe or Laspeyres index, it will use the first set of prices collected in the new sample to set the base prices for the index. The base price reference period and the weight reference period need to align if the Laspeyres price index is used. If the weight reference period for the establishment and product weights are, for example, annual revenue for 2000 and the prices collected for the new sample are for June 2003, then the new prices will have to be estimated backward to the annual average for 2000. This is accomplished by applying the price change for the industry between June 2003 and the annual average for 2000 to the June 2003 price observations. For example, if the prices in the industry rose by 10 percent between the annual average index for 2000 and the June 2003 index, then each price observation would be deflated by the factor 1.10.9 This calculation adjusts the new price observations for the average price change in the industry between the weight reference period and the current period.
5.92 Consider a similar example for the Lowe index. Again, assume that the weight reference period is for 2000 and that the base price reference period is December 2001. In this case, the statistical office will need to update the weights for price changes between the 2000 annual average and December 2001. The price index for the industry is used to calculate the price change between 2000 and December 2001, and this price change is applied to all the weights. Next, the June 2003 prices will need to be adjusted backward to December 2001. The industry price index is used to measure the price change between December 2001 and June 2003. This price relative then is used to deflate the June 2003 price observations to obtain December 2001 base prices.10
5.93 If the statistical office is using a Young index, the process is much simpler because the new weights are used directly in the computation of the index using the new prices without any adjustments. (See Chapter 15,Sections D.2 and section D.3 for a discussion of the Lowe and Young indices.)
5.94 These procedures ensure that the new prices and weights are consistent with the index number formula within each four-digit industry selected for sample rotation. For higher-level indices, the weight reference period may not be the same as for the industries going through sample rotation. In practice, the aggregation weights used to combine industry and product often have a different price reference period than for the sample rotation groups. For example, the industry and product group weights used to produce higher-level indices (three-digit, two-digit, etc.) may have a reference date of 2000 because they come from an establishment census conducted in 2000. The index reference period might also be 2000 = 100, because of a statistical agency policy to re-reference index numbers once every five years. On the other hand, the weights from the industry sampling frame used todraw the rotated sample may be for 2001, because the weights for the rotated industries are taken from an annual industry survey (perhaps with a special supplement for industries scheduled for sample rotation). The price index reference period could be December 2002 because the price information is readily available from sample respondents.
5.95 Thus, there can be a difference between the base price reference period for the new sample at the lowest level (elementary aggregate)—December 2002—and the index reference period for higher level indices—annual average for 2000. In such cases, the price change from the lower-level indices will be used to move the higher-level indices forward to the current period. For example, in industry 3411 (manufacture of pulp, paper, and paperboard) the index level in December 2002 was 108.0, and in September 2003 it was 110.2 with an index reference period 2000 = 100. The sample of 10 establishments and 40 price observations for this industry was rotated in January 2003 using base prices from December 2002. The elementary indices for the products in this industry have a price reference date of December 2002. To estimate the industry index, the statistical office will have to use the price change from the new sample and link it to the level of the higher-level index. This can be done in two ways, depending on whether the statistical office uses a direct or chained price index formula (see Chapter 9, Sections B.3). Assume a direct index is used where the current price for October 2003 is compared with the base price in December 2002, resulting in a price index of 102.96 (December 2002 = 100). The long-term price relative (1.0296), times the industry 3411 price index for December 2002 (108.0), gives the October 2003 index level of 111.2. Alternatively, if the monthly chained index form is used, where the October prices are compared with the September prices, then the lower level index is linked to the September 2003 higher-level index. Assume the one-month price relative was 1.0091 in October 2003. The September 2003 industry 3411 index (110.2, where 2000 = 100) is multiplied by this price relative to derive the October 2003 industry index of 111.2. The results of the formulas should be the same. The advantage to using the monthly chained index form is that it facilitates making quality adjustments, as discussed in Chapter 7, Sections C.3.3.
G. Summary of Sampling Strategies for the PPI
5.96 The approach to a sampling strategy in the PPI requires a number of steps to gain enough information and design a survey that will produce reasonable estimates of price change within the level of resources provided. The following points provide a logical sequence to the sampling issues presented in this chapter.
(i) Determine the survey objectives, uses, coverage, and resources before determining the data to be collected, the periodicity of collection, and the type of sampling that will be employed.
5.97 It is important to decide at the beginning of the process if price changes for both industry and products will be needed and the degree of accuracy required. It will also be important to decide whether monthly or quarterly indices will be produced. These, in turn, will determine the level of resources allocated to the program. Alternatively, if there is a fixed level of resources available, it is possible to work with cost controls to determine affordable sample sizes and collection frequency at the expense of accuracy.
(ii) Identify sources to use to develop a sampling frame for selecting the establishments and products for covered sectors and industries.
5.98 The availability of an up-to-date business register with appropriate selection parameters (for example, industrial codes and measures of size) could serve as a source for developing sampling frames for selected industries. Many of the sources of weight data discussed in chapter 4 also could be used to develop a sampling frame. These include industrial census, surveys, and administrative records.
(iii) Use probability sampling techniques to the extent possible.
5.99 While probability sampling throughout the selection process is a desirable goal, it may not be entirely affordable. An alternative is to use cutoff sampling at certain stages in the process, such as selecting the industries within a sector or the products within major groups. Sampling frames for each industry or product then can be established to conduct sampling using PPS techniques.
(iv) To make the sample more efficient, use multiple levels of stratification within the sample design.
5.100 In most cases, three strata will be identified within the sample—industry, product, and establishment. However, the sample could be more efficient and representative if additional strata are used, such as establishment size (large, medium, and small), region or location (if there are price trend differences by location within country), and export versus domestic market production (if there are price trend differences for these markets). Additional strata will be helpful to the design wherever there might exist differing price trends or price variability within the chosen strata.
(v) The price sample should be based on actual transactions with the characteristics of those transactions fully described.
5.101 Often there is a tendency to use average prices or unit values (sales value ÷ quantity sold) as the price reported in the PPI. These are not true transaction prices, in that they represent the average of a number of transactions for which there could be differences in quality or pricing characteristics. Therefore, it is important to select a sample of individual transactions with a detailed description of all of the characteristics that determine the price. These transaction prices and their characteristics then will be observed through time.
(vi) Initial recruitment of establishments should be completed by personal visits.
5.102 Initial sample recruitment should be conducted through personal interviews with establishment managers in order to accurately select representative products and transactions. The purpose of the survey must be explained, along with the need for the continuous reporting of price data for the selected transactions.
(vii) Samples of establishments and products must be maintained so the reliability of the PPI remains intact. A program of sample maintenance is needed for this purpose, and sample rotation also may be desirable.
Products produced by establishments will frequently change in response to market conditions. Also, establishments will cease operations and new ones will begin production. The PPI sample size must be maintained in order for PPI estimates of price change to be accurate. Therefore, it is necessary to have a program targeted toward keeping the sample intact and the products representative of current production, in terms of both the goods being produced and the establishments producing them.
The United States has estimates of variance for its CPI, and the United Kingdom has estimates of variance for its PPI. In both cases, the sample design was set up first without information on variances. The resulting variances are greater than if they had been known in advance. Once these first variances have been calculated, they can then be used to improve the efficiency of the sample design by reallocation of sample strata and the number of price observations in each.
There are many textbooks that can be consulted on the theory and application of sampling. One text used quite often is Cochran (1977), available worldwide.
See de Haan, Oppredoes, and Schut (1999) for an analysis of cutoff sampling in the CPI.
Another alternative would be for the statistical office to develop a sample at the three-digit level, combining all the lower-level four-digit industries into one group.
A distinction is made between two-stage sampling, where a sample of establishments is selected and then a sample of transactions is selected from each, and two-phase sampling, where a sample of establishments is selected to provide detailed output data, and this sample then is used as a new sampling frame. This new frame can be sorted and stratified much more effectively than the original frame as a result of the information collected in phase one.
This is not an application of the sampling technique called cluster sampling, where units are arranged into clusters, a number of the clusters are selected, and then all units in these clusters are sampled. In cluster sampling the clusters should be internally heterogeneous in the survey variables since those selected should be representative of those not selected. Here, the term clustering is being used to describe a method for increasing sample efficiency by grouping together homogeneous units. Strictly speaking, these clusters should be referred to as strata.
In many countries, the rotation is limited to the smaller respondents, for whom it is felt that responding to surveys imposes a significant burden. This need not be the general case, and the use of full-panel sample rotation is encouraged.
Annual weight update is not a requirement for sample rotations; it simply makes the process a bit easier because weights already are being updated at most levels of the index. When there is no system for annual weight updates, sample rotation does require a two-tier system of weights fixed weights at higher levels of aggregation for aggregating to higher-level indices and separate weights for low-level indices that are updated periodically.
The statistical office could also do these calculations using information from the product indices. This would involve deriving more deflation factors for the base prices one for each product in the industry. Then each observation would be deflated by the price change in its product index, rather than by the industry index.
If product indices are used, then the calculations must be made using the changes in product indices. Again, this will involve calculation of more price changes—one for each product.