The Size Distribution of Manufacturing Plants and Development

Contributor Notes

Author’s E-Mail Address: skothari@imf.org

The typical size distribution of manufacturing plants in developing countries has a thick left tail compared to developed countries. The same holds across Indian states, with richer states having a much smaller share of their manufacturing employment in small plants. In this paper, I explore the hypothesis that this income-size relation arises from the fact that low income countries and states have high demand for low quality products which can be produced efficiently in small plants. I provide evidence which is consistent with this hypothesis from both the consumer and producer side. In particular, I show empirically that richer households buy higher price goods while larger plants produce higher price products (and use higher price inputs). I develop a model which matches these cross-sectional facts. The model features non-homothetic preferences with respect to quality on the consumer side. On the producer side, high quality production has higher marginal costs and requires higher fixed costs. These two features imply that high quality producers are larger on average and charge higher prices. The model can explain about forty percent of the cross-state variation in the left tail of manufacturing plants in India.

Abstract

The typical size distribution of manufacturing plants in developing countries has a thick left tail compared to developed countries. The same holds across Indian states, with richer states having a much smaller share of their manufacturing employment in small plants. In this paper, I explore the hypothesis that this income-size relation arises from the fact that low income countries and states have high demand for low quality products which can be produced efficiently in small plants. I provide evidence which is consistent with this hypothesis from both the consumer and producer side. In particular, I show empirically that richer households buy higher price goods while larger plants produce higher price products (and use higher price inputs). I develop a model which matches these cross-sectional facts. The model features non-homothetic preferences with respect to quality on the consumer side. On the producer side, high quality production has higher marginal costs and requires higher fixed costs. These two features imply that high quality producers are larger on average and charge higher prices. The model can explain about forty percent of the cross-state variation in the left tail of manufacturing plants in India.

1. Introduction

The typical size distribution of manufacturing establishments in developing countries has a thick left tail compared to developed countries. Figure 1 plots the share of total workers in establishments of different size categories for India and the US for 2005-06. While about 60 percent of the workers are employed by establishments of size less than five in India, the corresponding number for the US is less than 2 percent.1

Figure 1:
Figure 1:

Share of Employment by Size Category: India vs. US

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The graph plots the share of total employment in establishments of different size categories for India and the US. The data for India combines two sources, the Annual Survey of Industries (ASI) and the Survey of Unorganized Manufacturing (SUM) for 2005-06. The data for the US is taken from the County Business Patterns Database for 2006.

This size-income relation also holds across Indian states. Figure 2 plots the share of employment in establishments of size five or less in 2005-06 for different Indian states against the per-capita Net Domestic Product (NDP) of the state relative to the poorest state (Bihar).2 The richest Indian states have about four times the per-capita NDP of the poorest states. While the poorest states have almost 90 percent of their manufacturing workforce employed in establishments of size five or less, the richer states have only about 40 percent of their workforce working in small establishments.3

Figure 2:
Figure 2:

Size Distribution of Manufacturing Establishments: Across Indian States

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The graph plots the share of employment in plants of size five or less in a state against per-capita NDP of the state relative to the poorest state. The data for the states combines two sources, the Annual Survey of Industries (ASI) and the Survey of Unorganized Manufacturing (SUM). Only the 15 largest states are included to keep the graph readable.

What explains this negative correlation between income levels and the share of employment in small establishments? Starting from the work of De Soto (1989), the previous literature has focused on size-dependent policies (regulatory burden faced by large firms, small scale reservation policies, etc) as an explanation for the size-income relation. These policies create distortions which can lead to misallocation of resources, lower income levels, and smaller establishment sizes.

This paper explores an alternative (though potentially complementary) explanation for this size-income relation which is driven by preferences and technology rather than distortions. The hypothesis is that poor households have high demand for low quality products, which can be produced efficiently in small establishments as they require small fixed investments (no research and development expenditure, or no need for large investments in fixed capital). On the other hand, richer households tend to demand higher quality goods, whose production requires a larger scale due to the need for larger fixed investments. This relation between income levels and demand for quality implies that poor countries or states have demand skewed towards goods which require a small scale of production, which in turn causes the size distribution to be dominated by small plants. As a region develops and income levels increase, demand shifts towards high quality products, which in turn leads to a shift on the production side towards higher quality goods. This shift in production causes the share of employment in small plants to decrease, and thus can generate the negative relation between the share of employment in small plants and income levels seen in the data.

I provide empirical evidence in support of this hypothesis using Indian data from consumer and producer surveys:

  1. Using data from Consumer Expenditure Surveys, I show that in the cross-section richer households tend to pay a higher unit price for the same good, which is consistent with the hypothesis that richer households buy higher quality products.

  2. On the producer side, I show that larger plants tend to charge a higher unit price for the same good as compared to smaller plants, which is consistent with larger plants producing higher quality products. To show this, I combine data from the Annual Survey of Industries (ASI), which covers plants employing ten or more workers (twenty or more workers if not using power), and the Survey of Unorganized Manufacturing (SUM), which covers plants employing less than ten workers. The positive relation between prices and plant size holds not just within the formal sector (ASI plants), but also when pooling together the formal and informal plants.

  3. Using ASI and SUM data, I show that larger plants use higher price material inputs, consistent with them using higher quality inputs. Using data from Household Surveys, I also find that larger plants hire more skilled workers.

I develop a general equilibrium model which matches these cross-sectional facts. Households choose from a finite number of quality levels. The choice over quality levels is modeled as a discrete-choice problem with households choosing to consume one quality level out of those available in the economy. Their preferences exhibit non-homotheticity with respect to quality: richer households are more likely to choose higher quality levels. The non-homotheticity arises because the utility function features complementarity between quality and quantity consumed (the marginal increase in utility from a given increase in quantity consumed is larger for higher quality goods) and richer households can consume more quantity of whichever quality level they choose.

On the producer side, production of high quality goods uses skilled labor more intensively. Also, starting a higher quality plant requires higher fixed costs, which combined with a free entry condition implies that producers of high quality goods will be larger on average (in order to recover their larger fixed costs).

The model parameters are chosen to match the micro-facts documented on the consumer and producer side. The quality-size relation on the producer side is matched to the relation between prices and plant size from the producer surveys, while the degree of non-homotheticity is chosen to match the price-income relation seen in the consumer surveys.

I then ask the question: How much of the cross-state variation in the size distribution seen in Figure 2 can be explained by the model? In particular I conduct counterfactual exercises in which I simulate changes in per-capita income levels in the model (by varying productivity and the skill level of the population) and see what is the effect on the size distribution. As income levels increase in the model, demand shifts to high quality goods due to the non-homotheticity of preferences. This shift in demand towards higher quality leads to a shift on the production side, with a fall in the number of low quality producers and an increase in the number of high quality producers. As high quality producers are larger on average compared to low quality producers, there is also a shift in the size distribution towards larger plants. I find that the share of employment in plants of size five or less goes down by 19.3 percentage points (which is about 43 percent of the difference seen across Indian states) when income in the model varies by the same extent as it does across Indian states. I also document that the share of employment in plants of size five or less has gone down by about 20 percentage points in India between 1989 and 2009, and show that the model can explain about 65 percent of this change. While most of the results presented in the paper focus on the share of employment in plants which employ five or less people, Section 5.1 also explores the implications of the model on the entire size distribution.

The model and the counterfactual exercises make the implicit assumption that each state can be treated as a closed economy in which local demand is met by local production. How would the possibility of interstate trade affect the hypothesis presented in the paper? A potential confounding effect of inter-state trade could come through the location choice of large plants. For example, if the richer states are more suited for operating large plants (due to availability of skilled labor, less stringent labor laws etc), then larger plants might choose to locate in these states (and ship their goods to the poor states) and this might be driving the negative relation between income and size that we see in Figure 2. If inter-state trade was an important force, then we would expect the more tradable industries within manufacturing to have a stronger negative relation between size and income levels across states. To test this, I construct two measures of tradability at the 3-digit level of industrial classification. I find that the size-income relation across states is not stronger for tradables as compared to non-tradables (for one of the measures, the non-tradables actually have a stronger negative relation as compared to tradables) indicating that inter-state trade is unlikely to be an important force behind the relation seen in Figure 2. I discuss the issue of inter-state trade in more detail in Section 6.

This paper is related to several strands of literature. A large literature has studied the question of why the size distribution differs markedly across countries. The role of distortionary policies and the regulatory environment in determining the size distribution of plants (and the extent of informality) has been studied in Little, Mazumdar, and Page Jr (1987), De Soto (1989), Loayza (1996), Djankov and others (2002), Loayza, Oviedo, and Serven (2005), Loayza, Serven, and Sugawara (2009), Garicano, LeLarge, and Van Reenen (2013) among others. While size-dependent policies are potentially an important determinant of the size distribution, these policies are unlikely to explain all the differences in size distribution seen between developing and developed countries. Tybout (2000) notes that all developing countries tend to have a large share of their population in small plants, irrespective of whether they have policies which discriminate against large plants or not. This suggests that these policies cannot be the only factor driving plant size. Gollin (1995) and Hsieh and Klenow (2012) conduct quantitative exercises in which they find that size-dependent policies leave a large part of the differences in size across countries unexplained. Hsieh and Olken (2014) document that the “missing middle” in the size distribution in developing countries actually does not exist and that regulatory obstacles which become binding at particular threshold levels do not seem to lead to discontinuities in the size distribution in developing countries.4 This paper suggests that a large part of the differences in size distribution that we see across countries and states is a natural consequence of the low levels of income in developing countries and is not necessarily caused by policies which discriminate against large productive plants in favor of small unproductive plants. The hypothesis considered in the paper is closer to the dual-sector view of the informal sector in La Porta and Shleifer (2008) according to which the informal sector does not compete directly with the formal sector. Also related is the idea in Banerjee and Duflo (2011) which considers the informal economy to be employing poor individuals and using a different production technology characterized by small fixed costs. I focus on the heterogeneity of quality levels being produced by plants of different sizes and how the demand for low quality falls with development.5

Some of the empirical results documented here have been studied in different contexts (or for different countries) in other papers. Deaton and Dupriez (2011) and Dikhanov (2010) document that richer Indian households buy higher price goods. However, these papers focus on spatial differences in prices within India and not the price income relation itself and its implication for the size distribution. Bils and Klenow (2001) show that richer households in the US also buy higher priced durable products. The fact that larger plants produce higher price goods and use higher price inputs is shown using Colombian data by Kugler and Verhoogen (2012). They also interpret these price differences as representing quality differences and develop a model in which more productive firms choose to produce higher quality goods at a higher unit cost. I document similar facts for India. Unlike Kugler and Verhoogen (2012), I combine data from the formal and informal sector to show that the price size relation also holds when we include very small plants in the sample (the Colombian data only has plants of size ten or more).6 On the modeling front, I focus on non-homothetic preferences and its effect on the size distribution which is not explored in Kugler and Verhoogen (2012). Faber (2012) documents similar consumer and producer side facts as in this paper using Mexican data, but focuses on the effect of trade liberalization on income inequality.

A number of papers, especially related to international trade, have developed models of non-homothetic preferences with respect to quality. These include Flam and Helpman (1987), Mitra and Trindade (2005), Dalgin, Mitra, and Trindade (2008), and Choi, Hummels, and Xiang (2009). The model I develop is most closely related to the model in Fajgelbaum, Grossman, and Helpman (2011). Their model features non-homothetic preferences with respect to quality where the non-homotheticity arises due to complementarity between the homogenous good and quality. The non-homotheticity with respect to quality in my model arises due to complementarity between the quantity of the good consumed and quality.

The rest of the paper is structured as follows: Section 2 documents that richer households buy higher price goods and that larger plants produce higher price goods and use higher price inputs. Section 3 presents the model and Section 4 discusses the calibration. Section 5 presents the results for the counterfactual exercises and explores the sensitivity of the results to some key parameters. Section 6 considers the role of inter-state trade in explaining the cross-state relation seen in Figure 2 and Section 7 concludes.

2. Empirical Results

In this section, I provide empirical evidence which is consistent with my hypothesis of richer households consuming higher quality products which are produced by larger plants. In particular I show the following facts:

  1. Richer households buy higher price goods

  2. Larger plants produce higher price goods

  3. Larger plants use higher price material inputs and hire more skilled labor

The facts are documented using four Indian surveys. I give a brief description of each survey along with the main results in the sections that follow.

2.1. Richer Households Buy Higher Price Goods

This sections shows that richer households buy higher price goods, which is consistent with them consuming higher quality products. I use data from the Consumer Expenditure Survey of 2004-05 conducted by the National Sample Survey Office (NSS) of India. About 125,000 households from all Indian states and union-territories were interviewed for the survey. The survey asks households to report the value of consumption for 339 different goods. Households report quantities and rupee values separately for 209 goods, which can be used to compute prices for these goods. More details about the survey can be found in Appendix A.3.

I run regressions of the form

In (Ph,g)=αg,state,rural+βIn (ch)+εh,g,

where Ph,g is the price paid by household h for good g, ch is per-capita expenditure of the household excluding durables, and αg,state,rural represents fixed effects for each product, state, and urban-rural cell. ch is a proxy for the income level of the household, adjusting for household size.7 αg,state,rural controls for the fact that different goods have different average price levels and that these price levels can vary across rural and urban areas and across states. For example, real estate prices might differ across rural and urban areas or across states with different levels of per-capita income and this can drive differences in cost of living and all prices. The fixed effects ensure that the price-income relation is not identified out of differences in average price levels across states of different income levels or across rural-urban area. Intuitively, the coefficient β is the elasticity of price with respect to per-capita consumption level and is identified out of variation in prices paid for the same good by households of different income levels within a state and urban-rural sector.

Column 1 of Table 1 reports the estimate of β, the elasticity of price with respect to per-capita consumption, based on 188 goods.8 The point estimate for β is 0.112 which implies that the average price paid by the 95th percentile household in terms of per-capita expenditure is 24.9 percent more than the price paid by the 5th percentile household (the 95th percentile household’s per-capita expenditure is about seven times that of the 5th percentile household). Column 2 shows that winsorizing 1 percent tails for per-capita expenditure and prices (for a good within a state and urban-rural cell) doesn’t change the results substantially.

Table 1:

Household Regressions: Richer Households Buy Higher Price Goods

article image
Notes: The data is from the Consumer Expenditure Survey of 2004-05. Column 1 reports results for the regression of log of price paid by households for different goods on log of per-capita expenditure of the households. Column 2 winsorizes 1 percent tails of per-capita expenditure and goods prices. Column 3 excludes the expenditure on the good itself from the independent variable. Regressions include fixed effects for the interaction of each good, state, rural-urban cell. The price ratio implied by the coefficient estimates for different percentiles of per-capita expenditure are reported in the rows called “Price Ratio”. Standard errors are clustered at the household level. ***p<0.01.

A possible concern with the results in columns 1 and 2 in Table 1 is that the independent variable is itself a function of the dependent variable as per-capita expenditure sums the expenditure of the household across all goods, i.e., ch=ΣgPh,gQh,ghousehold size where Qh,g is the quantity consumed by household h of good g. This can give rise to a mechanical correlation and also cause a bias if the variables are measured with error. To account for this, column 3 repeats the regression from column 2 with the independent variable replaced by log (chPh,gQh,ghousehold size), i.e., the expenditure on good g is subtracted from per-capita expenditure. The results in column 3 of Table 1 are very similar to columns 1 and 2.

Figure 3 plots the non-parametric equivalent of the the regression in column 3 of Table 1. It estimates a kernel-smoothed local linear regression of residualized log prices (removes good, state, and urban-rural fixed effects) on residualized log of per-capita expenditures.9 As seen in the figure, a constant elasticity of price with respect to per-capita expenditure is a very good fit for the data.

Figure 3:
Figure 3:

Non-parametric Estimate: Richer Households Buy Higher Price Goods

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The data is from the Consumer Expenditure survey of 2004-05. The graph plots the kernel-smoothed local linear regression of residualized log prices on residualized log per-capita expenditures (removes the interaction of good, state, and urban-rural fixed effects). As in column 3 of Table 1, the goods own value of consumption is subtracted from per-capita expenditure. 1 percent tails of residualized log per-capita expenditure are excluded. An Epanechnikov kernel with a bandwidth of 0.13 is used. The grey regions is the 95 percent confidence interval for the non-parametric estimate.

The results in Table 1 show that richer households buy goods at a higher unit price which is consistent with the hypothesis that they buy higher quality goods. However, as documented by Aguiar and Hurst (2007), households might be paying different prices for the same good because households with higher opportunity cost of time tend to shop around less for lower prices. If richer households have a higher opportunity cost of time, then the findings in Table 1 might be a result of less time spent shopping by richer households and not because of purchase of higher quality goods.10

The 2003 Consumer Expenditure Survey asked each individual in the household the main activity they were engaged in (whether they were employed, studying, attending to domestic duties, retired etc).11 I use this to construct a proxy variable which takes value 1 if the household has at least one member between the age of 15 and 70 who is only attending to domestic duties or is retired, and 0 otherwise.12 I interpret households with a non-worker present as households with low opportunity cost of time and include this variable as a control in the regressions. Column 1 of Table 2 repeats the regression from Column 1 of Table 1, but with the 2003 data instead of the 2004-05 data. Column 2 of Table 2 now adds the measure of “non-worker present” as an additional control. Although the coefficient on the “non-worker present” variable is positive, the key point is that the coefficient of per-capita expenditure does not change substantially. Column 3 also includes the interaction of the “non-worker present” variable with per-capita expenditure and this does not change the results substantially either. Columns 4, 5, and 6 repeat the regressions from columns 1, 2, and 3 respectively, but restrict the sample to include households with one or two members only. This controls for the fact that larger households are more likely to have non-working adults. Again, the coefficient on per-capita expenditure does not change substantially when including the “non-worker present” variable as a control.

Table 2:

Household Regressions: Controlling for Opportunity Cost of Time

article image
Notes: The data is from the Consumer Expenditure Survey of 2003. Column 1 reports results for the regression of log of price paid by households for different goods on log of per-capita expenditure (replicating Column 1 of Table 1). Column 2 includes a control for opportunity cost of time, namely a variable which takes value 1 if there is at least one non-working adult in the household. Column 3 also includes the interaction of this variable with per-capita expenditure. Columns 4, 5, and 6 repeat the specifications in 1,2, and 3 but restrict the sample to households of size 1 and 2 only. Regressions include fixed effects for each good, state, rural-urban cell. Standard errors are clustered at the household level. ***p<0.01, **p<0.05.

The results in this section indicate that richer households tend to buy higher price goods, which is consistent with the hypothesis that they are consuming higher quality products.

2.2. Larger Plants Produce Higher Price Goods

This section shows that larger plants produce higher price goods, which is consistent with the hypothesis that high quality goods are produced in large plants. To show this, I combine data from the Annual Survey of Industries (ASI) of 2005-06 and the Survey of Unorganized Manufacturing (SUM) of 2005-06. The ASI covers all manufacturing plants registered under the Factories Act, 1948. This includes manufacturing plants employing twenty or more workers and not using electricity or employing ten or more workers and using electricity. The SUM on the other hand covers the smaller manufacturing plants not covered by the ASI. The two surveys together should provide a representative sample of the manufacturing sector as a whole.13

Both the surveys ask manufacturing establishments detailed questions about the products they produce and inputs they use. Each establishment reports the quantity of the product it produces (for a 5-digit product classification, which has about 5,500 possible products) and its value (before taxes and distribution expenses) which can be used to compute prices. For the ASI, each products quantity is supposed to be reported for a standardized unit (kilograms, numbers, etc). In the SUM, different plants can report the same products price in different units. I concord units across the two survey so that the price of the same product is not getting compared for different units.14

I run regressions of the form

In (Pf,g)=αg+αstate,rural+γIn(Lf)+εf,g,

where Pf,g is the price charged by plant f for product g, Lf is the number of workers employed by plant f, αg is a product fixed effect, and αstate,rural is a state times urban-rural fixed effect. Intuitively, the coefficient γ is the elasticity of the price of output produced with respect to plant size and it is identified out of variation in prices charged by plants of different sizes producing the same product (reported in the same units) and allowing for differences in average price levels across states and urban and rural areas.

Column 1 of Table 3 reports results when the sample is restricted to the ASI only. The estimate for the elasticity of price with respect to size, γ, is 0.096 and is statistically significant at the 1 percent level. The point estimate implies that a plant which employs 500 people on average charges a price which is 55.6 percent more than a plant employing 5 workers.15

Table 3:

Plant Regressions: Larger Plants Produce Higher Price Goods

article image
Notes: The data is from the ASI and SUM for 2005-06. All columns report results for regressions of log price charged by plants for their products on log of number of employees hired by the plant. Column 1 restricts the sample to the ASI, Column 2 restricts the sample to the SUM, while column 3 combines the two. 1 percent tails of prices (within a product) and plant size are winsorized. Regressions include product fixed effects and state times urban-rural fixed effects. Standard errors are clustered at the product level. The number of product fixed effects exceed the number of clusters because of the units problem discussed in the Appendix as the misreported units are treated as a different product category for fixed effects but not for clustering. The price ratio for different sized plants implied by the coefficient estimates are reported in the rows called “Price Ratio”. ***p<0.01.

Column 2 report results when the sample is restricted to the SUM only. The point estimate for the coefficient γ (elasticity of price with respect to size) is still positive but smaller. This is not surprising as the variation in employment levels within the SUM is small with 95 percent of the plants employing 16 workers or less.

Column 3 reports results when the two surveys are combined. The estimate for the elasticity of price with respect to size implies that a plant which employs 500 people on average charges a price which is 62.9 percent more than a plant employing 5 workers.

Figure 4 plots the non-parametric equivalent of the the regression in column 3 of Table 3. In particular, it estimates a kernel-smoothed local linear regression of residualized log prices (after removing product fixed effects and state times urban-rural fixed effects) on residualized log of plant size.16 Again, the non-parametric estimates suggest that the price size relation across plants is close to log-linear.

Figure 4:
Figure 4:

Non-parametric Estimate: Larger Plants Produce Higher Price Goods

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The data is from the ASI and the SUM of 2005-06. The graph plots the kernel-smoothed local linear regression of residualized log prices charged by a plant for its products on residualized log employment of that plant (removes product fixed effects and the interaction of state and urban-rural fixed effects). Products which have the units problem discussed in footnote 14 and in Appendix C are split into two product categories. 1 percent tails of residualized log employment are excluded. An Epanechnikov kernel with a bandwidth of 0.502 used. The grey regions is the 95 percent confidence interval for the non-parametric estimate.

The fact that larger plants produce goods which they sell at a higher price is consistent with the hypothesis that larger plants produce higher quality products.

2.3. Larger Plants Use Higher Price Inputs

This section looks at the relation between the size of a plant and the inputs it uses. First I show that larger plants pay a higher price for the same material input as compared to smaller plants. This is consistent with the idea that larger plants produce higher quality products which require higher quality inputs. I then show that larger plants hire more educated workers as compared to small plants.

As in the last section, the ASI and SUM are used to show that larger plants use higher price material inputs. Each establishment reports the material inputs it uses (for a 5-digit product classification, which has about 5,500 possible products) and the price it pays for the input. The units between the surveys are again concorded.17

I run a regression of the form

In (Pf,i)=αi+αstate,rural+γIn(Lf)+εf,i,

where Pf,i is the price paid by plant f for input i, Lf is the number of workers employed by plant f, αi is a product fixed effect, and αstate,rural is a state times urban-rural fixed effect. Intuitively, the coefficient γ is the elasticity of the price paid for inputs with respect to plant size and it is identified out of variation in prices paid by plants of different sizes for the same inputs (reported in the same units), controlling for differences in average prices across states and urban-rural sectors.

Column 1 of Table 4 reports results when the sample is restricted to the ASI only. The estimate for the elasticity of input prices with respect to plant size, γ, is 0.077 and is statistically significant at the 1 percent level. The point estimate implies that a plant which employs 500 people on average pays prices for inputs which are 42.6 percent more than a plant employing 5 workers. Column 2 reports results when the sample is restricted to the SUM only. The coefficient γ is positive but smaller.

Table 4:

Plant Regressions: Larger Plants Use Higher Price Inputs

article image
Notes: The data is from the ASI and SUM for 2005-06. All columns report results for regressions of log of price paid by establishments for material inputs used on log of number of employees hired by the establishment. Column 1 restricts the sample to the ASI only. Column 2 restricts the sample to the SUM only while column 3 combines the ASI and the SUM. 1 percent tails of prices (within a product) and plant size are winsorized. All regressions include product fixed effects and state times urban-rural fixed effects. Standard errors are clustered at the product level. The number of product fixed effects exceed the number of clusters because of the units problem discussed in the Appendix as misreported units are treated as a different input category for fixed effects but not for clustering. The price ratio implied by the coefficient estimates for different sized plants are reported in the rows called “Price Ratio”. ***p<0.01, *p<0.1.

Column 3 reports results when the two surveys are combined. When combining the two surveys, the estimate for the elasticity of input prices with respect to size implies that a plant which employs 500 people on average pays a price for inputs which is 25.9 percent more than a plant employing 5 workers.

Not only do larger plants use higher price inputs, but they also employ more skilled labor. To show this I use the Employment-Unemployment Survey of 2004-05 conducted by the National Sample Survey Office (NSS) of India. Note that plants in the ASI and SUM do not report the education level of their workers, hence they cannot be used to look at the relation between plant size and education levels of workers.

The Employment-Unemployment Survey records demographic information (including education levels) for about 600,000 individuals. It also asks individuals to report the size category of establishment in which they work where the size category can take five values - establishment of size less than 6, between 6 and 9, between 10 and 19, 20 or greater, and unknown size. Table 5 reports the skill composition of workers for the different size categories. Out of the workers in establishments of size less than 6, 43 percent have never attended school while only 3 percent have graduated from high school. On the other hand, out of workers in establishments of size more than 20, only 23 percent have never attended school while 22 percent percent have graduated high school. As can be seen, a larger share of workers in big establishments have high levels of education.

Table 5:

Larger Plants Hire More Educated Workers

article image
Notes: The data is from the Employment-Unemployment Survey of 2004-05. The rows of the table represent the size category of the establishment in which an individual works while the columns represent the education level. Each number represents the share of individuals in the given size category who have attained the level of education given by the column.

3. Model

This section develops a general equilibrium model which matches the facts described in Section 2. In particular, I model consumers choice between different quality levels with richer households more likely to buy high quality goods. On the production side, I assume that production of better quality requires larger fixed costs which along with free entry implies that high quality producers are larger on average.

3.1. Households

There are a mass L of households in the economy indexed by the subscript j. Share h of the households are skilled and earn wage wS (which is determined endogenously in equilibrium) while share 1 — h are unskilled and earn wage wU. Unskilled wage wU is assumed to be the numeraire and is normalized to 1.18

There are N quality levels. Q = {q1,q2,…,qN} denotes the the set of qualities available in the economy. The quality indexes qn are arranged in ascending order of quality with qn > qmn > m. Therefore q1 is the quality index of the lowest quality level and qN is the quality index of the highest quality level.

The utility derived by household j from consuming quality level qn is given by

uj,qn(cj,qn,εj,qn)=aqn+qnlog(cj,qn)+εj,qnqnQ,(1)

where aqn is a constant in the utility function which can vary by quality level, cj,qn is the quantity consumed of quality level qn by household j, and εj,qn is a random utility component which represents the idiosyncratic valuation of quality level qn by household j. The fact that higher quality levels have higher indexes qn implies that for any given level of quantity consumed, households get more utility from consuming higher quality goods.

The random utility component εj,qn is assumed to be independently and identically distributed with a Gumbel Type 1 Extreme Value distribution with density

f(εj,qn)=eεj,qneeεj,qn.

As shown by McFadden (1974) (see also Chapter 3 of Train (2009)), assuming a Gumbel distribution for the random utility component implies simple closed form expressions for demands.

I assume that a household can choose to consume only one quality level and spends its entire income on the quality level that it chooses. This implies that the indirect utility function of household j if it chooses to consume quality level qn is given by

vj,qn(wj,Pqn,εj,qn)=aqn+qnlog(wjPqn)+εj,qnqnQ,(2)

where Pqn is the price of quality level qn, and wj represents the wage of household j. Equation (2) is simply equation (1) but with cj,qn=wjPqn reflecting the assumption that each household can only choose to consume one quality level.

Each household j receives draws of the random utility component εj,qn for each quality level qn and given these draws, chooses to consume the quality level which gives it the highest utility level. Therefore, household j chooses to consume quality level qn if and only if

vj,qn(wj,Pqn,εj,qn)>vj,qm(wj,Pqm,εj,qm)nm.

Let ρ(qn|w) be the share of households with wage w who choose to consume quality level qn. Given the assumption that εj,qn is independently and identically distributed with a Gumbel distribution, this share takes the simple logit form

ρ(qn|w)=eaqn+qnlog(wPqn)Σi=1Neaqi+qilog(wPqi)qnQ=eaqn(wPqn)qnΣi=1Neaqi(wPqi)qiqnQ(3)

Analyzing how ρ(qn|w) changes as wage changes can help understand how this preference structure leads to non-homotheticity with respect to quality choice. Define γρ(qn),w to be the elasticity of ρ(qn|w) with respect to wages w. Taking logs and differentiating equation (3) with respect to log (w) yields

γρ(qn),w=log[ρ(qn|w)]log (w)=qnΣi=1Nqiρ(qi|w).

The elasticity of ρ(qn|w) with respect to wages w is simply the quality index qn minus a weighted average of all the quality indexes where the weights are the share of households with wage w who buy each quality level. A positive elasticity (qn>Σi=1Nqiρ(qi|w)) implies that as wages increase, a larger share of the households buy the quality qn. As lower quality goods have a lower quality index (qn > qmn > m), the lowest quality level will always have a negative elasticity i.e. the share of household who buy the lowest quality level will always go down as wages increase. Furthermore, the highest quality level will always have a positive elasticity implying that the share of households who consume the highest quality always goes up as wage levels increase.

Therefore, the non-homotheticity with respect to quality operates on the extensive margin. As a household becomes richer, it is more likely to choose the higher quality goods. There is a positively sloped “quality Engel curve” where households with higher levels of wages will, on average, spend a larger share of their expenditure on higher quality goods. This arises because the utility function in equation (1) features complementarity between quantity consumed and quality. As wages increase, the household can consume more quantity of whichever quality level that it chooses. Complementarity between quantity and quality implies that the marginal increase in utility from a given increase in wage is larger for higher quality goods which leads to more households choosing higher quality levels as wages increase (given the draw of εj,qn).

The steepness of the quality Engel curve is determined by the differences in the quality indexes across quality levels. One way of parameterizing the quality indexes would be to set the index for the lowest quality level to be one and assume that each higher quality level has an index which is a constant Δ larger than the previous quality index i.e. q1 = 1 and qn = qn−1 + Δ. In this case, the size of the constant Δ determines the extent of non-homotheticity with a larger Δ implying that demand shifts to higher quality faster as wages increase.

Consider the following simple example which illustrates this relation between the size of Δ and the extent of the non-homotheticity. Assume that there are only two quality level (N = 2) which have prices Pq1 = 1 and Pq2 = 1.5 and quality indexes q1 = 1 and q2 = 1 + Δ.19 Figure 5 plots the share of households who choose the high quality level q2 as a function of wages for different value of Δ.20 For each value of Δ, the constant in the utility function aq2 is chosen such that 30 percent of the households with wage equal to one choose the high quality q2.21

Figure 5:
Figure 5:

Quality Engel Curve

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The figure plots the share of households who purchase the high quality product for different wage levels. There are only 2 quality level (N = 2) which have prices Pq1 = 1. Quality index for the low quality is set to one i.e. q1 = 1. The three lines correspond to three different values of Δ where q2 = 1 + Δ. aq2, the constant for the high quality is chosen such that 30 percent of households with wage equal to one choose the high quality.

For the case with Δ = 0, there is no change in the share of households who buy the high quality as wage increases. This is expected as Δ = 0 is in effect the case in which there is no quality distinction between the goods. For positive values of Δ, there is an increase in the share of households who buy the high quality good as wages increase, and this increase is larger for higher values of Δ.

Given prices and the wages of skilled and unskilled workers, the total demand for quality level qn is given by

Cqn=Nhρ(qn|wS)wSPqndemandfromskilledhousehold+N(1h)ρ(qn|wU)wUPqndemandfromunskilledhouseholdqnQ.(4)

The first term is the demand for quality qn from skilled households which is the product of the number of skilled households (Nh), the share of skilled households who choose quality qn (ρ (qn|wS)), and the quantity consumed by each skilled household who consumes quality qn(wSPqn). Similarly, the second term is the demand for quality qn from unskilled households.

In summary, the consumers choose between different quality levels and complementarity between quality and quantity implies that richer households are more likely to consume higher quality. This non-homotheticity with respect to quality will help match the patterns seen in Table 1 (richer households buy higher price goods).

3.2. Final Goods Producers

There are N competitive final goods producers, one for each quality level. In addition to the vertical differentiation across quality levels, there is horizontal differentiation in products within a quality level. The final goods producer of quality qn combines intermediate varieties (horizontal differentiation) of quality qn to produce the composite final good of that quality. Each final goods producer has a constant elasticity of substitution (CES) production function given by

Yqns=1Mqn1σ1(Σi=1Mqnxi,qnσ1σ)σ1σ,qQ

where i indexes varieties, Mqn is the number of varieties (or plants) of quality qn present in the economy which will be determined by free entry, xi,qn is the quantity of variety i of quality qn used by the final quality producer of quality qn,22 and σ is the elasticity of substitution between different varieties of the same quality.

The multiplicative factor 1Mσ1σ1 in the production function scales out the love of variety from the CES production function. This ensures that the price difference between different quality levels does not reflect differences in number of varieties available. I maintain this assumption of no love of variety in the baseline specification for two reasons. Firstly, assuming no love of variety is the conservative choice as changes in the size distribution in the counterfactual exercises are smaller in this case as opposed to the case with love of variety. Secondly, allowing for love of variety makes the changes in size distribution in the counterfactual sensitive to the average level of the quality indexes qn which is a difficult parameter to calibrate as it represents the own price elasticity of each quality level with respect to the unobserved CES price index of that quality.23 Therefore, while the baseline results presented in Section 5.1 maintains the assumption of no love of variety, Section 5.3 provides results when allowing for love of variety and further discuses the sensitivity of the results to the average level of the quality indexes qn.

The final quality producers take the prices of intermediate varieties, pi,qn, as given and solve their cost minimization problem

minxi,qnΣPi,qnxi,qns.t.Yqns=1Mqn1σ1(Σi=1Mqnxi,qnσ1σ)σ1σ,qnQ.

This yields their demand curves

xi,qn=Pi,qnσMqn1σ1Yqns(Σi=1MqnPi,qnσ1)σ1σqnQ,(5)

which are taken as given by downstream intermediate producers. The final quality producers make zero profits. The price that they charge consumers is given by

Pqn=Σi=1MqnPi,qnxi,qnYqns,qnQ.

Given the assumption of no love of variety, Pqn will be independent of the number of varieties Mqn available in the economy.

3.3. Intermediate Goods Producers

Each variety of each quality is produced by a monopolistically competitive intermediate producer. The intermediate producers combine skilled and unskilled labor and their production function is given by

x(Ai,qn)=Ai,qn(θqn(li,qnU)σsu1σsu+(1θqn)(li,qnS)σsu1σsu)σsuσsu1,(6)

where l1,qnU is the quantity of unskilled labor hired by variety i producer of quality qn,l1,qnS is the quantity of skilled labor hired by variety i producer of quality qn, σsu is the elasticity of substitution between the two types of labor, Ai,qn is the idiosyncratic productivity level of variety i producer of quality qn, and θqn is the share parameter of unskilled labor for quality qn producers.

Solving the cost minimization problem of the intermediate goods producer subject to the production function given in equation (6) yields the marginal cost of production for variety i of quality qn which is given by

k(Ai,qn)=1Ai,qn(θqnσsu(1wU)σsu1+(1θqn)σsu(1wS)σsu1)1σsu1.

The marginal costs is a function of skilled and unskilled wage, and is inversely proportional to the productivity level Ai,qn.

Intermediate quality producers will take the demand curve of final quality producers (equation 5) as given and will maximize profits. As the demand curve of final quality producers is of the constant elasticity form, the optimal price charged by intermediate producers will be a constant markup over marginal cost and is given by

p(Ai,qn)=σσ1k(Ai,qn).(7)

To start an intermediate goods plant of quality qn requires fqn units of labor. Share αqn of the entry labor needs to be skilled and this share is different for different quality levels. On paying the fixed cost fqn, entrant receive a productivity draw from a log normal distribution given by

log(Ai,qn)gqnN(μqn,v2).

Note that the mean of the log of the productivity draw can differ across quality levels but the variance is the same.

Free entry requires that the fixed cost payed must equal the ex-ante expected profit i.e.

αqnfqnwS+(1αqn)fqnwU=π(Ai,qn)gqn(Ai)dAiqnQ(8)

where π(Ai, qn) is the flow profit earned by an intermediate quality producer of quality qn with productivity draw Ai and is given by

π(Ai,qn)=[p(Ai,qn)k(Ai,qn)]x(Ai,qn).

The number of varieties Mqn will adjust to ensure that the free entry condition holds for all quality levels.

If fixed costs for higher quality levels is larger than for lower quality levels, then for the free entry condition to hold, the scale of production x(Ai, qn) will have to be larger for higher quality producers. Furthermore, if θqn > θqmn > m then higher quality producers will use skilled labor more intensively and will have a higher cost of production. Finally, differences in μqn will also translate into differences in prices between different quality levels as marginal costs and prices are proportional to productivity.

3.4. Equilibrium

The equilibrium in this economy is a set of prices (wS,{{pi,qn}iMqn,Pqn}qnQ), allocations {{cj,qn}jL,Cqn,{xi,qn}iMqn,Yqn}qnQ, and mass of entrants Mqn such that

  • Given prices Pqn, wages, and draws of the random utility component (εj,qn), consumers choose their optimal quality level (equations 3 and 4 hold)

  • Given prices, final quality producers demand optimal amounts of intermediate goods (demand follows equation 5)

  • Intermediate good producers maximize profits (charge the constant markup price given by equation 7)

  • Free entry conditions hold for all quality levels (equation 8)

  • Markets clear

Yqn=CqnqnQL(1h)=ΣqnMqnlU(Ai,qn)gqn(Ai)dAi+ΣqnMqn(1αqn)fqnLh=ΣqnMqnlS(Ai,qn)gqn(Ai)dAi+ΣqnMqnαqnfqn

The last two equations are the labor market clearing conditions. The second last equation says that the demand for unskilled labor for production by the intermediate producers (summing over all quality levels) and entry requirements must equal the supply of unskilled labor. Similarly, the last equations says that the demand for skilled labor from intermediate producers and entry requirements must equal the supply of skilled workers.

4. Calibration

I now calibrate the model to match the cross-sectional facts documented in Section 2 and some additional moments taken from the Indian data. I then conduct counterfactual exercises in which I simulate differences in per-capita income levels in the model and see how this effects the size distribution. The key parameters which determine the change in size distribution in the counterfactual exercises are the degree of non-homotheticity (Δ) on the consumer side and the price-size relation on the producer side. These parameters are calibrated independently of the aggregate relation between the share of employment in small plants and income levels seen across Indian states (which is what I want to explain in the counterfactual). In particular, I use the micro-facts documented in Section 2 (richer households buy higher priced goods and larger plants produce higher priced goods) to discipline these parameters of the model.

4.1. Production Parameters

For the calibration, I define an individual with less than ten years of education as unskilled. h, the share of the labor force which is skilled, is set to 0.24, which is the share of manufacturing workers with at least ten years of education in India in 2004-05. σsu, the elasticity of substitution between skilled and unskilled workers in the intermediate goods production function (equation 6), is assumed to be 1.75 which is in the range of estimates for developing countries in Behar (2009).

The elasticity of substitution between varieties for the final goods producer, σ is set to 5, which implies a markup over cost of 25 percent for the intermediate producers and is in the range of estimates in Broda and Weinstein (2006).

This leaves five sets of parameters to be calibrated on the production side: (1) fqn, the fixed cost for each quality level; (2) θqn, the share of unskilled workers in the production function for each quality level; (3) μqn, the mean of the log of the productivity draw for each quality level; (4) αq, the share of skilled labor needed for entry for each quality level; and (5) ν2, the variance of the productivity draw which is common across all quality levels. These parameters (along with the utility parameters) are jointly calibrated as there is no one-to-one mapping between the parameters and the target moments. However, for expositional purposes, I explain the calibration of each parameter in terms of the moments which are most informative about the parameter.

The number of quality levels N is set to 12.24

The fixed costs, fqn, determines the average scale of operation of the intermediate producers of each quality level. A larger fixed cost will mean that the average size (in terms of output and employment) of intermediate producers will need to be larger in order for the the free entry condition to hold. As shown in Section 2.2, larger plants tend to produce higher price products, which is indicative of higher quality goods being produced in larger plants. Therefore, the fixed costs are chosen such that the average employment (skilled plus unskilled workers) in intermediate producers of the lowest quality levels is 1.25 workers and each higher quality level has double the average size of the previous quality level i.e. the average employment of the intermediate producers of the different quality levels are sizeqn = {1.25, 2.5, 5,…, 2560}.25

The level of θqns determine the demand for unskilled labor relative to skilled labor and are informative about the wage premium, wS, in the economy. The ratio of skilled to unskilled workers in any quality level relative to the lowest quality is also a function of the θqns and is given by

ratioqnU,S=(LqnULqnS)/(Lq1ULq1S)=(θqn1θqn)σus/(θq11θq1)σusqnQ.(9)

Therefore, the twelve θqns are chosen to match a target for the wage premium and eleven targets for unskilled to skilled ratio in different quality levels relative to the lowest quality level.

The targets for these moments are obtained from the Employment-Unemployment Survey conducted by the NSS in 2004-05 (see Section 2.3 and Appendix A.4 for details about the dataset).

The target for the wage premium is set at 1.6, and is obtained from running Mincerian regressions on data from the Employment-Unemployment Survey.26 Table 6 gives the ratio of unskilled to skilled workers for three different size categories, along with the ratio relative to the smallest size category, as computed from the Employment-Unemployment Survey. Smaller plants have a much higher ratio of unskilled to skilled workers indicating that low quality producers have higher θqns. Unfortunately, the size categories reported in the Employment-Unemployment survey are very coarse, and therefore cannot be used to compute eleven ratios for equation (9) for eleven different quality (size) levels. I use the first two data points reported in Table 6 for the unskilled to skilled ratio (column 1) and extrapolated the relation to larger sizes (with a minimum of 0.5) to compute eleven ratios, one for each quality (size) level.

Table 6:

Unskilled to Skilled Ratio for Different Size Categories

article image
Notes: The data is from the Employment-Unemployment Survey of 2004-05. The rows of the table represent the size category of the establishment in which an individual works. The first column gives the ratio of skilled to unskilled workers in each size category where the definition of skilled is assumed to be an individual with at least ten years of education. The second column gives the ratio of skilled to unskilled relative to the smallest size category.

μqn, the mean of the log of the productivity draw for each quality level, is informative about the average price of each quality level as p(Ai,qn)1Ai. If the mean of the productivity draw for a particular quality is high, then the average price of that quality level will be lower. Therefore, the μqn for each quality level is chosen to match the price-size relation seen in Table 3.27

The share of skilled labor needed for entry for each quality level, αqn, is chosen to match the share of skilled labor used in the production of that quality. Therefore, high quality producers use a more skill intensive production process (lower θqn) and also have more skill intensive entry requirement.28

Finally, v2, the variance of the log of the productivity draw (common across qualities), is chosen to match the standard deviation of the log of employment in the combined ASI and SUM dataset which was 0.64.

4.2. Utility Parameters

The utility function in the model takes the form

uj,qn(cj,qn,εj,qn)=aqn+qnlog(cj,qn)+εj,qnqnQ.(10)

Two sets of parameters need to be calibrated: (1) qn, the quality indexes; and (2) aqn, the quality specific constant in the utility function.

As mentioned in Section 3.1, the quality indexes are parametrized as follows: q1 = 1 and qn = qn-1 + Δ.29 The value of Δ determines the steepness of the quality Engel curve i.e. how quickly does demand move to higher quality as income levels increase. In the model, skilled workers earn wage wS (which is calibrated to be 1.6) and unskilled workers earn wage wU (which is normalized to one as the numeraire). Δ is chosen to match the price-income relation documented in Table 1 of Section 2.1. In particular, Δ is chosen such that the price-income elasticity in the model is 0.1 i.e. the average log price paid by skilled households is 0.1*log(wSwU) more than for unskilled households. As higher quality producers in the model have higher prices, this in effect determines the extent to which demand shifts towards high quality as we move from unskilled wages to skilled wages.

The quality specific constant in the utility function, aqn, determines the absolute levels of demand for different quality levels i.e. it determines ρ(qn|w) given in equation (3). A higher aqn for a specific quality means that a larger share of households are likely to buy that quality (irrespective of income level). Therefore, I choose aqn such that the size distribution in the model matches the size distribution for India as a whole in 2005-06.

In summary, aqn pins down the absolute level of demand for the different qualities and are calibrated to match the size distribution in the model to the Indian data. Δ determines the differences in demand for high versus low quality levels between skilled and unskilled workers and is calibrated to match the price-income elasticity seen in the data.

Table 7 summarizes the calibration. Figure 6 plots the share of workers in plants of different size categories for the calibrated model and the data (combining the ASI and the SUM for 2005-06). As the model parameters were chosen to match the size distribution, it is not surprising to see that the size distribution in the model matches the data very closely. However, the model was not calibrated to match the change in size distribution as income levels change. The extent to which the size distribution changes in the model as income levels change depends crucially on the degree of non-homotheticity (Δ) on the consumer side and the price-size relation on the producer side and these parameters were calibrated using micro-data from consumer and producer surveys.

Table 7:

Calibration

article image
Figure 6:
Figure 6:

Size Distribution - Data vs Model

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The figure plots the share of employment in different size categories in the data and in the calibrated baseline of the model. The data is for the manufacturing sector in India for 2005-06. It combines the ASI and the SUM (same as Figure 1).

5. Results

Having calibrated the model, I now conduct counterfactual exercises in which I simulate differences in per-capita income levels in the model and see how this effects the size distribution. In addition to the counterfactual exercises, the sensitivity of the results to some important parameters is also explored.

5.1. Cross-section of Indian States

I now ask the question: How much of the cross-state differences in the size distribution seen in the data can be explained by the model if per-capita income in the model varies by the same amount as it varies across Indian states? To do this I conduct counterfactual exercises in which I vary three sets of parameters in the model while keeping all the other parameters unchanged:

  1. The share of the households in the model who are skilled, h, is varied in the counterfactual exercises to match the share of workers with ten or more years of education across rich and poor states. About 13 percent of the manufacturing workers in the poorest states are skilled as compared to 43 percent in the richest states.

  2. The share parameter of unskilled labor for intermediate producers, θqn, is changed across the counter-factuals to keep the wage premia unchanged.30 This can be viewed as skill biased technical change with richer states having a higher supply of skilled labor and also using skilled labor more intensively in the production of all quality levels.31

  3. The mean of the productivity draw of intermediate producers, μqn, is changed to match the differences in per-capita income across states and to maintain the price-size slope of 0.1 across the counterfactuals.32 Per-capita income of the poorest Indian state (Bihar) is 0.39 times India’s per-capita income while that of the richest state (Maharashtra) is 1.57 times India’s per-capita income. To generate similar differences in per-capita income in the model, the poorer states in the counterfactual exercise have lower average productivity levels compared to the richer states.33

To summarize, three sets of parameters are changed in the counterfactual exercises: the share of skilled in the population, the skill intensity of the production process, and the means of the productivity draws of intermediates. These parameters are changed to match the differences in skill composition and per-capita income levels across Indian states while keeping the wage premia and the relative prices of different quality levels unchanged.34

An increase in the productivity of intermediate producers and in the supply of skill translates into an increase in real income levels in the model. The increase in real income level leads to demand shifting towards higher quality goods due to the non-homotheticity in the preferences. This change in demand leads to a shift in the production side. The number of plants producing low quality goods declines while those producing high quality increases. This in turn implies that there is a shift in the size distribution with the share of employment in small plants falling.

The red dashed line in Figure 7 plots the share of employment in plants of size five or less that is predicted by the model when conducting the counterfactual exercises. In the calibrated baseline, the share of employment in small plants in the model is 63.9 percent. When productivity and supply of skill is lowered such that per-capita income levels decrease by a factor of 0.39 (0.94 log points lower), the share of employment in plants of size five or less increases to 75.6 percent. On the other hand, when productivity and supply of skill is increased such that per-capita income levels increase by a factor of 1.57 (0.43 log points higher) compared to the calibrated baseline, the share of employment in small plants falls to 56.3 percent.

Figure 7:
Figure 7:

Counterfactual Across Indian States - Data vs Model

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The figure plots the share of employment in plants of size five or less across Indian states in the data and for the counterfactual exercise in the model. The blue line is the linear regression line of share of employment in plants of size five or less in different Indian states on log of per-capita GDP of the state. The red line is the model predicted share of employment in plants of size five or less when conducting the counterfactual exercise.

The solid blue line in Figure 7 plots the projection from a linear regression of the share of employment in plants of size five or less on log of per-capita State NDP across Indian states. The share of employment in small plants is computed by combining the ASI and the SUM (the same data as in Figure 2). In the data, the poorest Indian states have about 91.9 percent of employment in small pants while the richest have 47.2 percent employment in small plants.

While the share of employment in small plants varies by 44.7 percentage points across Indian states in the data, the model predicts an 19.3 percentage points difference. Therefore, the model explains about 43 percent of the difference in share of employment in small plants seen across Indian states.

Figure 8 compares how the entire size distribution (as opposed to just the share of employment in plants of size five or less) changes in the model as compared to the data as we change income levels. In the data, I pool together the three poorest states and the three richest states and compute the share of employment in different size categories for these groups of states.35

Figure 8:
Figure 8:

Counterfactual: Changes in Distribution for 3 Richest vs 3 Poorest States

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The figure plots the share of employment in the three poorest states minus the share in the three richest states for different size categories in the data and in the model (when productivity and skill levels are varied to match the differences in per-capita income across these groups of states). The data is from the ASI and SUM for 2005-06.

The light blue bars in Figure 8 show the difference (in percentage points) in the share of employment in the three poorest states compared to the three richest states for each size category. The poorest states have about 36 percentage points more employment in plants of size five or less as compared to the richest states. The richer states have a larger share of their employment in all the larger size categories as compared to the poor states, which is why the the blue bars lie below zero for all these size categories. The red bars represent the same difference in share of employment for different size categories that the model predicts when productivity and skill levels in the model are varied to match the incomes differences across these groups of states. The model predicts that the share of employment in plants of size five or less is about 15 percentage points higher in the poorer states as compared to richer states, which again accounts for about 42 percent of the difference seen in the data. Again, like the data, the red bars lie below zero for all the other size categories, indicating that the model predicts a larger share of employment in richer states for these size categories.

5.2. India Over Time

I now look at how well the model does in explaining the evolution of the size distribution of manufacturing plants in India over time. Five waves of the Survey of Unorganized Manufacturing (SUM) have been conducted in Indian between 1989-90 and 2010-11. These can be combined with the corresponding years of the Annual Survey of Industries (ASI) to get five data points for how the size distribution has evolved over time in India.

The bars in Figure 9 show the share of employment in plants of size five or smaller for 1989, 1994, 2000, 2005, and 2009.36 As can be seen, the share of employment in small plants has decreased from 77 percent of total employment in 1989 to 58 percent in 2009.

Figure 9:
Figure 9:

Counterfactual India Over Time - Data vs Model

Citation: IMF Working Papers 2014, 236; 10.5089/9781498334396.001.A001

Notes: The red bars in the figure plot the share of employment in plants of size five or less for five years for India. The data for each year pools the SUM and and the the ASI for that year. The blue line plots the model predicted share of employment for each year when productivity and skill levels are varied to match the differences in per-capita income in India over time.

Per-capita income in 1989 was 0.54 times the 2005 level of per-capita income while the share of manufacturing workers with ten or more years of schooling was just 14 percent. In 2009 per-capita income levels were 1.30 times the 2005 level while the share of manufacturing workers with ten or more years of schooling had increased to 31 percent. The blue line in Figure 9 plots the share of employment in plants of size five or less as predicted by the model when productivity and skill supply in the model is varied to the extent required to match the differences in per-capita income levels and share of skilled in the data. The model was calibrated to match the share of employment in small plants in 2005, therefore, the fit in 2005 is very good by construction. The model predicts that 72 percent of employment would be in plants of size five or less in 1989, which is a little less than the 77 percent seen in the data. Similarly, the model under-predicts the change in the size distribution going from 2005 to 2009 by a small amount. Overall, the model predicts 65 percent of the change in share of employment in small plants seen in the data between 1989 to 2009.37

5.3. Parameter Sensitivity: Love of Variety

As mentioned in Section 3.2, the baseline specification of the model assumed that the final goods producers production function had no love of variety. A generalization of the the production function of the final goods producer of quality qn is given by

Yqns=1Mqnη(Σi=1Mqnxi,qnσ1σ)σ1σqQ.

In the baseline specification, η was set equal to 1σ-1, which corresponded to the case of no love of variety. In this section, I provide results for the case when η = 0 (the case with full love of variety) and compare this to the baseline. As mentioned in Section 3.2, the no love of variety assumption is the conservative case, with changes in the size distribution in the counterfactual being larger when we allow for love of variety. Furthermore, when allowing for love of variety, the results become more sensitive to the choice of q1, the quality index of the lowest quality level (note that given q1, all subsequent quality indexes are given by the recursion qn = qn-1 + Δ).

Table 8 shows how much of the cross-state differences in share of employment in small plants is explained by the model for different values of η and q1. The first row and first column corresponds to the baseline specification, with η=1σ1 (no love of variety) and q1 = 1. As mentioned in Section 5.1, when varying productivity and supply of skill to match the differences in per-capita incomes across states, the model explains 43.1 percent of the difference in the share of employment in small plants as compared to the data.

Table 8:

Love of Variety: Percent of Cross-State Difference Explained

article image
Notes: The table shows the percent of cross-state variation in share of employment in plants of size five or less that is explained by the model counterfactual for different parameter values of η and q1.η=1σ1 is the baseline specification of no love of variety while η = 0 is the case of full love of variety.

Now consider the model with love of variety (η = 0). When allowing for love of variety, all other parameters are recalibrate to match the same moments as in the baseline. I then run the same counterfactual exercises as in Section 5.1. As reported in Table 8, in the case with love of variety, the model can explain 71.2 percent of the differences in size distribution between the rich and poor states.

Why is it that in the case with love of variety, the model generates bigger changes in the size distribution in the counterfactual? The reason is that in the case with love of variety, relative prices of different quality levels change in the counterfactual, due to changes in the relative varieties of the different qualities. In particular, the CES price index (the price charged by the final producer to the consumer) for quality qn is given by

Pqn=Mqnη1σ1((p(Ai,qn))1σgqn(Ai)dAqi)11σqQ.

In the baseline specification, because η=1σ1, the price index for qn was independent of the of the number of varieties Mqn. However, when η = 0, the CES price index of a quality level, Pqn, is inversely related to the number of varieties of that quality (Mqn) available in the economy. In the counterfactual, as income levels increase, demand shifts towards higher quality, and this induces more entrants of the higher quality levels. The increase in number of varieties of high quality intermediate producers causes the relative price of high quality goods to fall in the counterfactual when η = 0. This causes a further shift in demand towards high quality which in turn causes more entry into higher quality goods. The additional increase in demand for high quality which acts through relative price changes due to change in number of varieties does not occur in the baseline specification when η=1σ1. Hence, the change in size distribution in the counterfactual in the baseline specification is less than in the case with love of variety. In effect, the baseline specification focuses attention on the changes in demand caused by changes in income levels alone. It abstracts away from any changes in relative prices caused by changes in number of varieties in the counterfactual.

Furthermore, when allowing for love of variety, the change in the size distribution in the counterfactual, becomes more sensitive to the choice of q1, the quality index for the lowest quality level. When q1 is set to 0.1 and η = 0, the model counterfactual explains only 53.1 percent of the difference in size distribution as opposed to 71.2 percent when q1 = 1. As shown in Section 3.1, the share of households with wage w who choose quality level qn is given by

ρ(qn|w)=eaqn(wPqn)qnΣi=1Neaqi(wPqi)qiqnQ.

As Pqn is raised to the power qn in the numerator, the absolute levels of qn approximately determine the own price elasticity of demand for a quality level. Lower absolute levels of the quality indexes imply that demand is less sensitive to changes in relative prices (of the CES price indexes). Therefore, a lower value for q1 (which translates into lower values for all the quality indexes) makes the model less sensitive to the changes in relative prices induced by changes in varieties.

5.4. India vs US

I conduct a counterfactual in which I simulate an economy with per-capita income level equivalent to that of the US in 2005 (seventeen times that of India) and see how the size distribution in the counterfactual compares to that of the calibrated baseline. I vary the levels of productivity, supply of skill, and θqs in the model to match per-capita GDP and supply of skill in the US while keeping the wage premium and relative prices unchanged. The share of employment in plants which employ 5 or less people falls from 64 percent in the calibrated baseline to 13 percent in the counterfactual.

It is important to note an important caveat while interpreting this cross-country result. The calibration of the model was local to India’s level of development and therefore a simple extrapolation to the US might be potentially biased. The price-size relation on the producer side and the price-income relation on the consumer side are similar across states of different income levels within India but might be very different for the US. For example, in the US it is possible that the producer side relation between price and size is flatter (or even negative) as the lower quality goods might be produced in large factories while the higher quality goods might be produced in small boutique organizations. Similarly, the elasticity of substitution between skilled and unskilled labor might be very different between the two countries. Therefore, although it is an interesting exercise to do the counterfactual for the US, the results should be interpreted with a lot more caution than the cross-state counterfactual.

6. Inter-State Trade

The model presented above implicitly assumed that each state in India can be treated as a closed economy and that differences in income levels across states translate into differences in demand and in the size distribution at the state level. How would the possibility of inter-state trade affect the hypothesis presented in the paper?

A potential confounding effect of inter-state trade could come through the location choice of large plants. For example, if the richer states are more suited for operating large plants (due to availability of skilled labor, better labor laws etc), then all the larger plants might choose to locate in these states and ship their goods to the poor states. In this case, the fact that richer states have a smaller share of employment in small plants would not reflect differences in demand across states but rather just the spatial location choice of large plants.

To address this concern, it would be ideal to have a measure of inter-state trade flows (similar to the Commodity Flow Survey in the US) to see how important this channel could be. Unfortunately, data on extent of inter-state trade is not collected in India. Here I provide indirect evidence to suggest that inter-state trade is not completely driving the cross-state relation seen in Figure 2.

Firstly, transportation costs in developing countries are often very high which makes it harder for plants to transport goods over large distances to poorer states. Atkin and Donaldson (2012) show that intranational transportation costs in two African countries are seven to fifteen times larger than similar estimates for the US. Furthermore, Hillberry and Hummels (2008) show that even in the US, manufacturing production is extremely localized with local shipments volumes being three times larger than shipments to more distant locations. This suggests that local demand is likely to be an important determinant of the the size distribution in any region, especially in developing countries.

Furthermore, if inter-state trade is driving the cross-state relation seen in Figure 2, then we would expect more tradable industries to exhibit larger differences in share of employment in small plants across states as compared to less tradable industries. On the other hand, if the states are in fact approximated well as closed economies then we would expect the relation between share of employment in small plants and per-capita NDP to be stronger for non-tradables. To test this fact, I construct two measures of tradability (within manufacturing) at the 3-digit level of the National Industrial Classification (NIC) of 2004.38 These are:

  1. Herfindahl index of geographical concentration in the US: The County Business Patterns Database of 2005 released by the United States Census Bureau provides information regarding the number of people working in each 6-digit industry of the North American Industry Classification System (NAICS) for each county in the US.39 As the tradability index is to be applied to the Indian industry classification, I first create a concordance from 6-digit NAICS to 3-digit NIC and then construct a Herfindahl Index (H-index) of geographical concentration of each 3-digit NIC across US counties.40 The H-index is defined as
    Hi=Σc=1C(shi,cL)2,

    where ‘i’ indexes industry (according to NIC), ‘c’ indexed counties, and shi,cL represents the share of industry ‘i’ employment which is in county ‘c’. The H-index for industry ‘i’ is simply the sum across counties of the square of the share of the industries employment which is present in county ‘c’. The industries which are highly concentrated in a few counties in the US (have a high value for Herfindahl index) are considered to be tradable industries while industries which have employment spread over lots of counties (have a low value for the Herfindahl index) are considered non-tradable industries. This measure for tradability of an industry based on US levels of concentration is applied to India.

  2. Degree of international trade in India: For each 3-digit NIC in the manufacturing sector, I construct a measure of the degree of international trade carried out in the industry as a share of domestic production. In particular, I define this measure of international trade as the exports plus imports in that industry as a share of gross production of that industry carried out my domestic plants in 2005-06. The data for exports and imports for India is taken from the website of the Department of Commerce, Government of India.41 The imports and exports data is not at the industry level but rather classified according to the Harmonized Commodity Description and Coding System (HS) product classification. This is converted to 3-digit NIC using the products to industry concordance developed by World Integrated Trade Solutions (WITS).42 The data on gross domestic production for each industry is computed by combining the ASI and the SUM. Industries in which international trade is a large percent of domestic production are considered to be more tradable.

Table A.5 in the appendix lists the 3-digit industries which lie above and below the median value of the two indexes of tradability. The two measures of tradability are weakly positively correlated with the rank correlation coefficient between them being 0.25.

I run regressions of the form

sdi,s,t=αi,t+αs,t+γln(SNDPs,t)*tradabilityi+εi,s,t(11)

where sdi,s,t is the share of employment in plants of size five or less in industry ‘i’ in state ‘s’ at time ‘t’, SNDPs,t is the per-capita NDP of state ‘s’ at time ‘t’, and tradabiltyi is a dummy variable which takes value 1 if an industry is classified as tradable. αi,t represents fixed effects for industry interacted with time and it controls for the fact that different industries might have different average levels for the share of employment in small plants. αs,t represents fixed effects for state interacted with time and controls for the fact that rich states on average have a lower share of employment in small plants.

The coefficient of interest is γ, the coefficient on the interaction of state per-capita income and the tradability dummy. A positive γ implies that the relation between the share of employment in small plants and log of per-capita income across states is stronger for non-tradables. This is because the share of employment in small plants and per-capita NDP are negatively related and therefore a positive interaction term implies that the slope for tradable industries is less negative compared to non-tradables. Therefore, a positive value of γ is supportive of the view that inter-state trade is not a major driving force behind the size distribution of plants across states.

An industry is classified as tradable if the tradability index for the industry lies above the median (or in the top quartile) of the index across industries. Data for five waves of the SUM is combined with the corresponding year of the ASI (1989, 1994, 2000, 2005, and 2010). Only the fifteen large Indian states mentioned in footnote 44 are included as the smaller states often have no observations for many industries as the 3-digit level.

Table 9 reports results for equation (11) for both the measures of tradability. Each observation is weighted by the share of observations in the state-industry cell out of the total observations in the ASI and SUM combined for the given year.43 Column 1 uses the Herfindahl index and classifies an industry as tradable if its Herfindahl Index is above the median value of the Herfindahl Index across industries. The coefficient on the interaction of per-capita NDP and the tradability index is positive and marginally significant at the 10 percent level. Column 2 classifies an industry as tradable if it is in the top quartile in terms of the Herfindahl Index and non-tradable if it is in the bottom quartile. The results are very similar to the first column. Columns 3 and 4 use the median and quartile of the tradability measure based on exports and imports in India. The point estimates of the coefficient on the interaction of per-capita NDP and the tradability index is much smaller in absolute value and statistically insignificant.

Table 9:

Size Income Relation Across States for Tradables vs. Non-tradables

article image
Notes: The data is from five rounds of the ASI and SUM. The table reports regression results for the share of employment in plants of size 5 or less in industry ‘i’ in state ‘s’ at time ‘t’ on log per-capita state NDP interacted with a dummy which takes value 1 if industry ‘i’ is classified as a tradable industry. Column 1 classifies an industry as tradable if the Herfindahl Index across US counties for the industry was above the median of Herfindahl Indexes, and non-tradable if it was below the median. Column 2 uses top and bottom quartiles of the Herfindahl Index as cutoffs. Column 3 and 4 use the tradability index based on Indian exports and imports and uses the median and the top and bottom quartiles as cutoffs respectively. All regressions include fixed effects for industry interacted with time and state interacted with time. Each observation is weighted by the share of observations in the state-industry cell out of the total observations in the ASI and SUM combined for the given year. Standard errors are clustered at the state level. *p<0.1.

The results in Table 9 suggest that the size-income relation across states is not stronger for tradable industries as compared to non-tradable industries.

7. Conclusion

The size distribution in developing countries usually has a thick left tail compared to developed countries. The same holds across Indian states, with richer states usually having a much smaller share of their manufacturing employment in small plants. In this paper, I explore the hypothesis that this income-size relation arises from the fact that low income countries and states have high demand for low quality products which can be produced efficiently in small plants. I provide evidence which is consistent with this hypothesis from both the consumer and producer side. In particular I show that richer households buy higher price goods while larger plants produce higher price products (and use higher price inputs). Finally, a model is developed which features non-homothetic preferences with respect to quality and is calibrated to match the cross-sectional facts from the consumer and producer sides. A calibrated version of the model indicates that up to 41 percent of the cross-state variation seen in the left tail of manufacturing plants in India can be explained by the model.

Therefore, this paper suggests that a large part of the differences in size distribution that we see across countries and states is a natural consequence of the low levels of income in developing countries and is not caused by policies which discriminate against large productive plants in favor of small unproductive plants. The presence of small plants in developing countries should not be viewed as originating necessarily from policy failures.

The Size Distribution of Manufacturing Plants and Development
Author: Siddharth Kothari