Agglomeration, Innovation, and Spatial Reallocation: The Aggregate Effects of R&D Tax Credits
Author:
Alexandre Sollaci https://isni.org/isni/0000000404811396 International Monetary Fund

Search for other papers by Alexandre Sollaci in
Current site
Google Scholar
Close

I investigate the aggregate effects of R&D tax credits in the US. Because it subsidizes R&D activity and because credit rates vary between states, this policy has both spatial and dynamic effects on the economy. To address this issue, I construct an endogenous growth model with spatial heterogeneity and agglomeration spillovers in innovation. Aggregate outcomes in this model are thus affected by the spatial distribution of the population in the economy, which is itself endogenous and reacts to policy. I use this framework to identify a set of local R&D subsidies that maximize aggregate welfare.

Abstract

I investigate the aggregate effects of R&D tax credits in the US. Because it subsidizes R&D activity and because credit rates vary between states, this policy has both spatial and dynamic effects on the economy. To address this issue, I construct an endogenous growth model with spatial heterogeneity and agglomeration spillovers in innovation. Aggregate outcomes in this model are thus affected by the spatial distribution of the population in the economy, which is itself endogenous and reacts to policy. I use this framework to identify a set of local R&D subsidies that maximize aggregate welfare.

1 Introduction

Research and Development (R&D) tax credits are one of the most important policies that foster innovation in the US. Introduced in 1981 at the federal level, these credits apply to company-funded R&D expenditures above a pre-determined baseline level. It is estimated that R&D tax credits would amount to over US$ 9 billion in foregone revenues to the federal government in 2019 – which is about 20% more than the National Science Foundation’s entire budget request for that year.1 In addition to the federal credits, most states have adopted some form of subsidy to research activity as well (see appendix figures A.1 and A.2). As a result, the US features a wide spatial dispersion of R&D tax credits, shown in figure 1.

Figure 1:
Figure 1:

R&D Tax Credit Rates per State, 2005.

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Note: The figure shows the effective R&D tax credit rates as computed by Wilson (2009). Statutory and effective credit rates differ based on how the base amount is defined and whether or not the credit is “recaptured” (i.e., considered taxable income).

In this paper, I investigate how the dispersion of R&D tax credits impacts the aggregate economy. R&D tax credits (or direct subsidies) are fundamentally different from most other policy instruments because they can impact both the location of workers of and firms, as well as their decisions over time (i.e., investments in innovation). In contrast, most of the existing studies that evaluate the impact of spatial policies do so in a static setting (Kline and Moretti, 2014; Ossa, 2015; Gaubert, 2018; Fajgelbaum and Gaubert, 2020). Similarly, studies that evaluate the long-run effects of R&D policies seldom account for local externalities or the spatial heterogeneity of subsidies to innovation.

The spatial effects of R&D tax credits are a product of the interaction between pre-existing spatial heterogeneity, agglomeration externalities, and the variation of the tax credit rate over space. Inventors and firms have incentives to move to places where taxes are relatively lower; thus, changes in the spatial distribution of tax credits can change the location of agents in the economy (Moretti and Wilson, 2014, 2017; Akcigit et al., 2016). This is relevant because innovative activity benefits from agglomeration spillovers: the productivity of individual inventors (and therefore of the firms that hire them) in producing patents increases when they are located in densely populated cities (Moretti, 2021). The benefits from agglomeration, however, must be weighted against congestion costs and the fact that local benefits might be offset by losses elsewhere.2

In addition, R&D tax credits have dynamic effects by their very nature, as they subsidize the cost of innovation for firms. Combining this with the effects described above, it is apparent that the way in which R&D tax credits are distributed over space can affect the aggregate rate of innovation – and therefore growth – of the country. Note that this creates an interaction between the spatial and dynamic aspects of R&D policy, which means that both of these dimensions must be jointly considered if one is interested in assessing this policy’s impacts.

To evaluate the effects of counterfactual R&D policies, I build an endogenous growth model with local agglomeration externalities and spatial heterogeneity. This framework captures both the spatial and dynamic aspects of R&D tax credits by allowing the productivity of individual inventors to increase with the population density of the city where they live. In addition, the spatial distribution of the population in this model is itself endogenous, which means that changes in local R&D tax credits can affect the location of inventors/firms. I use this model to compare the welfare level under the current distribution of tax credits in the US with two other policies: a spatially neutral R&D subsidy (i.e., that does not vary over space) and a theoretical welfare maximizing distribution of R&D subsidies that is obtained by solving a social planner’s problem. My findings suggest that removing the spatial variation of R&D policy hurts the economy, but that the current distribution of subsidies is far from being optimal.

In its essence, the framework I develop adds a spatial dimension to Schumpeterian models of endogenous growth (Aghion et al., 2014). The economy is composed of a system of cities c ∈ {0,1,..., C}, where C → ∞. Each city is endowed with an amenity level, a stochastic productivity for innovation, and a R&D tax credit (modeled as a direct subsidy for simplicity). Cities are populated by inventors (who produce innovation), production workers (who produce goods), and by the firms that hire each of them. I assume that both inventors and production workers are freely mobile at all points in time, but can only be hired by firms located in the city where they live.

I also allow for agglomeration spillovers in the production of innovation. Specifically, inventors become more productive when they are living in cities that are densely populated by other inventors. As a result, firms that invest in R&D also benefit from locating in cities where there is a large population of inventors. One potential concern about this formulation is that communications technology has made actual physical proximity irrelevant for the efficient exchange of ideas, particularly after the Covid-19 pandemic, which would greatly diminish the benefits of agglomeration. While it is true that technology has made remote communication possible, I argue that it is not able to replace the mechanisms that drive agglomeration spillovers. In particular, one important benefit from physical proximity are casual and unintended flows of information between workers (Puga, 2010), which cannot be replicated in virtual environments.3 Indeed, as of the writing of this paper, most companies have reverted to (at least) a hybrid work model, emphasizing the benefits of having workers in close proximity, for at least for part of their time. Regardless of the reason for this decision, the fact that workers will continue to share the same space means that agglomeration benefits will remain relevant.

The remainder of the structure of the model regarding innovation follows a standard quality ladder framework, with a few modifications that will discussed in detail in the model section.4 Firms innovate over intermediate goods, which exist in a continuum of varieties, each with a different quality level. After a successful innovation, a firm is able to produce the good over which it innovated at a higher quality than any other firm, becoming the technological leader over that product. The leader is able to drive all other competitors out of the market, producing the good under monopolistic competition.

Firms in this economy produce three types of goods. Intermediate goods are freely tradable and used as inputs in the production of a final (tradable) good. A second type of final good, which is non-tradable, is produced and consumed in each city (e.g., housing). Both types of final good are produced by a representative producer under perfect competition. I also assume the existence of a central planner (government) that fully taxes all land- and firm-owners to provide R&D subsidies and a public good. Consumers derive utility from the consumption of both final goods (tradable and non-tradable) and the public good. For ease of notation, I refer to the final tradable good as “final” and the final non-tradable good as “non-tradable.” In line with Henderson (1974), the production of the non-tradable good displays decreasing returns to scale (as it requires the use of land, a fixed factor of production), which generates congestion costs and limits the size of cities.

The last ingredient of the model is the free entry of firms into any city in the economy. Along with the free mobility of inventors and production workers, this condition implies that the distribution of the population is endogenous and responds to changes in R&D policy. This is one of the key results in the model, as it allows the government/social planner to use R&D tax credits to influence the location of inventors and firms – and therefore leverage agglomeration spillovers to affect the productivity of investments in innovation.

Because the population of inventors determines the agglomeration spillovers in each city, all other variables in the model’s equilibrium will be a function of how inventors are distributed across space. The rate of growth of the economy, for example, is directly affected by both the dispersion of the population of inventors and by their location: more inventors in more productive cities will generate a higher rate of growth. One of the appealing aspects of the model is therefore that the share of inventors in each city can be expressed with a closed form solution. Furthermore, this share is completely determined by city-specific features (R&D subsidies, amenity levels, and the expected productivity of innovation) and three parameters: the elasticity of agglomeration, the elasticity of congestion, and the share of expenditures on non-tradable goods.

The estimation of the parameters in the model is done in three steps. In the first step I calibrate the value of the set of parameters that can be directly matched to quantities in the data. The second step takes advantage of the model’s structure, which suggests that the elasticity of agglomeration and the elasticity of congestion can be estimated through linear regressions. In each case, the estimation consists of regressing the relevant outcome on the population of inventors or the population of production workers in each city (in addition to other controls, including city and year fixed effects). To account for endogeneity in each city’s population, I propose an instrument that is based on shift-share research designs (Adão et al., 2019; Borusyak et al., 2022; Goldsmith-Pinkham et al., 2020). Like shift-share designs, the instrument leverages industry-specific growth in the employment shares of inventors as exogenous shifters for the population in each city.

The third step recovers the remaining parameters by matching model predictions to moments in the data. I focus on moments related to the production of innovation, such as the share of the population of inventors and patents filed in each city. I verify the model’s external validity by showing that it can accurately replicate the spatial distribution of variables that were not targeted for estimation: the correlations between distributions generated by the model and the data range from 0.5 (patents per capita by city) to 0.97 (share of firms by city). I also show that changes in R&D tax credits over time can explain a large part of the variation in the share of inventors and patents filed in each city since the 1970’s.5 This result suggests that R&D policy is indeed well suited to spatially reallocate innovation in the country.

Finally, I use the framework described above to analyze the welfare effects of alternative spatial distributions of R&D subsidies. Specifically, I address the following question: “Given the government’s revenue, how much can we increase aggregate welfare6 in the economy simply by reshaping the distribution of R&D subsidies among cities/states?” Heuristically, this exercise consists of fixing the average level of the subsidy but changing its value across locations. I consider the effect of two counterfactual policies: one that removes all spatial variation by implementing a common subsidy in the entire country, and one that implements an “optimal” distribution of subsidies (that is found by solving a central planner’s problem).

Quantitatively, I find that removing the spatial variation of R&D tax credits in the US would generate a slightly less concentrated distribution of the population (the HHI index moves from 0.027 to 0.025) and reduce welfare by 0.77%. This suggests that the states that offer the largest credits are indeed to ones that are comparatively better at producing innovation. When comparing the current welfare level with the level attained by an optimal distribution of R&D tax credits, I find that the potential welfare gains are fairly high: if subsidies are allowed to vary by city, aggregate welfare increases by at least 6% under the optimal distribution. If subsidies are only allowed to vary by state, the gains are about half as large, at 3.2%. Both cases, however, imply that part of population of inventors in small and medium-sized cities should instead be working at large (and more productive) cities.

1.1 Relationship to the Literature

This paper is related to several strands of the economic literature. The main findings of the paper contribute to the literature on spatial misallocation and optimal spatial policies. In line with my findings, a number of other studies have found large potential gains from reallocating resources across space in the US.7 Hsieh and Moretti (2019) argue that housing supply restrictions adopted in some of the most productive cities in the US significantly lowered the country’s rate of growth between 1964 and 2009. Fajgelbaum et al. (2019) find that tax dispersion across states leads to aggregate losses because it distorts the spatial allocation of resources. Adopting a strategy closer to mine (although not necessarily focusing on policies that foster innovation), Gaubert (2018) and Fajgelbaum and Gaubert (2020) develop general quantitative frameworks that allow them to compute optimal local subsidies designed to attract workers and firms to each city. Similarly, Ossa (2015) explores the welfare effects of subsidy competition among states, while Kline and Moretti (2014) study the long run effects of the Tennessee Valley Authority development program.

While these are important contributions, the existing literature has analyzed the welfare effects of spatial policies primarily though static frameworks. As discussed above, R&D tax credits have both spatial and dynamic effects that interact with each other. A purely spatial model would incorrectly evaluate the welfare effects of spatially reallocating R&D subsidies because it would ignore the effects of a rise/fall in the rate creative destruction over the incentives of firms. Similarly, a purely dynamic model would predict no relationship between the location of subsidies and aggregate growth. Thus, to understand how the aggregate economy reacts to changes in the distribution of R&D subsidies, one must adopt a setting that captures both the dynamic and spatial effects of this policy.

The theoretical framework developed here is based on endogenous growth models where innovation is the main driving force of economic growth (e.g., Aghion and Howitt, 1992; Klette and Kortum, 2004; Akcigit and Kerr, 2018). It contributes to this literature by nesting a simple model of innovation through creative destruction into a spatial setting. The resulting model retains many of the features present in the growth literature, but also allows for spatial heterogeneity, agglomeration spillovers and an endogenous distribution of the population. Adding a spatial dimension to this class of models helps us to understand the linkages between innovation at the firm level, which is greatly impacted by the firm’s location through agglomeration spillovers, and broader economic growth. It is also fundamental to the investigation of the impact of spatial policies over aggregate welfare and growth.

This framework also contributes to the literature on spatial and dynamic models. One closely related paper is Duranton (2007), who embeds the quality ladder model of Grossman and Help-man (1991) into an urban structure in order to study the city size distribution and the movement of industries across cities. The main advantage of the current setting is that agglomeration and congestion externalities are microfounded and endogenously generated in the model’s equilibrium; in contrast, Duranton (2007) relies on a reduced form that captures the net effect of these externalities on the size of cities. Other papers developing spatial and dynamic models include Desmet and Rossi-Hansberg (2014), Desmet et al. (2018), and Caliendo et al. (2019). The models in those papers feature a realistic geography, which includes trade and moving costs, as well as locations that are ordered in space. In contrast, I ignore most of these geographical frictions, which enables me to focus on the dynamics of innovation for individual firms. This produces a tractable model that has a natural link between innovation at the firm level and patent data.

The remainder of this paper is organized as follows. Section 2 develops a formal endogenous growth model that replicates the linkages between city-specific R&D subsidies and aggregate growth. Section 3 solves for the model’s equilibrium in a Balanced Growth Path. Section 4 shows how to map the model’s equilibrium to the US data and estimate its parameters. In section 5, I use this framework to measure the effects of the adoption of alternative R&D policies. Section 6 concludes.

2 A Dynamic Spatial Model of Innovation and Growth

In this section I describe in detail the environment of the model; section 3 then defines and solves for its equilibrium. The main contribution of this model is to allow for spatial heterogeneity and agglomeration externalities to affect the productivity of R&D investments by firms. Not only does this imply that the spatial distribution of population in the economy matters for growth, but, because firms and workers choose where to locate, this distribution is also endogenously determined. Since the goal of the model is to capture the aggregate effects of R&D subsidies, there are no agglomeration effects in the production of goods. I also introduce local shocks to the productivity of R&D investments in each city. Assuming that the number of cities is large, this structure allows for the existence of a Balanced Growth Path where the aggregate growth rate of the economy is constant, but the growth rate and population in each individual city can still fluctuate over time.

2.1 Cities

There are C + 1 cities, indexed bye c ∈ {0,1,...,C}. I also assume that there is large number of cities, i.e., C → ∞. Cities differ from each other by their level of amenities, αc, the amount of land they posses, m¯c, and a stochastic (time-varying) city-specific productivity in the production of innovation, Xc(t). Amenities reflect features that improve the quality of life in a particular city, making it more attractive for workers to live there. These features can be related to the city’s geographical location, such as climate or proximity to an ocean, or to the city’s history, customs, etc. Importantly, amenities are fixed over time. Land is a fixed factor in each city and is owned by absentee land owners, who are fully taxed by the government. Without loss of generality I normalize m¯0=1andm¯c=1/C for all c ≥ 1 (a discussion on why city 0 is different from the others will follow below).8

The city-specific productivity for innovation reflects the fact that some cities are “hubs” of innovation. These cities are usually the home to more venture capitalists, who favor local investments (Gompers and Lerner, 2001; Kolympiris et al., 2011), or have a “culture” that is favorable to entrepreneurship and innovation.9 Unlike amenities or land, this productivity is not fixed over time. Specifically, I assume that it evolves stochastically according to

χc(t)=χ¯cezc(t),

where χ¯c is a constant, capturing permanent differences in the productivity of investments in R&D between cities, and zc(t) follows an Ornstein-Uhlenbeck (O-U) process: dzc(t) = ϕ (µ – zc(t))dt + σdWc(t), where Wc(t) is a (city-specific) Wiener process.10 I assume µ = – σ2/4ϕ, so that, under the process’ stationary distribution, 𝔼[eZc] = 1.

All cities are populated by workers and firms. Workers are separated into two types: inventors and production workers. Inventors are hired by firms who wish to invest in R&D, while production workers are used to produce final goods. The total population of inventors in the economy is given by I, while the total population of production workers is given by L, both of which are kept constant over time. Note that this assumption excludes the possibility that workers adjust their occupation based on the amount of R&D subsidy offered. This is not far from reality, however, as the supply of inventors is very inelastic (Goolsbee, 1998).

Both types of workers are freely mobile between cities, but are hired locally. Free mobility of labor is of course a simplification of reality, but in line with empirical findings that individuals react quite strongly to tax differentials between regions (Moretti and Wilson, 2017). In contrast, I assume that firms are able to enter any city they wish, but are not allowed to move once their location is set (a common assumption in spatial models; see e.g., Behrens et al., 2014). The requirement that workers are hired locally thus guarantees that firms are located in the same city where their employees live. Finally, goods can also be subdivided into three types: a final non-tradable good, a final tradable good and intermediate goods. Both the tradable final good and intermediate goods can be traded at no cost. Firms, like land, are owned by absentee firm owners who are fully taxed by the social planner/government.

2.2 Preferences

Consumers in this model are inventors and production workers. All consumers have the same utility function, but might differ in their wages. Recall that both types of worker are free to move into any city they wish, so that a worker of type h ∈ {i,ℓ} (i.e., inventor and production worker, respectively) has utility given by

Uh(t0)=t0eρtmaxc(t){uc(t)h(t)G(t)}dt,

where c(t) indicates the city where the worker lives during time t, G(t) is the amount of public good consumed in period t and uc(t)h(t) is the highest utility that a worker of type h living in city c during time t can attain from the consumption of the final goods. Formally, for each c

uch(t)=maxn(t),y(t)[αcn(f)]θy(t)1θs.t. y(t)+pn,c(t)n(t)wch(t).

In the expression above, n(t) is the amount of the non-tradable final good consumed by the worker in time t and y(t) is the amount of the tradable final good consumed in the same period. For simplicity of notation, I will refer to the non-tradable final good simple as “non-tradable good” and to the tradable final good as “fnal good” (to differentiate it from intermediate goods). All workers inelastically supply one unit of labor per period, so their income is equal to their wage, wch(t). To keep the model tractable, I assume that all consumers are “hand-to-mouth”, in the sense that they cannot borrow or save.11

2.3 Technology

2.3.1 Non-tradable Good

Non-tradable goods are locally produced by a competitive firm using land and the labor of production workers. The representative non-tradable goods producer in each city chooses how many production workers to hire and how much land to rent in order to solve

maxn,c(t),mc(t)pn,c(t)n(t)wc(t)n,c(t)pm,c(t)mc(t)s.t.n(t)=n,c(t)βmc(t)1β,

where pn,c(t) is the price of the non-tradable good, pn,c(t) is the price of land, and wc(t) is the wage received by production workers in city c.

Land is a fixed factor in each city, which means that the production of the non-tradable good will have decreasing returns to scale in equilibrium. This is what generates congestion costs in cities: as the population of a city increases, so does the demand for the non-tradable good. Because of decreasing returns to scale, the higher demand will push the price pn,c up, making it less attractive to live in that city.12

2.3.2 Final Goods

The representative final good producer uses the labor of production workers and all intermediate goods as inputs to competitively produce the final good. Like all other firms, the final good producer is free to choose where to locate. If production occurs in city c*, the final good producer solves

maxy,c*(t),{kj(t)}jJY(t)Jpj(t)kj(t)djwc*(t)y,c*(t)s.t.Y(t)=y,c*(t)ε1εJkj(t)1εqj(t)εdj,

where y,c* is the number of production workers hired by the final good producer, and kj and qj are the quantity and quality of intermediate good j used in production, respectively. Each intermediate good is sold at a price pj to the final good producer and the price of the final good is normalized to 1. Recall that all intermediate goods and the final good are freely tradable across cities, so the location of their production does affect their price.

2.3.3 Intermediate Goods

Intermediate goods exist in a continuum of varieties jJ ≡ [0,1]. Intermediate good producers (or simply “firms”) can produce any number of varieties. The set of products that each firm is able to produce – and the quality at which it can produce them – is given by the set of product over which that firm has innovated in the past. For now, I take this set as given. Following Akcigit and Kerr (2018), firms compete over the production of each variety j by choosing prices according to the following.

Assumption 1. All firms able to produce intermediate good j enter a two-stage pricing game. In the first stage, firms decide whether to pay a small fee to be able to announce their price. In the second stage, they choose the price at which they propose to sell good j, given the set of firms who paid the fee in the first stage.

A direct result of assumption 1 is that only the firm who can produce good j with highest quality (i.e., the technical leader) pays the fee and enters the second stage of the game. As it will be shown in the equilibrium description below, the demand for intermediate goods increases with their quality, so technical laggards in the production of good j can never recover their fee if they choose to enter the second stage. The technical leader can therefore choose its price as if it were a monopolist in the production of good j. One crucial condition for this argument to hold is that all firms have the same cost of production, regardless of their location (this excludes a production function where labor is used to produce intermediate goods, since wages vary by city). If the location of a firm affects its cost of production, the technical leader in the production of a good j might not be able to drive low quality firms of the market. For example, firms who produce the good at a lower quality but who also face lower costs of production could remain in the market by selling that good at a lower price than the leader is willing to.13

I follow Akcigit et al. (2018) and assume that the final good is used a factor of production for intermediate goods. Since the final good is freely tradable, all firms face the same marginal cost v > 0 and the technical leader in each variety chooses the amount of good to produce by solving

maxkj(t)pj(kj(t);qj(t))kj(t)vkj(t).

where pj (kj; qj) is the inverse demand function for intermediate good j the price of the final good is normalized to 1 in all periods. Also note that there is nothing that is city-specific in the production of intermediate goods. The only incentive that firms have to locate in specific cities comes from their investments in R&D, which ultimately determines the set of products that firms produce.

2.3.4 Research and Development

Innovation serves two purposes in this model. First, it increases the quality of intermediate goods, which is the driving force for economic growth. Second, it adds products into a firm’s portfolio by making it the quality leader for a given variety of intermediate good. Whenever a firm f innovates over a product line j, it is immediately able to produce that good with quality (1 + λ)qj(t), where λ > 0 is the quality improvement of the innovation (or the step size in the quality ladder) and qj (t) is the highest quality at which good j is currently produced by any firm. Note that the quality improvement makes the innovating firm the new technical leader on the production of good j, “stealing” this product from whichever firm currently produces it.

To produce innovation, firms must invest in R&D. For any given investment, the number of innovations realized in each period is stochastic and follows a Poisson distribution. In continuous time, this means that each firm produces at most one innovation per period.14 I also assume that firms are not able to target any specific product line with their R&D investments, so the resulting innovation is realized over any product jJ with equal probability. There are two important consequences of this assumption. First, there is no strategic interaction between firms in their R&D investment decisions, as they cannot target each others’ products. Second, since all firms start with no product lines in their portfolio and can only add at most one product per period, the number of products in any firm’s portfolio is always countable. Because there is a continuum of product lines in the economy, the probability that a firm innovates over one of its own products is zero. A firm does, however, need to consider the probability that another firm will innovate over a product it is currently producing. In this case, the product is stolen and the incumbent producer removes it from its portfolio.

The arrival rate of an innovation is determined by the firm’s location and its investment in R&D. This investment requires the labor of inventors, who benefit from agglomeration externalities in the city where they work. Specifically, let Ĩc be the population of inventors per unit of land in city c. The productivity of each individual inventor who resides in c on the production of innovation is proportional to I˜cη, for η ≥ 0. Hence, a firm f located in city c who hires if,c inventors will produce an innovation with arrival rate

xf,c(t)=χc(t)(I˜c(t)ηif,c(t))ψ.(1)

The “strength” of the agglomeration externality is controlled by the parameter η in particular this model nests the case where there are no agglomeration externalities by setting η = 0.

Investment in R&D also involves a fixed cost, paid in terms of the labor of inventors: firms need to hire κ > 0 inventors to cover their fixed cost. This cost is paid in every period that the firm decides to invest in R&D, and reflects managerial and maintenance costs associated with research, such as cleaning and repairing laboratory equipment and the management of inventors. Firms that decide not invest in R&D in a given period do not need to pay the cost in that period; but will need to pay it in case they decide to invest in the future. In terms of the model, this fixed cost acts as a benchmark for the value of entry – so that free entry drives the expected value of entrant firms to zero. Including a fixed cost instead of an entry cost is convenient in this model because it makes entrants and incumbents symmetric.

There are several assumptions implicit in the functional form of the production of innovation (1). The first assumption is that the population of inventors is the relevant measure of agglomeration. This is empirically tested and confirmed in appendix C.3.3. Second, innovation does not scale with firm size, so that all firms in the same city innovate at the same rate. Appendix G presents an extension of the model that allows for the scaling of innovation and the fixed cost with firm size, and shows that the aggregate predictions of the model remain unchanged.

Third, agglomeration spillovers are the same for all inhabitants of the same city, and nonexistent outside the borders of the city. In practice, agglomeration externalities are likely to change continuously with the distance between agents (Rosenthal and Strange, 2001; Carlino et al., 2012). Under this interpretation, the elasticity of agglomeration defined in this paper is in fact the aggregate effect of agglomeration within a distance that corresponds to the average size of a city. With that said, an analysis that explicitly takes into account the distance between inventors would be both impossible to conduct given the available data and most likely not add much intuition to the model. Recall also that all policies/regulations are defined over specific geographical boundaries, so analyzing the effect of agglomeration over those boundaries is consistent with the purpose of this paper.

Fourth, there is no sorting into cities because inventors and firms are homogeneous. If there is sorting in reality, the ex-post differences between the productivity of workers in different cities will be captured by χ¯c, although this does not change the fact that sorting is endogenous and might respond to policy changes. Furthermore, it is unclear how the introduction of sorting would affect the optimal distribution of R&D subsidies. If sorting and agglomeration are complements (i.e., inventors with high productivity also benefit more from agglomeration), then it is likely that the economy would benefit from a higher spatial concentration of inventors. Nevertheless, the infra-marginal inventors moving in to larger cities would also be less productive, and reduce the average productivity in those places.

Finally, firms can’t target innovations at their own goods, so that all innovation generates creative destruction. Intuitively, this means that firms do not benefit from their own past investments to increase the quality of goods that they own. Since firms located in more productive cities innovate more frequently, the absence of “internal” innovation reduces the incentives for firms to locate in those places. Both assumptions (lack of sorting and of internal innovation), however, are crucial for the tractability of the model and are thus kept throughout the paper. The counterfactual results in section 5 should be interpreted with those caveats in mind. With that said, including both sorting and internal innovation is likely to only strengthen those results, which already imply that a higher spatial concentration of inventors is welfare enhancing.

City 0. To reflect the fact that some cities in the US have never produced a single patent, I allow for the existence of one city where χ¯0=0. Intuitively, city 0 is representative of all cities that do not have minimal necessary conditions for investment in R&D (and therefore do not innovate) in the country. Firms located in city 0, however, can still engage in the production of goods.

2.4 Local Policies and the Government

The policy of interest in this paper are local R&D subsidies. These subsidies transfer a share of each firm’s R&D cost back to the firm. They are local in the sense that the subsidy rate, sc, varies according to the city where the firm is located. On top of R&D subsidies, the government also provides a public good, G. The public good is nationally available and is consumed by all workers. I assume that the government fully taxes all firm- and land-owners to finance its expenditures. In this setting, the provision of the public good plays two important roles. First, it is a simple device for redistributing profits and land rents back to workers. Second, expenditures on on the provision of this good are used to balance the government’s budget constraint, given that the values of the R&D subsidy rates are taken directly from the data. Throughout the model and in all counterfactuals, the amount of public good provided in each period is fixed at G(t)=G¯ for all t.

Corporate and labor income taxes. Local governments also have in place several other taxes/subsidies that are not explicitly included in the model. Two important taxes in this category are corporate income and labor income taxes. For the purposes of the model, differences in either of those taxes across regions would mainly change how attractive a city is to workers or firms. In that regard, any effect of the corporate income tax would be reflected in χ¯c, while the effect of a labor income tax would be reflected in αc.15

A different and important issue arises in the estimation of the model if corporate/labor income taxes are changing over time in a way that is correlated with how local R&D subsidies evolve. This would lead to an overestimation of the effect of those subsidies in the data. I explicitly address this issue in section 4.4 by showing that R&D tax credits have evolved in way that is largely uncorrelated with other local taxes.

3 The Balanced Growth Path Equilibrium

In this section, I explicitly pose all problems faced by consumers and firms and solve for the model’s equilibrium. In what follows, I will impose two restrictions on the equilibrium. First, I will solve for a Balanced Growth Path, in which all aggregate variables grow a constant rate. Second, I require that each one of the local shocks zc(t) follows its stationary distribution. Not surprisingly, I refer to this equilibrium as a Stationary Balance Growth Path (SBGP), formally defined below.

Definition 1. Given initial product quality levels {(qj(0) > 0}j∈J and a set of values for the R&D subsidy, amenity level and expected productivity of innovation in each city {sc,αc,χ¯c}c=0C such that χc(t)=χ¯cezc(t),χ¯0=0 dzc = ψ(µ − zc) + σdWc, a Stationary Balanced Growth Path Equilibrium of the model consists of, for all periods t ≥ 0: (a) an allocation of goods (Y(t),{nc(t)}c=0C,{kj(t)}jJ), (b) a spatial distribution of inventors, production workers and firms {Ic(t),Lc(t),Nc(t)}c=0C and (c) prices {wci(t),wc(t),pn,c(t)}c=0C such that

  • (i) zc(t) is normally distributed with mean µ, and variance σ2 for all t and all c.

  • (ii) The production of the final good Y{t), the average quality of intermediate goods in the economy Q(t) = ∫j∈Jqj(t)dj, “baseline” wages (i.e., congestion-adjusted), and the utility of consumers grows at a constant rate.

  • (iii) All workers are freely mobile and maximize their utility over the consumption of final and non-tradable goods, as well as over the city where they live.

  • (iv) Final and non-tradable good producers maximize profits taking prices as given. Intermediate good producers (firms) operate under monopolistic competition in the production of each product line j.

  • (v) Incumbent firms take their location as given and choose R&D investments to maximize their discounted stream of profits. There is free entry to all cities and a large mass of potential entrants.

  • (vi) All labor and goods markets clear, and the amount of public good produced, G¯, balances the government’s budget constraint.

3.1 The Firm’s Static Problem

Each firm’s problem can be broken down into a static problem and a dynamic problem. In the static problem the firm chooses how much to produce of each good, taking as given the set of products it is able to produce and their respective qualities. In the dynamic problem, the firm chooses how much to invest in R&D after observing the local productivity shock.

Final goods production. The final good producer is free to choose in which city to locate. Because agglomeration externalities do not have any effect over the production of the final good, production will take place in whichever city has the lowest wages (least congestion): city 0. Since χ0 (t) = 0 for all t, no firms investing in R&D will locate there, which means that there no inventors living city 0 as well. As a result, the only workers in city 0 are the ones hired by the final good producer and those who produce the non-tradable good. In each period, the final good producer’s profit maximization problem is therefore (I drop time from the notation as it causes no confusion)

maxy,0,{kj}jJy,0ε1εJkj1εqjεdjJpjkjdjw0y,0.

The first-order conditions define the wage in city 0 and the demand for intermediate goods:

[y,0]:ε1εy,0ε1Jkj1εqjεdj=w0;(2)
[kj]:pj=(y,0qjkj)ε,jJ.(3)

Intermediate goods production. Given the demand function (3), each firm chooses kj to

maxkjpj(kj;qj)kjvkj.

Since there are no transport costs and the marginal cost of production does not vary between cities, the firm’s production decision is completely independent from the city where it is located. Solving the problem above gives the quantity of good j produced

kj=qj(1εv)1εy,0.(4)

The profit made with product line j is πj=y,0(1εv)1εεεqj.

3.2 Local Wages and Congestion Costs

Before getting into the dynamics of firm decisions, it is useful to understand how local wages respond to the city’s population. Wage variation between cities is driven by differences in city-specific characteristics (i.e., amenities and productivity) and congestion costs among cities and, crucially, the fact that workers are freely mobile. Because of free mobility, it follows that uct=uh(t) for all c, t and h{i,ℓ} in equilibrium – that is, the utility level of workers must be the same in all cities and in all periods.16 Given this condition, wages must adjust to compensate workers for any variation in utility caused by different levels of amenities or the price of the non-tradable good between cities. Wages between inventors and production workers differ due to differences in the supply and demand for each type of worker.

Congestion costs in this model arise from the fact that the production of the non-tradable good involves a fixed factor (land), and thus displays decreasing returns to scale in equilibrium. Combined with the fact that the demand for non-tradable goods increases with the population of workers in each city, DRS in the production of non-tradable goods implies that cities with a larger population will also have more expensive non-tradable goods – generating a congestion cost. Note that, unlike agglomeration, the congestion force in this model is not an externality, as its effects occur through prices.17 It does, however, limit the size of cities: as the population of a city increases, so does the cost of living/producing there, so firms will start to favor locating in other places. Lemma 1 describes how wages respond to the city’s population. In this lemma and throughout the paper, I use the “tilde” notation to refer to variables per unit of land: if Ic is the population of inventors in city c, then I˜c=Ic/m¯c.

Lemma 1. Define Ic and Lc, respectively, as the population of inventors and production workers in each city (with I0 = 0). Under the free mobility of workers, wages can be expressed as the product of a “baseline” wage and a term that adjusts for congestion costs in each city. The wages of inventors in cities c ∈ {1,..., C} is given by

wci=wi(I˜c1βαc)θ1θwherewi=11θ[ui[(1θβ)]θ]11θ(ILL0)θβ1θ,(5)

while the wages of production workers are

wc=w(L˜c1βαc)θ1θwherew=11θ[u(θβ)θ]11θ,(6)

where ui and u are the utility levels of inventors and production workers, respectively. Land rents in each of those cities is given by

pm,cm¯c=(1β)θ1θβwciIc.

In city 0, the number of production workers hired to produce non-tradable goods is ℓn,0 = θβL0, and the number of workers hired to produce the final good is ℓy,0 = (1 – θβ)L0. The wage of production workers is given by

w0=w(L˜01βα0)θ1θ(θβ)(1β)θ1θ(7)

and the total land rent is pm,0m¯0=(1β)θw0L0.

See appendix B for the proof. The equations in lemma 1 also imply that the number of inventors and production workers cities where there is innovation is proportional: Ic/I = Lc/(L – L0). As a result, the population of production workers is sufficient to characterize the population of inventors in each city, and vice-versa.18 Finally, note that the wage of production workers in city 0 can be written in an alternative form by plugging (4) into the F.O.C. of the final good producer, equation (2):

w0=ε1ε(1εv)1εεQ,(8)

where Q = ∫J qjdj is the average quality of all intermediate goods produced in the economy

3.3 The Firm’s Dynamic Problem

Firms can either be entrant or incumbent. In each period firms choose how much to invest in innovation, given the set of product varieties they own and the optimal production decisions described above. Incumbent firms take their location as given, and are not allowed to move. Entrant firms are free to choose which city to locate in. The timing of decisions in each period is as follows:

  • (i) The shock zc(t) is realized and observed in all cities.

  • (ii) Potential entrants decide whether or not to enter and in which city to locate.

  • (iii) Entrants and incumbents decide how many inventors to hire.

  • (iv) Innovations are realized (based on the arrival rates xf,c) and production takes place.

Incumbents The dynamic problem faced by an incumbent firm located in city c can be described by the Hamilton-Jacobi-Bellman (HJB) Equation in lemma 2 below. To make the exposition clearer, first define qf to be the multiset of the qualities of products that are currently being produced by the firm19 and D to be the rate of creative destruction of the economy – or equivalently, the aggregate rate of innovation. Because the measure of intermediate good varieties in the economy is one, D also coincides with the probability than any single product line will be “stolen” at any point in time.

Let r be the (exogenous) interest rate and A = (Q, wi, D, L0) denote the “aggregate state” of the economy, where Q is the average quality of all intermediate goods produced, wi is the “baseline” wage of inventors, D is the rate of creative destruction, and L0 is the population of production workers in city 0 (which affects the flow of profits for firms). For notational convenience, I also defined π¯=(1θβ)(1εv)1εεε, so that the per-period profit of each firm is πj=π¯L0qj, and Zc = eZc so that the city-specific productivity in the production on innovation is χc=χ¯cZc.

Lemma 2. The HJB equation that describes the problem faced by an incumbent firm located in city c ∈ {1,..., C} is

rVc(qf,I˜c,Zc,A)Vc(qf,I˜c,Zc,A)AAt=maxxf,c{Σqjqfπ¯L0qj+xf,c𝔼j[Vc(qfU+{(1+λ)qj},I˜c,Zc,A)Vc(qf,I˜c,Zc,A)](1sc)wci(if,c+κ)DΣqjqf[Vc(qf,I˜c,Zc,A)Vc(qf\{qj},I˜c,Zc,A)]+Rc(qf,I˜c,Zc,A)xf,c=χ¯cZc(I˜cηif,c)ψ}

The derivation of this equation can be found in appendix B. There are three groups of state variables in the firm’s problem: the first is firm-specific, the second is city-specific and the third is common across all firms in the economy. The first group includes qf, the set of qualities of the product lines that are currently produced by the firm; the second group has Ĩc, the population of inventors per unit of land in city c (which determines agglomeration spillovers), and Zc, the productivity shock in city c; finally, the third group has the aggregate state A, which includes the rate of creative destruction (i.e., the probability that one of the firm’s product lines will be “stolen”), the “baseline” wage of inventors (which along with Ĩc determines the wage wci) and the average quality of all intermediate goods produced in the economy.

The first term inside the curly brackets is the profit made through the production and sale of goods. It is followed by the expected gain from one more innovation, which introduces a new product into qf (recall that the number of innovations per period follows a Poisson distribution, so that in continuous time the probability that two or more arrivals occur can be ignored). The first term in the second line is the cost of investment in R&D (both variable and fixed), subsidized at rate sc. The second term in the second line is the expected cost from the loss of a product line due to creative destruction. The remaining term, Rc(qf, Ĩc, ZC,A), captures the risk that firms in city c face due to the productivity shock.

As a final note, recall from section 2 that one of the sources of revenue for the government are taxes on firm owners. Taxing firm owners in this model is the same as taxing firm’s profits; yet no corporate income taxes appear on the HJB equation above. As shown in the proof of lemma 2, this can be done because a tax on a firm’s profit will not affect any of the firm’s decisions, as long as the “profit” also includes R&D expenditures. As a result, the corporate income tax only shifts the share of the firm’s value that is accrued to the firm owner, without having any effect on the allocation of resources in the economy. By fully taxing firm owners, I study the limiting case where corporate taxes are arbitrarily close to one, but where firms still behave as profit maximizers (i.e. they behave as if their value function was Vc, as in lemma 2).

Entrants Entrants behave in the same way that incumbents do, with two exceptions: entrants do not yet have any product lines of their own and are able to choose where to locate. As before, define the aggregate state of the economy as A = (Q,wi,D,L0). The entrant firm’s problem can be solved in two steps:

Step 1: Choose which city to locate in after observing all shocks {Zc}c=1C:

Ve(A)=maxcVce(I˜c,Zc,A).

Step 2: Choose the level of innovation subject to being in city c.

rVce(I˜c,Zc,A)Vce(I˜c,Zc,A)AAtmaxxf,c{xf,c𝔼j[Vc(qj,I˜c,Zc,A)Vce(I˜c,Zc,A)](1sc)wci(if,c+κ)+Rce(I˜c,Zc,A)}s.t.xf,c=χ¯cZc(I˜cηif,c)ψ

The HJB equation for the second stage of the entrant’s problem is exactly analogous to the incumbent’s problem, but does not include the flow of profits from current production or the risk of loosing products to other firms by creative destruction. Proposition 1 describes the value function for entrants and incumbents (see appendix B for the proof).

Proposition 1. In a Stationary Balanced Growth Path Equilibrium where the production of final goods Y grows at rate g < r, the value function of an incumbent firm located in city c ≥ 1 and whose portfolio of products is qf is

Vc(qf,I˜c,Zc,A)=F(D,L0)Σqjqfqj+max{0,Ec(I˜c,Zc,wi/Q,D,L0)Q},

where F(D,L0)=π¯L0/(r+D) is the “franchise value” of adding a new product to the portfolio and Ec is the entry value for firms city c (see the proof for a complete characterization).

In addition, the second stage value function of an entrant firm who is located in city c is

Vce(I˜c,Zc,A)=max{0,Ec(I˜c,Zc,wi/Q,D,L0)Q}.

Intuitively, F can be interpreted as the quality-adjusted franchise value of adding a new product to the firm’s portfolio, while EcQ is the value at entry for all firms in city c. Note that Ec does not depend on the firm’s portfolio of products, so it does not vary across firms in the same city. The term max{0, EcQ} in the firm’s value function reflects the fact that each firm has the choice to invest in R&D or not, and as such can be interpreted as the option value of investments in innovation. When investing in R&D is optimal, Ec ≥ 0 and the arrival xf,c is strictly positive. However, firms can also choose not to invest in R&D in any given period – for example if the realized value of the shock Zc is too low. In this case, the firm does not hire inventors if,c = 0, produces no innovation Xf,c = 0, and does not need to pay the fixed cost wciκ.

The assumption that g < r is a technical requirement for the present discounted value of firms to be finite. From an intuitive point of view, note that the firm’s value from investing in R&D grows at rate g (so long as Ec ≥ 0). Thus, if g > r, the optimal strategy for any given firm will be to invest as much as possible on R&D (for example, by borrowing capital at rate r). This will generate an expected growth in the firm’s value that is larger than the firm’s discount factor and thus the PDV diverges. The assumption that the rate of growth is smaller than the real interest rate mitigates those concerns.

Free Entry. The first stage of the entrant’s problem specifies that entrants are free to locate in any city they wish. Since there is a large mass of potential entrants to every city in the economy, free entry implies that, in equilibrium

Vce(I˜c,Zc,A)=0forallc{1,...,C}andallt.

This condition says that the value of entry must be zero for all cities and at all times. The intuition behind it simple: if the entry value were positive, firms would keep entering the city, which increases congestion and eventually pushes the value of entry to zero. If the entry value were negative (i.e., Ec < 0), two things happen. First, no firms will enter the city. Second, incumbents will refrain from investing in R&D (see the discussion above). This pushes down the demand for inventors in the city, which reduces congestion costs and the value of entry grows back to zero.

One alternative way to read the free entry condition is that Vce(I˜c,Zc,A)=0 for all cities and for any value of the shock Zc and the aggregate state A. In other words, the population of inventors must adjust so that the value of entry is zero in all cities. From proposition 1, this implies that Ec = 0 regardless of the value of the state variables. This interpretation of the free entry condition, proposition 2 determines the population of inventors in each city (once again the proof can be found in appendix B).

Proposition 2. Imposing (1)free entry, (2) labor market clearing for both inventors and production workers, and (3) assuming a large number of cities C → ∞ (so that the Law of Large Number applies and the average of city-specific shocks converges to its mean), the population of inventors in each city is given by

Ic=I×(χ¯c1sc)1θΘαcθΘΣc=1C(χ¯c1sc)1θΘαcθΘ×Zc1θΘe1θΘ(1θΘ1)σ24ϕ(9)

where Θ = (1 – β)θ – ψη](l – θ). Moreover, the arrival rate of an innovation for a firm f located in city c is

xf,c=(κψ1ψ)ψχ¯cI˜cψηZc,(10)

if the firm chooses to invest in R&D and zero otherwise. Similarly, the number of inventors hired by each firm in city c is if,c=ψ1ψ in case of positive investment and zero otherwise. The number of firms located in city c who invest in R&D in each period is

Nc=(1ψκ)Ic(11)

Finally, it can also be shown that the population of production workers in city 0 is proportional to L (i.e., LQ does not vary over time), and that wi is not affected by the city-specific productivity shocks

wiQπ¯L0r+D{1CΣc=1C(χ¯c1sc)1θΘαcθΘ}Θ1θ.(12)

Proposition 2 has three important results. First, it shows that the population of inventors in each city has a closed form solution in equation (9), with relatively simple terms. Cities with higher amenities, higher productivity for innovation, and higher R&D subsidies will have more inventors (note that Θ > 0 using the parameter values from section 4). The population of inventors in each city also reacts to the productivity shock Zc, increasing in periods where the shock is larger. The composite parameter Θ can be interpreted as the “net elasticity” of congestion: (1 – β) captures the elasticity of congestion with respect to a city’s population, while ipr) is the elasticity of the production of innovation with respect to the population of inventors. Θ is then defined as the weighted difference between these two elasticities, where the weights are determined by the share of expenditure on the tradable good, θ.

Second, proposition 2 says that the baseline wage of inventors – and therefore of production workers as well (see equation B.4) – does not react to any of the city-specific productivity shocks. Put differently, even though each city is subject to an idiosyncratic productivity shock, these shocks “average out” on the aggregate, so the economy can still operate on a Balanced Growth Path where there is no aggregate uncertainty.

Third, this proposition shows that the optimal arrival rate of innovation xf,c is uniform across firms located in the same city and who make positive investments in R&D. Note, however, that the expected value of investing in R&D is null because of the free entry condition. As a result, it is possible that some firms in the city choose to invest in R&D and some choose not to. Indeed, the number of inventors hired by firms who make positive investments does not respond to productivity shocks. Instead, the adjustment to those shocks is done entirely on the extensive margin – both by incumbents who decide whether or not to hire inventors and by the entry and exit of firms.20

3.4 Determining the Growth Rate

In this section, I finish the characterization of the SBGP equilibrium of the model by determining the rate of growth of the economy. To be able to do this, I first must determine the aggregate rate of creative destruction of the economy, D. Note that, because all firms located in the same city will make the same investment in D=Σc=1CNcxf,c. Corollary 1 shows that this rate can be expressed as a function of the spatial distribution of the population of inventors in the economy (see appendix B for the proof).

Corollary 1. Define I¯c=I(χ¯c1sc)1θΘαcθΘ{Σc=1C(χ¯c1sc)1θΘαcθΘ}1 to be the expected value of Ic w.r.t. the local productivity shocks. Similarly, I¯˜c is the expected density of inventors in city c. The aggregate rate of creative destruction is then

D1CΣc=1Cχ¯cI¯˜c1+ψη,(13)

Corollary 1 shows one of the key takeaways from this model: the aggregate rate of innovation in the economy depends not only on the amount of inventors in the country but also on how inventors are geographically distributed. Since local R&D subsidies can change the spatial distribution of inventors by attracting more firms to a given location, it follows that changes in the spatial distribution of those subsidies affect the aggregate rate of innovation in the economy – even if the average subsidy rate (or total expenditure) is kept constant. The corollary also proves, as alluded to before, that the rate of creative destruction is indeed constant on the SBGP equilibrium (as both χ¯candI¯˜c are fixed over time).

Proposition 3 now determines the value of the rate of growth of the economy in the SBGP equilibrium, defined by g=Y˙/Y. The proof can be found in appendix B.

Proposition 3. In a Stationary Balanced Growth Path equilibrium where the production of the final good in the economy grows at rate g, the following are true:

  • (1) The average quality of intermediate goods, Q, and the baseline wage for both types of workers, wi and w, all grow at rate g. The utility level of workers, ui and u, grows at rate (1 – θ)g.

  • (2) The value of the rate of growth is g = λD.

  • (3) Let Jc(t) be the set of intermediate goods that are produced in city c at time t, Qc(t) = ∫ Jc(t),t qj(t)dj their aggregate quality, and gc(t)=Q˙c(t)/Qc(t) the rate of growth of Qc in time t. For large values of t (i.e., as t grows to infinity), we have that

𝔼[Q˙c(t)]𝔼[Qc(t)]=gor,equivalently,𝔼[gc(t)]=gCov(gc(t),Qc(t))𝔼[Qc(t)].

Part (1) of proposition 3 describes the rate of growth of aggregate variables in the SBGP equilibrium of the model. Part (2) then shows that the rate of growth of the economy is equal to the rate of creative destruction multiplied by the innovation step-size. Part (3) shows that the ratio between the expected variation in the aggregate quality of goods produced in city c and the expected quality of these goods converges to the national growth rate with time. Intuitively, this result is a consequence of innovation by creative destruction: cities that produce goods with higher-than-average quality will innovate over products whose qualities are, on average, lower than the goods already produced in the city – and vice-versa. Through this mechanism, creative destruction acts as a mean reverting force for the quality and the value of goods produced in each city, precluding all economic activity from concentrating in a single city.21 22 Lastly note that none of the results in the proposition are predicated upon the initial spatial distribution of economic activity: as long as no city is large enough to drive the evolution of aggregate variables by itself, proposition 3 holds regardless of where the economy starts from.

3.5 Existence and Uniqueness

The existence of a solution for the SBGP equilibrium relies in two conditions. First, it requires that r > g so that present discounted values of profits are finite (see the discussion following proposition 1). Second, the number of cities must large enough so that the local shocks do not generate aggregate uncertainty in the economy.23 Uniqueness comes from the unique spatial distribution of inventors defined by equation (9). One important caveat here is that this distribution is only unique if the net elasticity of congestion Θ > 0. If this is not the case, the agglomeration force is always larger than congestion and therefore it is always profitable for all firms to locate in the same place. This generates multiple equilibria since the initial distribution of population can affect the entire path of the economy – for example, if the entire population is located in city ĉ in the initial period, there is no incentive for entrants to go anywhere else.

4 Identification and Estimation of the Model

The identification and subsequent estimation of the parameters in the model proceeds in three steps. In the first step (section 4.1), I calibrate a set of parameters that can be directly matched to quantities in data or reliably found in other studies in the literature. The second step (section 4.2) uses linear regressions to estimate the elasticity of the agglomeration spillover with respect to the population of inventors, as well as the elasticity of congestion with respect to the population of production workers in each city. Finally, the third step (section 4.3) identifies the remaining parameters by matching moments in the model to moments in the data. This step-by-step structure simplifies identification of the model by allowing each step of the process to take as given the values of parameters identified in previous steps.

To estimate the model, I use data on patent flings and economic activity within cities. Data on patents filed is available on the USPTO Patent Dataset, which records all patents registered in the US. It also provides data on the patent’s inventors and assignees (owners) and, importantly, their location. In addition, I use the County Business Patterns Dataset (CBP), which provides information on the demography and economic activity in each county in the US. To approximate the prices of non-tradable goods I use the Zillow Rent Index (ZRI), which estimates the median rental value per square foot across counties.24 Both the CBP and the ZRI are aggregated to the city level, whose empirical counterpart is a CBSA. CBSAs, or core-based statistical areas, are geographic areas defined by the US Office of Management and Budget that consist of one or more counties (or equivalents) anchored by an urban center of at least 10,000 people plus adjacent counties that are socioeconomically tied to the urban center by commuting. I use the most recent definitions of CBSAs, based on the 2010 Census standards. Appendix C.1 describes in detail the construction of the dataset. For most of the estimation procedure, I focus on a panel of firms (and their respective locations) who have filed patents between 1998 and 2016.

4.1 Calibration

A subset of the parameters in the model, shown in table 1, can be matched to the data in a straightforward way. In this section, I detail how to calibrate these parameters. I start by setting the rate of growth of the economy to g = 2%, which corresponds to the annualized historic rate of growth in the US. Similarly, the discount rate of consumers is also set to ρ = 2%, following a common practice in the growth literature (e.g., Acemoglu et al., 2018). The real interest rate is fixed at r = 3.8%, which corresponds to the average interest rate in the US between 1961 and 2017 according to the World Bank.25

Innovation. I rely on the economic literature to determine the value of two parameters in the innovation process: the curvature of the innovation production function, ψ, and the innovation quality multiplier (or step-size), λ. I set ψ = 0.5 following several studies that agree on this number. A series of papers identify ψ as the elasticity of patents with respect to R&D expenditure. Blundell et al. (2002) estimates this elasticity to be 0.5 using count data models. Griliches (1990) and Hall and Ziedonis (2001) find similar values. Other researchers identify ψ using the elasticity of R&D expenditure with respect to taxes/subsidies, which is equal to ψ/(1—ψ).26 In a survey, Hall and Van Reenen (2000) conclude that this elasticity is around unity. Similar estimates have been found in more recent papers as well (Bloom et al., 2002; Wilson, 2009). Akcigit et al. (2020) add interesting nuances to this result, differentiating between the impact of taxes on inventors/firms (micro level) and on states over time (macro level). Consistent with the discussion here, they find that “a one percentage point increase in the personal tax rate leads to a 1.1 percent decline in the number of patents” at the inventor level. Note that a unit elasticity of patents relative to taxes also implies ψ = 0.5. Finally, Acemoglu et al. (2018) arrive at the same conclusion when computing the elasticity of R&D expenditures with respect to scientists’ wages using firm level data from the Census Bureau.

The innovation step-size is λ = 0.132. This is the value estimated by Acemoglu et al. (2018) in a setting close to the one presented here. Akcigit and Kerr (2018) find a similar value for the step size of “external” innovations (i.e., innovations that do not target a firm’s own products). In both cases, the estimation of the step size is achieved through a simulated method of moments procedure that targets, among others, firms’ sales and R&D costs. Intuitively, these data identify the innovation step-size because the increase in the quality of goods after an innovation is reflected on the sales to R&D cost ratio of firms.27

R&D Subsidies. R&D subsidies are equivalent to R&D tax credits in the model, to which I assign the values of existing credits in each state (i.e., the sum of federal and state-specific credits). Because of differences in the tax code for each state, the statutory credit rates can be different from the effective credit rates (i.e., the rates that are actually applicable as subsidies to firms). For example, R&D tax credits are only applicable to R&D investments over and above a given base value, which can differ across states. In addition, some states consider the federal tax credit as taxable income, thus “recapturing” part of the credit. In all that follows, I assign sc to be equal to effective R&D tax credit rate that applies to the highest tier of R&D investments in each state, as computed by Wilson (2009), who has made this data publicly available.28

Nevertheless, there are a couple of details that require attention in this data. First, the value of R&D credits changes over time, while the model assumes that sc is fixed. Most of these changes happen before 1995, but there are still some cases where there is variation in the credit rate after this period. I therefore use the average credit rate between 1998 and 2006 (the last year available in the data) as a measure of the R&D subsidy in each state. Second, the model is written in terms of cities (CBSAs), which is some instances do not fall into the geographical boundaries of states. In those cases, I match the CBSA to the state of its largest urban center and assume that the R&D credit rate of that state applies to the entire CBSA. For example, the New York-Newark-Jersey City, NY-NJ-PA MSA is matched to the state of New York, the Chicago-Naperville-Elgin, IL-IN-WI MSA is matched to Illinois, the Boston-Cambridge-Newton, MA-NH MSA is matched to Massachusetts, and so on.

Production and Preferences. The elasticity of quality in the production of the final good, ε, coincides with the profit/sales ratio for intermediate good producers, which can be observed in the National Income and Product Accounts tables published by the Bureau of Economic Analysis (BEA). In addition, the preference parameter θ is the share of expenditure on non-tradable goods by consumers. I set this parameter to 0.6, which is roughly the share of expenditure in housing and transportation found in the Consumer Expenditure Survey, published by the Bureau of Labor Statistics (BLS) and the share of aggregate investment on non-tradable goods found by Bems (2008).

Population. The population of inventors in the economy, I, can be found through the unique inventor ID in the USPTO patent data. For each year, I compute the total number of inventors in the US – let Ipat be the average of this series over time.29 Since only inventors that have authored patents are identified in the data, Ipat = ψI, which is the total number of inventors hired by firms to produce patents ( ΣcNcif,c), not including those hired to cover fixed costs. A simple adjustment gives I = Ipat.

The population of production workers or “non-inventors” in the economy is set so that L + I matches the total employed population in the CBP data. Finally, L0 is the total employed population in CBSAs that did not file any patents throughout the sample (once again averaged over time). Since the size of the economy does not affect the results in the quantitative exercises, I normalize the population so that the total number of inventors in the economy is 1.

The number of cities is also chosen to match the data. There are 917 CBSAs in the US (not counting Puerto Rico because it is not included in the CBP), of which 860 have filed at least one patent between 1998 and 2016. Given that the number of patents filed is my measure of innovation, I assume that the remaining CBSAs have not produced any innovation over my sample. As a result, C = 860 (the cities who have a positive expected productivity in innovation) and city 0 is representative of the remaining 57 CBSAs.

Table 1:

Calibrated Parameters

article image

4.2 Linear Regressions

In this section, I show how to identify and estimate η and β, which help to determine the elasticities of agglomeration and congestion, respectively. In both cases, these parameters can be estimated using linear regressions based on relationships predicted by the model.

4.2.1 The Elasticity of Agglomeration

The functional form of the production function for innovation, equation (1), leads to a log-linear relationship between the arrival rate of innovation, the number of inventors hired, and population of inventors in the city for each firm located in city c – which suggests that the parameters of this function can be estimated by linear regression. However, the model presented in section 2 is written continuous time, while the data is only available at a yearly rate. Therefore, we must first transform equation (1) to reflect the same frequency as the data. This transformation is straightforward and is described in detail in appendix C.2. It leads to the following regression model

log(xf,c,t)=ψlog(if,c,t)+ψηlog(Ic,t)+δc+zf,c,t,(14)

where, xf,c,t is the number of innovations produced by firm f, located in city c, during year t; if,c,t is the number of inventors hired by firm f during year t; Ic,t is the population of inventors in city c during year t;δc is a city fixed-effect; and zf,c,t is a function of the city-specific productivity shocks.

To proxy for the production of innovation in each year, I use the number of patents filed by a firm in that same year. This is a fairly common practice, but it does have some caveats. First of all, not all innovations are patented. Possible reasons for that include firms who decide to protect their intellectual property by other means (for example with trade secrets) or the fact that some innovations are not “patentable” (e.g., new managerial practices or marketing strategies).30 Second, not all patents represent an innovation over a product. Examples include defensive patenting and patent trolls.

To reduce the potential for a mismatch between patents filed and the production of innovation by firms, I include two controls into the regression above. The first is the total number of citations that the patents filed by each firm jointly receive. Patents whose main goal is not to generate an innovation to increase the quality of a product are less likely to be cited by future patents – so including the number of citations as a control helps to separate innovations over products from other types of patents. On top of that, more recent patents mechanically receive fewer citations (Hall et al., 2001), so I interact the number of citations with a dummy for the year in which the patent applications were fled.31 The second variable included is the firm’s industry, which controls for the possibility that some industries are more prone to patent innovations than others. In addition, I also add a year fixed effect into the regression to capture aggregate variations over time (for example population growth).

One important empirical prediction of the model regarding the regression model (14) is that both if,c,t ad Ic,t are correlated with the local shock zf,c,t, so that estimating the coefficients in that regression via OLS would recover neither ψ or η. To see why the model predicts that if,c,t is correlated with zf,c,t, recall that the value of the productivity shock changes the number of firms investing in R&D – so that a particularly low shock could induce some firms not to invest in R&D in a particular period, while a high value of the shock would induce more firms to invest (along with a larger number of entrants). Since zf,c,t is a function of the local productivity shocks, the correlation follows. In the case of Ic,t, this correlation is easily seen from equation (9), where the population of inventors in each city is a function of the city’s productivity in each period (see appendix C.2 for more details). In practice what this means is that estimating the coefficients on regression (14) via OLS would not recover neither ψ or η.

Note, however, that the regression (14) can be rearranged as

log(patentsf,c,t)ψlog(if,c,t)=ψηlog(Ic,t)+Xf,c,t'Γ+δc+δt+zf,c,t

where Xf,c,t includes the controls mentioned above, and δt is a year fixed effect. Note that, since the value of ψ is known from the previous literature, the left-hand side of the equation above can be constructed in the data. Furthermore, because my the goal is to estimate η, this equation can be aggregated to the city level

1Nc,tΣf=1Nc,t[log(patentsf,c,t)ψlog(if,c,t)]=ψηlog(Ic,t)+Xc,t'Γ+δc+δt+zc,t(15)

where Nc,t be the number of firms in city c during year t and zc,t is the average shock inside each city. The control vector Xc,t includes the average number of citations received by patents filed by firms in city c (interacted with a year dummy) and the employment shares in each NAICS 2-digit industry in city c.32 The dependent variable in this model is the average log production of patents per inventor in each firm, where number of inventors per firm is transformed by raising it to the elasticity of labor in the production of innovation.

The regression model (15) simplifies the analysis by removing one of the endogenous variables from the RHS of the equation. I account for the endogeneity of the population of inventors Ic,t by constructing an instrument that leverages exogenous shocks to industries. I partition the set of products J into K industries, so that each product j can be assigned to a single industry k. Note that the population of inventors in city c can then be written as

Ic,t=Σk=1KIk,c,t=Σk=1KIk,c,tl(1+γk,c,tlt)

where Ik,c,t is the number of inventors in industry k living at city c during time t and γk,c,t-l→t is the rate of growth of Ik,c,t between periods t — l and t.33 Based on this identity, I construct the following instrument for Ic,t:

Ic,t,l=Σk=1KIk,c,tl(1+γk,c,tlt).(16)

where γk,c,t-l→t is the overall growth rate of employment in industry k from year t — l to year t. To avoid picking up variation in the total number of inventors in each year, the industry growth rate is computed using the shares of employment in each industry relative to the total population of inventors. Slightly abusing notation to define Ik,t as the number of inventors in industry k during year γk,c,tlt=Ik,t/ItIk,tl/ItlIk,tl/Itl.

To ensure the exogeneity of the instrument, I follow Autor et al. (2013) – among many others -, and compute the industry growth rates γk,c,t-l→t using inventors residing outside of the US who have registered patents with the USPTO (who are responsible for about 50% of all registered patents during the period of my sample). Industries are defined based on NBER’s patent subcategories (which add up to 38 different industries) and inventors are assigned to an industry based on the modal sub-class of the patents he or she filed (see the data appendix C.1 for more details).

The instrument Ic,t,l has a structure that resembles the commonly used “shift-share” research design (Adao et al., 2019; Borusyak et al., 2022; Goldsmith-Pinkham et al., 2020), with the obvious difference that Ic,k,t-l is the population level, not a share (see appendix C.3.1 for more details). The economic content is preserved, however: the growth rate of employment in each industry acts as an exogenous shock or “shifter” and the population level Ic,k,t-l measures the city’s exposure to industry shocks. However, the shift-share structure can also affect the across-region correlation of the regression residuals (Adao et al., 2019). Intuitively, regions that have a similar industry composition in their population of inventors will also have a similar exposure to the shifters γk,c,t-l→t, and therefore will tend to have similar residuals as well. As a result, I include two sets of standard errors when the IV strategy is used: one that is clustered across regions and one that is adjusted using the methods developed by Adao et al. (2019).34

The results of the estimation are in table 2. The top panel shows the first stage of the IV estimation, and the bottom panel shows the second stage. Column (1) shows the OLS estimates. Columns (2) – (4) show the IV estimates with lags I between 5 and 10 years (specified at the bottom of the table). Column (5) has the IV estimates for I = t —t 90–95 – that is, Ii,c,t-l = Ii,c,t90-95 is fixed at its average level between 1990 and 1995, and γk,t90-95→t is computed using average industry shares in the same period as the base value (the first year of the estimation period is 1998). All regressions are weighted by the number of firms in each city, to account for the fact that the data consists of averages over firms. Standard errors are clustered at the city level to allow for serial correlation of shocks within each city. In the second stage, AKM SE indicates adjusted standard errors that account for the shift-share structure of the instrument.

Table 2:

Estimation of the elasticity of agglomeration, ψη.

article image
Standard errors are clustered at the CBSA level and shown in parenthesis. *, **, and *** indicate that the coefficient is statistically different from 0 at the 10%, 5%, and 1% levels, respectively. AKM SE indicates alternative standard errors, calculated according to Adao et al. (2019). All specifications control for patent quality and city industry composition, as well as CBSA and year fixed effects.

By and large, the estimated coefficients are highly significant and vary between 0.07 and 0.10.35 Those numbers do not change much in most of the robustness checks – and when they do, the value of the agglomeration elasticity tends to be higher. If compared to the other estimates of the elasticity of agglomeration (which usually do not focus on innovation), the values in Table 2 are quite large. Duranton and Puga (2014), for example, state that most studies have found this elasticity to be between 0.02 and 0.05.36

Innovation can, however, be more responsive to agglomeration spillovers than the production of goods. In an exercise similar to mine, Carlino et al. (2007) compute the elasticity between the number of patents per capita and employment density in metropolitan/urban areas in the US. In their baseline specification, they find this elasticity to be approximately 0.19, which is quite larger than the values shown in table 2. The main difference between that study and regression (15) is that the amount of patents per capita does not account for differences in firm size that arise in different cities – and cities with higher agglomeration will also have more and larger firms, so they naturally produce more patents.

Identification Conditions and Robustness Checks. There are two ways to interpret the orthogonality condition for shift-share instruments, and therefore for the instrument proposed here as well. The first one, discussed at length by Goldsmith-Pinkham et al. (2020), is that it requires that the exposures Ik,c,t-l to be uncorrelated with the local shock zc,t. This is unlikely to be true for small values of the lag I, as local shocks can differentially affect industry employment shares depending on the city’s industrial composition (and be correlated with the lagged share if shocks are serially correlated). However, this condition becomes plausible when lags are larger, for example in column (4) that uses a 10-year lag, or when the industry employment levels are fixed at a period that predates the estimation sample, as in column (5).37

The second interpretation, suggested by Borusyak et al. (2022), assumes that the industry growth rates γk,t−l→t are asymptotically uncorrelated with the industry-specific average of local shocks, 𝔼[Ik,c,t-lzc,t] (where the expectation is taken over c). Measuring growth rates γk,t−l→t outside of the US addresses many of the issues that could be raised about the plausibility of this assumption. One concern that remains is that some industries might be highly concentrated in one single city – enough that the city’s local shocks are able to affect global trends in that industry (Silicon Valley may come to mind). To address this issue, I re-run the regressions in Table 2 with a slightly different instrument that excludes industries whose employment share in a single city exceeds 15% in any point in time (varying this threshold between 10 and 25% produces comparable results). Once again, those estimates are in line ones presented here and can be seen in appendix C.3.2.

Another argument that might call into question the validity of the estimates above is that the log-log specification of the regression discards observations in which firms have not produced a patent. Since innovation is stochastic, this specification would introduce bias by selecting firms located in larger cities or firms located in cities who by chance experienced mostly positive shocks (both of which increase the arrival rate of innovation). To put those concerns to rest, I slightly modify (15) to interpret it as a count-data (Poisson regression) model, which allows for firms to produce zero patents in any given year. Appendix C.3.3 describes this regression model in detail and shows the estimated coefficients. The resulting elasticity of agglomeration is even slightly higher than above, estimated at approximately 0.13 — 0.15.

One last robustness check, also described in appendix C.3.3, tests the hypothesis inventors/firms might benefit from other sources agglomeration. For example, firms could benefit from being close to other firms that they can observe and learn from; alternatively, inventors could benefit simply from living in densely populated areas, not necessarily by other inventors. The results I find suggest otherwise: the coefficients on the number of firms (investing in R&D), total employment and total establishments in each city are either negative or not statistically significant (after accounting for the population of inventors).

The value of η. Taking into account the results in table 2, as well as the robustness checks in appendix C.3,I use η = 0.20 as the baseline value to compute the optimal distribution of R&D subsidies in section 5. In appendix F, I perform sensitivity analyses of my results using η = 0.15 and η = 0.25, which roughly spans the range of estimated coefficients found in all specifications.

I also estimate equation 15 using different values of ψ – namely, ψ = 0.4 and ψ = 0.6. When ψ = 0.4, the estimated coefficient ψη revolves around 0.11 (0.09 if estimated via OLS), which implies η = 0.275. Conversely, when ψ = 0.6, the estimated coefficient ψη fluctuates around 0.09 (0.05 if estimated via OLS), which implies η = 0.15. In all of the IV specifications, coefficients are statistically significant at the usual levels, and the implied value for η falls in the range used for sensitivity analyses in appendix F.

4.2.2 The Elasticity of Congestion

Given the share of consumer expenditures on the non-tradable good, the parameter that determines the elasticity of congestion in the model is the return to scale on the production of that good, β. With constant returns to scale β = 1) there is no congestion force, as the production of the non-tradable good scales up with the city size. As β becomes closer to zero, congestion costs become more and more relevant – up to the point where the supply of the non-tradable good is fixed and all variation in city size is absorbed into prices. This intuition offers some insight into how best to identify β. The first-order condition of the non-tradable good producer’s problem gives pn,cwc(Lcm¯c)1β. Using equation (6) to substitute for wages and approximating the price of the non-tradable good by the median rental value in each city, the model implies the following empirical relationship (details in appendix C.4)

log(pc,th)=(1β1θ)log(Lc,t)+δc+δt+zc,th(17)

where pc,th is the median rental value per square foot of housing units in city c during year t, Lc,t is the population of non-inventors in city c during year t,δc is a city fixed effect that accounts for variation in amenities and land availability, δt is a year fixed effect that accounts for the growth in wages/prices and zc,th is a city-specific shock (again a function of the productivity shock zc(t)).

Given θ = 0.6, β is identified by the coefficient on Lc,t in the regression above. However, like the population of inventors in each city, the model also predicts that the population of production workers is correlated with zc,th. As a result, estimating this regression via OLS will not recover the value of β. Notwithstanding, given that the population of production workers and the population of inventors in each city is highly correlated (the model predicts that they are proportional), Ic,t,l also serves as an instrument for the population of production workers.

Table 3 displays the estimation results from (17) using Ic,t,l as an instrument for Lc,t. As one would expect, the instrument in this case has a much lower predictive value in first stage, although the F-statistic generally remains above commonly used thresholds. The values of the elasticity of rental prices with respect to each city’s population are also quite large. For comparison, Behrens et al. (2014) find this elasticity to be between 0.08 and 0.09. This difference is due to the inclusion of city fixed effects in my model.38 Those fixed effects reflect in part the fact that cities have different amenities, which affect the utility of consumers. Leaving them out of the regression can therefore severely bias the elasticity of prices with respect to the population, since individuals are willing to pay higher prices to live in cities where amenities are higher.

Table 3:

Estimation of the elasticity of congestion, (1 —β)/(1 — θ).

article image
Standard errors are clustered at the CBSA level and shown in parenthesis. *, **, and *** indicate that the coefficient is statistically different from 0 at the 10%, 5%, and 1% levels, respectively. AKM SE indicates alternative standard errors, calculated according to Adao et al. (2019). All specifications control for CBSA and year fixed effects.

The different specifications shown in the table all produce similar results, with β around 0.5 — 0.6. As was the case with the elasticity of agglomeration, I also run this regression using an instrument that is lagged up to 12 years and when excluding industries that are highly concentrated in one single place. Those results can be found in appendix C.4.

Identification Conditions and Robustness Checks. The structural error in equation (17) is a function of the same productivity shock that determines the residual in the previous section. Hence, the conditions for the orthogonality of the instrument in both cases are also the same. Table 3 shows the estimated elasticity of congestion when the lag I used to construct the instrument varies from 5 to 10 years and when industry employment shares in each city are fixed at their average level between 1990 and 1995 – leveraging the fact that for large I, Ic,k,t-l and zc,th are likely to be uncorrelated. Appendix C.4 presents these same results when using an instrument that excludes industries whose employment share (of inventors) in any single city share exceed 15% at any point in time.

Rental values are only available in the ZRI database after 2010, which explain the small number of observations in table 2. Because of that, I also an alternative version of (17), where pc,th is approximated by the median housing price in each city (a series that goes back to 1996)39. This regression implies a higher value or β, around 0.8, which reflects the fact that housing prices tend to be less elastic to the population than rental prices.

The value of β. Combining the estimation results in Table 3 and in appendix C.4,I adopt β = 0.6 as the baseline value to compute the optimal distribution of R&D subsidies in section 5. I also perform sensitivity analyses using β = 0.5 and β = 0.8, which spans the range of estimated coefficients found in all specifications.

4.3 Moment Matching

Fixed Cost of Innovation. To estimate size of the fixed cost of innovation, n, I use equation (11), which relates the number of firms in each city to the number of inventors in the city. Summing both sides of that equation over cities and rearranging gives

κ=(1ψ)IN

where N is the total number of firms in the economy. Given that the average number of inventors per firm in the data is I/N ≈ 21.07 and ψ = 0.5, this relationship gives κ = 10.53.

City-Specific Parameters. Next, I turn to the set of city amenities αc and mean productivities χ¯c in each city. For cities c ∈ {1,..., C}, these parameters can be identified of the average share of inventors and patents filed by cities over time. Specifically, the average share of inventors in city c over time is

1T0TIc(t)Idt=(χ¯c1sc)1θΘαcθΘΣc=1C(χ¯c1sc)1θΘαcθΘ×1T0TZc(t)1θΘexp(1θΘ(1θΘ1)σ24ϕ)dt.

Given the assumptions on the evolution of Zc(t), it is not hard to show that it is an ergodic process. As such, the ergodic theorem applies (see Bergelson et al. (2012) for more on continuous-time ergodic theorems) and the integral in the expression above converges to an expected value when T → ∞. Assuming that the number of periods available in the data is large enough so that this result approximately holds, the average share of inventors in each city is given by

(avg.shareofinventors)c=(χ¯c1sc)1θΘαcθΘΣc=1C(χ¯c1sc)1θΘαcθΘI¯cI.

Similarly, the average share of patents filed in each city is

(avg. share of patents filed)c=1T0TNc(t)xf,c(t)Σc=1CNc(t)xf,c(t)dt=χ¯cI¯c1+ψηΣc=1Cχ¯cI¯c1+ψη.

The two sets of equations above identify αc and χ¯c for every c up to a constant. Since αc is a preference parameter, its level does not have much meaning and I normalize 𝔼cc] = 1. The scale of χ¯c can be identified of equation (13) by imposing that the rate of growth of the model g = λD equals 2%, the historic annualized rate of growth in the US. The amenity in city 0 can be found by requiring that the share of population in city 0, L0/(I + L), matches this share in the data. Both of these procedures are described in detail in appendix D.1. Before doing all of that, however, the value of σ2/4ϕ must be known.

Law of Motion of the Productivity Shock Lastly, I describe the identification of shock distribution parameters σ and ϕ. Since only the ratio σ2 matters for the equilibrium of the model, I set ϕ = 1. Next, σ can be found by matching the model-generated cross-sectional variance of the population of inventors between cities with the same moment in the data. Appendix D.2 derives the expression for this variance in the model and shows how to identify σ.

4.4 Comparison to Untargeted Moments

I assess the model’s external validity by measuring how well it can fit the spatial distribution of variables that were not targeted for estimation. Figure A.3 plots the model’s predictions against the data four untargeted variables: the share of firms per city (panel a), the average number of patents per firm in each city (panel b), the share of total employed population per city (panel c) and the spatial distribution of patents per capita (panel d). In general the model matches those distribution quite well – the correlation between the share of firms per city in the model and data is particularly high, at about 0.97. The distribution of patents per firm is harder to match, as there are many cities that have on average one patent per firm. This pattern holds for cities with widely different sizes and production of patents (see the vertical alignment of points in panel (b) of figure A.3).

Panels (c) and (d) of figure A.3 show the match between the total employed population and patents per capita between model and data. As a general rule, the model tends to underestimate the total population in cities where there is a small number of inventors and overestimate the population of cities where many inventors live – recall that the model predicts that the population of inventors and production workers is proportional; in practice, however, cities tend to specialize to some degree in innovation or in production. As a result, the model-predicted total population and model-predicted patents per capita tend to be of in each end of the city size distribution. Nevertheless, the match between model and data is not bad, with the correlations shown in panel A of table 4.

Panel B in Table 4 compares the outcomes in the model and data at different sections of the city size distribution. Specifically, it ranks cities based on their average population of inventors between 1998 and 2016 and divides them into five bins with an equal number of cities. It then compares the share of firms and the average number of patents per firm in each of those quintiles separately.40 For reference, I also include share of inventors and patents in each quintile of the city size distribution (there is no comparison between model and data in those cases, as the match is one-to-one).

Table 4:

Spatial Distribution of Untargeted Variables

article image

4.4.1 Can R&D Tax Credits Shift the Spatial Distribution of the Economy?

Another question addressed in this section is the extent to which R&D tax credits can influence the location of firms. Moretti and Wilson (2017) offer evidence that inventors are very sensitive to state taxes, but R&D tax credits tend to have a smaller effect than other forms of taxation. Similarly, Slattery (2019) finds that state-level subsidies have an important effect over firms’ locations, but this effect includes all discretionary state subsidies and cannot be attributed to R&D tax credits alone. Understanding the extent to which R&D tax credits can change the spatial distribution of the economy is relevant for interpreting the counterfactual results in the next section, which assess the welfare effects of alternative spatial configurations of those credits. It is also relevant for policy-makers who wish to quantify the effects of R&D policy.

To answer this question, I leverage the variation of R&D tax credits over time and measure how well the model can predict the spatial dispersion of the economy in the years when the spatial distribution of R&D tax credits differed from what it is today. I focus on the spatial distribution of the population of inventors and of patents filed in the decades of 1970–1979, 1989–1989 and 1990–1999. Using averages across longer periods has two advantages. First, the equilibrium of the model assumes a BGP, so its predictions do not apply to short-term variations. Second, these three decades roughly coincide with broad trends in the adoption of R&D subsidies: in the 1970’s, there were no subsidies; in the 1980’s, there was a spatially uniform federal subsidy, plus a few states offering subsidies of their own; in the 1990’s, this policy had already been adopted by most states.

I construct the model-implied distribution of inventors and patents per city in any given year by simply providing the model with the value of the R&D tax credit rates in that year (keeping all other parameters fixed). Using those credit rates and the parameters estimated above, I construct the share of inventors and patents produced in each city for all years, then calculate their averages for each decade. Panel A of Table 5 shows the correlations between model outcomes and the data for each decade. For better visualization, I again aggregate cities according to quintiles of the city size distribution (where cities are ordered according to their population of inventors) and report the model-predicted and observed share of inventors and patents in each of those bins.

Table 5:

Spatial Distribution of Inventors and Patents Over Time

article image

To account for persistence in city size, I also analyze the model- and data-implied changes in the shares of inventors and patents in each city across time. To this end, I again compute the average share of inventors and patents produced in each city during the 1970’s, 1980’s, 1990’s and 2000’s. Next, I find the difference between those shares in the decades of 1970, 1980 and 1990 relative to their value in 2000. Figure A.4 shows the correlation between model and data outcomes in each decade, which hover around 0.6 for changes in the share of inventors and 0.8 for changes in the share of patents fled. Panel B of Table 5 displays the output of regressing the changes in shares observed in the data on its counterpart in the model, where only the value of the R&D tax credit is allowed to vary.

My results suggest that R&D tax credits are quite relevant for the location decisions of inventors/firms and the production of innovation. The R-squared of the regressions in Table 5 show that changes in the R&D tax credit rate can explain about 40% of the variation of changes in population shares over time and over 60% of the variation of changes in the production of patents. The correlations in Panel A of that same table indicate that those changes often go in the direction predicted by the model.41

R&D Tax Credits vs Corporate and Labor Income Taxes. Previous research has linked changes in corporate and labor income taxes to changes in the quantity and location of innovation (Akcigit et al., 2020). As a result, it is important to check whether R&D tax credit rates have moved in tandem with those taxes – which would mean that the results above might not be related to R&D tax credits themselves, but the effects of changes in other local policies.

To test this possibility, I calculate, for each state in the US, the year-over-year change in (1) the statutory R&D tax credit rate, (2) the state corporate income tax rate, and (3) the state labor income tax rate for the top income bracket.42 Data for all series is available between 1977 and 2006. Across all states, the correlation between changes in the R&D tax credit rate and either corporate or labor income taxes is very small (-0.01 and -0.03, respectively), and not statistically significant at the 10% level. Different specification of a regression of R&D tax credits on both corporate and labor income taxes all yield the same result: coefficients that are small and not statistically significant.43

5 The Welfare Effects of Spatial Policies

I now turn to the main question motivating this study: can a redistribution of local R&D subsidies increase aggregate welfare in the economy? The answer to this question has two parts. First, I compare the current spatial distribution of R&D subsidies in the US with a spatially homogeneous subsidy that is implemented with the same amount or resources. In practice, each state is able to choose its own tax credit level, so the spatial distribution of R&D subsidies US can be understood as the outcome of a non-cooperative game played by policy makers in each state. Comparing this decentralized outcome with a spatially neutral subsidy informs us about the welfare gains of allowing states to compete by choosing R&D policy.

Second, I compare the aggregate welfare level under the current distribution of R&D subsidies with the theoretical maximum welfare level that is obtained by solving a central planner’s problem. Specifically, I assume the existence of a government that is able to choose the value of all local R&D subsidies in order to maximize welfare. The government’s problem highlights some of the main tradeoffs associated with changing the spatial distribution of agents in the economy. I compute an approximate solution for this problem, which produces a set of “optimal” R&D subsidies. This approximate solution is then used to measure the potential welfare gains from the redistribution of local R&D subsidies in the US and to inform us about which places should benefit from R&D policy.

5.1 The Government’s Problem

Aggregate welfare in this model is measured as the sum of the utility of all workers in the economy, since all firm- and land-owners are fully taxed. For convenience, I assume that the cost of producing G¯ units of the public good is γG¯=π¯G¯Q(t), and that the production of this good is fixed throughout all counterfactuals. I also define Π(t) be the aggregate flow of profits by all firms in the economy in period t. The government’s problem is

max{sc}c=1C0eρt{Σc=0C[Lc(t)u(t)+Ic(t)ui(t)]}G¯dts.t.0ert[Σc=0Cscwci(t)Ic(t)+γ(G¯)]dt=0ert[pm,0(t)m¯0+Σc=1Cpm,c(t)m¯c+Π(t)]dt.

Note that the population of inventors and production workers, as well as their utility, wages, the rate of creative destruction, land prices and profits are all endogenously defined in the model. Using the expressions for these variables obtained in the model’s equilibrium and defining w¯i=wi(t)/[π¯Q(t)] to be the normalized (static) baseline wage of inventors, the government’s problem can be reduced to a static one (see appendix E):

max{sc}c=1C(LL0(s))θβ(1+θβL0(s)LL0(s))w¯i(s)1θρ(1θ)λD(s)s.t.e1θΘ(1β)θΘσ24ϕ1CΣc=1Csc(1αc)θ1θI¯˜c(s)1θβ1θ+G¯w¯i(s)=rλD(s)r+D(s)L0(s)w¯i(s)+(1β)θ1θβ×(18)[(1α0)θ1θ(θβIL0(s)LL0(s))1θβ1θ+e1θΘ(1β)θΘσ24ϕ1CΣc=1C(1α0)θ1θI¯˜c(s)1θβ1θ],

where I use the notation w¯i(s) to indicate that wages are a function of the vector of R&D subsidies s = (s1,..., sC) – the same applies to L0(s), I¯˜c(s) and D(s). In the version of the problem shown in (18), it becomes clear that changing the value of the R&D subsidy affects the government’s problem through its effect on (1) the population distribution, I¯˜c and L0; (2) wages, w¯i; (3) the rate of creative destruction, D (and the rate of growth, λD); and (4) the direct effect over expenditures.

The term w¯i(s)1θρ(1θ)λD(s) highlights one important tradeoff in the government s problem. On the one hand, this term increases with the rate of creative destruction, D: a higher rate of creative destruction means that the economy grows at a higher rate, which therefore implies a higher present value of welfare. Furthermore, from corollary D1CΣc=1Cχ¯cI¯˜c1+ψη which means that a more spatially concentrated population leads to a higher rate of innovation – especially if the population is concentrated on cities with a large χ¯c. On the other hand, the normalized wage w¯i decreases with the rate of creative destruction (see equation 12 in proposition 2). Intuitively, when the rate of creative destruction increases, so does the rate at which firms discount the future, r + D, because the probability that any of the firm’s product lines will be stolen by a competitor increases. This leads firms to decrease investments in R&D, which reduces the demand for inventors and pushes their wages down (the same happens for production workers through general equilibrium effects).44 Lower wages then result in lower welfare.

5.2 A Spatially Homogeneous Subsidy

The current spatial distribution of R&D subsidies in the US can be understood as the outcome of the competition among states to attract innovative firms and inventors into their jurisdiction. To evaluate the effects of this competition, consider a counterfactual economy where states are not allowed to compete, so that R&D subsidies are fixed over space scs¯ for all c. Since taxes and other government expenditures are kept constant throughout all counterfactuals, s¯ is fully determined by the government’s budget constraint. Under the parameter values found in the previous section, this subsidy rate is close to 19% (the average subsidy rate under the current distribution is about 16%).

Moving to a spatially homogeneous subsidy spreads the population of inventors more evenly over space: the HHI index of the city population shares moves from 0.027 to 0.025. Under this alternative population distribution, aggregate welfare falls by 0.77% due to a decrease in the growth rate of the economy of approximately 0.03 percentage points. In contrast, the static baseline wage w¯i increases by 0.91%. In words, the decentralized adoption of R&D subsidies by states has led to a higher welfare level than what would be attained under a spatially neutral subsidy that is implemented using the same amount of resources. This suggests that the states that offer the largest R&D tax credits are indeed to ones that are comparatively better at producing innovation (leading to a higher growth rate). In the next section, I ask whether we can do even better by allowing a social planner to choose the value of all local subsidies.

5.3 Approximating the Optimal R&D Subsidies

Finding the exact optimal subsidies that solve (18) can be computationally challenging, as this a non-convex problem with 860 choice variables (cities). Therefore, I compute an approximate solution by imposing a functional form to sc:

sc={ζαcξχ¯cωifζαcξχ¯cωτ;τifζαcξχ¯cω>τ.

This functional form is motivated by the fact that cities only differ from each other because of either αc or χ¯c – and therefore any differences in the optimal subsidy across cities will necessarily be driven by differences in these two parameters. Imposing this functional form, however, reduces the government’s problem to the choice of a few parameters, instead of the full distribution of subsidies.

I consider three different values for the subsidy cap: τ ∈ {0.3, 0.4, 0.5}. The highest credit rate currently offered in the data (combining state and federal tax credits) coincides with the lowest value of the cap, at about 30%. In each case, the parameters ξ and ω are chosen in the interval [—5,10] to maximize aggregate welfare. The scale parameter £ ensures that the budget constraint is satisfied.

Figure A.5 plots aggregate welfare as a function of ξ and ω (fixing τ = 0.4) and the resulting optimal subsidy as a function of amenities and local productivity. It is clear from panel (b) that the welfare is maximized when innovation is concentrated in cities with high amenities and high productivity, so the optimal subsidy rates should move the economy in this direction. This result is intuitive: cities with high productivity produce more innovation per worker, so moving the population to those cities will generate a higher growth rate. Alternatively, workers living in cities with high amenities will accept relatively lower wages, so firms in those cities experience less congestion costs, all else equal.

Figure 2:
Figure 2:

Optimal R&D Tax Credit Rates per City

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Figure 2 maps the optimal R&D tax credits for each city in the US. In accordance to the discussion above, there are two areas that are heavily subsidized under the optimal policy: the Silicon Valley (San Jose) and New York City, which are already the two largest producers of patents in the country. Figure A.6 shows how the optimal R&D subsidies affect the geographical distribution of inventors and patents produced in each city, relative their current values. Note that a big part of the effect of the optimal subsidies is to move the population from mid-sized cities to a few high productivity/high amenity cities (e.g. San Jose and NYC), dramatically increasing their share of the population and innovation.

The welfare gains from the spatial reallocation of the population caused by the optimal distribution of R&D subsidies is shown in table 6. When capping the city-level subsidy at 50%, the model predicts that total welfare would grow by at least 6% if the optimal distribution of R&D subsidies was adopted. This gain is generated in part by an increase of 0.26 percentage points in the rate of growth of the economy. However, as mentioned above, baseline wages also fall by over 7%, indicating that the higher rate of creative destruction has lowered the demand for labor by innovative firms.

Table 6:

Gains from adopting optimal subsidies.

article image

5.3.1 Subsidies by State

The results described above are predicated on the assumption that R&D subsidies can vary by city.45 In practice, however, these subsidies are chosen at the state level. Taking the geographical scope of the policy as given, I re-run the exercise above while constraining subsidies to be constant within states. To do that, let c(S) indicate a city c that is located in state S. Denote by C(S) the total number of cities in each state and the approximate optimal subsidy by

sc(s)={ζ1C(s)Σc=1Cαcξχ¯cω,ifζ1C(s)Σc=1Cαcξχ¯cωτ;τ,ifζ1C(s)Σc=1Cαcξχ¯cω>τ.

Once again,τ ∈ {0.3, 0.4,0.5} and the parameters ξ and ω are chosen to maximize total welfare in (18). The parameter £ ensures the the government’s budget constraint is satisfied. Panel B of Table 6 shows the welfare effects of the optimal distribution of R&D subsidies across states. Note that the distribution follows the same pattern as above, where a higher spatial concentration of innovation leads to gains in welfare. Those gains are smaller, however, as there are more constraints on the value of the subsidy. Finally, note that this pattern is again robust to different values of the agglomeration and congestion elasticities, as shown by the sensitivity analyzes in appendix F.

Figure 3 shows the optimal value of the subsidy in each state. It is interesting to see that the states that should be subsidized the most are not California and New York – as would be suggested from the results in the previous section-, but California and Idaho! The reason for this difference is that the state of New York has a number of smaller cities that combined produce a sizable share of innovation in the state. Therefore, subsidizing New York state would bring more inventors to NYC, but also to all those smaller cities, not achieving a large concentration of the population.

Figure 3:
Figure 3:

Optimal R&D Tax Credit Rates per State

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

In contrast, innovation in Idaho is much more concentrated in Boise (which is itself an innovation “hub”, being the headquarters of companies such as Micron and Hewlett-Packard).46 As a result, the effect of an R&D subsidy in Idaho would be highly focused in that city, leading to a larger concentration of the population in one single place. One important conclusion from this discussion is that the spatial distribution of the optimal R&D subsidy can drastically change depending on the geographical scope of the policy.

5.4 Discussion

There are a few important points to keep in mind when interpreting the results found in this section. First, the welfare gains reported here are the product of a pure redistribution of R&D subsidies over space. Expenditures on those subsidies are kept constant throughout all counterfactual exercises, with the potential exception of endogenous changes in the government’s revenue caused by the reallocation of the population. As such, the optimal subsidy rates computed here require no changes in taxation by the government. Second, the gains reported in Table 6 are only a lower bound for the increase in welfare that can be obtained by the redistribution of R&D subsidies. This is a direct consequence of imposing a functional form to approximate the optimal subsidies, which does not necessarily describe the policy that maximizes the government’s problem.

A third point concerns the limitations of the model used to compute the optimal policies and aggregate welfare. The introduction of moving costs, for example, can have relevant effects on welfare and on the the distribution of the optimal R&D tax credits. In a similar note, short-run adjustment costs (e.g., in investments in R&D) are also ignored in the model, so the results found here should be thought of as long-run responses to changes in policy. Finally, there are many other issues that are relevant for policy makers and can be affected by changes in the spatial distribution of agents in the economy (e.g., income inequality, joblessness); this paper does not address those concerns, as they are outside of the scope its research question.

6 Conclusion

This paper assesses whether there are welfare gains from the spatial reallocation of R&D tax credits in the US. As a framework to analyze counterfactual spatial distributions of the tax credit, I construct an endogenous growth model with spatial heterogeneity and agglomeration economies in the production of innovation. This framework contributes to the literature on endogenous growth by nesting a model of growth through creative destruction into a spatial setting. It also contributes to the literature on spatial and dynamic models by developing a tractable model that can be easily matched to micro data.

Qualitatively, I identify an important tradeoff that must be addressed when computing the optimal spatial distribution of R&D subsidies: increasing the geographical concentration innovation in highly productive cities will increase the rate of growth of the economy, but it also increases the rate at which firms discount the future due to a higher rate of creative destruction. This reduces individual firms’ investments in R&D, which puts downward pressure on the wages of inventors and decreases aggregate welfare. Quantitatively, I find that concentrating the population of inventors in cities with high amenities and high productivity has positive and potentially large impacts on aggregate welfare. Furthermore, those gains are achieved through a pure redistribution of the R&D subsidy over space, keeping all taxes and other government expenditures constant.

References

  • Acemoglu, D., Akcigit, U., Alp, H., Bloom, N., and Kerr, W. (2018). Innovation, Reallocation, and Growth. American Economic Review, 108(11):345091.

    • Search Google Scholar
    • Export Citation
  • Adão, R., Kolesár, M., and Morales, E. (2019). Shift-Share Designs: Theory and Inference. The Quarterly Journal of Economics, 134(4):19492010.

    • Search Google Scholar
    • Export Citation
  • Aghion, P., Akcigit, U., and Howitt, P. (2014). What Do We Learn From Schumpeterian Growth Theory? In Handbook of Economic Growth, volume 2, pages 515563. Elsevier.

    • Search Google Scholar
    • Export Citation
  • Aghion, P. and Howitt, P. (1992). A Model of Growth Through Creative Destruction. Econometrica, 60(2):323351.

  • Akcigit, U., Ates, S., and Impullitti, G. (2018). Innovation and Trade Policy in a Globalized World. NBER Working Paper 24543, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Akcigit, U., Baslandze, S., and Stantcheva, S. (2016). Taxation and the International Mobility of Inventors. American Economic Review, 106(10):29302981.

    • Search Google Scholar
    • Export Citation
  • Akcigit, U., Grigsby, J., Nicholas, T., and Stantcheva, S. (2020). Taxation and Innovation in the Twentieth Century. The Quarterly Journal of Economics, 137(1):329385.

    • Search Google Scholar
    • Export Citation
  • Akcigit, U. and Kerr, W. (2018). Growth through Heterogeneous Innovations. Journal of Political Economy, 126(4):13741443.

  • Argente, D., Baslandze, S., Moreira, S., and Hanley, D. (2019). Patents to Products: Innovation, Product Creation, and Firm Growth. Working paper.

    • Search Google Scholar
    • Export Citation
  • Austin, B., Glaeser, E., and Summers, L. (2018). Jobs for the Heartland: Place-Based Policies in 21st-Century America. Brookings Papers on Economic Activity, 49(Spring):151255.

    • Search Google Scholar
    • Export Citation
  • Autor, D. H., Dorn, D., and Hanson, G. H. (2013). The China Syndrome: Local Labor Market Effects of Import Competition in the United States. American Economic Review, 103(6):21212168.

    • Search Google Scholar
    • Export Citation
  • Behrens, K., Duranton, G., and Robert-Nicoud, F. (2014). Productive Cities: Sorting, Selection, and Agglomeration. Journal of Political Economy, 22(13):507553.

    • Search Google Scholar
    • Export Citation
  • Bems, R. (2008). Aggregate Investment Expenditures on Tradable and Nontradable Goods. Review of Economic Dynamics, 11(4):852883.

  • Bergelson, V., Leibman, A., and Moreira, C. G. (2012). Form discrete- to continuous-time ergodic theorems. Ergodic Theory and Dynamical Systems, 32:383426.

    • Search Google Scholar
    • Export Citation
  • Bloom, N., Grifth, R., and Van Reenen, J. (2002). Do R&D tax credits work? Evidence from a panel of countries 1979–1997. Journal of Public Economics, 85(1):131.

    • Search Google Scholar
    • Export Citation
  • Blundell, R., Grifth, R., and Windmeijer, F. (2002). Individual Effects and Dynamics in Count Data Models. Journal of Econometrics, 108(1):113131.

    • Search Google Scholar
    • Export Citation
  • Borusyak, K., Hull, P., and Jaravel, X. (2022). Quasi-Experimental Shift-Share Research Designs. The Review of Economic Studies, 89(1):181213.

    • Search Google Scholar
    • Export Citation
  • Caliendo, L., Dvorkin, M., and Parro, F. (2019). Trade and Labor Market Dynamics: General Equilibrium Analysis of the China Trade Shock. Econometrica, 87(3):741835.

    • Search Google Scholar
    • Export Citation
  • Carlino, G., Carr, J., Hunt, R., and Smith, T. (2012). The Agglomeration of R&D Labs. Working Paper 12–22, Federal Reserve Bank of Philadelphia.

    • Search Google Scholar
    • Export Citation
  • Carlino, G. and Kerr, W. (2015). Agglomeration and Innovation. In Duranton, G., Henderson, J. V., and Strange, W. C., editors, Handbook of Regional and Urban Economics, volume 5, chapter 6, pages 349404. Elsevier.

    • Search Google Scholar
    • Export Citation
  • Carlino, G. A., Chatterjee, S., and Hunt, R. M. (2007). Urban Density and the Rate of Invention. Journal of Urban Economics, 61:389419.

    • Search Google Scholar
    • Export Citation
  • Combes, P.-P., Duranton, G., and Gobillon, L. (2008). Spatial wage disparities: Sorting matters! Journal of Urban Economics, 63:723742.

    • Search Google Scholar
    • Export Citation
  • Correia, S., Guimarães, P., and Zylkin, T. (2019). ppmlhdfe: Fast Poisson Estimation with High-Dimensional Fixed Effects.

  • Davis, D. R. and Dingel, J. I. (2019). A Spatial Knowledge Economy. American Economic Review, 109(1):153170.

  • De La Roca, J. and Puga, D. (2016). Learning by Working in Big Cities. The Review of Economic Studies, 84(1):106142.

  • Desmet, K., Nagy, D. K., and Rossi-Hansberg, E. (2018). The Geography of Development. Journal of Political Economy, 126(3):903983.

  • Desmet, K. and Rossi-Hansberg, E. (2014). Spatial Development. American Economic Review, 104(4):12111243.

  • Duranton, G. (2007). Urban Evolutions: The Fast, the Slow, and the Still. American Economic Review, 97(1):197221.

  • Duranton, G. and Puga, D. (2014). The Growth of Cities. In Aghion, P. and Durlauf, S., editors, Handbook of Economic Growth, volume 2, chapter 5, pages 781853. Elsevier.

    • Search Google Scholar
    • Export Citation
  • Eaton, J. and Kortum, S. (2002). Technology, Geography and Trade. Econometrica, 70(5):17411779.

  • Fajgelbaum, P. and Gaubert, C. (2020). Optimal Spatial Policies, Geography and Sorting. Quarterly Journal of Economics, 135(2):9591036.

    • Search Google Scholar
    • Export Citation
  • Fajgelbaum, P., Morales, E., Serrato, J. C. S., and Zidar, O. (2019). State Taxes and Spatial Misallocation. Review of Economic Studies, 86(1):333376.

    • Search Google Scholar
    • Export Citation
  • Farrokhi, F. (2021). Skill, Agglomeration, and Inequality in the Spatial Economy. International Economic Review, 62(2):671721.

  • Gaubert, C. (2018). Firm Sorting and Agglomeration. American Economic Review, 108(11):311753.

  • Glaeser, E. L. and Gottlieb, J. D. (2008). The Economics of Place-Making Policies. Brookings Papers on Economic Activity, pages 155239.

    • Search Google Scholar
    • Export Citation
  • Glaeser, E. L. and Hausman, N. (2019). The Spatial Mismatch Between Innovation and Joblessness. NBER Working Papers 25913, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Glaeser, E. L. and Maré, D. C. (2001). Cities and Skills. Journal of Labor Economics, 19(2):316342.

  • Goldsmith-Pinkham, P., Sorkin, I., and Swift, H. (2020). Bartik Instruments: What, When, Why, and How. American Economic Review, 110(8):25862624.

    • Search Google Scholar
    • Export Citation
  • Gompers, P. and Lerner, J. (2001). The Venture Capital Revolution. Journal of Economic Perspectives, 15(2):145168.

  • Goolsbee, A. (1998). Does Government R&D Policy Mainly Benefit Scientists and Engineers? American Economic Review, 88(2):298302.

  • Greenstone, M., Hornbeck, R., and Moretti, E. (2010). Identifying Agglomeration Spillovers: Evidence from Winners and Losers of Large Plant Openings. Journal of Political Economy, 118(3):536598.

    • Search Google Scholar
    • Export Citation
  • Griliches, Z. (1990). Patent statistics as economic indicators: A survey. Journal of Economic Literature, 28(4):16611707.

  • Grossman, G. M. and Helpman, E. (1991). Quality Ladders in the Theory of Growth. Review of Economic Studies, 58(1):4361.

  • Hall, B. and Van Reenen, J. (2000). How Effective are Fiscal Incentives for R&D? A Review of the Evidence. Research Policy, 29(4–5):449469.

    • Search Google Scholar
    • Export Citation
  • Hall, B. H., Jafe, A. B., and Trajtenberg, M. (2001). The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools. NBER Working Paper 8498, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Hall, B. H. and Ziedonis, R. H. (2001). The Patent Paradox Revisited: An Empirical Study of Patenting in the U.S. Semiconductor Industry, 1979–1995. The RAND Journal of Economics, 32(1):101128.

    • Search Google Scholar
    • Export Citation
  • Henderson, J. V. (1974). The Sizes and Types of Cities. American Economic Review, 64(4):640656.

  • Hsieh, C.-T. and Moretti, E. (2019). Housing Constraints and Spatial Misallocation. American Economic Journal: Macroeconomics, 11(2):139.

    • Search Google Scholar
    • Export Citation
  • Joint Committee on Taxation (2010). Estimated Budget Effects of the Revenue Provisions Contained in the President’s Fiscal Year 2011 Budget Proposal, JCX-7–10R. https://www.jct.gov/publications.html?func=startdown&id=3665.

    • Search Google Scholar
    • Export Citation
  • Klette, T. J. and Kortum, S. (2004). Innovating Firms and Aggregate Innovation. Journal of Political Economy, 12(5):9861018.

  • Kline, P. and Moretti, E. (2014). Local Economic Development, Agglomeration Economies, and the Big Push: 100 Years of Evidence from the Tennessee Valley Authority. The Quarterly Journal of Economics, 129(1):275331.

    • Search Google Scholar
    • Export Citation
  • Kolympiris, C., Kalaitzandonakes, N., and Miller, D. (2011). Spatial collocation and venture capital in the US biotechnology industry. Research Policy, 40(9):11881199.

    • Search Google Scholar
    • Export Citation
  • Manso, G. (2011). Motivating Innovation. The Journal of Finance, 66(5):18231860.

  • Moretti, E. (2021). The Effect of High-Tech Clusters on the Productivity of Top Inventors. American Economic Review, 11(10):33282275.

    • Search Google Scholar
    • Export Citation
  • Moretti, E. and Wilson, D. (2014). State Incentives for Innovation, Star Scientists and Jobs: Evidence from Biotech. Journal of Urban Economics, 79(C):2038.

    • Search Google Scholar
    • Export Citation
  • Moretti, E. and Wilson, D. J. (2017). The effect of State Taxes on the Geographical Location of Top Earners: Evidence from Star Scientists. American Economic Review, 107(7):18581903.

    • Search Google Scholar
    • Export Citation
  • Ossa, R. (2015). A Quantitative Analysis of Subsidy Competition in the U.S. NBER Working Papers 20975, National Bureau of Economic Research.

    • Search Google Scholar
    • Export Citation
  • Puga, D. (2010). The Magnitude and Causes of Agglomeration Economies. Journal of Regional Science, 50(1):203219.

  • Rosenthal, S. S. and Strange, W. C. (2001). The Determinants of Agglomeration. Journal of Urban Economics, 50:191229.

  • Saxenian, A. (1994). Regional Advantage: Culture and Competition in Silicon Valley and Route 128. Harvard University Press, Cambridge, MA.

    • Search Google Scholar
    • Export Citation
  • Shiryaev, A. (1996). Probability. Springer-Verlag New York, 2 edition. Translated by S. S. Wilson.

  • Slattery, C. R. (2019). Bidding for Firms: Subsidy Competition in the U.S. Working paper.

  • Stokey, N. (2008). The Economics of Inaction: Stochastic Control Models with Fixed Costs. Princeton University Press.

  • Wilson, D. J. (2009). Beggar Thy Neighbor? The In-State, Out-of-State, and Aggregate Effects of R&D Tax Credits. The Review of Economics and Statistics, 91(2):431436.

    • Search Google Scholar
    • Export Citation
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. The MIT Press, Cambridge, MA.

Appendix

A Figures

Figure A.1:
Figure A.1:

Spatial Distribution of R&D Tax Credit Rates.

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Note: figure shows the average effective R&D tax credit rate in the US. See the note in figure 1.
Figure A.2:
Figure A.2:

Spatial Distribution of R&D Tax Credit Rates.

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Note: figure shows the effective R&D tax credit rates in the US. See the note in figure 1. The two discontinuities in the federal credit rate are due to (1) a change in the method for computing the federal base level in 1991 and (2) the fact that there were no federal credits in 1995.
Figure A.3:
Figure A.3:

Distributions in the Model and the Data

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Figure A.4:
Figure A.4:

Correlation Between Changes City Population: Model vs Data

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Note: The change in city population is defined as the difference between the average population share of the city in a given decade minus the average population share of the same city between 2000 and 2006.
Figure A.5:
Figure A.5:

Results from Welfare Maximization

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

Figure A.6:
Figure A.6:

Changes in the spatial distribution of inventors and innovation

Citation: IMF Working Papers 2022, 131; 10.5089/9798400212666.001.A001

The x-axis shows the share of inventors and patents in each percentile of the city distribution (in log scale), where cities are ordered by their current share of inventors (panel a) or patents filed (panel b). The y-axis plots the expected change in those shares should the economy move to the optimal subsidy scheme. For better visualization, each plot aggregates cities into percentiles.

B Proofs and Derivations

Proof of Lemma 1

City 0. To derive the relationship between population and wages, it is useful to separate city 0 from cities 1,..., C. The non-tradable good producer in city 0 solves the problem

maxn,0,mpn,0nw0n,0pm,0ms.t.n=n,0βm1β,

where w0 is determined in equation (8) using the final good producer’s problem. The first-order conditions are

[n,0]:βpn,0(mn,0)1β=w0[m]:(1β)pn,0(n,0m)β=pm,0.

There are three local market clearing conditions (in the sense that they hold inside city 0). The land market clearing condition is

m = m0

since land is a fixed factor. The labor market clearing condition is

L0=y,0+n,0,

where L0 is defined as the total population of production workers in city 0. And finally the non-tradable good market clearing condition is

θ[y,0w0pn,0+n,0w0pn,0]=n

where the demand for non-tradable good from each worker is θw0/pn,0, given the familiar Cobb-Douglas utility function of workers.

Using the F.O.C. [ℓn,0] from the non-tradable good producer’s problem, the supply of the non-tradable good is

n=w0pn,0n,0β.

Plugging this into the non-tradable good market clearing condition,

θβ[y,0+n,0]n,0n,0=θβL0anby,0=(1θβ)L0.

We can also compute the land rent in city 0 by using the F.O.C. [m] and the land market clearing condition:

pm,0=(1β)pn,0(n,0m0)β.

Plug in pn,0 from the F.O.C. [ℓn,0] to find

pm,0m0=(1β)θw0L0.

Now turn to the free mobility condition u0=u. A production worker’s utility is

u=[α0(θw0pn,c)]θ[(1θ)w0]1θ.

Once again, we can plug in the F.O.C. [ℓn,0] and the labor market clearing condition above to find,

u=[α0θβ(m0θβL0)1β]θ[(1θ)w0]1θ.

Define L˜=L0/m0 as the population per unit of land in city 0. Rearranging the expression above,

w0=11θ[u(θβ)θβ]11θ(L˜01βα0)θ1θ,

as desired.

Cities 1,..., C. The process for cities 1 through C is very similar, with the exception that these cities also have a population of inventors. In each city, the non-tradable good producer solves

maxn,c,mcpn,cnwcn,cpm,cmcs.t.n=n,cβmc1β.

The first-order conditions are

[n,c]:βpn,c(mcn,c)1β=wc[mc]:(1β)pn,c(n,cmc)β=pm,c.

Again, there are three local market clearing conditions that must hold in equilibrium. For all c ∈{1,...,C}, the land market clearing condition is

mc=m¯c;

the labor market clearing condition is

n,c=Lc;

and the goods market clearing condition is

θ[Lcwcpn,c+Icwcipn,c]=n

where Lc and Ic are, respectively, the population of production workers and the population of inventors in city c.

Using the F.O.C. [ℓn,c] and the two latter market clearing conditions, we get

Lcmc¯=(βpn,cwc)11β

and

θ[Lcwcpn,c+Icwcipn,c]=wcpn,cLcβ.

This second equation simplifies to

Lcwc=θβ1θβIcwci.(B.1)

The utility level for production workers is therefore

u=[αcθwcpn,c]θ[(1θ)wc]1θ=[αcθβ(m¯cLc)1β]θ[(1θ)wc]1θ.

Rearranging this expression and using the “tilde” to denote variables expressed by units of land (L˜c=Lc/m¯c), production worker’s wages are

wc=w(L˜c1βαc)θ1θwherew=11θ[u(θβ)θ]11θ.(B.2)

To find wci, and rewrite equation (B.1) as (recall that “tildes” indicate variables per unit of land, I¯˜c=Ic/m¯c)

L˜cwcpn,c=θβ1θβI˜cwcipn,c.

Using pn,cwc=L˜c1ββ, we get

L˜c=[θ1θβI˜cwcipn,c]1β.

Now plug this and (B.2) into (B.1) to find

wcipn,c=1θβθ(1wθβ1θβwci)β(1θ)1θβαcθβ1θβI˜c1β1θβ.

The utility of inventors is thus

ui=[αcθwcipn,c]θ[(1θ)wci]1θ=[αc(1θβ)(1wθβ1θβwci)β(1θ)1θβαcθβ1θβI˜c1β1θβ]θ[(1θ)wci]1θ

Rearranging,

wci=wi(I˜c1βαc)θ1θwherewi={ui(1θ)1θ[(1θβ)(1wθβ1θβ)β(1θ)1θβ]θ}1θβ1θ.(B.3)

Finally, going back to (B.1) and plugging in (B.2) and (B.3), we get

Lc=(θβ1θβwiw)1θ1θβIc

for cities c ∈ {1,..., C}. Summing over cities where there is innovation and using that I=Σc=1CIc and L=L0+Σc=1CLc, it follows that

w=(ILL0)1θβ1θθβ1θβwi.(B.4)

Finally, plug (B.4) into (B.3) to find

wi=11θ[ui[(1θβ)]θ]11θ(ILL0)θβ1θ.

Also note that plugging (B.2) and (B.3) into (B.1) and summing over c ∈ {1,..., C} implies that the number of inventors and production workers is proportional in those cities:

IcI=LcLL0.

Finally, we can find land rents in each city by plugging in the land market clearing condition and the F.O.C. [ℓn,c] into the F.O.C. [mc]:

pm,cm¯c=1ββwcLc.

It is convenient to write this expression in terms of the population and wage of inventors in each city. Using equation (B.1), we have

pm,cm¯c=(1β)θ1θβwcIc.

Proof of Lemma 2

As described in the main text, the firm’s HJB equation is

rVc(qf,I˜c,Zc,A)=maxxf,c{Σqjqfπ¯L0qj+xf,c𝔼j[Vc(qf+{(1+λ)qj},I˜c,Zc,A)Vc(qf,I˜c,Zc,A)]DΣqjqf[Vc(qf,I˜c,Zc,A)Vc(qf\{qj},I˜c,Zc,A)](1sc)wci(if,c+κ)+𝔼[dVc(qf,I˜c,Zc,A)]dt}s.t.xf,c=χ¯cZc(I˜cηif,t)ψ

where I have defined Zc = ezc as the local productivity shock. Since zc is an Ornstein-Uhlenbeck process with law of motion dzc = ϕ(µ — zc)dt + σdWc(t), it follows that

dZc=ϕ(σ24ϕln(Zc))Zcdt+σZcdWc(t)

by application of Ito’s lemma and using μ=σ24ϕ.

To prove lemma 2, we only need to determine dVc(qf, Ĩc, Zc, A). This can be done by applying Ito’s lemma to the the firm’s value function Vc, while taking into account that one of the state variables – the population of inventors per land in the city Ĩc is a function of the shock Zc. For each city c ≥ 1, define a function hc : +×+2×[0,1]×[0,L][0,I/m¯c] such that Ĩc = hc(Zc; A) (recall that A=(Q,wi,D,L0)+2×[0,1]×[0,L]). Ito’s lemma implies that

dhc=hcAAtdt+[ϕ(σ24ϕln(Zc))ZchcZc+(σZc)222hcZc2]dt+σZchcZcdWc(t),

where the arguments of the function are suppressed for convenience in the notation. Note that the first term in the equation above is the regular derivative of hc w.r.t. the vector representing the aggregate state of the economy, while the remaining two terms involve differentiating w.r.t. the stochastic process Zc.

Given the process for hc, we use Ito’s lemma once again to differentiate Vc(qf, hc(Zc; A), Zc, A)

with respect to time:

dVc=VcAAtdt+VcZcdZc+Vchcdhc+12[2VcZc2(dZc)2+2Vchc2(dhc)2+22VcZchcdZcdhc]=VcAAtdt+VcZc[ϕ(σ24ϕln(Zc))Zcdt+σZcdWc(t)]+Vchc{[ϕ(σ24ϕln(Zc))ZchcZc+(σZc)222hcZc2]dt+σZchcZcdWc(t)+hcA