Optimum Taxation and Tax Policy
Author: Nicholas Stern1
  • 1 0000000404811396https://isni.org/isni/0000000404811396International Monetary Fund

What types of goods should be taxed? How progressive should the income tax be? What should be the balance between the taxation of commodities and the taxation of income? These questions are obviously central to public finance, and they have concerned many leading economists of the last two centuries, from Smith, Mill, Dupuit, Edgeworth, and Wicksell to Pigou and Ramsey. The past 15 years, however, has seen a tremendous surge in the formal analysis of the problems posed by these questions. This paper gives an introduction to this recent literature. Although much of the literature is technical, this paper aims to present a broad understanding of the methods of approach, the type of arguments used, and the main conclusions reached. Given the precise nature of the problems and the sensitivity of many of the results to particular assumptions, it is not entirely possible to avoid formal details. The paper therefore attempts to identify some fairly general and robust lessons.


What types of goods should be taxed? How progressive should the income tax be? What should be the balance between the taxation of commodities and the taxation of income? These questions are obviously central to public finance, and they have concerned many leading economists of the last two centuries, from Smith, Mill, Dupuit, Edgeworth, and Wicksell to Pigou and Ramsey. The past 15 years, however, has seen a tremendous surge in the formal analysis of the problems posed by these questions. This paper gives an introduction to this recent literature. Although much of the literature is technical, this paper aims to present a broad understanding of the methods of approach, the type of arguments used, and the main conclusions reached. Given the precise nature of the problems and the sensitivity of many of the results to particular assumptions, it is not entirely possible to avoid formal details. The paper therefore attempts to identify some fairly general and robust lessons.

What types of goods should be taxed? How progressive should the income tax be? What should be the balance between the taxation of commodities and the taxation of income? These questions are obviously central to public finance, and they have concerned many leading economists of the last two centuries, from Smith, Mill, Dupuit, Edgeworth, and Wicksell to Pigou and Ramsey. The past 15 years, however, has seen a tremendous surge in the formal analysis of the problems posed by these questions. This paper gives an introduction to this recent literature. Although much of the literature is technical, this paper aims to present a broad understanding of the methods of approach, the type of arguments used, and the main conclusions reached. Given the precise nature of the problems and the sensitivity of many of the results to particular assumptions, it is not entirely possible to avoid formal details. The paper therefore attempts to identify some fairly general and robust lessons.

The three questions posed above are at the heart of optimum taxation, which is part of the modern theory of public economics. It is important to realize that optimum taxation forms just one part of this modern theory of public economics. Section I of the paper attempts to convey an impression of the main concerns of the overall theory in order to set optimum taxation in a broad context. In so doing, it indicates some historical antecedents of optimum taxation and emphasizes its foundation in the relaxation of the classical assumptions of welfare economics.

Section II sets out some of the main results of the theory of optimum taxation. It begins with commodity taxation and the well-known Ramsey rule for the one-consumer economy and then examines its extension to an economy with many consumers. Optimum income taxation, following the approach of Mirrlees, is presented next. Finally, the section closes with a discussion of the appropriate combination of income taxation and commodity taxation (often called the balance between direct and indirect taxation). Section III discusses the consequences of the theory of optimum taxation for production efficiency.

Some applications of the theory of optimum taxation to discussions of tax policy are presented in Section IV. It is shown that the simple principles the theory embodies can be used to discriminate among arguments with respect to public policy. Such discrimination is, of course, one of the main purposes of theoretical enquiry in economics—that is, to establish which of the many possible intuitive and informal arguments are well founded. Section V describes briefly a different sort of application—an attempt (by Ahmad and the author) to use some of the theory of reform (the welfare analysis of a small movement from a given initial position) in the analysis of Indian tax policy and to show the close relation between the theories of optimality and reform. Concluding remarks are presented in Section VI.

I. Scope of Modern Theories and Some Historical Antecedents1

In the nineteenth century much of the discussion on public economics was concerned with the enunciation of general principles to guide tax policy. One example was the argument between those who espoused the benefit principle (he who benefits should pay) and those who claimed that taxation should be based on ability to pay. This latter concept was itself discussed extensively in terms of which equal absolute or proportional or marginal sacrifice was appropriate where sacrifice was related to utility of, say, income. The argument included a discussion of whether the base should be income, expenditure, or wealth. (For an analysis of this discussion and some of the classic statements, see Musgrave and Peacock (1967).) This attempt to analyze the questions of public finance in terms of a collection of principles characterizes much of the literature up to the present. (See, for example, Musgrave and Musgrave (1980).)

the approach and its relations with public economics

The modern theories of public economics diverge from the traditional ones because they are based on the standard criteria of welfare economics—usually a Bergson-Samuelson social welfare function or Pareto optimality. They are firmly individualistic in that the behavior of consumers is modeled as utility maximization and the welfare criterion counts as an improvement any change that makes one individual better off without making someone else worse off.

The unifying features of this approach provide substantial clarity and analytic power. However, many interesting ethical economic issues are left out. For example, the approach is essentially “consequentialist” because policies are evaluated in terms of their consequences. One can argue that in taxation, as in other things, certain principles should be observed, irrespective of their consequences. An example might be the kind of information the state should be allowed to use. Furthermore, the consequences of taxation are evaluated solely in terms of changes in utility of members of the society. Again, there might be aspects of the consequences of a particular tax policy (for example, the rights it grants to individuals) that would be ignored in this approach. (For further discussion of some of these issues, see Sen and Williams (1982).) As many of the questions being discussed here are concerned with whether a given rate of tax should be increased or decreased, in this context the difficulties just raised may not be of overwhelming importance. They should not be dismissed, however, and may be of considerable relevance for some aspects of social policy (for example, the question of which instruments of policy are admissible).

The use of a Bergson-Samuelson social welfare function provides clarity, as well as a unifying theme, in its portrayal of the value judgments required for normative analysis. For much of the present analysis, it is not necessary to be specific as to how the function is chosen, and it is possible to show the consequences of using different social welfare functions embodying different ethical positions concerning, for example, income distribution. The selection of a particular policy, however, usually involves the selection of a particular welfare function, and that selection is the task of the policymakers. The economist may, however, be able to assist in the choice of the social welfare function, since he can show the policymakers in simple contexts the consequences of different specifications of that function. This may help them to make value judgments with respect to more complicated problems.

(1) The analysis of taxation in the modern theory proceeds therefore by first describing the effects of taxation and then applying a criterion (usually a social welfare function) to evaluate those effects. This view splits the subject into two sides—first, a positive side and, then, a normative side where value judgments are introduced. This paper concentrates on the normative side, but it should be recognized that a large part of modern public economics is concerned with the positive side. For example, more than half of the principal textbook on the subject (Atkinson and Stiglitz (1980)) is devoted to the analysis of the consequences of taxes before the analysis of normative issues begins. Examples of positive issues are (a) the analysis of the consequences of income or wealth taxation for risk taking, (b) the manner in which different forms of company taxation affect investment and the distribution of profits, and (c) the effects of national debt and taxation on saving and growth. The application of formal microeconomic theory to such questions is a major feature of modern public economics.

It is clear that, if the calculation of the consequences of tax policies is itself difficult, then choice of the optimum taxation runs the risk of being intractable. One is then searching over a set of options, each of which presents analytical difficulties. Thus, the normative side of public economics has, in the main, been concerned with rather simpler models than those used for the analysis of the positive side only. (For further discussion of the positive models, see Atkinson and Stiglitz (1980).)

(2) A second important area of recent research that is not discussed here in detail concerns models of the way taxes might be determined in a nonoptimizing framework (voting, bureaucracies, political power, interest groups, and so on). Note that these non-optimizing models also require an analysis of the consequences of taxes for the interests of different groups. The work of the “public choice” theorists (see, for example, Buchanan and Tollison (1972)) contains some valuable insights (see also Atkinson and Stiglitz (1980, Lecture 10)). The optimizing and deterministic approaches to the analysis of taxes should be seen as complements rather than alternatives. Thus, for example, it would be interesting to compare the outcome of a deterministic or closed model with the solutions that might emerge from optimization models under different social welfare functions.

(3) It is important to mention a third main area of recent research that is not presented in detail here—that is, the econometric estimation of the positive models used in public economics. This involves the empirical analysis of how people react to different tax, pricing, or rationing schemes. And it has led to a closer integration between the theory and estimation of consumer choice and the behavior of firms, on the one hand, and the theory of public economics, on the other. After estimation, one can try to use the estimated demand and utility functions to analyze the welfare effects of possible changes in policy. There have been a number of recent examples of this vertical integration of the analysis of data, economic theory, econometric skills, and policy discussion, showing what economics can do. (See, for example, the Journal of Public Economics, which began publication in 1972 and where much of this research has been published.)

(4) A fourth area is that of computable general equilibrium models. It is unnecessary to discuss this area in detail here, since Shoven (1983) has provided a splendid survey.

the point of departure

The modern theory of public economics takes as its point of departure the two basic theorems of welfare economics: (1) a competitive equilibrium is Pareto efficient, and (2) one can achieve a prescribed Pareto-efficient allocation as a competitive equilibrium if prices are set appropriately and lump-sum incomes are allocated to each individual to allow him to buy the consumption bundle given in the allocation at the prices specified. The important assumptions for the first theorem are the existence of a complete set of markets and the absence of externalities. The second theorem requires, in addition, (a) for private producers, decreasing or constant returns to scale; (b) for consumers, diminishing marginal rates of substitution; and (c) for the government, the ability to arrange lump-sum transfers and taxes.

The prescribed Pareto-efficient allocation is often referred to as the “first best,” and the assumptions and policy tools of the second theorem allow achievement of this first-best allocation. With the failure of the assumptions or with more limited policy tools, there arises a problem of the “second best.” Occasionally, “first-best” and “Pareto-efficient” are used interchangeably, but it seems preferable to reserve first-best for the desired Pareto- efficient allocation (that is, the one selected among all those possible) rather than for any Pareto-efficient point. Obviously, some Pareto-efficient points may involve very unattractive distributions of welfare.

It is common to regard these results as requiring such restrictive assumptions as to be devoid of practical interest, yet it is remarkable that the first theorem is an essential part of the argument of those who argue in favor of the virtues of the market mechanism, and the second theorem provides a valuable framework for public economics in that much of the subject is concerned with the investigation of what the government may do, particularly through taxation, when the assumptions required for the second theorem fail to apply. This paper attempts to sketch the main results of the part of the investigation that concentrates on the inability to achieve a desirable set of lump-sum taxes.

The appropriate tax policies to deal with externalities have been extensively discussed in the literature (for a classic statement, see Pigou (1947)). The theory of public sector pricing is close to that of commodity taxation in that the difference between price and marginal cost is analogous to a tax (see, for example, Boiteux (1956)) and, thus, a discussion of commodity taxation essentially includes the important topic of public sector pricing. In this context, there has been valuable and interesting work in public economics on the problem of measuring marginal cost (see, for example, Drèze (1964)).

Recall that a lump-sum tax on an individual is a payment that he cannot alter by any of his actions. Thus, a tax on cigarettes is not a lump-sum tax because an individual can pay less tax if he smokes less; similarly, a wealth tax is not a lump-sum tax because one can accumulate less. It is clearly desirable to relate lump-sum transfers and taxes to individual circumstances, yet, at the same time, the collection of information for those taxes, such as earning power or wealth, effectively prevents them from being lump-sum taxes. The individual can discover what is being measured and usually can, if he wishes, adjust that dimension. Note that lumpsum taxes are not, in general, impossible. There could be differential taxation by sex or height (assuming that there would be neither direct action to change these nor emigration). It is the achievement of desirable lump-sum taxes that causes the difficulties.

This conclusion leads in two directions. The first involves a fairly robust general notion: there is an argument in favor of taxing things that are not easily varied by individuals or firms in response to taxation. An important example would be pure rent or monopoly profits, where these can be identified. (Later in this discussion is an example in commodity taxation where this is embodied more formally.) The second involves a theory that addresses the problem of taxation in a world where lump-sum taxes are not possible. This direction leads to what is known as the theory of optimum taxation.

In concluding this section, it is interesting to note that much of the argument concerning public sector pricing and taxation just discussed was set out in a remarkable paper by Wicksell in 1896 (Musgrave and Peacock (1967)). He pointed out the importance of marginal cost pricing in the public sector and financing of losses and other government activities by lump-sum taxation, for example, on land. This is linked directly to Pareto efficiency through his notion of unanimity.

II. Optimum Taxation

The examination of optimum taxation where lump-sum taxes are impossible has been concentrated on commodity taxation and income taxation. Analysis of commodity taxation goes back to Ramsey (1927), and important papers by Boiteux (1956) and Samuelson (1951) were written shortly after World War II, but the literature expanded rapidly in the 1970s, following the Diamond-Mirrlees papers (1971). The subject of optimum income taxation was created by Mirrlees (1971).


The Ramsey problem is to raise a given revenue from a consumer through the taxation of the commodities he consumes in such a way as to minimize the loss in utility that arises from taxation. Ramsey considered the case of one consumer (or, equivalently, identical consumers who are treated identically)—a simple efficiency problem in that distributional considerations are ignored (a point that is considered later on).

It is useful for the interpretation of the results from the Ramsey problem, and for further reference below, to have a brief description of the partial equilibrium approach to the question. The two pieces of analysis below are used to demonstrate the methods and to develop some intuition, which is employed in later arguments. They are, however, obviously very simple and unsatisfactory in a number of ways.

The partial equilibrium assumption here is that the demand for a good or commodity does not depend on the price of other goods, so that it is possible to draw the familiar demand curve DD (Figure 1).2 It is assumed that producer prices p are fixed, so that the effect of a tax vector t is to increase prices q faced by consumers from p to p + t. The so-called deadweight loss from the taxation of the i th good is measured by the shaded triangle ABC in the figure. The motivation for this definition of deadweight loss is as follows. The state of affairs associated with a given tax and, thus, with consumer prices and demand is evaluated by the sum of benefits to consumers (measured by consumer surplus), to the government (measured by tax revenue), and to producers (measured by profits). Note that the sum is unweighted, so that one dollar is regarded as equally valuable to each group.

Figure 1.
Figure 1.

Excise Taxation and Deadweight Loss

Citation: IMF Staff Papers 1984, 002; 10.5089/9781451946918.024.A003

Profits here are taken as zero (as producer prices are fixed, competition would drive profits to zero) and, therefore, only consumer surplus and government revenue are considered. In the absence of taxation, government revenue is zero, and consumer surplus is the area below the demand curve and above the line GC. With taxation, government revenue is given by the rectangle ABGH, and consumer surplus is the area below the demand curve and above AH. Thus, the net loss, or deadweight loss, is the triangle ABC.

One then examines the minimization of the sum across goods of triangle ABC (that is, total deadweight loss), subject to the constraint that the sum across goods of the rectangle ABGH (that is, total tax revenue) is not less than a given figure. It is straightforward to show that this leads to the result that the tax as a proportion of the consumer price of each good should be inversely related to the elasticity of demand. Formally, ti/qi = μ/єi, where μ is constant across goods and qi, ti, and єi, are, respectively, the consumer price, the tax, and the price elasticity of demand for the i th good.

There have been a number of calculations of such triangles in the empirical literature, following the work of Harberger (1954), who applied this approach to deadweight losses from monopoly (the distance of price above marginal cost playing an analogous role to the tax). The more modern approach is to use explicit utility functions and “equivalent variation,” thus avoiding the unattractive assumption that the demand for a good does not depend on the prices of other goods (see, for example, Rosen (1978)).

Below is a brief mathematical formulation of the central result in optimum commodity taxation, the so-called Ramsey rule. This dispenses with the partial equilibrium assumption concerning demands, and it works directly with utility functions. To keep things simple, the assumption is that producer prices are fixed, so that an increase in taxes implies an equal increase in consumer prices. Goods may be either bought or sold by consumers. Sales are treated as negative purchases. It is convenient to treat the sale of labor differently from other goods and, thus, to identify it separately (as, say, l) in the utility function and the budget constraint. Where w is the wage faced by the consumer, where there is lump-sum income M, where X is the vector of quantities transacted, and where q ⋅ X denotes ΣiqiXi, then the equation for the individual problem may be written


Note that, if the prices of all goods and labor are raised by taxation in the same proportion, so that qi = (1 + τ)pi and w = (1 + τ)wg, where wg is the wage faced by producers (there is a wage subsidy), then the result is effectively a lump-sum tax. The reason is that the proportional change in prices is simply equivalent to a reduction of M to M/(1 + τ), as may be seen by inspection of the budget constraint in equation (1). The revenue is τM/(1 + τ). In the one-consumer economy with lump-sum incomes, this form of taxation would be optimum provided that the revenue requirement R does not exceed M. In this case, the optimum uniform tax rate τ is given by


In a one-consumer economy without lump-sum incomes, proportional taxes (including the wage subsidy) raise no revenue. If the revenue requirement does exceed M, then distortionary taxes (that is, those which are not equivalent to lump-sum taxes) are necessary.

If no lump-sum incomes exist (M = 0), then it is possible to choose one good to be untaxed without loss of generality, and it is convenient to choose that good to be labor. When M = 0, the budget constraint is simply


Then, for the consumer, a tax at rate τ on wage income is equivalent to raising prices to q/(1 − τ). It is assumed in what follows in this subsection that no lump-sum incomes exist and that labor is untaxed.

Consider, then, just one consumer whose individual demands X(q, w) are a function of consumer prices only. The maximum utility that an individual can achieve when facing prices q is written V(q, w). This is the indirect utility function. The problem then becomes to choose t, or, equivalently, q, to maximize V(q, w) (and, thus, minimize utility loss), subject to the constraint that the tax revenue ΣktkXk meets the requirement R. R is the value at p of the bundle of goods and factors required by the government. One need not be concerned with the precise form of the bundle required, since the government can transform its revenue at prices p into whatever goods it desires. The suffix on a vector denotes the particular component: thus, tk is the tax on the kth good.

Formally, then, the equation for the problem is


Taking a Lagrange multiplier for the constraint λ, the first-order conditions for maximization are


Remembering that producer prices are fixed, so that differentiation with respect to ti and qi are equivalent, the equation becomes


Using ∂V/∂qi = −αXi, where α is the marginal utility of income and with the standard decomposition of ∂Xk/∂qi into an income effect and a symmetric substitution effect [skiXi(Xk/M)], one derives the Ramsey rule


where sik is the utility-compensated change in demand for the i th good when the kth price changes, and where θ is a positive number independent of i.

An intuitive interpretation of equation (7) is as follows. Consider Σktksik as the (compensated) change in demand for the i th good as the result of the imposition of the vector of small taxes tk. The typical term in the sum is


(which is the change in the compensated demand for good i as a result of the increase in consumer price tk, if tk is small). Summing across k gives the change arising from the vector of taxes. Strictly, of course, the size of the taxes tk is determined within the problem, and one is not really justified in assuming that tk is small. With this qualification, however, the Ramsey rule is that the proportional reduction in compensated demand as a result of the imposition of the set of taxes should be the same for all goods.

This result is an important one and provides the main insight into tax rules arising from the theory of optimum commodity taxation. It should be emphasized that proportional quantity changes are equal in this rule. Thus, crudely speaking, those quantities which are relatively insensitive to price are taxed relatively more. It is important in the argument that follows that this is, in general, very different from the proposition that taxation should be uniform—that is, that all proportional price changes should be equal. The result provides a generalization of the rule that taxes should be inversely related to elasticities of demand—a result that is familiar from the less rigorous and partial equilibrium treatment that has just been presented. The Ramsey rule provides an example of the general principle that efficient taxation is directed toward goods that cannot be varied by consumers. Note, however, that one needs considerable care with respect to substitutes and complements, a question that is suppressed by the partial equilibrium approach.

Given that labor is assumed to be untaxed and that there is an endowment of time, one can interpret the Ramsey rule in terms of complementarity and substitutability of the taxed consumer goods with leisure. A notable early example was the work of Corlett and Hague (1953). Goods that are relatively complementary with leisure should bear the higher tax rate. Thus, one can show (Deaton (1981)) that if leisure is quasi-separable from all goods, then the Ramsey rule gives uniform taxation of goods. Intuitively, quasi-separability means that all goods are equally complementary with leisure. Formally, goods i and j, are quasi-separable from leisure if the marginal rate of substitution between i and j is independent of leisure at constant utility (where compensation for a change in l involves a proportional change in the vector (X, l)). Note that the issues of complementarity, substitutability, and separability with leisure arise because there is an untaxed endowment of time and that the conclusions would be expressed in terms of another good if there were a corresponding endowment.

It should also be noted here that in one sense the one-consumer economy is an awkward vehicle for the development of the argument. The reason is that lump-sum taxation (which, in general, is first best) becomes simply a poll tax (which, it might be argued, would be feasible). Alternatively, as seen above, where there are fixed lump-sum incomes this may be achieved equivalently through proportional taxation of all goods (including subsidies on factor supplies). The real case of interest is, of course, the many- consumer economy, and here the poll tax is, in general, not alone the best way to raise revenue, and indirect taxation is also required. Indeed, in the many-consumer case, the optimum poll tax is often negative (that is, a poll subsidy). Discussion of the Ramsey rule should therefore be seen as a development of the intuition for application in the more general case. (See the following discussion of the many-consumer case and the next subsection on the optimum combination of income and commodity taxes.)

The Ramsey rule would seem to be rather inegalitarian in that it appears to direct commodity taxation toward “necessities,” which are usually considered as fairly insensitive to price. But the formulation in terms of one consumer explicitly ignores distributional questions. The rule can, however, be generalized to many consumers in a fairly straightforward way by simply replacing V(q, w) in equation (4) by the social welfare function W(u1,u2,…,uH), where uh is the utility function of the hth individual, which is considered again as a function of consumer prices q and the wage wh. The function X(q, w) becomes Σhxh(q, wh), where xh(q, wh) is the demand function for individual h. The rule then is no longer that the proportional reduction in compensated demand should be the same for all goods or commodities, but the modified rule shows how it should vary across goods. The proportional reduction in quantity for a good should now be higher where the share of the rich in its total consumption is higher. Strictly, “the rich” here means those whose social marginal valuation of income is low. Following an argument similar to that used in the derivation of the Ramsey rule (equation (7)), one can show


where Sikh is the Slutsky term for household h, b is the average across households of bh (the net social marginal valuation of income of household h), and ri is the normalized covariance between the consumption of the ith commodity and the net social marginal valuation of income, plus one. By net is meant the value of an extra dollar to individual h as perceived by the government plus any extra indirect tax revenue arising from the expenditure of the dollar (formally, bh=βh/λ+txh/mh, where βh is the social marginal utility of income and λ is the Lagrange multiplier on the revenue constraint). The number ri is a generalization of the distributional characteristic of a good (introduced by Feldstein (1972)) and indicates the relative extent to which a good is consumed by those with a high net social marginal valuation of income (see equation (19)).

Thus, the proportional reduction of compensated demand denoted by the left-hand side of equation (8) embodies the efficiency arguments for taxing necessities introduced in the Ramsey rule, together with the distributional judgment as associated with the ri on the right-hand side, which points toward luxuries. The implications of equation (8) for tax rates depend on the way in which these two effects combine. Much depends on the structure of preferences and the type of income tax tools available, as is seen below in the subsection on the combination of income and commodity taxes.

income taxation

Both Adam Smith and John Stuart Mill3 argued that taxation should be linked to the ability to pay, with the former stating that “subjects should contribute toward the support of the government in proportion to their respective abilities” and the latter arguing that “whatever sacrifices it [the government] requires… should be made to bear as nearly as possible with the same pressure upon all.” The form “the pressure” should take was discussed extensively and was often based on a notion of cardinal utility, linking income to some utility level. At various points it was suggested that the sacrifice of utility should be equal for all or that an equal proportion of utility should be sacrificed. Given a utility function (assumed to be the same for everyone) and one of these principles—say, equal absolute sacrifice—one can calculate a corresponding tax function. If income is Y and the tax payable is T(Y), then, given some total revenue requirement, one can calculate T, assuming that Y is independent of the tax schedule, for each level of Y from


the condition for equal absolute sacrifice. (For calculations in this framework, see Stern (1977).) One can show, for example, that if U(Y)=Yη then taxation is progressive (in that the marginal exceeds the average rate) for η > 1. The logarithmic or Bernoulli form, corresponding to η = 1, gives proportional taxation.

These criteria are adduced, however, without any reference to guiding principles. From this point of view, the notion of “equal marginal sacrifice,” set forth by Edgeworth, has greater clarity in that it is derived from the utilitarian objective of the sum of utilities. If it is assumed again that pretax income is independent of the tax schedule and, further, that everyone has the same strictly concave utility function, then equal marginal utility implies equal post-tax incomes. Thus, the marginal tax rate is 100 percent, which raises the incentive question in a very stark manner. This incentive problem had been recognized very early in the discussion. For example, McCulloch (1863, Part I, Chapter IV, p. 146) stated: “Graduation is not an evil to be paltered with. Adopt it, and you will effectively paralyse industry…. The savages described by Montesquieu, who to get at the fruit cut down the tree, are about as good financiers as the advocates of this sort of taxes.”

Given that the incentive and distribution aspects of the income tax have long been recognized, it is perhaps surprising that a model that simultaneously examined the distribution and size of the cake was not forthcoming until the paper by Mirrlees (1971). This paper essentially created the subject of optimum income taxation. As is seen later on, Mirrlees kept his model as simple as possible, given the issue at hand, but it is nonetheless not an easy problem, and the analysis poses considerable technical difficulties because the policy tool is the whole income tax function. Thus, for each income it is necessary to specify the tax payment, and the optimization is in a space of all admissible functions. This should be contrasted with the problems usually examined in standard microtheory (for example, consumer or producer behavior), where only a finite number of variables (for example, consumption of each type of good) are considered.

The income tax problem is considerably simplified if one confines attention to a linear tax schedule where there is a lump-sum benefit or tax combined with a constant marginal rate. Following the Mirrlees paper, a number of papers examined the simpler problem. (See Atkinson and Stiglitz (1980, Lecture 13) for references.) In the discussion of numerical results, the present paper concentrates on the linear case but begins by setting out the Mirrlees nonlinear problem and explaining why it takes the form it does. It then summarizes some of the main results for the nonlinear problem. Finally, it presents numerical calculations for the linear case to bring out the sensitivity to the important parameters and to compare the computed tax rates with levels seen in practice. There has been some recent work on an intermediate case with a finite number of individuals—one might interpret them as representative of certain groups—where the optimum tax schedule can be taken as piecewise linear (Guesnerie and Seade (1982) and Stern (1982)).

Given that the nonlinear problem poses difficulties, it is sensible to begin by keeping the structure as simple as is consistent with retaining the question. From this point of view, the model concerned with distribution and incentives must have two features: individuals should not be identical, and there must be an input over which individuals exercise choice. If individuals are identical, then the optimum would be given by a poll tax with zero marginal taxation (this is the standard result of welfare economics) and, if there is no incentive problem, it has been seen, in the discussion of Edgeworth above, that the marginal rate would be 100 percent. The Mirrlees model has individuals differing in only one respect—in their pretax wage or productivity—and there is only one aspect of incentives—labor supply. Thus, in the model, labor is supplied by individuals, each of whom has an identical utility function, in order to maximize utility of consumption and leisure, given the pretax wage and the income tax schedule. The government chooses the income tax schedule so as to maximize a Bergson- Samuelson social welfare function, subject to raising some given amount of revenue.

All individuals have the same utility function u(c, l), which depends on consumption c and labor supply l. Individuals differ in their wage rates w, and the distribution of w is described by the density function f(w). One speaks of an individual as being of type w.

The problem is to choose a function g( ) that relates post-tax to pretax income in order to maximize


subject to


and where (c, l) is chosen by the individual to maximize u(c, l), subject to


The government revenue requirement is R, which is seen as fixed for equations (10) through (12). At a later stage, it is asked how the solution varies with different values of R. The maximand (10) is a Bergson-Samuelson social welfare function of additive form—ϕ(u) is added across individuals. If ϕ(u) represents social utility for an individual of type w, then the function is utilitarian. Equation (11) represents the revenue requirement—wl is pretax income, so that wl − g(wl) is the tax payment by an individual of type w, and this is integrated or added across individuals. The constraint (12) represents the second-best nature of the problem in that it says that individuals make their own choice subject to the budget constraint, set by their wage and the government tax function. One can express it by saying that no individual would prefer the income of some other individual, taking account of the work he would have to do to earn it.

Before proceeding to results, some particular features of the model should be noted. First, as specified, the model is static, and there is no saving. This is to keep the structure as simple as possible. From a broader perspective, l might be considered as representing lifetime labor supply and c as representing lifetime consumption, but the treatment of c, l as vectors would take the discussion too far afield at this point. In the next subsection, a vector of different consumption goods is considered.

Second, equation (11) may be replaced in a general equilibrium framework by a production constraint that says that total production, a function of total effective labor ∫(lw)f(w)dw, must equal total consumption ∫cf(w)dw plus R. Here it is assumed that w measures productivity, so that wl is effective tasks performed by a person of type w in hours l. It is then straightforward to show that this general equilibrium model is equivalent to equations (10) through (12). Relative wages and effective hours or tasks per clock hour are exogenous, and the tasks performed by different individuals are perfect substitutes.

Third, note that the constraint (12) gives the model its special structure, in that it embodies the incentive constraint. Without it, it would be possible to go to the first best, using lump-sum taxation. It is interesting in this context that the first-best optimum would have utility decreasing in ability w, if consumption and leisure are normal goods (see Mirrlees (1979)). Intuitively, high lump-sum taxes on those with high skill lead to work being concentrated on the most productive (there is no difference between individuals on the consumption side). In the income tax model, it is assumed explicitly that the government cannot identify one type of individual from another and measures only an individual’s income (not his hours of work or wage). Thus, with this constraint, embodied in equation (12), utility must be nondecreasing in w because an individual of higher w always has the option of consuming the same as an individual of low w but doing less work.

Fourth, one cannot guarantee that, at the optimum, l(w) > 0 for all individuals. Thus, it may be optimum for some group of individuals with the lowest productivity to do no work.

Consider now some results in the Mirrlees model of nonlinear taxation. The general results (in the sense that they are independent of functional forms) that are available are rather few. Moreover, these results themselves may not hold if the model is modified, for example, to include complementarities between different types of labor (see, for example, Stern (1982)). The important ones in the Mirrlees model are the following.

(1) The marginal tax rate should be between zero and 1.

(2) The marginal tax rate for the person with the highest income should be zero.

(3) If the person with the lowest w is working at the optimum, then the marginal tax rate he faces should be zero.

Formal proofs of these propositions are not offered here, but some intuitive arguments are presented. (See Mirrlees (1971) and Seade (1977) for the formal treatment.) Consider first whether the marginal tax rate should ever exceed unity. This would imply that the reward for the marginal hour was negative. Hence, in the model, no one would choose to work where the marginal tax rate exceeds unity. Thus, it would be possible to replace any portion of the g( ) function that is downward sloping by a horizontal section without changing behavior (Figure 2), and attention can be confined to tax schedules with marginal rates that do not exceed unity.

Figure 2.
Figure 2.

Tax Function and Consumer Choice

Citation: IMF Staff Papers 1984, 002; 10.5089/9781451946918.024.A003

Figure 2 illustrates the tax function and consumer choice. For an individual with fixed w, indifference curves can be drawn in the pretax, post-tax income space, since the former represents work and the latter consumption. Through any point, the indifference curve for a person with higher w is, it is supposed, less steep than that for a person with lower w, since, at the given consumption level, the higher-w person is doing less work and, thus, needs less extra consumption to compensate him for doing the lower amount of extra work required for the extra dollar. This implies, in general, that a person with higher w locates to the right of (earns more money than) the person with lower w since, at the optimum for the person with lower w (tangency with g( )), the indifference curve for the person with higher w intersects g( ) from above (coming from the left).

The tax payment is given by the vertical difference from g( ) to the 45-degree line. Note that a movement of an individual parallel to the 45-degree line keeps revenue constant. It is possible to use this feature to show that the marginal tax rate cannot fall below zero. If it were to fall below zero at some income, then g ( ) would be steeper than 45 degrees, and, therefore, so would the indifference curve of any individual choosing that income. In this case, g( ) would be steeper than 45 degrees and, intuitively, an equal revenue shift of a person of type w in the southwest direction would take him to a higher indifference curve.

An intuitive argument can be presented for the second result, as follows. Suppose, with some given income tax schedule, that the person with highest income earns $Y pretax and that the marginal tax rate is positive. Consider the option of lowering the marginal tax rate to zero for all incomes above $Y. The top person may now decide to work more (the reward for the marginal hour having gone up) and, if so, he is better off. The government has lost no revenue, since the tax payment on the income $Y has stayed constant. The utility of the top person has increased, that of others is no lower, government revenue is no lower, and, therefore, a Pareto-improving change that meets the constraints has been found. Accordingly, the given income tax schedule could not have been optimum, and the schedule that is optimum must have the property that the marginal tax rate at the top is zero. If those near the top elect to work more in response to the change, then they are both better off and pay more tax, so that the argument is reinforced.

One should note that one cannot deduce that, where there is no highest income and the distribution of skills includes individuals at or above any positive skill levels, the optimum tax rate tends to zero. There are examples (Mirrlees (1971) and Atkinson and Stiglitz (1980, Lecture 13)) where it does not tend to zero (involving the Pareto distribution). One should remember, too, that the argument assumes that there are no externalities, so that making the top individual better off upsets no one. Further, the “top” may be at very high levels of income. Zero may be a poor approximation, even within most of the top percentile. Nevertheless, the result is rather striking.

The argument for the third result concerning the zero marginal rate at the bottom is not given here in any detail. It proceeds along the following lines. Suppose that on a given schedule the marginal rate at the bottom is greater than zero. Consider a change in the lower end of the tax schedule that has the sole effect of inducing the bottom person to do a little more work and, thus, moving a small amount along the schedule. To the first order in utility, that person is no worse off, since his indifference curve was tangential to the schedule. But there is a first-order increase in tax revenue, since the marginal rate is positive. Hence, the given schedule is not optimum. (For formal discussion of this and the previous result, see Seade (1977).)

Thus, the general results in this particular model reveal that the marginal rate should be zero at the top and bottom. This contrasts strongly with many systems combining taxes and social security. This issue is treated briefly in Section IV.

Mirrlees (1971) presented a number of numerical calculations of the optimum nonlinear income tax, using the Cobb-Douglas utility function for consumption and leisure and using wage distributions based on data for the United Kingdom. From these examples he concluded the following.

(1) The optimum tax structure is approximately linear—that is, has a constant marginal tax rate, with an exemption level below which negative tax supplements are payable.

(2) The marginal tax rates are rather low, as Mirrlees (1971, p. 207) remarks: “I must confess that I had expected the rigorous analysis of income taxation in the utilitarian manner to provide arguments for high tax rates. It has not done so.”

(3) “The income tax is a much less effective tool for reducing inequalities than has often been thought” (Mirrlees (1971, p. 208)).

Stern (1976) investigated a wider class of utility functions and, in addition, looked at sensitivity with respect to the social welfare function and the level of government revenue but confined attention to linear taxation. He used the constant elasticity of substitution (CES) utility function:


with welfare criterion


The tax function in the model is linear, so that the individual budget constraint is


where t is the marginal tax rate and G the lump-sum grant (the same for everyone). The government budget constraint is


where, as before, R is an exogenous revenue requirement and the number of individuals is normalized to 1, so that G is the total payment on lump-sum grants.

The CES in equation (13) has an elasticity of substitution between consumption and leisure of


One may use empirical estimates of labor supply functions to estimate ɛ, and Stern (1976, p. 136) suggested a number around 0.4, based on estimates for married males in the United States. Where the elasticity is less than unity, the labor supply function (for positive G) is forward sloping for low wages and backward sloping for higher wages. Note that the concept of labor supply in the models is much broader than the simple measure of hours used in the estimation of short-run supply functions. The Mirrlees labor-supply function corresponds to the limit, as ɛ tends to 1 (μ tends to zero), and ɛ = 0 gives right-angle indifference curves (the case of zero substitution effect). One can show generally that, with ɛ = 0, the optimum marginal rate is 100 percent. Note that this is zero-compensated elasticity of labor supply and not inelastic labor supply. A selection of the results is given in Table 1.

Table 1.

Sterns Calculations of Optimum Linear Tax Rates

(In percent)

article image
Source: Stern (1976, Table 3).

A central estimate of the elasticity of substitution e might be 0.4

v = 0 corresponds roughly to an absence of aversion to inequality in incomes and v = ∞ to the Rawlsian maxi-min.

Total output (Y) in these models is around 0.25 (and endogenous), so that R = 0.05 corresponds to government spending (excluding transfer payments) of around 20 percent of gross national product. The grant G is equal to tY − R.

One may think of v as analogous to the elasticity of the social marginal utility of income, which is often used in analyses of measures of inequality using the Atkinson index (see Atkinson (1970)), since the utility function is homogeneous of degree 1 in consumption and leisure (doubling each would double utility) and is, thus, itself analogous to income. The specification of v then completes the statement of distributional value judgments. Values of v between 1 and 2 are quite commonly used. Dalton (1970, originally published in 1922, pp. 68–69) argued that Bernoulli’s law (or utility logarithmic in income, and marginal utility decreasing as the inverse of income), v = 1, “gives a rather slow rate of diminution of marginal utility” and found v = 2 “as best combining simplicity and plausibility” (although he was working in the context of equal absolute sacrifice). Whether these views of v helped him when he subsequently became Chancellor of the Exchequer is a matter for speculation.

National product in the model is endogenous but is mostly around 0.25. Hence, a revenue requirement of R of 0.05 corresponds to around 20 percent of gross national product (GNP). The case v = 2, R = 0.05, ɛ = 0.4 gives a marginal tax rate of 54 percent. The expenditure of the 54 percent of GNP represents 34 percent for transfer payments and 20 percent for goods and services. These results are not wildly out of line with tax rates (taking direct and indirect together) from a number of developed countries. Hence, if one considers a wider class of cases than those used by Mirrlees, the computed tax rates may be rather higher.

In general, the tax rates increase with the aversion to inequality v, and with the revenue requirement R, but decrease with ɛ, the elasticity of substitution.

the combination of income and commodity taxes

The question of the appropriate combination of income and commodity taxation provides fertile ground for confusion. Prest (1975, originally published in 1960, p. 34) refers to the “first and best-known problem in tax analysis…. the contention that the allocative effects of indirect taxes are inferior to those of direct taxes….” The contention in its simple form is mistaken, since there is an excess burden, or deadweight loss, associated with the divergence between consumer and producer prices for labor (and, thus, the income tax), just as with other goods. A second example concerns the often-heard claim that a switch from income tax to indirect taxes such as a value-added tax (VAT) would increase work effort. At the simple level, this is clearly false, since an increase in prices (from the VAT), together with an increase in earnings (from the reduction in income tax), would leave the incentive to work unchanged. Perhaps the argument is intended to be more subtle, depending on intertemporal allocations and expectations, on progressivity, or on the existence of lump-sum incomes, for example, but it is usually presented in a naive form, such as “taxing spending rather than earning induces work.”

It transpires that one can show that, under certain conditions, one would want to tax income rather than goods, but it should be stressed that those conditions are very special. The argument depends critically on particular features of the model and involves some difficulty. Futhermore, it is not easy to come to a judgment as to how the obvious fact of the divergence of the world from these special conditions should influence views on the balance between direct and indirect taxation. Thus, the subject involves difficulty in analysis and difficulty in interpretation, and one must beware of simple arguments or contentions such as those described.

Details of the theorems on the optimum combination of income and commodity taxes are not presented here; the theorems are explained briefly to highlight the importance of the assumptions. There are essentially two theorems, dealing with (1) the case where there is a linear income tax and (2) the case where there is a nonlinear income tax.

Note that, if individuals are identical, then the basic theorem of welfare economics states that the “first best” can be reached with a poll tax to raise the required government revenue and with zero marginal taxation of income and goods. Where individuals differ, however, one needs some combination of income and commodity taxes, and each of these is distortionary in that marginal rates of substitution between labor and goods or among goods in consumption are not equal to marginal rates of transformation in production. Of course, some distortionary taxation is always optimum in second-best problems, since a marginal imposition of taxes from the point of zero taxation involves zero deadweight loss and is desirable if it improves distribution.

For the first theorem, it is assumed that a linear income tax is available in the form of a lump-sum grant or tax (the same for everyone) and that there is a constant marginal rate on labor income. As seen previously, a constant marginal rate on labor is, in this context, equivalent to a proportional tax rate on all goods (and a proportional adjustment to the lump-sum grant/tax), since it is assumed that there are no sources of income other than the lump-sum grant/tax and wages.

The first-order conditions for the optimum indirect taxes are given as before by equation (8). The condition for the optimality of the lump-sum grant is that b = 1—that is, the grant is adjusted to the point where the benefit in terms of net social welfare of the marginal dollar (the average of the social marginal utilities of income) is equal to the cost to the government (one dollar). Substituting this condition in equation (8) yields




Recall that ri is 1 plus the normalized covariance between consumption of the i th commodity by the hth household and the net social marginal utility of income bh—thought of as the distributional characteristic of good i. If the government is indifferent to distributional considerations, in that it sees bh as equal for all households, then ri is equal to 1, and the right-hand side of equation (18) is zero. Indirect taxes are zero, and all revenue is raised through the lump-sum grant, as in the case of identical individuals. Thus, in this sense indirect taxes are desirable because distributional considerations arise (remember that there are not externalities and that revenue can be raised through a poll tax).

It has been seen that indirect taxes appear because of the interest in distribution, but this does not reveal what form the indirect taxes should take. The taxation of goods consumed by the rich provides some progressivity, but indirect taxes also play the role of raising revenue to increase the progressive lump-sum grant (or reduce the regressive tax), and the taxation of necessities may be an efficient way to do this (as in the Ramsey rule). The way in which these two considerations balance depends critically on the form of the differences among the population and on the structure of demand functions. This is illustrated by the first of the theorems, which is as follows.

If there is an optimum linear income tax, individuals differ only in the wage rate, and the direct utility function has the Stone-Geary form,


then the optimum indirect taxes are uniform—that is, the proportion of tax in consumer price (ti/qi) is the same for all goods. The result follows from equations (18) and (19), using b = 1 and substituting for the specific form of the Slutsky terms derived from equation (20). The result was established by Atkinson (1977).

Deaton (1979 and 1981) shows that this applies in a class of cases slightly wider than the linear expenditure system. The important conditions are that the Engel curves are linear and identical (that is, for each good everyone has the same constant marginal propensity to consume and the same minimum requirement x0)4 and that there is weak separability (see equation (21)) between leisure and goods. Deaton (1979) also shows that if a subgroup of goods satisfies these two conditions, then taxes should be uniform for the subgroup.

The second theorem states that, if there is an optimum nonlinear income tax, individuals differ only in the wage rate, and the direct utility function has goods weakly separable from labor, in the sense that utility can be written


where η is a scalar function, then optimum indirect taxes are uniform. Weak separability involves the marginal rate of substitution between goods being independent of labor or leisure. The proof of the theorem is not attempted here, since it involves use of the calculus of variations (where one assumes a continuous distribution of wages and integrates across wages). Intuitively, differences arise only in labor, which separates out from the utility function. Therefore, a flexible tax instrument that concentrates on labor income cannot be improved by indirect taxation. The plausibility of the result suggests that it is likely that more direct proofs and arguments will be established in the future. Note that the more sophisticated income tax in the second theorem allows a less strong assumption on preferences.

The importance and interpretation of these two theorems are discussed in Section IV, but this section closes by emphasizing an important point. The taxes emerging from optimum tax models depend critically on the combination of three sets of assumptions: (1) the form of differences between households, (2) the range of tax tools assumed to be available, and (3) the structure of preferences. These are assumptions made before specific parameter values, social welfare judgments, and revenue requirements are entered into the model, and the results are also sensitive to these subsequent selections.

III. Production

Up to this point, it has been assumed that producer prices are fixed, and production has been put aside. In a competitive model, producer prices are generally independent of demand only where the nonsubstitution theorem applies (constant returns to scale, no joint production, and a single nonproduced input). The assumption of fixed producer prices allows one to concentrate on consumer welfare and government revenue, but one also wants to know a little more about taxes and production outside the framework of the nonsubstitution theorem. The original Diamond-Mirrlees papers (1971) were entitled “Optimal Taxation and Public Production,” and they devoted considerable attention to the production side, although this has received less emphasis in the subsequent literature.

The title of the Diamond-Mirrlees papers immediately emphasizes one obvious but important point. Taxation decisions and production are closely linked through the equilibrium of the economy, and fiscal choices and physical planning should be seen as part of the same overall policy framework. For example, if production of some publicly produced and nontradable good is to be restricted, then, in the absence of rationing, its price (or, equivalently, the tax) should be high. Too often the fiscal and quantity sides are separated in the policy process.


A central set of results in the optimum taxation literature concerns circumstances under which production efficiency is desirable (that is, where it is a feature of the optimum). A production plan is defined to be efficient if it is impossible to have more of one good without having less of another. (If the convention is adopted that an input is a negative output, then the definition covers factor inputs as well.) The definition is taken to be synonymous with being on the frontier of the production possibility set. A necessary condition for efficiency is that the marginal rate of transformation between two goods should be the same for all enterprises where the two goods are transformed on the margin one into the other. If the enterprises are profit maximizers at fixed prices (or are required to maximize shadow profits at shadow prices), then equality of the marginal rates of transformation across enterprises requires equality of relative prices (or relative shadow prices) across enterprises.

It is important here to distinguish between efficiency of the whole productive sector of the economy and efficiency of the public sector taken by itself. The former is called aggregate productive efficiency, and the latter is called public sector efficiency. The desirability of public sector efficiency is a very general and robust result. It says simply that, if it is possible for the public sector to produce more at no extra cost in resources, then it should do so. One requires only that the public sector should have a beneficial way of disposing of extra output.

Aggregate productive efficiency at the optimum can, in general, be established only where all final goods may be taxed and where the private sector is competitive and pure profits are zero. It is therefore a rather narrow result. The reason that zero pure profits are required is that, without this assumption, it is necessary to consider the consequences of a reform for the incomes of profit earners. Where these profits exist, an attempt to produce extra output may increase some profits and lower others with, possibly, adverse effects on income distribution. All final goods are required to be taxed because, in the absence of this assumption, one may want to tax an input as a means of taxing the output (which it may be desirable to tax for reasons embodied in the models of Section II). Broadly speaking, there are three ways of assuming away private profits for the purpose of the theorem on aggregate productive efficiency; one can assume (1) that all production is in the public sector, (2) that all pure profits are taxed, or (3) that there is perfect competition, with constant returns to scale.

The consequences of the results concerning efficiency are important. With public sector efficiency, it is required that all public sector firms should face the same relative shadow prices, or market prices, if financial targets are fixed in these terms. This requires more coordination of public sector production than is perhaps present in many countries. Also, if foreign trade at fixed prices for certain goods is part of public sector activity, then relative world prices for these goods give shadow prices. Foreign trade is simply one way of transforming one good into another, and marginal rates of transformation in this activity should be equal to those elsewhere.

In the restrictive class of cases where aggregate efficiency is desirable, then public sector shadow prices should be equal to market prices faced by private producers. Taxation of goods should fall on final goods only, for, if intermediate goods are taxed, different producers face different relative prices. It is important to appreciate the circumstances under which departures from aggregate efficiency are desirable. Inputs should be taxed only where the taxation of final outputs is not possible or where it is necessary to improve the distribution of profit income. Thus, it is natural to question closely whether the taxation of the final good or of the profits is possible before resorting to the substitute measure.

More generally, shadow prices provide a valuable tool for integrating into the analysis of tax reform questions that concern production where markets may be distorted. This is an area where research is under way. (A book edited by D.M.G. Newbery and N.H. Stern, including several chapters on this subject and with the provisional title “The Theory of Taxation for Developing Countries,” is forthcoming from the World Bank.)


It should be emphasized that the discussion of efficiency in relation to taxes here includes tariffs. Where efficiency is desirable, there should not be tariffs on intermediate goods. A VAT system that rebates taxes on imports by producers, or a purchase tax on final goods, would have this property. Note that there is no presumption that uniform tariffs help with efficiency in this respect. The relevant question is rebating of taxes on inputs, and uniformity is irrelevant to this. In general, uniformity of tariffs on intermediate goods implies inefficiency. Further, it would not lead to uniformity of taxes on final goods, if these were desirable (see Sections II and IV), since uniform taxation of inputs becomes nonuniform taxation on outputs through the production process, which involves different factor intensities and intensities in the use of imported goods.

IV. Applications to General Arguments

One of the main purposes of economic theory is to sort out correct from incorrect arguments and to help establish reliable rather than unreliable intuition. This section attempts to draw together from the theory some lessons of this type. It begins by setting out three general principles emerging from the analysis, then looks at the question of uniformity of indirect taxation, and comments briefly on the income tax.

general principles

The principles are stated below in summary form before the discussion of their foundation and interpretation.

Principle 1

Tax revenue is raised most efficiently by taxing goods or factors with inelastic demand or supply. Note that this abstracts from distributional questions, that inelasticity refers to compensated demands and supplies, and that care should be taken with the pattern of complements and substitutes.

Principle 2

Taxation concerned with distribution and with externalities or market failures should go as far as possible to the root of the problem. Thus, for distribution one should look for the sources of inequality (such as land endowments or earned incomes) and concentrate taxation there, and for externalities one should attempt to tax or subsidize directly the good or activity producing the externality. However, it is often impossible to deal completely with an issue directly, and this will have important consequences for other policies.

Principle 3

It must be recognized that it is impossible to deal perfectly with questions of distribution and market failure directly. The former, for example, require strictly a full set of lump-sum taxes. Thus, the target-instrument approach may be treacherous in a second-best world. In this context, a number of policy tools are required, and for any particular policy it is necessary to ask how it affects all the objectives (including distribution). The optimum policy for any one tax is often very sensitive to assumptions concerning the existence and levels of other taxes.

The principles are discussed in turn and are related to the preceding analysis. This analysis started with the basic theorems of welfare economics, which establish clearly that the first-best way of raising revenue is a set of lump-sum taxes. The tax payment itself is then completely inelastic, in that the behavior of individuals cannot affect the payment.

(1) This points to the first principle. The discussion of the Ramsey problem concerning indirect taxes led in the same direction but cautioned that it was the compensated demands that were relevant. Any system of taxation has income effects, and one distinguishes among tax systems by the “excess burden,” which refers to distortions in compensated demands.

The pattern of substitutes and complements, in general, is central to the relevant notion of elasticity. For example, away from the optimum, a small increase in indirect taxation may yield a great deal of revenue at little cost, if it leads to a sharp switch in demand to goods that are heavily taxed. One must beware of notions of increasing marginal distortion. In a second-best world, one cannot, for example, assume that a reduction in indirect taxes and an increase in lump-sum taxes increases welfare (see Atkinson and Stern (1974))—although under certain circumstances it will, and one can apply the theory of reform (see Section V) to check whether such a switch will be desirable.

The notion underlying the first principle has been appreciated for a considerable time (by Henry George, Wicksell, Hotelling, and so on), but its application to indirect taxes and income taxes requires care. For example, it was seen in the preceding discussion of the income tax that 100 percent marginal taxation would be indicated where the compensated elasticity of substitution between consumption and leisure is zero. This is not the same as a vertical supply curve for labor, which involves simply the balancing of income and substitution effects.

(2) Again, the basic theorem of welfare economics illustrates the second principle, in that distribution would be dealt with entirely through lump-sum taxes. And it is illustrated by the theorems of the subsection on the combination of income and commodity taxes, where the optimum income tax was the only policy tool required when differences arose solely in earning capacity. But it was seen that this result required other very strong assumptions concerning the structure of preferences. Thus, while the target-instrument approach can point to certain taxes, it should never delude the policymaker in a second-best world into thinking that he can forget about a target, such as distribution, after mentally allocating some tax to it. Where the instruments used are imperfect, it is necessary to consider all objectives in the study of any one instrument.

(3) The third principle is closely linked to the second. It has been seen that the desirability and structure of a differential system of commodity taxes depended crucially on the assumptions concerning the existence of the income tax and, indeed, on the type of income tax available. Taxation of necessities may be attractive where the revenue is used to provide a lump-sum grant but unattractive where no such lump-sum grant is possible. A narrow view of targets and instruments or of the policy tools available runs the risk of considerable error.

desirability of uniform taxation

Attention is focused now on the question of the desirability, or otherwise, of uniform taxation. One should distinguish sharply at the outset between consumer and producer taxation. In Sections II and III, it was seen that indirect taxes should, where possible, be concentrated on final goods only. Thus, apart from special or particular arguments, the discussion here concerns taxes that fall on goods equally, regardless of origin. This means that tariffs should be rebated on intermediate goods where it is possible to tax the final goods directly.

The discussion begins by questioning whether the taxes on final goods should be uniform. In general, the results from the many-person Ramsey analysis in the subsection on the combination of commodity and income taxation indicate that there is no presumption in favor of uniform taxes. It has been seen that the Ramsey rule balances two considerations. On the one hand, inelasticities in the sense of equal proportional reductions of quantities are exploited, but, on the other, demand is reduced less for those goods consumed by those who are worse off. The manner in which these two effects combine depends on the social values and the structure of demands for different individuals and groups.

The Ramsey rule is modified in an important way if income taxes are allowed, and it has been seen that in very special circumstances uniformity might be desirable. For the linear income tax, these circumstances involve, for example, differences in income across individuals arising only from the wage rate, a special structure of preferences (essentially the linear expenditure system), and marginal propensities to spend on each good being identical across individuals. The special nature of the conditions implies that uniformity is a poor guide for developing countries. Individuals differ in many ways, particularly in endowments (of land, for example), but also in preferences (religion, caste, and education, for example, may have an important bearing). And the income tax is often an instrument of marginal importance. Further, the linear expenditure system (as Deaton (1974) has argued persuasively) is an implausible representation of demands.

It is much more difficult to prescribe, however, than to point to the inadequacies of other prescriptions. The derivation of the appropriate set of commodity taxes requires information concerning patterns of complements and substitutes that is very difficult to extract from the data. And attempts to do this require specifications of functional forms that (as seen in the subsection on the combination of commodity and income taxes) may have a profound effect on the recommendations. As Deaton (1981, p. 1245) points out: “In consequence, it is likely that empirically calculated tax rates, based on econometric estimates of parameters, will be determined in structure, not by the measurements actually made, but by arbitrary, untested (and even unconscious) hypotheses chosen by the econometrician for practical convenience.” One way to go forward is the analysis of reform (small movements from a given starting position), which is described briefly in the next section.

It has been seen from the discussion in Section III that there is absolutely no presumption that uniformity of taxes on intermediate goods is desirable. In general, taxes on intermediate goods lead to inefficiencies, and there is no reason to suppose that uniform taxes lead to any less inefficiency than some arbitrary set. Taxation of intermediate goods should be avoided unless taxing a particular final good is difficult (and its inputs might then be taxed) or unless it improves the distribution of profits where this is not possible by other means.

behavior of the marginal rate

It was seen in the subsection on the combination of commodity and income taxes how the marginal rate of the linear income tax increases with the revenue requirement and the aversion to inequality and how it decreases with the elasticity of substitution between consumption and leisure. The discussion of the nonlinear tax showed how the intuition on optimizing functions needs to be tutored carefully. There is no presumption, for example, of an increasing marginal rate. Indeed, the optimum schedule, in general, shows first an increasing marginal rate and then a decreasing one. And in the calculations of Mirrlees (1971), the peak of the marginal rate was fairly centrally placed. One should be very careful, however, to note that this behavior of the marginal rate should not be seen to offend against any notion of the desirability of progression. Such notions should be related to the average rate; it is quite possible for the marginal rate to have the shape required, but for the average rate to be increasing much of the way. For this it is necessary only that the marginal should exceed the average and, where there is a uniform lump-sum grant, this is quite likely to be the case over a big range. But note that, in any case, a statement concerning the desirability, or otherwise, of an increasing average rate should itself be derived from a model concerning incentives and distribution and should not immediately be assumed to be obvious.

A marginal rate that at first increases and then decreases is in striking contrast to the apparent state of affairs in the United Kingdom, where means-tested benefits give high marginal rates at the bottom, and the income tax schedule shows increasing marginal rates. One suspects that the policy has not been designed in any systematic way.

V. Tax Reform

Tax reform means a movement away from some given status quo. This paper concentrates on marginal movements. The methods are sketched only briefly here in order to illustrate an application and possible additional applications. (Further discussion may be found in Ahmad and Stern (1983 a and b).) Suppose that some vector of tax tools t is in operation, then the resulting level of social welfare is V(t) and that of government revenue is R(t). V(t) can be thought of as being defined by a Bergson-Samuelson social welfare function, as before. Consider an increase in the i th tax ti that is sufficient to raise one dollar of extra revenue. The rate of change of revenue with respect to the tax is ∂R/∂ti; hence, to raise one extra dollar, one must increase the tax by (∂R/∂ti)−1.

The rate of change of welfare with respect to the tax is ∂V/∂ti. The fall in welfare λi, is defined as the reduction in V consequent upon raising one more dollar by increasing the tax on the i th good:


One may think of λi as the marginal cost in terms of social welfare of raising one more dollar from the ith tax. If the marginal cost for tax i exceeds that for tax j, then a beneficial reform is to switch taxation on the margin from i to j. Thus, if λi > λj, there is a gain in welfare of λiλj from raising one more dollar via tax j and one less dollar via tax i. More generally, of any reform Δt, one questions its consequences for welfare ΔV and for revenue ΔR; it is beneficial if ΔV>0 and ΔR≧0. The statistics λi guide in the selection of beneficial reforms.

As there is, in general, a whole collection of beneficial reforms, one should not expect uniqueness. Choice among beneficial reforms is usually based on criteria that cannot be put directly into the model. Second-best analysis in this case provides a range of desirable options and, thus, is far from being nihilistic or pessimistic, as it is sometimes portrayed.

The optimum is the state of affairs from which no beneficial reform is possible; thus, the theories of optimality and of reform are very close. Here, optimality requires that all the λi, are equal (calling the common value λ); thus


This is precisely the first-order condition for optimality that emerges from the equation:


Note that away from the optimum there are as many marginal costs of public funds as there are tax tools, and it is misleading to speak of a unique marginal cost of public funds. The Ramsey rule, the many-person Ramsey problem, and the linear income tax are all examples of optimizing models that take the form of equation (23).

application in india

In work described in Ahmad and Stern (1983 a and b), this framework was applied to the question of tax reform in India. The question of resource mobilization was approached by asking about the marginal cost, in terms of social welfare, of raising revenue by different means. This included the comparison of taxation of different goods, of state and central taxes, and of indirect taxes and the income tax. For indirect taxes, for example, at fixed producer prices, the equations are




where βh is the social marginal utility of income for household h and the other notation is as in Section II. Equation (24) may be derived intuitively by noting that an increase in the price of good i affects household h in money terms by the amount xih it consumes. The number βh (a value judgment) converts the money measure into social welfare. The indirect taxes are to be selected by the decision maker. Equations (24) and (25) yield λi, as in equation (22).

The data requirements are, then, a consumer expenditure survey for the xih (and thus Xi), knowledge of the tax rates t, and aggregate demand responses ∂Xj/∂ti. For many countries, some information on all these things is likely to be available. Only aggregate demand elasticities are necessary and may be estimated from time-series data. Given that tax design, not short-term demand management, is at issue here, one requires, in principle, medium-term or long-term elasticities. The value judgments βh should be the subject of sensitivity analysis to show how results vary in response to different specifications.

A major effort was required in the above application to India to calculate the tax rates. Note that the tj in equation (25) represents taxes actually levied on final goods. Thus, it is necessary to work with actual tax collections and to calculate the effects of taxing intermediate goods on taxes effectively levied on final goods. These are called “effective taxes.” This involves a specification of the input-output process.

The method could be applied in practice, and it led to a number of conclusions. First, the calculation of effective taxes provided useful information in itself. Often governments do not know the effects of their taxes on intermediate goods. In India, the central excise tax is concentrated on the production of a number of basic goods. These are involved in many production processes, so that the excise taxes spread out through the input-output system. This leads to effective taxation of some goods that are notionally subsidized. Further, the resulting system is much less progressive than might be assumed at first sight. In many cases, the effective taxes on domestic production are higher than on imports and higher than export rebates.

The marginal social cost of taxing different goods is quite sensitive to distributional value judgments. Cereal subsidies, for example, would be unattractive if one had little concern for inequality but they would be more attractive otherwise.5 Increases in income tax have lower social cost than indirect taxes. The state sales tax seems a more attractive source of extra revenue (lower marginal social cost) than the central excise tax. The results were not very sensitive to changes in estimates of the aggregate demand elasticities; changes in the term ∑jtj(∂Xj/∂ti) (see equation (25)) seemed less important than those in βh (see equation (24)) over plausible ranges. Note that the consumer expenditures xih are not varied when aggregate elasticities are changed.

Thus, it may be concluded that the reform approach can be applied in a way that produces useful results for policy.

other applications

Applications of the methods of modern public economics to a broad range of problems will be included in a forthcoming book.6 In addition to developing the basic theory, applications will be included concerning the Indian tax system, agricultural pricing in Korea, energy pricing in Thailand, education in Kenya, and so on. Thus, it can be seen that the methods are being applied to many different problems of taxation and pricing in various countries and are likely to be applied extensively in the future.

VI. Concluding Remarks

As a number of the general conclusions have been presented in Section IV, they are not repeated here. The purpose of the paper has been to develop and to explain the main results of the modern theory of optimum taxation and to show how they might be applied to guide tax policy.

It has been shown, on the one hand, that the theory implies that a number of simple statements, such as “efficiency requires uniform commodity taxes” or “egalitarianism implies increasing marginal income tax rates,” must be treated with great circumspection. On the other hand, it has been argued that the theory did yield a number of general principles (see Section IV) that are useful in guiding the practical decision maker. Further, applications to detailed calculations of possible tax reforms are possible and have been carried out in a number of contexts.

In conclusion, however, it is desirable to indicate important aspects of the theories that have been left out, at least up to now. First, the theories are medium term in scope. They do not refer to short-run stabilization policy and, as yet, have not been directed toward considerations of growth. Second, administrative costs have been ignored. But those interested in short-run management should surely be informed by a view of where it is desirable to go in the medium term. And the discussion of administration should be influenced by a judgment as to which taxes are attractive in light of the ethical values and the application of economic analysis.


  • Ahmad, E., and Nicholas H. Stern (1983 a), Effective Taxes and Tax Reform in India, Discussion Paper No. 25 (Coventry, England: University of Warwick, Development Economics Research Center, 1983).

    • Search Google Scholar
    • Export Citation
  • Ahmad, E., and Nicholas H. Stern (1983 b), Tax Reform, Pareto Improvements, and the Inverse Optimum, Discussion Paper No. 30 (Coventry, England: University of Warwick, Development Economics Research Center, 1983).

    • Search Google Scholar
    • Export Citation
  • Atkinson, Anthony B., On the Measurement of Inequality,” Journal of Economic Theory (New York), Vol. 2 (September 1970), pp. 24463.

    • Search Google Scholar
    • Export Citation
  • Atkinson, Anthony B., Optimal Taxation and the Direct Versus Indirect Tax Controversy,” Canadian Journal of Economics (Toronto), Vol. 10 (November 1977), pp. 590606.

    • Search Google Scholar
    • Export Citation
  • Atkinson, Anthony B., and Nicholas H. Stern, Pigou, Taxation, and Public Goods,” Review of Economic Studies (Clevedon, England), Vol. 41 (January 1974), pp. 11928.

    • Search Google Scholar
    • Export Citation
  • Atkinson, Anthony B., and Nicholas H. Stern, and Joseph E. Stiglitz, Lectures on Public Economics (New York: McGraw-Hill, 1980).

  • Boiteux, M., Sur la gestion des monopoles publics astreints à l’equilibre budgétaire,” Econometrica (Evanston, Illinois), Vol. 24 (January 1956), pp. 2240. English translation, under titleOn the Management of Public Monopolies Subject to Budgetary Constraint,” in Journal of Economic Theory (New York), Vol. 3 (September 1971), pp. 21940.

    • Search Google Scholar
    • Export Citation
  • Buchanan, James M., and Robert D. Tollison, eds., Theory of Public Choice: Political Applications of Economics (Ann Arbor, Michigan: University of Michigan Press, 1972).

    • Search Google Scholar
    • Export Citation
  • Corlett, W.J., and D.C. Hague, Complementarity and the Excess Burden of Taxation,” Review of Economic Studies (Clevedon, England), Vol. 21 (1) (1953), pp. 2130.

    • Search Google Scholar
    • Export Citation
  • Dalton, Hugh, Principles of Public Finance, 4th ed. (London: Routledge and Kegan Paul, 1970).

  • Deaton, Angus, A Reconsideration of the Empirical Implications of Additive Preferences,” Economic Journal (London), Vol. 84 (June 1974), pp. 33848.

    • Search Google Scholar
    • Export Citation
  • Deaton, Angus, Optimally Uniform Commodity Taxes,” Economics Letters (Amsterdam), Vol. 2 (No. 4, 1979), pp. 35761.

  • Deaton, Angus, Optimal Taxes and the Structure of Preferences,” Econometrica (Evanston, Illinois), Vol. 49 (September 1981), pp. 124560.

    • Search Google Scholar
    • Export Citation
  • Diamond, Peter A., and James A. Mirrlees, Optimal Taxation and Public Production: Part I—Production Efficiency; Part II—Tax Rules,” American Economic Review (Nashville, Tennessee), Vol. 61 (March and June 1971), pp. 827 and 26178.

    • Search Google Scholar
    • Export Citation
  • Drèze, Jacques H., Some Postwar Contributions of French Economists to Theory and Public Policy, with Special Emphasis on Problems of Resource Allocation,” American Economic Review (Nashville, Tennessee), Vol. 54 (June 1964, Part 2), pp. 164.

    • Search Google Scholar
    • Export Citation
  • Feldstein, Martin S., Distributional Equity and the Optimal Structure of Public Prices,” American Economic Review (Nashville, Tennessee), Vol. 62 (March 1972), pp. 3236.

    • Search Google Scholar
    • Export Citation
  • Guesnerie, Roger, and Jesus Seade, Nonlinear Pricing in a Finite Economy,” Journal of Public Economics (Amsterdam), Vol. 17 (March 1982), pp. 15759.

    • Search Google Scholar
    • Export Citation
  • Harberger, Arnold C., Monopoly and Resource Allocation,” American Economic Review, Papers and Proceedings of the Sixty-Sixth Annual Meeting of the American Economic Association, Washington, D.C., December 28–30, 1953 (Nashville, Tennessee) Vol. 44 (May 1954), pp. 7787.

    • Search Google Scholar
    • Export Citation
  • McCulloch, John Ramsey, A Treatise on the Principles and Practical Influence of Taxation and the Funding System, Scottish Economic Society Edition, ed. by D.P. O’Brien (Edinburgh: Scottish Academic Press, 1975). Reprint of 1863 edition (Edinburgh: A. and C. Black).

    • Search Google Scholar
    • Export Citation
  • Mirrlees, James A., An Exploration in the Theory of Optimum Income Taxation,” Review of Economic Studies (Clevedon, England), Vol. 38 (April 1971), pp. 175208.

    • Search Google Scholar
    • Export Citation
  • Mirrlees, James A., The Theory of Optimal Taxation,” Chapter 24 in Handbook of Mathematical Economics, Vol. 3, Part 4, ed. by Kenneth J. Arrow and Michael D. Intriligator (Amsterdam: North-Holland, 1981; New York: American Elsevier, 1981).

    • Search Google Scholar
    • Export Citation
  • Musgrave, Richard A., The Theory of Public Finance: A Study in Public Economy (New York: McGraw-Hill, 1959).

  • Musgrave, Richard A., and Peggy B. Musgrave, Public Finance in Theory and Practice, 4th ed., (New York: McGraw-Hill, 1984).

  • Musgrave, Richard A., and Peggy B. Musgrave, and Alan T. Peacock, eds. Classics in the Theory of Public Finance (London: Macmillan, 1967; New York: St. Martin’s Press, 1967).

    • Search Google Scholar
    • Export Citation
  • Pigou, Arthur C., The Economics of Welfare, 4th ed., enlarged (London: Macmillan, 1960).

  • Prest, Alan R., Public Finance in Theory and Practice, 5th ed. (London: Weidenfeld and Nicolson, 1975).

  • Ramsey, F.P., A Contribution to the Theory of Taxation,” Economic Journal (London), Vol. 37 (March 1927), pp. 4761.

  • Rosen, Harvey S., The Measurement of Excess Burden with Explicit Utility Functions,” Journal of Political Economy (Chicago), Vol. 86 (April 1978, Part 2), pp. S121S135.

    • Search Google Scholar
    • Export Citation
  • Seade, J.K., On the Shape of Optimal Tax Schedules,” Journal of Public Economics (Amsterdam), Vol. 7 (April 1977), pp. 20335.

  • Sen, Amartya K., and Bernard Williams, eds., Utilitarianism and Beyond (Cambridge, England, and New York: Cambridge University Press, 1982).

    • Search Google Scholar
    • Export Citation
  • Shepherd, David, Jeremy Turk, and Aubrey Silberston, eds., Microeconomic Efficiency and Macroeconomic Performance (Dedington, Oxford, England: Philip Allan, 1983).

    • Search Google Scholar
    • Export Citation
  • Shoven, John B., Applied General Equilibrium Tax Modeling,” Staff Papers, International Monetary Fund (Washington), Vol. 30 (June 1983), pp. 35093.

    • Search Google Scholar
    • Export Citation
  • Stern, Nicholas H., On the Specification of Models of Optimum Income Taxation,” Journal of Public Economics, Papers on Taxation Theory from the International Seminar in Public Economics, Paris, January 1975 (Amsterdam), Vol. 6 (July/August 1976), pp. 12362.

    • Search Google Scholar
    • Export Citation
  • Stern, Nicholas H., Welfare Weights and the Elasticity of the Marginal Valuation of Income,” in Studies in Modern Economic Analysis: Proceedings of the Annual Conference of the Association of University Teachers of Economics, Edinburgh, 1976, ed. by Michael J. Artis and A. Robert Nobay (London: Basil Blackwell, 1977).

    • Search Google Scholar
    • Export Citation
  • Stern, Nicholas H., Optimum Taxation with Errors in Administration,” Journal of Public Economics (Amsterdam), Vol. 17 (March 1982), pp. 181211.

    • Search Google Scholar
    • Export Citation
  • Wicksell, K., A New Principle of Just Taxation,” in Classics in the Theory of Public Finance, ed. by Richard A. Musgrave and Alan T. Peacock (London: Macmillan, 1967; New York: St. Martin’s Press, 1967), pp. 72118. Originally published in German in 1896.

    • Search Google Scholar
    • Export Citation

Mr. Stern, Professor of Economics and Director of the Development Economics Research Center at the University of Warwick, Coventry, England, was an undergraduate at Cambridge University and received his doctorate from the University of Oxford. He is editor of the Journal of Public Economics. This paper was written while he was a Visiting Scholar in the Fiscal Affairs Department of the Fund during the summer of 1983. The author is grateful for the comments of A.B. Atkinson, A.S. Deaton, J.A. Mirrlees, D.M.G. Newbery, and participants at a seminar in the Fiscal Affairs Department of the Fund, where the paper was presented on August 4, 1983.


Some of the presentation of standard theory in this section is taken from the author’s chapter, entitled “Taxation for Efficiency,” in Shepherd, Turk, and Silberston (1983).


The presentation and discussion of Figure 1 is taken from the author’s chapter, “Taxation for Efficiency,” in Shepherd, Turk, and Silberston (1983).


For references to the early debate, see Atkinson and Stiglitz (1980, Lecture 13), and Musgrave (1959, Chapter 5). Quotations are from Adam Smith, The Wealth of Nations (New York: G.P. Putnam’s Sons, 1904), Vol. 2, p. 310; and John Stuart Mill, Principles of Political Economy (New York: Collier and Son, 1900), Vol. 2, p. 308.


Deaton emphasizes linearity, but the proof also uses the assumption that the curves are identical.


Subsidies become attractive for value judgments that involve the social marginal utility of income falling faster than the inverse of income. See the discussion following Table 1.


Newbery, D.M.G., and Nicholas H. Stern, eds., The Theory of Taxation for Developing Countries (Washington: World Bank, forthcoming).