Mechanism Design Made Simple

One of the most elegant corners of economic theory is mechanism design. Its apparent generality combined with the simplicity of its answers makes it an exciting field to study. But many expositions are either not rich enough to convey the beauty of the theory or too technical, relying on the tools of convex analysis, sometimes at the cost of obscuring the underlying economic intuition. In this post, I will attempt to find a middle ground. I will present an exposition of a special case of the general mechanism design problem that is accessible at the level of anyone who is able to work through Hal Varian’s textbook. As a result, I will not always define my terms with absolute mathematical rigor, but I will strive to be precise enough to convey the big ideas clearly.

The setting we will focus on is that of quasi-linear utility with a one dimensional type space. Specifically, let q be an outcome in \mathbb R^+ such that associated with each agent is a utility function

U = \theta q - c

where c is the cost that the agent must pay. All this is saying is that we assume our agents to be indifferent between getting another unit of q or getting \theta dollars. Another way of saying this is that the marginal rate of substitution remains fixed for each agent.

As we will see, this functional form assumption will buy us a lot of tractability in our models, but it may seem a bit restrictive. However, when q is a relatively small part of what the agents are spending money on, it may be a reasonable first order approximation. To see why, consider motivating the demand for p via the canonical utility maximization problem:

(q,\mathbf x) = \underset{q,\mathbf x}{\mathrm{argmax}}\, U(q, \bar x)\quad \mathrm{s.t.}\ (q,\mathbf x)\cdot (p,\mathbf{p_x}) \leq M

Define the partial indirect utility function as

V(q,p) = \underset{\mathbf x}{\mathrm{max}}\, U(q, \bar x)\quad \mathrm{s.t.}\ (q,\mathbf x)\cdot (p,\mathbf{p_x}) \leq M

Assuming an interior solution, the envelope theorem gives us

V_q = U_q - \lambda p

V_p = -\lambda q

where \lambda is the associated Lagrange multiplier with the optimization problem in the definition of V. Then Taylor expanding the indirect utility function around q^*, p^* gives

V(q,p) \approx V(q^*, p^*) + (p^* - p) \lambda q^* + (q - q^*)(U_q(q^*) - \lambda p^*)

Now, if we take q^*, p^* be such that U_q(q^*) - \lambda p^* > 0 (if the optimization problem is convex, this corresponds to the situation when q^* is less than the optimal quantity of q), and normalize the utility function so that all terms not involving p, q cancel to 0, we get

V(q,p) \approx q (U_q(q^*) - \lambda p^*) - \lambda p q^* \equiv \theta  q - \lambda c

Picking a monotone transformation of the utility function so that \lambda = 1, we get

V(q, p) \equiv V(q, c) = \theta q - c

Again by envelope theorem, the case that \lambda = 1 is precisely the case where the marginal utility of wealth is 1, which confirms the intuition that the form for quasilinear utility above simply denominates utility in dollar terms.

Remark 1: It is notable that in fact, we have the identity that U_q(q^*) - \lambda p^* = \theta in the above making \theta equal to the marginal benefit minus the marginal cost of consuming another unit given prices p^*.

Remark 2: In light of the fact that we always chose the normalization \lambda = 1 to coerce the utility function to be of the form U = \theta q - p (i.e. with no multiplier in front of p), a common criticism against quasilinear utility is precisely that it ignores wealth effects. Choosing this normalization is equivalent to identifying utility with willingness to pay, which may in turn be dependent on wealth. As a result, the welfare implications of the theory taken too seriously can be disagreeable, depending on your philosophical views on the optimal distribution of wealth. See for example, this paper.

Having provided some justification for why quasi-linear utility makes sense as an approximation, let us turn to the actual theory. Mechanism design problems can come in a variety of flavors, but here are some common features:

  1. Agents have utility functions based on some type.
  2. There is some sort of information asymmetry about types – agents typically know more about their own type than others, but the distributions of types may be known and common knowledge.
  3. A designer wants to design a game such that when agents play according to some solution concept of the game, the outcome achieves some sort of goal.

We will focus on a very simple example, but note that much of the formalism here carries over to other settings. We consider the case of a monopolist selling to some mass of agents. We will assume that agents have quasilinear utilities parameterized by their type \theta so that U = \theta q - T where q is the amount of product received and T is the tariff. We assume that \theta \in [\underline{\theta},\overline{\theta}], and is drawn from a distribution with density f(\theta) and CDF F(\theta) known to the monopolist (this compact support assumption is not necessary, but will allow us to avoid some technicalities with swapping integration and differentiation). The monopolist does not know the type of any given individual, and therefore must set up some sort of game where agents play, and as a result of the game, they end up paying T for some quantity q. Assume, additionally, that the monopolist faces some cost function c(q), so it is trying to maximize expected profits: \mathbb E[T - c]. An outcome is called implementable if it is a Nash equilibrium of some game the monopolist plays. The space of possible actions seems quite large at first, but is simplified by this staple of mechanism design:

Theorem 3 (Revelation Principle): The monopolist can do no better than to only consider the set of games of the form:

  1. Each agent tells the monopolist their type
  2. According to their type, the monopolist charges tariff T(\theta) and gives quantity q(\theta) so that each agent can do no better than to honestly report their type.

Sketch: The idea is simply that any complicated game is isomorphic to a simple decision rule of the above form. In particular, suppose we have any such game. Then an alternative game would be simply that before the game, each agent independently tells the monopolist their type, and the monopolist plays a Nash equilibrium of the game for them. Then it sets T(\theta) and q(\theta) to be the allocation based on this equilibrium. Honest type reporting is exactly what it means to be in equilibrium.

Such a pair (T(\theta),q(\theta)) is called an incentive compatible direct mechanism (ICDM). Pricing schemes which satisfy the above condition are called incentive compatible (IC). Let us derive the first and second order conditions for what an ICDM must look like. Specifically, we have that for an individual with fixed type \theta, they must choose a reported type r to maximize U(r) = \theta q(r) - T(r). Assuming that T,q are smooth, we have first order condition:

\theta q'(\theta) = T'(\theta)

Integrating this, we get that a necessary condition for incentive compatibility to hold for all types is that

T(\theta) = z + \int_{\underline\theta}^{\theta} \theta' q'(\theta')\,\mathrm d\theta' = z + q(\theta) \theta - \int_{\underline\theta}^{\theta} q(\theta')\,\mathrm d\theta'

where the last equality is from integration by parts. We can restrict the space of incentive compatible solutions even further. First, differentiating the above integral equation twice, we get

T''(\theta) = \theta q''(\theta) + q'(\theta)

Then, looking at the second order condition, we have that we need

0 \geq U''(\theta) = \theta q''(\theta) - T''(\theta) = -q'(\theta) \implies q'(\theta) \geq 0

so actually, we need that individuals who value the good more get more good. This condition is intuitive – if not, then it is always possible to find an individual who can pretend to be a lower type and get more of the good for below their willingness to pay. This shows that when q'(\theta) \geq 0 and T'(\theta) = \theta q'(\theta), an individual of type \theta is at a local maximum when they report their true type. To see that this is also a global maximum, notice that

U'(r) = \theta q'(r) - T'(r) = (\theta - r) q'

so U'(r) \geq 0 whenever \theta > r and U'(r) \leq 0 whenever \theta < r.

Remark 4: One reason why the above derivation is not used is that it assumes second order differentiability when in reality, such a priori smoothness restrictions are not really necessary. For non-pedagogical purposes, the convex analysis derivations are considerably more rigorous and general. Nonetheless, doing the derivation this way makes it more apparent how this theory relates to the settings one might encounter in an intermediate micro class.

After incentive compatibility, it is sensible to impose one more condition known as individual rationality (IR). In particular, we assume that each individual always has the choice to not buy from the monopolist, and we normalize the value of the outside option to be 0. Defining

V(\theta) \equiv \theta q(\theta) - T(\theta)

we need to have that V(\theta) \geq 0. Notice that V satisfies

V'(\theta) \equiv \theta q'(\theta) + q(\theta) - T'(\theta) \geq 0

so satisfying IR for individuals of type \underline \theta suffices to satisfy IR for all other individuals. In fact, having schedule q(\theta), it is clearly optimal to set p(\underline\theta) = 0 since otherwise, it is possible to shift the entire schedule of prices upwards and capture more profits. Plugging this into our integral equation thus gives z = 0. Summarizing then, insisting on IR and IC means that once we fix a schedule mapping type to quantity, we immediately have fixed:

T(\theta) = \theta q(\theta)+ \int_{\underline\theta}^\theta q(\theta')\,\mathrm d\theta'

Our profit function is therefore given by

\pi(q) = \int_{\underline \theta}^{\overline\theta} \left[\theta q(\theta) -  \int_{\underline\theta}^\theta q(\theta')\,\mathrm d\theta' - c(q(\theta))\right]f(\theta)\,\mathrm d\theta

Let us call the set of IC quantity functions \mathscr C.

Exercise 5: Show that \mathscr C is convex.

Optimizing this function seems intimidating at first, since we are trying to choose an entire function (and hence must search over an infinite dimensional space). However, a simple trick from non-linear functional analysis will allow us to easily characterize a solution. The idea is fairly simple – we need to solve this “infinite dimensional” optimization problem, but a solution to this infinite dimensional problem must also be a solution to a single dimensional optimization in the following sense: let h(\theta) be some function. Then we must have that for all \alpha such that q + \alpha h is non-decreasing,

\Delta(q;h;\alpha) \equiv \pi(q + \alpha h) \leq \pi(q)

Based on this observation, we define the Gateaux derivative in the direction of h to be

D(q;h) = \lim_{\alpha \downarrow 0} \Delta(q;h;\alpha)

Exercise 6: Show that

D(q;h) = \int_{\underline \theta}^{\overline\theta} \left[\theta h(\theta) -  \int_{\underline\theta}^\theta h(\theta')\,\mathrm d\theta' - c'(q(\theta)) h(\theta)\right]f(\theta)\,\mathrm d\theta

From this, we get the usual first order condition that D(q;h) = 0 for any “direction” h that stays within C for sufficiently small \alpha, we must have D(q;h) = 0. In particular, the space spanned by h_t(\theta) = \mathbf 1\{\theta \geq t\} is dense in the space of possible directions, so we can restrict our attention to checking first order conditions for these. Using exercise 5, we have that for any h_t such that q - \alpha h_t is non-decreasing for sufficiently small \alpha (call any such h an admissible direction), we must have the first order condition:

0 = D(q;h_t) = \int_t^{\overline\theta} \left[\theta - (\theta - t) - c'(q(\theta))\right]f(\theta)\,\mathrm d\theta = \int_t^{\overline\theta} \left[t - c'(q(\theta))\right]f(\theta)\,\mathrm d\theta

To make this first order condition a bit more interpretable, we make the following computation (which one can derive, for example, using an integration by parts argument):

\int_t^{\overline\theta} \left[t - c'(q(\theta))\right]f(\theta)\,\mathrm d\theta = \int_t^{\overline\theta} \left[\frac{(1 - F(t))}{\theta f(\theta)} + 1 - \frac{c'(q(\theta))}{\theta}\right]\theta f(\theta)\,\mathrm d\theta = \int_t^{\overline\theta} \left[\frac{(1 - F(t))}{\theta f(\theta)} + \frac{\theta - c'(q(\theta))}{\theta}\right]\theta f(\theta)\,\mathrm d\theta

It might seem somewhat unnatural at first that we would want to write our “first order condition” in this way, but let’s look a bit more closely at the expression in the brackets.

In particular, we use the identity p'(\theta) = \theta q'(\theta) to get \theta = \frac{p'}{q'} = \frac{\partial p}{\partial q}. Additionally, we can think of the complementary CDF 1 - F(\theta) as a sort of “demand function” (in an auction theory context where each individual has quasilinear preferences, but unit demand, this is in fact exactly the demand function). Then it is easy to see that the first term in the above is nothing more than a negative inverse elasticity of demand. And the second term becomes

\dfrac{\frac{\partial p}{\partial q} - \frac{\partial c}{\partial q}}{\frac{\partial p}{\partial q}}

So \frac{\theta - c'(q(\theta))}{\theta} is nothing more than a generalized Lerner index! In fact, if all directions are admissible, we in fact have

-\frac{1}{\varepsilon(\theta)} = \frac{\theta - c'(q(\theta))}{\theta},\quad \forall \theta

which is just a functional Lerner rule. Using the notation introduced earlier, the condition that all directions are admissible is another way of saying that the solution lies on the interior of \mathscr C. So after all of this work, we’ve basically come to discover that what one might term the “fundamental theorem of economics” holds in this setting as well: for interior solutions, set marginal benefits equal to marginal costs. The fact that \mathscr C is convex now plays an important role. In particular, it shows that this business of checking directions is sufficient. In particular, if \pi(\alpha) = \pi(q + \alpha h_t) is a convex function (where we take the domain to be all \alpha where q + \alpha h_t is non-decreasing) for all t, then q satisfying the above for all h_t is indeed an optimum. But now, notice that in fact, we have

\pi''(\alpha) = -\int_t^{\overline\theta} c''(q(\theta) + \alpha)

In particular, under the assumption of a convex cost function, it is clear that the first order condition is, in fact, sufficient.

Remark 7: An attentive reader might wonder, what happens when the cost function is linear? This example actually serves as a good warning to where quasi-linear models can break down. In this case, the issue arises because under quasi-linearity, demand curves under linear pricing are undefined. As a result, the model will say that the monopolist can sell an infinite amount of a good to some subset of consumers by pricing below their rate of substitution. Of course, by the time that the monopolist tries to sell such a quantity, the first order approximation we used to justify quasi-linearity starts to break down.

Remark 8: If the monopolist does face a linear cost function, but we still want to use the quasilinear setting, we will need to constrain demand in some way so that we are never taken to a point where the appropriateness of quasilinear utility breaks down. For example, if we constrain each individual to demand at most a single unit (or maybe a small number of units), we end up in the world of auction theory.

Exercise 9: In this exercise, we will derive another way to try to make the monopolist’s problem well defined under constant marginal costs. In the beginning of the post, I referred to q as an “outcome” instead of the implied interpretation as “quantity”. The utility of this way of thinking about q will become clear here. In particular, suppose instead that U = \theta g(q) - p for some increasing function g.

  1. Derive a new characterization of incentive compatibility as an integral equation.
  2. Using this new characterization, repeat the arguments above for characterizing the “first order condition” and derive the new “Lerner” rule for interior solutions under this new utility function.
  3. What conditions on g guarantee that the first order condition is sufficient.

Remark 10: There is a geometric way to interpret the results of exercise 8. We can think of g as encoding some sort of deformation of outcome space with the new functional form for U representing the first order approximation of the utility function with respect to this deformed space. For example, if g = \log, the deformation in question is to turn absolute changes into percent changes. A well chosen deformation will allow us to pick a convenient shape for c(g^{-1}(q)) to make the model well defined while still being able to justify interpreting the utility function as some sort of first order approximation.

Exercise 11: In usual producer theory, it is often useful to be able switch between thinking about the firm as setting price or as setting quantity. The same skill can be useful in this setting. We begin by reparametrizing quasilinear utility: U = q - \eta p. Clearly, we have the identity \eta =  \frac1\theta.

  1. Let g(\eta) be the pdf of the distribution of \eta (with cdf G). Use the change of variables formula to express g in terms of f.
  2. Repeat the IC/IR characterization above to get an expression for what q must be given a choice of p.
  3. Using the result from 2, write profit as a function of p
  4. Take the first order conditions for an interior solution and re-derive the Lerner rule when we think of the monopolist as setting price.

A final consideration we must think about is what happens when our solution is not interior to \mathscr C. This happens if and only if q(\theta) has some constant interval. In this case, it may be possible that the integral above is \leq 0. This is really no different for how the KKT conditions may be inequalities at certain boundaries. In these cases, the analysis features some tedious reasoning about what the shape of q must look like, but since I find the arguments less interesting, I will simply refer the interested reader to the original paper that this post was inspired by.

Before I conclude, I wanted to add a bit about further directions one could take these arguments that are not pursued in the Mussa and Rosen paper linked above. In particular, the reason why the directional derivative argument worked so well here is that the IC restriction gave us a convex set \mathscr C with a tractable but dense set of directions that kept us in the relative interior of \mathscr C. It may be worth thinking about when further restrictions on \mathscr C still retain this convexity and tractability property and when these restrictions correspond to economic limitations (or when convexity is even needed at all). An example that comes to mind is: what if we restrict ourselves to pricing with a finite menu of two-part tariffs? Such extensions may be an interesting avenue for making the above theory tractable in applied settings.

We started off this post by thinking about a monopolist who wanted to price discriminate. As a theoretical crutch, we showed that we can always think of this problem as the monopolist asking each consumer “how much do you like this good” and giving appropriate prices and quantities so as to incentivize truth telling. After carrying through the analysis, we showed that the optimal scheme is just characterized by a generalized Lerner rule. But the form of the arguments above are far more general than a monopolist’s problem. They apply, generally, to any situation where some decision maker needs to elicit information to determine optimal policy subject to incentive constraints. Imagine, for instance, a local government that wants to build a small park. An important question is if it is possible to build the park in a way that

  1. Assigns higher taxes to people who want the park more
  2. Is budget balanced
  3. Does not coerce anyone into participating

and if not, what is the set of feasible alternatives. Mechanism design provides a common framework through which all sorts of questions like this can be answered.

Leave a comment