Delta Hedging, Stochastic Calculus, and Black-Scholes

I will preface this post by saying that the idea is entirely unoriginal. The main goal we will be building towards here is an intuitive understanding of the Black-Scholes equation. But this is a grossly overdone topic by now. Wikipedia, as well as countless other sources all offer their own intuitive derivations. My first time really hearing an intuitive explanation of Black-Scholes also came from a friend who set out to write a similarly intuitive derivation for his undergraduate finance club. Nonetheless, since I’ve been reading through some stochastic calculus lecture notes (which I highly recommend to anyone who is familiar with some basic measure theory) recently, I figured that writing a post about one of the most famous applications of stochastic calculus would be a good way to test my own progress. The following is my attempt to regurgitate the intuitions I absorbed from watching my friend’s undergrad talks, now that I am more familiar with the technical machinery.

Let’s begin with the simple economic ideas that go into the formula. In particular, we begin by looking at how some seemingly uncontroversial assumptions will allow us to price assets in a simple world without risk. We will then turn to the price of options in stylized discrete time models. This will hint at the relevant extension to continuous time, at which point, I will introduce the main technical ideas that will allow us to carry out this intuition. This ordering will allow us to separate the technicalities from the economics, which hopefully will help better highlight the simple intuitions at play. Let us begin with everything in discrete time.

Assumption 1: There is a riskless, asset with perfectly inelastic supply/demand that investors can sell/buy at $1 and pays off amount r each period.

In practice this assumption is usually referring to a US Treasury bond, which is as closest we’ll get to the platonic ideal of “riskless” in the real world.

Assumption 2: There is no arbitrage – i.e. there is no way for an investor to make money in a risk free way that surpasses the rate of return on the asset from assumption 1.

Taken together, assumptions 1 & 2 already allow us to price any deterministic asset.

Exercise 3: Let some asset pays off x_t in each period. Show that the value of this asset must be given by \sum_{t=0}^\infty \frac{x_t}{(1 + r)^t} by showing that with any other price, there will be some way to violate assumption 2 by buying and selling some combination of this asset and the asset from assumption 1.

Let us now introduce some risk into this world. Imagine that there is an asset which can be purchased at price p at time t = 0. With probability \pi, this asset will be worth p + dp in period t = 1 and with probability 1 - \pi, it will be worth p - dp. Note that this does not violate any of the assumptions above since we are now moving to a world with risk, so assumption 2 is silent. Suppose now that an entrepreneur comes in and wants to sell the following product: at some price, you can buy a contract in period 0 such that in period 1, you have the option (but not the obligation) to buy the asset in question in period 1 at price K. This is a stylized options contract. Surprisingly, for a fixed value of K, we can determine the price of the option uniquely without any appeal to preferences. In particular, we will consider the strategy of delta hedging. The intuition will be that we wish to use the assets available to us to construct a completely deterministic asset, which, by the above exercise, we can price.

Let us begin by buying a single unit of the options contract. Then we have 2 states of the world. In the first state, the price of the asset exceeds the exercise price of the option, so we should exercise our right to buy and pocket the difference as profit, making p + dp - K. In the second state of the world, the exercise price of the option is higher than the price of the asset, so we should not exercise our option, so we make nothing. But now, let’s say we also sell x units of the underlying asset with the intention of buying back immediately in period 1. Then in state 1, we now make p + dp - K - x(p + dp) = (1-x) \cdot (p + dp) - K. In state 2, furthermore, we make - x\cdot (p - dp). By choosing x carefully, we can now make this asset completely deterministic. In particular, equating the payoffs in the 2 states of the world, we have

(1-x)\cdot (p + dp) - K = -x\cdot (p-dp) \implies p + dp - K = 2 \cdot x \cdot dp \implies x = \frac{p + dp - K}{2 dp}

Exercise 4: Compute the revenues in periods 0 and 1 from this strategy in terms of K. Use assumption 1 and exercise 3 to derive the price as a function of K.

Remark 5: Shockingly, the price of the option does not depend on the transition probabilities at all. This is simply a reflection of the fact that in the derivation above, the option can be used to create a risk free portfolio, so any dependence on the underlying stochastic process is hedged away.

The above model can obviously be extended to arbitrary periods and to arbitrary transition of the underlying price (we used an arithmetic random walk here, but importantly, the Black-Scholes model uses a geometric one). In particular, given a known probability law for the underlying asset, a European options contract (i.e. one where you can only exercise the option at the time of maturity) (K,T) where K,T, the price paid upon exercise (the “strike” price, and the number of periods until the option expires. In fact, to move to continuous time, intuitively, what we want to do is to extend the number of periods to infinity while shrinking the time increments the periods are supposed to represent down to 0. So we need to use calculus.

Mathematical Interlude: Stochastic Calculus Intuition.

Consider some process that moves around the real number line over time: y(t). One way to model this process is just to think of y as a function taking a real number to another real number. In this view, the whole trajectory of y can be viewed as a single object. However, modeling y in this way “forgets” the fact that y actually represents a process: it may be more physically intuitive to think about y as a trajectory. In other words, at each moment in time, y_{t+1} \approx y_t + \Delta y_t. This is an equivalent way to characterize y‘s trajectory, and in particular, we may write y_t \approx y_0  + \sum_{t=0}^{t-1} \Delta y_t. Taking the limit as the steps become small gives us the usual Reimann integral: y(t) = y(0) + \int_0^t \frac{\mathrm dy}{\mathrm dt}\,\mathrm dt. Sometimes, however, time itself is not the most natural way to model the medium through which y travels. Consider, for example, a y that is tracks some other process \alpha(t). Then yes, y moves through time, but only because \alpha does. In this case, it may be more natural to represent our process y as something looking like y = \int \frac{\mathrm dy}{\mathrm d\alpha}\,\mathrm d\alpha(t). Depending on your familiarity with real analysis, you may recognize this as essentially being the Stieltjes integral. It provides a slight generalization of the usual Reimann integral, but it can be shown that when \alpha is well behaved, it really is just another lens through which to view the same underlying phenomenon. For example, when \alpha is differentiable, we have

\int f(x) \,\mathrm d\alpha = \int f(x) \frac{\mathrm d\alpha}{\mathrm dt}\,\mathrm dt

I bring up this idea of the natural “medium” through which a process travels to motivate what stochastic calculus models. To begin with, we need to define what “stochastic process means. The actual definition is just a that it is a collection of random variables indexed by time. But more intuitively, this definition is really begging us to think of a stochastic process as a model of something that evolves through time in a way that features some randomness. Then given some stochastic process X_t, a major goal of stochastic calculus is to make sense of what an infinitesimal increment in this process, \mathrm dX_t means. Most relevant for what’s to come, Brownian motion turns out to be characterized by the fact that \Delta B_t \sim N(0, \Delta t) for small increments, and we would like to define a mathematical object to capture the limit of this characterization as \Delta t \to 0. Since the trajectory of stochastic processes can be quite non-smooth, it is hard to even define \mathrm dX_t on its own. Instead, the trick will be to study these processes by studying how they behave when we try to integrate with respect to them. In particular, it turns out to be much easier to make sense of the following formalism:

Y_t = Y_0 + \int \xi_t \,\mathrm dX_t

In fact, even the integral itself is too difficult a starting point in general. One way to proceed is to instead first restrict \xi_t to only be allowed to make jumps at a predetermined, finite set of points. In this case, labeling the points t_k, it will be sensible to write

\int \xi_t \,\mathrm dX_t = \sum_{k=1}^n (\xi_{t_k} - \xi_{t_{k-1}}) (X_{t_k} - X_{t_{k-1}})

Then, using some standard tricks from the typical construction of the Lebesgue integral, this definition can be extended to most functions we care about (this essentially amounts to showing that finer and finer approximations by these “elementary” integrals converge uniquely to a sensible value).

While this is an easy way to define \mathrm d X_t, it is not necessarily the best way to think about it conceptually. We really want to be thinking about \mathrm d X_t as a small change of a process over a small timescale. Fortunately, the formalism of the fundamental theorem of calculus allows us to express the above equation in a way closer in spirit to this dynamical view:

\mathrm dY_t = \xi_t \,\mathrm dX_t

If we forget for a moment that we’re working with random variables, the above looks a lot like \frac{\mathrm dY_t}{\mathrm dX_t} = \xi_t (and in fact, there are formal ways to make sense of this exact expression), so just like with a derivative, when we see differential equations of the above form, in our minds, we should think of \xi_t as a quantity that relates changes in some underlying stochastic process X_t to changes in another stochastic process, Y_t.

Much of the elementary theory of stochastic integration is fairly boring. By construction, it behaves like ordinary integration ought to: it is linear and satisfies similar limit theorems to the Lebesgue integral. However, where stochastic calculus gets interesting, and becomes a theory in its own right, is when we try to do things like change of variables or integration by parts. In particular, let’s look at that expression for the Stieltjes integral again:

\int f(x) \,\mathrm d\alpha = \int f(x) \frac{\mathrm d\alpha}{\mathrm dt}\,\mathrm dt

This is essentially the chain rule, and it reflects the fact that in ordinary calculus, processes we want to model can often be expected to be smooth. As a result, a first order approximation (the chain rule only looks at the first derivative of \alpha) is good enough to characterize how the integral behaves under nonlinear transformations.

However, stochastic processes cannot be expected to have the same regularity properties. Even the standard Brownian motion, one of the most well behaved commonly used stochastic processes, has the property that it is differentiable nowhere. In fact, even worse, within any interval, it will change its direction an infinite number of times.

Exercise 6: Use the following facts about Brownian motion to prove the infinite direction changes claim made above. For a standard Brownian motion, B_t,

  1. For any t > s B_t - B_s \sim \mathcal N(0,t-s) where I am using the notational convention for normal distributions \mathcal N(\mu, \sigma^2) (although this distinction doesn’t matter for the sake of this exercise)
  2. For any t_2 > t_1 > t_0, we have B_{t_2} - B_{t_1} is independent of B_{t_1} - B_{t_0}

(Hint: like with all most good things in probability theory, this one uses a Borel Cantelli lemma)


One of the great achievements of stochastic calculus is to show that in fact, taking a second order approximation is sufficient to smooth out the issues raised by the aforementioned irregularities. In particular, the following theorem is the stochastic calculus analogue of the chain rule:

Theorem 7 (Ito’s Lemma in one dimension for continuous processes): Let X_t be a continuous stochastic process and let f be a twice continuously differentiable function. Then

f(X_t) - f(X_0) = \int f'(X)\,\mathrm  d X_t + \frac12 \int f''(X)\,\mathrm  d X_t^2

Remark 8: The proof of the above formula is essentially just a Taylor expansion, and showing that an o(\varepsilon^2) approximation error is sufficiently small to not mess up the result in the limit. The main difference between this and the change of variables formula from ordinary calculus is that in ordinary integration, a first order Taylor expansion was enough because an o(\varepsilon) error term was enough to vanish in the limit of finer approximations. In stochastic calculus, the irregularity of stochastic processes means we need to make a better approximation, hence, we need to expand to second order.

Remark 9: The \mathrm d X_t^2 term is known as the quadratic variation and it is a measure of the degree of irregularity that the stochastic process has. It turns out that standard Brownian motion has a particularly nice expression for quadratic variation: \mathrm d B_t = \mathrm dt. Quadratic variation also has the nice property that \mathrm d (a X_t)^2 = a^2 \,\mathrm dX_t^2 for constant a.

Remark 10: Like mentioned earlier, we can turn any statement about integrals into a statement about differentials: in particular, in stochastic calculus, the following notation for expressing Ito’s lemma makes sense and is the more common way it is expressed:

\mathrm df(X_t) = f'(X_t) \,\mathrm dX_t + f''(X_t)\, \mathrm dX_t^2

Back to the Economics

With these basic tools from stochastic calculus, we will be able to derive the Black-Scholes equation. First, let us state the assumptions of the Black-Scholes model. The key substantive assumption is that we are going to impose a strong functional form on the dynamics of stock price:

Assumption 11: The underlying stock price S follows a geometric Brownian motion (this is the same as saying that log prices follow a Brownian motion with drift):

\frac{d S_t}{S_t} = \mu\, \mathrm dt + \sigma\,\mathrm dB_t

Here, \mu accounts for the fact that stock prices empirically grow at a steady rate in the long run, and the \sigma captures the variability of the stock price Multiplying a Brownian motion by \sigma is akin to multiplying a standard normal distribution by \sigma to get another normal distribution with variance \sigma^2, and adding the \mu \,\mathrm dt term is like adding a constant to a normal random variable to shift it. Therefore, in words, this functional form says that the log stock price grows at a constant rate \mu on average, and over any fixed increment, the distribution of returns is normal with variance proportional to the size of the increment. This turns out to be a decent but clearly wrong assumption in practice. We will also need the following technical assumptions

Assumption 12: Stocks and options are infinitesimally divisible, can be traded infinitesimally quickly, and there are no transaction costs.

We are now ready to proceed with the argument. Like in the stylized binary stock price world, we want to be able to put a price on a contract with strike price K and maturity T when the underlying price is S. Note that the dynamics of S are completely specified by the model, which is the starting point for the derivation. We begin by using a favorite trick of economists – writing down a value function before we know what it looks like. In particular, fixing K, let V(S, T) be the value of the option when the underlying stock’s price is S and the time to maturity is T. In order to use Ito’s lemma, we need to check that V is twice differentiable, which is a bit of a technical chore. For our purposes here, we will simply assume that V is twice differentiable, in which case, we have (dropping time subscripts for convenience)

\mathrm dV = \mathrm \mu \frac{\partial V}{\partial S} \,\mathrm dS + \frac12\sigma^2 \frac{\partial V}{\partial S} \,\mathrm dS^2

We now plug in our model for stock price to get

\mathrm dV = \left(\mu S \frac{\partial V}{\partial S} + \frac{\partial V}{\partial t} + \frac12 \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}\right)\,\mathrm dt + \sigma S\frac{\partial V}{\partial S} \,\mathrm dB

Looking at this expression, we see only a single stochastic term: the one depending on d B (a subtle point worth mentioning here is that S is a random variable, but at any given point in time, the current value of the stock is known to the investor, and thus, the investor treats this a given). All that remains is to derive the right dynamics for buying/selling the underlying stock to exactly cancel out the stochastic term. In particular, if we adopt a short position in the option, it will suffice to be long the underlying stock with \frac{\partial V}{\partial S} units (it should be clear from looking at the expression given by Ito’s formula why this is the position we want to take). In particular, letting the value of the whole portfolio now be:

\Pi = -V(S,T) + \frac{\partial V}{\partial S} S

and plugging in again and cancelling, it is a simple matter to show that

\mathrm d\Pi = -\left(\mu S \frac{\partial V}{\partial S} + \frac{\partial V}{\partial t} + \frac12 \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}\right)\,\mathrm dt

We’re now almost done. By the above arguments, we have just constructed a completely deterministic asset, and using assumptions 1 and 2, we can price it! In particular, the continuous time analogue of exercise 3 says that

\Pi r \,\mathrm dt = \mathrm d\Pi

But now, substituting the definition of \Pi and the derived value of \mathrm d\Pi and simplifying, we recover the Black-Scholes equation:

rV = r S\frac{\partial V}{\partial S} + \frac12 \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + \frac{\partial V}{\partial t}

So what does it all mean? One way to think about the Black-Scholes equation is that it allows us to decompose the value of the option into three components with intuitive interpretations:

\underbrace{rV}_{\text{Value of option}} = \underbrace{r S\frac{\partial V}{\partial S}}_{\text{Value from underlying}}\quad +\quad \underbrace{\frac12 \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}}_{\text{Value of hedging}}\quad +\quad \underbrace{\frac{\partial V}{\partial t}}_{\text{Continuation value}}

Essentially, then, the Black-Scholes equation is just the dynamic programming first order condition implied by the Black-Scholes model. This gives us an analytic form for the dynamics of the problem, but now we would like to turn these dynamics into a full solution for the value of the option. Combined with some reasonable boundary conditions motivated by economic intuition, this PDE can be solved exactly in some cases and numerically in others, although at this point, I sadly must admit that my grasp of differential equations is substantially weaker than my grasp of probability theory. The Wikipedia link above does carry the argument through to its completion, so I would recommend checking it out as well as the others.

It is worth mentioning that the above seems to me like a good example of how the process of mathematical modeling ought to flow:

  1. Have an intuition
  2. Translate that intuition into a precise language (math)
  3. Use the black box of that precise language to get implications of the intuition
  4. Look at the resulting outcomes to glean more intuition
  5. If the interpretation of 4 is insufficient, return to 3 and try some more, or change 2 to gain additional tractability.

In practice, sophisticated firms have since moved on from the normality (Brownian motion) assumption of Black-Scholes to more sophisticated models of stock prices that try to account for relaxations of the model like the overly fat tails of stock returns observed in the data. These models will typically not have closed form solutions and will require simulations to fully study. Despite the real world moving on, Black-Scholes still represents a major achievement. It’s an excellent example of how some simplifying assumptions (for tractability) and rigorous arguments can give intuitively useful ways to decompose a complex problem into something more manageable, even as the real world is full of additional technicalities.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s