An Addendum on Compactness

An important result in analysis is the Arzelà–Ascoli theorem. It provides a fairly simple way to check compactness in the space $C(X)$ of continuous functions on compact Hausdorff spaces $X$ where $C(X)$ is endowed with the uniform norm $||f|| = \sup_{x\in X} f(x)$ . It can be used, for example, in the Peano existence theorem for ordinary differential equations. More relevant to the econometrician, Arzelà–Ascoli can be used to prove the Rellich-Kondrachov theorem in Soblev space theory, which is an important part of the approach taken by Newey and Powell (2003) building off of Gallant and Nychka (1987) to show consistency of nonparametric 2SLS estimators under certain regularity conditions (in particular, one of the key technical difficulties in the linked paper is to ensure that the metric assumed in Theorem 4.2 makes a certain set compact).

As with the last post, I will restrict myself to metric spaces $X$ , which suffices for most applications. In order to state the theorem, I will first need the following definition:

Definition 1: A family of functions $\{f_\alpha\}$ is equicontinuous if $\forall \varepsilon$ , $\exists \delta$ such that for all $\alpha$ , $||x-y|| < \delta \implies |f_\alpha(x)-f_\alpha(y)| < \varepsilon$ .

It is clear that this definition is a direct generalization of uniform continuity for a single function and as was a theme in the previous post, this definition allows us to control how $f_\alpha(x)$ “moves” as we vary $x$ in a way that does not depend on $\alpha$ . It is unsurprising, then, that this definition features prominently in a theorem about compactness:

Theorem 2: Consider $\{f_\alpha\} \subset C(X)$ . Then the following conditions hold if and only if for any sequence $f_1,f_2,\ldots \in \{f_\alpha\}$ , there exists a subsequence that converges in $C(X)$ :

$\{f_\alpha\}$ is equicontinuous
$\forall x\in X$ , $\sup_\alpha |f_\alpha(x)| \leq M$ for some $M < \infty$ (pointwise bounded).

Note that this is “compactness” in the sense that $\{f_\alpha\}$ almost satisfies the definition of sequential compactness. The only difference is that the limit of the subsequence does not have to lie in $C(X)$ . But clearly, if we take the closure of $\{f_\alpha\}$ , we recover compactness. This motivates the following definition:

Definition 3: A set $K \subset X$ is pre-compact if its closure is compact.

Exercise 4: Show that the following are equivalent:

$K$ is pre-compact
$K$ is sequentially pre-compact: for any sequence $f_1,f_2,\ldots \in \{f_\alpha\}$ , there exists a subsequence that converges in $C(X)$ .
$K$ is totally bounded

As we will see, the connection between this theorem and compactness is so close that the style of proof will be almost identical. I will be begin by showing the “easy” direction of Arzelà–Ascoli.

Proof of “only if”: It is clear that pointwise boundedness is necessary since otherwise, we can find some $x$ and $f_1,f_2,\ldots$ such that $f_n(x) > n$ , which clearly does not have a convergent subsequence. It will thus suffice to show the necessity of equicontinuity. To that end, suppose a set is not equicontinuous. We will proceed by constructing a non-convergent subsequence (does this look familiar?). By assumption, for some $\varepsilon$ , for any $n$ , we can find a function $f_n$ and points $x_n, y_n$ such that $||x_n - y_n|| < \frac1n$ but $|f_n(x_n) - f_n(y_n)| > \varepsilon$ . We will pick a subsequence as follows. Pick $f_{n_1} = f_1$ . Suppose we have picked $k-1$ functions. Since continuous functions on compact sets are uniformly continuous, these first $k-1$ functions as a set are in fact equicontinuous, hence $\exists \delta$ such that $||x - y|| < \delta$ but $|f_{n_i}(x) - f_{n_i}(y)| < \frac{\varepsilon}{2}$ for some $a_k$ to be picked later. Then pick $n_k$ to be the smallest integer such that $\frac1{n_k} < \delta$ and let $x_{n_k},y_{n_k}$ be described as above. By the triangle inequality, we have

$|f_{n_k}(x_{n_k}) - f_{n_k}(y_{n_k})| \leq$

$|f_{n_k}(x_{n_k}) - f_{n_i}(x_{n_k})| + |f_{n_i}(x_{n_k}) - f_{n_i}(y_{n_k})| + |f_{n_k}(y_{n_k}) - f_{n_i}(y_{n_k})|$

for all $i < k$ . Rearranging and using the equicontinuity of the first $k-1$ functions and our choice of $n_k$ , we have

$|f_{n_k}(x_{n_k}) - f_{n_i}(x_{n_k})| + |f_{n_k}(y_{n_k}) - f_{n_i}(y_{n_k})| \geq |f_{n_k}(x_{n_k}) - f_{n_k}(y_{n_k})| - \frac{\varepsilon}{2} \geq \varepsilon - \frac{\varepsilon}{2}$

which by a pigeonhole principle argument implies

$\max\{|f_{n_k}(x_{n_k}) - f_{n_i}(x_{n_k})|,|f_{n_k}(y_{n_k}) - f_{n_i}(y_{n_k})|\} \geq \frac{\varepsilon}{4}$

But this implies that each $f_{n_k}$ is distance greater than $\frac{\varepsilon}4$ from the previous functions in the uniform norm. Then this subsequence is not Cauchy and hence cannot be convergent.

This should all look a bit like a repeat of the last post. It may come at no surprise then that the standard proof of the “if” direction uses a diagonalization argument:

Proof of “if” (version 1): Let $\{f_\alpha\}$ be uniformly bounded and equicontinuous, pick a sequence $f_1,f_2,\ldots$ . Take $\varepsilon = 1$ . By equicontiunity, pick $\delta$ such that $||x - y|| < \delta \implies |f_n(x) - f_n(y)| < \frac{\varepsilon}2$ . By compactness of $X$ , using total boundedness, pick $x_1,\ldots,x_n$ such that $\forall x \in X$ , $\exists 1 \leq i \leq n$ such that $||x - x_i|| < \delta \implies |f(x) - f(x_i)| < \frac{\varepsilon}2$ . By pointwise boundedness, we can pick a subsequence such that $f_{n_{1k}}(x_i)$ converges as $k \to \infty$ for all $x_i$ . Now, having taken the $(n-1)^{th}$ such subsequence, it is clear that you can construct the $n^{th}$ such subsequence with $\varepsilon = \frac{1}{n}$ at each stage.

Exercise 5: Use the same diagonalization trick from last time to complete the proof.

After a bit of thought, I’m not a huge fan of this proof. The diagonalization argument is already employed in proving that sequential compactness is equivalent to closedness and total boundedness in metric spaces. To employ this trick again appears somewhat redundant, especially in light of the fact that we’re still essentially just talking about equivalent characterizations of compactness; why do the hard work twice? Here, I prove the “if” direction without recourse to the diagonalization argument by hiding it behind the black box of the equivalence between sequential pre-compactness and total boundedness. I believe that doing so highlights insights about the result that the usual proof obscures.

Proof of “if” (version 2): As discussed, it suffices to to show that $\{f_\alpha\}$ is totally bounded. Fix $\varepsilon$ . By equicontinuity and compactness of $X$ ,, $\exists \delta$ and finite $n$ with $x_1,\ldots,x_n \in X$ such every $x\in X$ has $|f(x) - f(x_i)| < \frac{\varepsilon}{2}$ for some $1 \leq i \leq n$ . It therefore suffices to show that the set

$K = \{y \in \mathbb R^d: \exists \alpha \text{ s.t. } y = (f_\alpha(x_1), f_\alpha(x_2),\ldots,f_\alpha(x_n))\}$

is totally bounded. But by pointwise boundedness, $\exists M < \infty$ such that
$K \subset [-M,M]^d$
and hence $K$ is totally bounded, since a bounded subset of Euclidean space is totally bounded.

Aside from giving a slightly more elegant proof, what did we gain? Let $X = \{1,2,\ldots,n\}$ . Then $C(X)$ (where $X$ has the discrete topology metrized by the discrete metric) can be identified with $\mathbb R^n$ . Furthermore, any subset of $C(X)$ is equicontinuous in the discrete metric. Thus, Arzelà–Ascoli in light of the above proof is really nothing more than a generalization of the Heine-Borel theorem. The pointwise boundedness condition is a generalization of the boundedness of Heine-Borel, and the equicontinuity can be thought of as a statement that the set of functions can be arbitrarily well approximated by the value of these functions on a finite set of points. Going in the other direction in terms of generality, we can view Arzelà–Ascoli as a special case of Tychonoff’s theorem. In particular, the boundedness means that $\{f_\alpha\}$ can be identified with a subset of the product topology $\prod_{x \in X} [M_{-x},M_x]$ , and equicontinuity is the technical condition required to upgrade convergence in the product topology to convergence in the uniform topology on $C(X)$ when restricted to this subset. We therefore have this nice nesting of conceptually related results in increasing generality: