Hunter Liu's Website

10. Week 9: On Convolutions and On Derivatives

≪ 9. Week 8: The Method of Uniform Approximations | Table of Contents

There are two small and mostly disjoint things I’d like to address today, as suggested by the title.

An Application of Convolutions

Recall the following setup: one has a “convolution kernel” or “mollifier” \(\varphi : \mathbb{R}\to \mathbb{R}\) satisfying the following properties:

One defines the sequence of rescalings \(\varphi _n(x) = n \varphi (nx)\), which preserves the nonnegativity and the integral, but now makes it so that \(\varphi _n\equiv 0\) on \(\mathbb{R}\setminus \left( -\frac{1}{n},\frac{1}{n} \right)\).

Then, as was proven in lecture (hopefully), one always has for any continuous function \(f : \mathbb{R}\to \mathbb{R}\) that the convolutions \(\varphi _n * f\to f\) uniformly on any compact subset of its domain. Moreover, if \(f\) is also uniformly continuous, then the convergence is uniform on \(\mathbb{R}\).

An anonymous student emailed me over the long weekend asking about why these convolutions are important at all. After all, they’re an extremely explicit and somewhat cumbersome method of producing uniform approximations, especially when we already know that polynomials are dense.

However, Weierstrass’ theorem only applies when domain is compact. Moreover, the explicit nature of these convolutions — and specifically the algebraic properties of the convolution operator — are what give this method an advantage. Here’s an example to demonstrate this:

Problem 1.

Suppose \(f:\mathbb{R}\to \mathbb{R}\) is continuous and satisfies \[f(x) = \frac{1}{2} \left( f(x+h) + f(x-h) \right)\] for any \(x, h\in \mathbb{R}\). Show that \(f\) is a line, i.e. \(f(x) = ax + b\) for some constants \(a, b\).

An idea is to say that if \(f\) is a polynomial, then this is obvious; then take a sequence of polynomials (which must be lines) uniformly converging to \(f\). However, when we uniformly approximate \(f\) by a polynomial, the functional equation dissolves into a fine dust, and our argument is halted in its tracks. But otherwise we’re kind of out of options, and I can’t think of any solutions that use only tools like the intermediate value theorem.

We have a little bit more to work with if we assume \(f\) is smooth: one can take a second order Taylor expansion of the right hand side and get \[f(x) = \frac{1}{2}\left( f(x) + f’(x) h + \frac{1}{2}f’’\left( \xi _+ \right) h^2 + f(x) - f’(x)h + \frac{1}{2} f’’\left( \xi _- \right) h^2 \right).\]

Here, we have \(x - h < \xi _- < x < \xi _+ < x + h\). Collecting terms and rearranging, we have \[f(x) = f(x) + \frac{h^2}{4} \left( f’’\left( \xi _+ \right) + f’’ \left( \xi _- \right) \right).\] Now if \(f’’\) is continuous and \(f’’(x)\neq 0\), for \(h\) sufficiently small, both \(f’’\left( \xi _- \right)\) and \(f’’\left( \xi _+ \right)\) will be the same sign. But then the last term is nonzero no matter what, and this is a contradiction.

It follows that if \(f\) is smooth (even just twice continuously differentiable), then \(f’’ \equiv 0\). By applying the fundamental theorem of calculus, it follows that \(f’\) is a constant and \(f\) is a line.

This is where convolution comes in: it allows us to make the assumption that \(f\) is smooth and also keep the functional equation in the problem. Indeed, if \(f_n = \varphi _n * f\), we know that \(f_n \to f\) uniformly on compact subsets of \(\mathbb{R}\), and by expanding the definition of convolution we have \[\begin{align*} f_n(x) &= \int \varphi _n(y) f(x-y) dy \\ &= \int \varphi _n(y) \cdot \frac{1}{2} \left( f(x+h-y) + f(x-h-y) \right) dy \\ &= \frac{1}{2} \left( \int \varphi _n(y) f(x+h-y) dy + \int \varphi _n(y) f(x-h-y) dy \right) \\ &= \frac{1}{2} \left( f_n(x+h) + f_n(x-h) \right). \end{align*} \]

This holds for any \(x, h\in \mathbb{R}\). Thus \(f_n\) is a line for every \(n\), and it follows that \(f\) must be a line too (this last step left as an exercise).

This is magic: not only do we get to approximate \(f\) by something smooth, but we also get to keep the functional equation that \(f\) satisfies! Contrast this with some of the examples we saw last week, where approximating by some non-explicit function causes us to lose some assumptions about \(f\).

Some remarks to make:

  1. If \(f : [0, 1] \to \mathbb{R}\), for instance, one has to be carefuly with what domain \(\varphi _n * f\) is even defined on. None of these are defined at the endpoints, in particular, and one has to either (a) just live with it or (b) carefully extend \(f\) to a slightly larger domain before convolving.
  2. This kind of argument is frequently used when studying differential equations because of identities like \[\frac{d}{dx} \left( \varphi _n * f(x) \right) =\left( \varphi _n * f’\right)(x) = \varphi _n’ * f(x).\] Additionally, convolutions behave well when interacting with the Fourier transform, though I don’t think I should say any more.

What does Continuously Differentiable Mean?

Let me start this section off with the following question: suppose \(X\) is a metric space and \(f : X\to \mathbb{R}^n\) is continuous (with respect to the standard Euclidean metric on \(\mathbb{R}^n\)). Let’s write \(f = \left( f_1, \ldots, f_n \right)\) in its component vectors. Is each component continuous? Conversely, if each component is continuous, is \(f\) itself continuous too?

Question 2.

Recall \(\ell^ 2\) is the space of all sequences \(\left\lbrace x_n \right\rbrace\) of real numbers such that \(\sum _{n=1}^{\infty} \left\lvert x_n \right\rvert^2 < \infty\). Suppose \( f : X\to \ell^2\), and let us write \(f\) as the sequence \(f_1, f_2, f_3, \ldots\).

If \(f\) is continuous, are each of the \(f_n\)’s continuous? If each of the \(f_n\)’s is continuous, is \(f\) continuous too?

For \(f : X\to \mathbb{R}^n\), both answers are yes, and there’s not really much to think about. But for \(f: X\to \ell^2\), only one direction is a yes. The converse is, quite sadly, false. Try thinking of a counterexample.

The reason is quite clear. When \(f :X \to \mathbb{R}^n\), \(f\) is continuous means that for every \(\epsilon > 0\), there exists a \(\delta > 0\) such that \(d_X(x, y) < \delta \) implies \[\left\lVert f(x) - f(y) \right\rVert_2 = \sqrt{\sum _{j=1}^{n} \left( f_j(x) - f_j(y) \right)^2} < \epsilon .\] Since everything under the squareroot is nonnegative, this implies that \(\left\lvert f_j(x) - f_j(y) \right\rvert < \epsilon \), i.e. that every component of \(f\) is continuous.

Conversely, if \(f_1,\ldots, f_n\) are all continuous, then there exist \(\delta _1, \ldots, \delta _n\) such that \(d_X\left( x, y \right) < \delta _j\) implies that \(\left\lvert f_j(x) - f-j(y) \right\rvert < \epsilon \). Taking \(\delta = \min \left\lbrace \delta _1, \ldots, \delta _n \right\rbrace\), we have \(d_X (x, y) < \delta \) implies \[\max _{j=1,\ldots, n} \left\lvert f_j(x) - f_j(y) \right\rvert < \epsilon \implies \left\lVert f(x) - f(y) \right\rVert _2 < \epsilon \sqrt n.\] Notice here, by the way, that the left side is itself a norm: it’s essentially a metric that “respects” scalars. Specifically, we have \(\left\lVert v \right\rVert_\infty = \max _{j=1,\ldots, n} \left\lvert v_j \right\rvert\) (where of course \(v_j\) are the components of a vector \(v\)).

From each norm we get a metric, e.g. \(d_\infty(v, w) = \left\lVert v-w \right\rVert_\infty\), and so we’ve shown above that continuity with respect to the Euclidean metric is the same thing as continuity with respect to the “uniform metric” on \(\mathbb{R}^n\). Evidently, this argument fails for \(\ell^2\).

Theorem 3. All norms on \(\mathbb{R}^n\) are equivalent

Let \(\left\lVert \cdot \right\rVert_a\) and \(\left\lVert \cdot \right\rVert_b\) be two norms on \(\mathbb{R}^n\). Let \(d_a(v, w) = \left\lVert v-w \right\rVert_a\) and likewise for \(d_b\) be the induced metrics on \(\mathbb{R}^n\). Then \(f : X\to \mathbb{R}^n\) is continuous with respect to \(d_a\) if and only if \(f\) is continuous with respect to \(d_b\).

Proof Idea
Let \(e_1,\ldots, e_n\) be the standard basis for \(\mathbb{R}^n\). Define the constants \[\begin{align*} C_1 = \max _{j=1,\ldots, n} \frac{\left\lVert e_j \right\rVert_a}{\left\lVert e_j \right\rVert_b} && \textrm{and} && C_2 = \max _{j=1,\ldots, n} \frac{\left\lVert e_j \right\rVert_b}{\left\lVert e_j \right\rVert_a}.\end{align*}\] Then show that \(\left\lVert v \right\rVert_a \leq C_2 \left\lVert v \right\rVert_b\) and \(\left\lVert v \right\rVert_b \leq C_1 \left\lVert v \right\rVert_a\). Conclude that the identity maps between metric spaces \[\begin{align*} \left( \mathbb{R}^n, d_a \right) \to \left( \mathbb{R}^n, d_b \right) && \textrm{and} && \left( \mathbb{R}^n, d_b \right)\to \left( \mathbb{R}^n, d_a \right)\end{align*}\] are continuous.

This is also true when \(\mathbb{R}^n\) is the domain rather than the codomain: think through what changes in our argument!

The point is, when we’re working with Euclidean space, there are a bunch of different ways to characterise continuity, and some notions of continuity are more suited to certain scenarios than others. This theorem says that it doesn’t matter what you mean by “continuous function into \(\mathbb{R}^n\)” or “continuous function on \(\mathbb{R}^n\)”, as long as your notion of distance is based on a norm, it’s all the same thing.

For us, this is particularly important when we talk about derivatives in several variables. Recall that given a function \(f : \mathbb{R}^n\to \mathbb{R}^m\) and a point \(x_0\in \mathbb{R}^n\), the derivative of \(f\) at \(x_0\) is the unique linear function \(L: \mathbb{R}^n\to \mathbb{R}^m\) satisfying \[\lim _{\left\lVert h \right\rVert\to 0} \frac{\left\lVert f\left( x_0 + h \right) - f\left( x_0 \right) - L h \right\rVert}{ \left\lVert h \right\rVert} = 0.\] We often denote \(L\) by \(f’\left( x_0 \right)\) or \(\left. Df\right\rvert_{x_0}\). (I will stick with the latter, since \(f’\) is kind of ambiguous.)

First and foremost, this coincides completely with our usual definition of the derivative when \(f: \mathbb{R}\to \mathbb{R}\): all the norm bars go away, and the linear function \(Lh\) is just multiplication of \(h\) by a constant (i.e. \(f’\left( x_0 \right)\)). Thus, \(f’\) too is a function \(\mathbb{R}\to \mathbb{R}\), and we know exactly what it means for \(f’\) to be continuous, no questions asked.

When \(f : \mathbb{R}^n\to \mathbb{R}^m\) istead, now the derivative is a linear transformation \(\left. Df\right\rvert_{x_0} : \mathbb{R}^n\to \mathbb{R}^m\), and we may think of it as an \(m\times n\) matrix of real numbers at every point. Thus really the derivative is a map \(Df : \mathbb{R}^n\to \mathbb{R} ^{m\times n}\).

This is not a good way to think about the derivative because it’s basis dependent. (It would be a great shame if the continuity of our derivative was basis dependent, for instance.) So instead, one often describes the codomain of \(Df\) as \(L \left( \mathbb{R}^n, \mathbb{R}^m \right)\) — the space of linear transformations \(\mathbb{R}^n\to \mathbb{R}^m\). This is a real vector space of dimension \(m\times n\), and we give it the operator norm \[\left\lVert A \right\rVert _{\operatorname{op}} = \sup \left\lbrace \left\lVert Ax \right\rVert_2 : \left\lVert x \right\rVert_2 = 1 \right\rbrace.\] One can check that this really is a norm, it’s basis-independent, and it works really well with the algebraic structure of \(L \left( \mathbb{R}^n, \mathbb{R}^m \right)\). In any proof involving the words “continuously differentiable” in this chapter, we will almost certainly use this norm to characterise continuity. It even fits right into the way we express differentiability in our definition…

However, can you imagine verifying that a function \(Df : \mathbb{R}^n\to L \left( \mathbb{R}^n, \mathbb{R}^m \right)\) is continuous with respect to this horrible metric? I think I would struggle to even compute the operator norm of a single matrix, and that’s on a good day.

Thus the purpose of our preceeding discussion surfaces. By selecting any bases of \(\mathbb{R}^n\) and \(\mathbb{R}^m\), we get a (basis-dependent) linear isomorphism \(L \left( \mathbb{R}^n, \mathbb{R}^m \right) \cong \mathbb{R} ^{m\times n}\), i.e. by representing each linear map by its matrix. We may then give \(\mathbb{R} ^{m\times n}\) — and therefore \(L \left( \mathbb{R}^n, \mathbb{R}^m \right)\) — either the standard Euclidean norm \(\left\lVert \cdot \right\rVert_2\) or the \(\infty\)-norm we described earlier. This induces a metric on \(L \left( \mathbb{R}^n, \mathbb{R}^m \right)\), where we say the distance between two linear maps is the distance between their matrices with respect to whatever distance.

In the standard bases of \(\mathbb{R}^n\) and \(\mathbb{R}^m\), the entries of \(Df\) are merely the partial derivatives of the components of \(f\). Thus, continuity of \(Df\) with respect to this mysterious operator norm is the exact same thing as the continuity of each partial derivative of \(f\).

Thus, if \(f\) is continuously differentiable (i.e. \(Df : \mathbb{R}^n\to L \left( \mathbb{R}^n, \mathbb{R}^m \right)\) is continuous), then every partial derivative of \(f\) is continuous too, and even every directional derivative of \(f\) is continuous. This seems intuitive and obvious, and perhaps one can furnish a direct proof of this with some time. But what’s really great is

Theorem 4.

Suppose \(f : \mathbb{R}^n\to \mathbb{R}^m\) has all continuous first-order partial derivatives. Then \(Df\) exists everywhere and \(f\) is continuously differentiable.

The hard part is the existence of \(Df\) — we’ve seen several examples of functions with all directional derivatives on its domain but no full derivative at a point. But once you get existence, you get for free that \(Df\) is continuous, simply because you know the entries of its matrix are continuous at every point.