Hunter Liu's Website

10. The Inverse Function Theorem and More

≪ 9. Multivariable Calculus Refresher | Table of Contents | 11. A Generalisation of the Implicit Function Theorem ≫

Last week, we looked at two different generalisations of the derivative to multiple dimensions. Although partial derivatives are convenient and familiar both conceptually and computationally, we ultimately decided that the Frechét derivative was the better way to go. Again, the intuition is that a differentiable function is a “locally linear” function in a quantitative sense.

More specifically, let F:RnRmF: \mathbb{R}^n\to \mathbb{R}^m be a function that’s differentiable at some x0Rnx_0\in \mathbb{R}^n. Then, its derivative DFx0\left. DF\right\rvert_{x_0} is a linear function from RnRm\mathbb{R}^n\to \mathbb{R}^m, and for hRnh\in \mathbb{R}^n sufficiently small, one has F(x0+h)F(x0)+DFx0(h).F\left( x_0 + h \right) \approx F\left( x_0 \right)+\left. DF\right\rvert_{x_0}(h). The difference between the two is negligble.

This numerical estimate can be extended in a more significant way. Recall the 11-dimensional change of variables formula. It says if u(x):[a,b][c,d]u(x):\left[ a,b \right]\to \left[ c,d \right] is sufficiently nice (C1C^1 with nonvanishing derivative is enough), then for any Riemann integrable f:[c,d]Rf:\left[ c,d \right]\to \mathbb{R}, one has cdf(t) dt=abf(u(x))u(x) dx.\int _{c}^{d}f(t)\ dt =\int _{a}^{b}f\left( u(x) \right)u’(x) \ dx. Morally speaking, this is because f(t)Δtf(u(x))u(x)Δx\sum f(t) \Delta t \approx \sum f\left( u(x) \right) u’(x) \Delta x when you partition the intervals in the right way. A box of width Δt\Delta t is distorted to a box of width approximately u(x)Δxu’(x) \Delta x. Draw a picture!

More broadly, you will prove in lecture that given two reasonable subsets U1,U2RnU_1, U_2\subseteq \mathbb{R}^n and a diffeomorphism φ:U1U2\varphi : U_1\to U_2, one has for any integrable function f:U2Rf: U_2\to \mathbb{R} that U2f(y) dV=U1f(φ(x))det(Dφx) dV.\int _{U_2} f(y)\ dV = \int _{U_1} f\left( \varphi(x) \right) \cdot \det \left(\left. D\varphi\right\rvert_{x}\right) \ dV. In principle, φ\varphi scales the volume of a small box containing xx by a factor of det(Dφx)\det \left( \left. D\varphi\right\rvert_{x} \right), in much the same way uu scaled the length of a small interval near xx by a factor of u(x)u’(x). One can say that uu and φ\varphi inherit the volume-scaling properties of their derivatives.

The next question to ask is, in what other ways does FF resemble its derivative? There are some qualitative questions that make sense, such as:

For the third question, it’s not immediately obvious what it means for the level sets of ff to be dd-dimensional, but there is some intuition for what dd-dimensional subsets (as opposed to subspaces) are. For instance, the circle {(x,y)R2:x2+y2=1}\left\lbrace (x, y)\in \mathbb{R}^2 : x^2+y^2=1 \right\rbrace appears to be a 11-dimensional object, while the sphere {(x,y,z)R3:x2+y2+z2=1}\left\lbrace (x,y,z)\in \mathbb{R}^3 : x^2+y^2+z^2=1 \right\rbrace appears to be 22-dimensional. We will revisit this question later.

The Inverse Function Theorem

Those of you that were here last quarter saw the inverse function theorem several times; these two problems provide a good look at which parts of the inverse function theorem hold when one drops the assumption of continuous derivatives. Let us state the theorem as most people know it:

Theorem 1. Inverse Function Theorem

Let URnU\subseteq \mathbb{R}^n be open, and let F:URnF:U\to \mathbb{R}^n be continuously differentiable on UU. Let x0Ux_0\in U such that DFx0\left. DF\right\rvert_{x_0} is nonsingular. Then, there exists an open neighbourhood VV of x0x_0 and an open neighbourhood WW of F(x0)F\left( x_0 \right) such that FF is a bijection VWV\to W, and its inverse is differentiable.

This is a somewhat complicated statement, but ultimately, this theorem boils down to: if a function locally looks invertible, it’s locally invertible.

I must point out that nonsingular means invertible. When n>1n>1, this is strictly different from having a nonzero derivative. In fact, there are many, many maps with singular but nonzero derivatives that are not locally invertible, such as F:R2R2F:\mathbb{R}^2\to \mathbb{R}^2 via (x,y)(x,0)\left( x, y \right)\mapsto \left( x,0 \right).

Besides this all-too-common mistake, let’s highlight a few pitfalls of this theorem:

Exercise 2.

Show that the function f(x)=x2+x2sin(1x)f(x)=\frac{x}{2}+x^2\sin\left( \frac{1}{x} \right) for x0x\neq 0 and f(0)=0f(0)=0 is differentiable at x=0x=0 and that f(0)=12f’(0)=\frac{1}{2}. Show that ff is not injective on any neighbourhood of x=0x=0.

Exercise 3.

Let F:RnRnF:\mathbb{R}^n\to \mathbb{R}^n be a continuously differentiable function such that DFx\left. DF\right\rvert_{x} is nonsingular for all xRnx\in \mathbb{R}^n. Show that FF is an open map: that is, if URnU\subseteq \mathbb{R}^n is open, then F(U)F(U) is open.

Challenge: show that this is still true if FF is differentiable but not necessarily continuously differentiable.

The Implicit Function Theorem

The inverse function addresses all three of the questions that were posed in a very special scenario, assuming continuous derivatives. Let’s return to the more general question of whether or not injectivity/surjectivity are “inherited” from a function’s derivatives.

One glaring issue with the inverse function theorem that I purposefully chose not to mention earlier is that the inverse function theorem only applies to functions whose domain and codomain have equal dimensions. Yet one might expect similar ideas to apply to the more general case. Suppose F:RnRmF:\mathbb{R}^n\to \mathbb{R}^m is differentiable at x0x_0.

These seem entirely plausible, especially if one assumes a continuous derivative. Although the inverse function theorem does not apply out-of-the-box, one can actually adapt the proof of the theorem to these two scenarios! I’ll leave this as an exercise for the comitted student.

By the rank-nullity theorem, if n<mn< m, it’s impossible for DFx0\left. DF\right\rvert_{x_0} to be surjective. Likewise, if n>mn> m, it’s impossible for DFx0\left. DF\right\rvert_{x_0} to be injective.

Remark 4.

It’s hard to believe that there could even be a surjective function RR2\mathbb{R}\to \mathbb{R}^2 or an injective function R2R\mathbb{R}^2\to \mathbb{R}. However, the two sets have the same cardinality, and it is possible to construct set-theoretic bijections between the two.

Okay sure, you say, it’s harder yet to believe that there could be a continuous surjection RR2\mathbb{R}\to \mathbb{R}^2 or a continuous injection R2R\mathbb{R}^2\to \mathbb{R}. Continuous maps have to retain some idea of dimensionality, right? It so turns out that there are ways to continuously and surjectively map [0,1][0,1]×[0,1]\left[ 0, 1 \right]\to \left[ 0, 1 \right]\times \left[ 0,1 \right] via space-filling curves. That is, one can continuously and surjectively map low-dimensional spaces onto high-dimensional spaces. I am unsure if one can continuously and injectively go the other way.

The answer for both of the questions posed prior is yes. We are not equipped to prove the answer, but we can look at a very specific scenario that we are well-equipped to handle.

Theorem 5. Implicit Function Theorem

Let F:RnRF:\mathbb{R}^n\to \mathbb{R} be a continuously differentiable function. Suppose v=(v1,,vn)Rn\vec v = \left( v_1,\ldots, v_n \right)\in \mathbb{R}^n such that F(v)=0F\left( \vec v \right) = 0 and Fxn(v)0\frac{\partial F}{\partial x_n}\left( \vec v \right)\neq 0. Then, there is an open subset URn1U\subseteq \mathbb{R} ^{n-1} containing (v1,,vn1)\left( v_1,\ldots, v _{n-1} \right) and a continuously differentiable function f:URf: U\to \mathbb{R} such that f(v1,,vn1)=vnf\left( v_1,\ldots, v _{n-1} \right) = v_n and F(x1,,xn1,f(x1,,xn1))=0F\left( x_1,\ldots, x _{n-1}, f\left( x_1,\ldots, x _{n-1} \right) \right) = 0 for all (x1,,xn1)U\left(x_1,\ldots, x _{n-1}\right)\in U.

In words, what this theorem says is that F1(0)F ^{-1}(0) looks like the graph of a function f:Rn1Rf : \mathbb{R} ^{n-1}\to \mathbb{R} near v\vec v. This graph should be an n1n-1-dimensional object!

Based on the title of today’s discussion, it seems as though this is a consequence of the inverse function theorem. Indeed it is, but we need to circumvent the dimensional mismatch problem from earlier.

Proof

Define the function G:RnRnG: \mathbb{R} ^{n}\to \mathbb{R} ^{n} as G(x)=(x1,,xn1,F(x)).G \left( \vec x \right) = \left( x_1,\ldots, x _{n-1}, F\left( \vec x \right) \right). GG is continuously differentiable — it has continuous first-order partial derivatives in every component. Thus, with v\vec v as above, one has DGv=(100Fx1010Fx2001Fxn1000Fxn)\left. DG\right\rvert_{\vec v} = \begin{pmatrix} 1 & 0 & \cdots & 0 & \frac{\partial F}{\partial x_1} \\ 0 & 1 & \cdots & 0 & \frac{\partial F}{\partial x_2} \\ \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & \frac{\partial F}{ \partial x _{n-1}} \\ 0 & 0 & \cdots & 0 & \frac{\partial F}{\partial x_n} \end{pmatrix} is nonsingular (it is upper triangular, and all of its diagonal entries are nonzero). So, by the inverse function theorem, it must have a continuously differentiable inverse, say HH, defined on a neighbourhood of G(v)G \left( \vec v \right).

Write H=(h1,,hn)H = \left( h_1,\ldots, h_n \right), where each hih_i is defined on a neighbourhood of G(v)G\left(\vec v\right). By using the fact that GHG\circ H is the identity, we actually get that h1(x)=x1h_1\left( \vec x \right) = x_1, and likewise for h2,,hn1h_2,\ldots, h _{n-1}. In other words, H(x)=(x1,,xn1,hn(x))H\left( \vec x \right) = \left( x_1,\ldots, x _{n-1}, h_n \left( \vec x \right) \right).

Since (v1,,vn1,0)G(v)\left( v_1,\ldots, v _{n-1}, 0 \right)\in G \left( \vec v \right), as long as (x1,,xn1)\left( x_1,\ldots, x _{n-1} \right) are close enough to (v1,,vn1)\left(v_1,\ldots, v _{n-1}\right), hn(x1,,xn1,0)h_n \left( x_1,\ldots, x _{n-1}, 0 \right) will be defined. Moreover, one has by construction that GH(x1,,xn1,0)=(x1,,xn1,0).G\circ H\left( x_1,\ldots, x _{n-1}, 0 \right) = \left( x_1,\ldots, x _{n-1}, 0 \right). Unpacking the last coordinate of GHG\circ H yields F(x1,,xn1,hn(x1,,xn1,0))=0.F\left( x_1,\ldots, x _{n-1}, h_n \left( x_1,\ldots, x _{n-1}, 0 \right) \right) = 0. This is the desired result! \square

It should be added that one cannot weaken any of these conditions, especially the condition that the nn-th partial is nonzero. Draw a picture to see what could happen when the nn-th partial is zero!

Bonus: The Rank Theorem

I don’t think I’ll have time to talk about this during discussion, but the rank theorem is a big part of the picture that gives a unified answer to the question: to what extent does a function resemble its derivative?

The answer is, given continuity of the derivative, the resemblance is extremely strong. If F:RnRmF:\mathbb{R}^n\to \mathbb{R}^m is continuously differentiable in a neighbourhood UU around x0\vec x_0 and DFDF has rank rr on this neighbourhood, then there is a change of coordinates such that FF looks like DFDF.

Specifically, there is an open neighbourhood VV of x0x_0, WW of F(x0)F\left( x_0 \right), open neighbourhoods VRnV’\subseteq \mathbb{R}^n and WRmW’\subseteq \mathbb{R}^m both containing the origin, and C1C^1 functions ϕ:VV\phi : V\to V’, ψ:WW\psi : W\to W’ with C1C^1 inverses such that ψFϕ1:VVWW\psi \circ F\circ \phi ^{-1}: V’ \to V \to W \to W’ is given by (x1,,xn)(x1,,xr,0,,0)\left( x_1,\ldots, x_n \right) \mapsto \left( x_1,\ldots, x_r, 0,\ldots, 0 \right) (where the number of zeroes, possibly none, is chosen to match the dimension mm).

In words, what this is saying is that you can move the origin in Rn\mathbb{R}^n to x0\vec x_0 and the origin in Rm\mathbb{R}^m to F(x0)F\left( \vec x_0 \right), then wiggle the standard coordinate axes so that FF does look like a rank rr linear map.

This strictly generalises both the inverse function theorem and the implicit function theorem; in fact, it relies on the technique of padding FF in such a way that the problem gets reduced to analysing a map RkRk\mathbb{R}^k\to \mathbb{R}^k with nonsingular derivative.

Indeed, the moral of this story is: continuously differentiable functions behave locally just like their derivatives!