Root-Finding Algorithms

2. Root-Finding Algorithms

≪ 1. MATLAB Crash Course | Table of Contents | 3. Analysing Order of Convergence and Steffensen's Method ≫

In class, you learned about several different methods of solving equations, either using a root-finding algorithm or a fixed point algorithm. These algorithms are essentially equivalent in the sense that they address the same kind of problem (numerically solving an equation), but they differ vastly in their applicability. Specifically, you should have learned about:

Fixed point iteration.
The bisection method (rootfinding) and the method of false position.
Newton’s method and the secant method

We’ll discuss the first and the third points, both from a theoretical and practical perspective, as well as some important concepts pertaining to rates and orders of convergence.

Inline Functions and Functions as Paremeters in MATLAB

Something I neglected to talk about during our discussion last week was the concept of inline functions or anonymous functions. These functions are declared on a single line without all the crazy syntax of creating a function.

The syntax for declaring an anonymous function is:

FUNCTION_NAME = @( INPUTS ) OUTPUT;

You are only allowed to have one output, and it must fit on a single line. For instance, if I wanted to create a function \(f\) that takes a number \(x\) as input and returns the value \(e ^{3x-\sin x}\), I would write

f = @(x) exp(3 * x - sin(x));

You can sort of cheat with the “only one output” thing and return a vector or matrix, but this is an easy way to make mistakes if you’re not careful about the order of the entries!

You can have multiple inputs, and these would correspond to functions of many variables. For instance, if I wanted to define the function \(g(x, y) = x^2 + \sin(xy)\), I could write

g = @(x, y) x^2 + sin(x * y);

Finally, you can pass functions as inputs to other functions (even other anonymous functions)! This is helpful when you want to run the same algorithm (e.g. fixed point iteration) with multiple different functions. For instance, the following function creates a graph of the input \(f\) on the domain \(\left[ x_1, x_2 \right]\) using \(N\) points.

1function plot_graph(f, x1, x2, N) 
2    x = linspace(x1, x2, N); 
3    y = f(x); 
4    plot(x, y); 
5end

Note that there are no outputs in this function, which is also something I neglected to mention last week. This is how MATLAB emulates void functions from other languages!

Warning 1.

Be very careful around using * vs. .* (likewise, ^ vs. .^ and / vs. ./) when you define inline functions. It’s common to apply a function to a vector or matrix as in the above example, and using the wrong operator can cause your code to break.

Fixed Point Iteration

Given an equation of the form \(f(x)=0\), we can equivalently solve \(g(x)=f(x)+x = x\). In words, finding a root of \(f(x)\) is the same as finding a fixed point of \(f(x)+x\). One way to do this is to guess a solution \(p_0\), then recursively define \[p _{n+1}= g\left( p_n \right)\] for \(n\geq 0\).

Theorem 2.

If \(g\) is continuously differentiable, \(x_*\) is a fixed point of \(g\), and \(\left\lvert g’\left( x_* \right) \right\rvert < 1\), then \(p_n\) converges to \(x_*\) as \(n\to\infty\) if \(p_0\) is sufficiently close to \(x_*\).

Let’s write some MATLAB code for this algorithm. For this discussion, we’ll be writing a function that takes (1) a function \(g\) to iterate, (2) a starting guess \(p_0\), and (3) some number of iterations \(N\) to perform. We will return a \(1\times N\) row vector containing the sequence \(p_1,\ldots, p_N\).

 1function sequence = fixed_point_iteration(g, p0, N) 
 2    sequence = zeros(1, N); 
 3
 4    % computes p_1 and store it in the first index 
 5    sequence(1, 1) = g(p0); 
 6
 7    for i = 2:N 
 8        sequence(1, i) = g(sequence(1, i - 1)); 
 9    end % of for loop 
10end % of function

Something to point out is that this method is not guaranteed to work if your function is somewhat nasty.

First, consider the equation \(\cos x = -x\), which has the solution \(x_* = -0.7390851332…\). Equivalently, we would be solving the equation \(\cos x + x = 0\). So, we could attempt to solve this by running fixed point iteration on the function \(g(x) = \cos x + 2x\).

Exercise 3.

Write MATLAB code that runs 50 iterations of the fixed point iteration method on the function \(g(x) = \cos x + 2x\) with the following initial guesses:

\(p_0 = 0\).
\(p_0 = 1\).
\(p_0 = -0.7\).
\(p_0 = -0.73909\).

What happens?

You should find that in all four cases, the sequence \(p_n\) blows up to \(\infty\). This makes us moderately depressed.

Of course, it’s better to run fixed point iteration on the equation \(-\cos x = x\). This does not have the convergence issues that the above example does. Although somewhat contrived, the point is that there are often many choices for which fixed point to look for, and this choice does matter a lot!

A sequence blowing up to \(\infty\) is not the only way that fixed point iteration can fail.

Exercise 4.

Consider the equation \[-1.47665369898x^2 + 1.66944251811x + 1.41421356237 = 0.\] Run 20 iterations of the fixed point iteration algorithm with \[g(x) = -1.47665369898x^2 + 2.66944251811x + 1.41421356237\] and \(p_0 = 0\). What happens?

In both of these examples, the primary issue is that the derivative at the fixed point is too big. Formally, this phenomenon can be described by the following exercise:

Exercise 5. (Section 2.2, Exercise 19)

Let \(g\) be continuously differentiable, and let \(x_*\) be a fixed point of \(g\). Suppose \(\left\lvert g’\left( x_* \right) \right\rvert > 1\). Show that for any \(x\) sufficiently close to \(x_*\), one has \(\left\lvert x - x_* \right\rvert < \left\lvert g(x) - x_* \right\rvert\). How does this show that fixed point iteration will generally fail under these assumptions?

Newton’s Method and the Secant Method

Let’s start with Newton’s method. Again, one wants to solve \(f(x) = 0\), and we need a starting guess \(p_0\). One can use the first-order Taylor approximation \[f(x) \approx f\left( p_0 \right)+f’\left( p_0 \right)\left( x-p_0 \right).\] Since we want \(f(x) = 0\), we end up with \[f\left( p_0 \right)+f’\left( p_0 \right)\left( x-p_0 \right) = 0 \implies x = p_0 - \frac{f\left( p_0 \right)}{f’\left( p_0 \right)}.\]
This is not an exact solution, but it will generally get us closer to where we started. Thus, the algorithm iterates this idea to construct the sequence \[p _{n+1} = p_n - \frac{f\left( p_n \right)}{f’\left( p_n \right)}.\] Unlike fixed point iteration, the barrier to convergence occurs when \(f’ \approx 0\) close to the solution. In fact, this is more or less the only possible way convergence can fail:

Theorem 6.

If \(f\) is \(C^2\) (twice continuously differentiable), \(f\left( x_* \right) = 0\), and \(f’\left( x_* \right)\neq 0\), then \(p_n\to x_*\) whenever \(p_0\) is sufficiently close to \(x_*\).

The weakness to this method is that one does not always know what \(f\) is in explicit form, and one may not be able to (efficiently) compute \(f’\). In this case, we would prefer to use the secant method. This requires two initial guesses \(p_0\) and \(p_1\), and it instead uses the recursive definition \[p _{n+2} = p _{n+1} - \frac{f\left( p _{n+1} \right)\left( p _{n+1}-p _n \right)}{f\left( p _{n+1} \right) - f \left( p_n \right)}.\] It’s rather difficult to pin down the rate of convergence of this method; generally, it’s slightly poorer than Newton’s method.

Exercise 7. (Section 2.3, Exercise 30)

The iteration method for the secant method can be expressed as \[p _{n+2} = \frac{f\left( p _{n+1} \right) p _{n} - f\left( p_n \right)p _{n+1}}{f \left( p _{n+1} \right) - f\left( p_n \right)}.\] Explain why this iteration equation is likely to be less accurate than the one given above in general.

You may have heard in lecture or read in the textbook that the secant method only needs one function evaluation per iteration while Newton’s method requires one function evaluation and one derivative evaluation. This matters because in some scenarios, the function and/or its derivative can be extremely costly or difficult to compute. For instance:

If \(f\) represents some difficult-to-measure quantity, like the depth of a crater on Mars.
If you have an expensive and/or limited resource and computing \(f\) drains said resource, such as measuring how people’s blood pressure responds to an expensive pharmaceutical treatment.
If \(f\) is computationally difficult to compute, e.g. \(x^\top A x\), where \(A\) is a \(10^8\times 10^8\) matrix and \(x\) is a row vector of dimension \(10^8\). (This may seem contrived, but such scales are not uncommon in many data-driven applications.)

You should keep these issues in mind when selecting which algorithm to use and when implementing these algorithms.

Order of Convergence and Big-O Notation

Our last topic today is order of convergence; closely related is big-O notation, which is a form of mathematical shorthand that more concisely conveys the statement, “\(f\) grows more slowly than \(g\)”.

Given two functions \(f\) and \(g\), we say \(f(x)\) is \(O\left( g(x) \right)\) as \(x\to\infty\) (pronounced “big-O of \(g\)”) if for all \(x\) large enough, there exists some constant \(C\) (independent of \(x\)) such that \(\left\lvert f(x) \right\rvert \leq C \left\lvert g(x) \right\rvert\). Likewise, we say \(f(x)\) is \(O\left( g(x) \right)\) as \(x\to x_0\) for some fixed quantity \(x_0\) if for all \(x\) sufficiently close to \(x_0\) there exists some constant \(C\) (independent of \(x\)) such that \(\left\lvert f(x) \right\rvert \leq C \left\lvert g(x) \right\rvert\).

Equivalently, \(f(x) = O\left( g(x) \right)\) if the quantity \(\frac{\left\lvert f(x) \right\rvert}{\left\lvert g(x) \right\rvert}\) stays bounded. Intuitively, this means that eventually, the graph of \(f\) will end up below the graph of \(g\) after rescaling.

Big-O notation is amazing for making mathematical arguments more concise, as you avoid the hassle of carrying around a bunch of constants that you don’t care about. However, there some common traps to avoid:

Exercise 8. Distributivity of Big-O over Division

Suppose \(f_1(x)=O\left( g_1(x) \right)\) and \(f_2(x) = O\left( g_2(x) \right)\). Assuming \(f_2(x)\neq 0\) for any \(x\), is it true that \[\frac{f_1(x)}{f_2(x)} = O\left( \frac{g_1(x)}{g_2(x)} \right) ? \] Prove your answer, or give a counterexample.

Exercise 9.

Suppose \(f(x) = O\left( g(x) \right)\). Is \(\frac{1}{f(x)} = O\left( \frac{1}{g(x)} \right)\)? Is \(\frac{1}{g(x)} = O\left( \frac{1}{f(x)} \right)\)?

Exercise 10. Transitivity of Big-O

Suppose \(f_1, f_2\) are functions such that \(f_1(x)=O\left( 1 \right)\) and \(f_2(x)=O\left( 1 \right)\). Is it true that \(f_1(x)=O\left( f_2(x) \right)\) or that \(f_2(x) = O\left( f_1(x) \right)\)? Prove it or give a counterexample.

Here is a chain of commonly seen big-O relationships that may be helpful. In each of the following, the big-O is taken as \(x\to\infty\).

For any real number \(\alpha > 0\), \(\log x = O\left( x^\alpha \right)\).
Whenever \(\alpha \leq \beta \), \(x ^{\alpha } = O \left( x ^ \beta \right)\).
For any real number \(b > 1\) and any real number \(\alpha > 0\), \(x^ \alpha = O \left( b ^{x} \right)\).
Whenever \(b_1\leq b_2\), \(b_1^x = O \left( b _{2}^{x} \right)\).
For any real number \(b\), \(b^x = O \left( x^x \right)\).

One way to describe this heirarchy is as follows: \[\textrm{constants} < \textrm{logarithms} < \textrm{polynomials} < \textrm{exponentials} < x^x.\]

Warning: this only applies when \(x\to\infty\)! These relationships are no longer true if \(x\) approaches something else (particularly, \(x\to 0\)).

Now let \(p_n\) be a sequence converging to \(p\). Recall that the rate of convergence of the sequence is a function \(f(x)\) such that \(\left\lvert p_n-p \right\rvert = O\left( f(n) \right)\). Typically \(\lim _{x\to\infty}f(x)=0\) in order for this to be meaningful. For instance, the sequence \(p_n=\frac{1}{n+\sin(n)}\) converges to \(p=0\) at a rate \(f(n)=\frac{1}{n}\).

It’s often not helpful to analyse the rate of convergence of rootfinding algorithms, as this requires advanced knowledge of the limit point/solution that’s often unavailable. Instead, one analyses how quickly the sequence \(\left\lvert p_n-p \right\rvert\) decays from one term to the next.

Definition 11.

A sequence \(p_n\) converging to \(p\) has order of convergence \(\alpha \) if \[\lim _{n\to\infty} \frac{\left\lvert p _{n+1}-p \right\rvert}{\left\lvert p_n -p\right\rvert ^ \alpha } < \infty.\]

Typically \(\alpha \) will be a whole number. One of the most important tools in analysing the rate or the order of convergence of an algorithm (such as the ones we’ve described today) is Taylor’s formula:

Theorem 12. Taylor's Formula

Suppose \(f\) is an \(n\)-times differentiable function. Then, for any fixed choice of \(x_0\), one has \[f(x) = \sum _{i=0}^{n-1} f ^{(i)} \left( x_0 \right) \cdot \frac{\left( x-x_0 \right)^i}{i!} + O\left( \left( x-x_0 \right)^n \right).\]

You should think about using this whenever you want to relate some sum or difference of a function’s values to its derivatives.

Exercise 13.

Let \(f(x)=e ^{x^2}\). Show that the sequence \[a_n = n^2 \cdot \left( f\left( 1-\frac{1}{n} \right) -2f(1) + f\left( 1+\frac{1}{n} \right) \right)\] converges to \(6e\) at a rate of \(\frac{1}{n}\).

Exercise 14.

Determine the order of convergence of the bisection method.

Exercise 15.

Let \(f\) be a twice continuously differentiable function, and suppose \(x_*\) is a root of \(f\). Show that Newton’s method has a quadratic (\(\alpha = 2\)) order of convergence if \(f’\left( x_* \right) \neq 0\).

Exercise 16. (Section 2.4, Exercise 13)

Let \(f\) a smooth function, and let \(x_*\) be a root of \(f\). Let \[g(x) = x - \frac{f(x)}{f’(x)} - \frac{f’’(x)}{2f’(x)} \cdot \left( \frac{f(x)}{f’(x)} \right)^2.\] Show that for \(p_0\) sufficiently close to \(x_*\), the sequence defined by \(p _{n+1}= g\left( p_n \right)\) converges to \(x_*\) with order \(\alpha = 3\).

You may use without proof that \(g’\left( x_* \right)=g’’\left( x_* \right)=0\).

Hunter Liu's Website

2. Root-Finding Algorithms

Inline Functions and Functions as Paremeters in MATLAB

Fixed Point Iteration

Newton’s Method and the Secant Method

Order of Convergence and Big-O Notation