5.2 Newton Method with multiple variables#

Problem Definition:

We want to find a vector \(\mathbf{x} = (x_1, \dots, x_n)\), given on a continuous vector-valued function \(\mathbf{f}= (f_1, \dots f_n)\) mapping from \(\mathbb{R}^n\) into \(\mathbb{R}^n\), such that \( \mathbf{f}(\mathbf{x}) = \mathbf{0} \).

The solution \(\mathbf{x}\) is called the root of the vector-valued function.

Linear Model: We can use the first Taylor approximation (tangent) at the point \(\mathbf{x}_n\) and use the Newton method to find the root of the function with multiple variables:

\[ \mathbf{f}(\mathbf{x}+\mathbf{h}) = \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x})\mathbf{h} + O(\| \mathbf{h} \|^2), \]

where \(\mathbf{J}(\mathbf{x})\) is the Jacobian matrix of \(\mathbf{f}\) at \(\mathbf{x}\):

\[\begin{split} \begin{split}\mathbf{J}(\mathbf{x}) = \begin{bmatrix} \rule[2mm]{0pt}{1em}\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\[2mm] \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\[1mm] \vdots & \vdots & & \vdots\\[1mm] \rule[-3mm]{0pt}{1em} \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix} = \left[ \frac{\partial f_i}{\partial x_j} \right]_{\,i,j=1,\ldots,n}.\end{split} \end{split}\]

We set \(\mathbf{f}(\mathbf{x}_{n+1}) = \mathbf{0}\) and solved for \(\mathbf{x}\):

\[\begin{split} \begin{split}\begin{split} \mathbf{0} &= \mathbf{f}(\mathbf{x}_n) + \mathbf{J}(\mathbf{x}_n)\mathbf{h}\\ \mathbf{h} &= -\mathbf{J}(\mathbf{x}_n)^{-1}\mathbf{f}(\mathbf{x}_n)\\ \mathbf{x}_{n+1} &= \mathbf{x}_n - \mathbf{J}(\mathbf{x}_n)^{-1}\mathbf{f}(\mathbf{x}_n). \end{split}\end{split} \end{split}\]

The calculation rule for the next \(\mathbf{x}\) which is closer to the root is thus iteratively called over and over again. The calculation rule according to the Newton method is therefore:

\[ \mathbf{x}_0 = \text{start value} \]
\[ \mathbf{x}_{1} = \mathbf{x}_0 - \mathbf{J}(\mathbf{x}_0)^{-1}\mathbf{f}(\mathbf{x}_0) \]
\[ \mathbf{x}_{2} = \mathbf{x}_{1} - \mathbf{J}(\mathbf{x}_{1})^{-1}\mathbf{f}(\mathbf{x}_{1}) \]
\[ ... \]
\[ \mathbf{x}_{n+1} = \mathbf{x}_n - \mathbf{J}(\mathbf{x}_n)^{-1}\mathbf{f}(\mathbf{x}_n) \]

We do this until we have reached the desired accuracy, i.e., the desired distance of \(\mathbf{f}(\mathbf{x}_{n+1})\) to \(\mathbf{0}\).

Summary: The Newton method is an iterative method for finding the root of a function with multiple variables. The calculation rule for the next \(\mathbf{x}\) which is closer to the root is thus iteratively called over and over again. When look at the calculation rule of the Newton method:

\[\mathbf{x}_{n+1} = \mathbf{x}_n - \mathbf{J}(\mathbf{x}_n)^{-1}\mathbf{f}(\mathbf{x}_n)\]

we see that we need to calculate the function \(\mathbf{f}(\mathbf{x})\), the derivative of the function (the Jacobian \(\mathbf{J}(\mathbf{x})\)) and the inverse of the Jacobian matrix \(\mathbf{J}(\mathbf{x})^{-1}\).

Revisiting Derivatives and Linear Equations#

In Exercise 5. we have already seen how we can estimate the Jacobian of a function \(\mathbf{f}(\mathbf{x})\). We can use forward diff to estimate the Jacobian of a function \(\mathbf{f}(\mathbf{x})\) at a point \(\mathbf{x}\):

import Pkg 
Pkg.instantiate()
using ForwardDiff

f(x) = [x[1]^2 + x[2]^2 - 1, x[1] - x[2]^2]
q = [1.0, 1.0]
J_fd = ForwardDiff.jacobian(f, q)
2×2 Matrix{Float64}:
 2.0   2.0
 1.0  -2.0

alternatively we can use reverse diff to estimate the Jacobian of a function \(\mathbf{f}(\mathbf{x})\) at a point \(\mathbf{x}\):

using ReverseDiff

J_rd = ReverseDiff.jacobian(f, q)
2×2 Matrix{Float64}:
 2.0   2.0
 1.0  -2.0

In the last exercise we have also seen how we can solve a linear equation system:

A = rand(3,3)
b = rand(3)
@show x = A\b # Simple version of solving a linear system
x = A \ b = [-0.6602871471263836, 1.4279595665173477, 0.7267683110023274]
3-element Vector{Float64}:
 -0.6602871471263836
  1.4279595665173477
  0.7267683110023274

A more stable version of solving a linear system based on LU factorization:

using LinearAlgebra
L,U,p = lu(A)
x = U\(L\b[p]) 
3-element Vector{Float64}:
 -0.6602871471263836
  1.4279595665173477
  0.7267683110023274

and how to calculate the inverse of a matrix using the inv function:

inv(A) 
3×3 Matrix{Float64}:
  1.05317    0.65815   -1.7307
 -4.53311    2.87165    4.28247
  0.516812  -0.575288   0.960754

Newton Method in Julia#

Now we already learnt that the inverse is not a good idea to use to solve the linear equation systems. Instead we can use the LU factorization to solve the linear equation system. We can first use the lu function to calculate the LU factorization of the Jacobian matrix and then use forward and backward substitution to solve \(\mathbf{J}(\mathbf{x}_n)^{-1}\mathbf{f}(\mathbf{x}_n)\) instead of calculating the inverse of the Jacobian matrix. We need to calculate the function \(\mathbf{f}(\mathbf{x})\) and the Jacobian matrix \(\mathbf{J}(\mathbf{x}_n)\) in each iteration, since the Jacobian matrix depends on the point \(\mathbf{x}_n\).

First we define the function J(x) which calculates the Jacobian. We can use the ForwardDiff.jl package to calculate the Jacobian matrix of a function \(\mathbf{f}(\mathbf{x})\) at a point \(\mathbf{x}\):

J(x) = ForwardDiff.jacobian(f, x)
J (generic function with 1 method)

Now we define a function newton(f, J, x) which implements the Newton method. Where f is the function \(\mathbf{f}(\mathbf{x})\), J is the function J(x) which calculates the Jacobian matrix \(\mathbf{J}(\mathbf{x})\) and x is the start value \(\mathbf{x}_0\). The function newton returns the root of the function \(\mathbf{f}(\mathbf{x})\). The function iteratively calls the calculation rule of the Newton method using the LU factorization until the desired accuracy is reached. The desired accuracy is defined by the parameter tolerance which is set to 1e-10 by default. The maximum number of iterations is defined by the parameter maxiter which is set to 1000 by default.

function newton(f, J, x, solver)
   h = Inf64
   tolerance = 10^(-10)
   iter = 0
   while (norm(h) > tolerance)
      iter += 1
      if iter > 1000
         error("Too many iterations")
      end
      h = solver(J(x), f(x))
      x = x - h
   end
   return x
end
newton (generic function with 1 method)
function LU_solver(J_x, f_x)
    L, U, p = lu(J_x)
    h = U \ (L \ f_x[p])
    return h
end
LU_solver (generic function with 1 method)
function invers_solver(A, b)
    A_inv = inv(A)
    return A_inv * b
end
invers_solver (generic function with 1 method)
function simple_solver(A, b)
    return A \ b
end
simple_solver (generic function with 1 method)

Let’s test the Newton method for the function \(\mathbf{f}(\mathbf{x})\):

\[\begin{split} \mathbf{f}(\mathbf{x}) = \begin{bmatrix} \rule[2mm]{0pt}{1em}x_1^2 + x_2^2 - 1\\[2mm] x_1 - x_2 \end{bmatrix} \end{split}\]

Try to find the two roots of the function \(\mathbf{f}(\mathbf{x})\) by using the Newton method:

f(x) = [x[1]^2 + x[2]^2 - 1, x[1] - x[2]^2]
x = [0.6, -1]
newton(f, J, x, LU_solver)
2-element Vector{Float64}:
  0.6180339887498949
 -0.7861513777574234
x = [0.6, 1]
newton(f, J, x, LU_solver)
2-element Vector{Float64}:
 0.6180339887498949
 0.7861513777574234
x = [0.6, -1]
newton(f, J, x, invers_solver) # This gives the same result as LU_solver
2-element Vector{Float64}:
  0.6180339887498949
 -0.7861513777574234
x = [0.6, -1]
newton(f, J, x, simple_solver) # This gives the same result as LU_solver
2-element Vector{Float64}:
  0.6180339887498949
 -0.7861513777574234

Let’s try to find the two roots of another set of functions \(\mathbf{f}(\mathbf{x})\):

\[\begin{split} \mathbf{f}(\mathbf{x}) = \begin{bmatrix} \rule[2mm]{0pt}{1em}(1-x_1)^2 + (2-x_2)^2 - 5^2\\[2mm] (6-x_1)^2 + (1-x_2)^2 - 6.2^2\\[2mm] \end{bmatrix} \end{split}\]
f(x) = [(1-x[1])^2 + (2-x[2])^2 - 5^2, (6-x[1])^2 + (1-x[2])^2 - 6.2^2]
x = [1, -2]
newton(f, J, x, LU_solver)
2-element Vector{Float64}:
  1.2573252032868072
 -2.993373983565962
x = [3, 6]
newton(f, J, x, LU_solver)
2-element Vector{Float64}:
 3.158059412097807
 6.510297060489039

Iterative solvers#

In the lecture we looked at the Jacobi and Gauss-Seidel method to solve a linear equation system. Many Iterative solvers for linear equations are provided by the iterative solvers package: https://iterativesolvers.julialinearalgebra.org/dev/

Let’s try some of them out:

Pkg.add("IterativeSolvers")
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.9/Project.toml`
  No Changes to `~/.julia/environments/v1.9/Manifest.toml`
using IterativeSolvers
A = rand(3, 3)
b = rand(3)
x = A \ b
3-element Vector{Float64}:
 -0.6845587524328699
  2.1862727242460775
 -0.8028715135646114

Check the documentation (https://iterativesolvers.julialinearalgebra.org/dev/) and try the idr(s) and the BiCGStab(l) solver. What do you observe?

x = zeros(3)
idrs!(x, A, b)
3-element Vector{Float64}:
 -0.684558752432891
  2.186272724246137
 -0.8028715135646396
x = zeros(3)
bicgstabl!(x, A, b)
3-element Vector{Float64}:
  0.5360236922568408
 -0.4379751864680906
  0.23566987725944047