Laurent Hoeltgen

Arnoldi Iterations

2021-07-17, update: 2021-08-18 ·  10 min read  ·  Mathematics

How to compute eigenvalues and eigenvectors

This is the second post in my series on Krylov subspaces. The first post is here and the third one is here.

Arnoldi Iterations

Arnoldi iterations is an algorithm to find eigenvalues and eigenvectors of general matrices. The algorithm constructs orthonormal bases of suitable Krylov subspaces and extracts the eigendata from the basis representation. The method is named after the American engineer Walter Edwin Arnoldi and was published in 1951 [1]. Arnoldi bases his results on the findings of Lanczos [2]. We will present the Lanczos method in a later post since it is a special case of the Arnoldi iteration. The Lanczos method focuses more on symmetric matrices, whereas the Arnoldi iteration works for arbitrary matrices.

Starting point is the (homogeneous) Fredholm equation (λIA)u=0(\lambda I - A)u=0. Pairs (λk,uk)(\lambda_k, u_k) with uk0u_k\neq 0 that solve this equation are known as eigenvalues and eigenvectors of AA. Eigenvalues and eigenvectors are a very important concept in mathematics. Knowing the eigenvalues and eigenvectors of a linear operator tells you everything about its behavior. We would like to find such pairs of eigendata by using Arnoldi iterations. The next paragraphs present the necessary ideas and follow the presentation in [1] rather closely.

We are given a square matrix ARn,nA\in\mathbb{R}^{n,n} and we want to solve the Fredholm equation (λIA)u=0(\lambda I - A)u=0 from the previous paragraph. To this end let us first consider a set of linearly independent vectors {ei}i\left\lbrace e_i \right\rbrace_i with eie_i in Rn\mathbb{R}^n for all ii. If we place these vectors eie_i as columns in a matrix EE, then there exist coefficients αi\alpha_i such that any vector uu in the space spanned by these vectors can be expressed as u=Eαu = E\alpha. Here α\alpha is the vector that contains the coefficients αi\alpha_i. Plugging this expression for uu into (λIA)u=0(\lambda I - A)u=0 gives us (λIA)Eα=0(\lambda I - A)E\alpha = 0. We can also multiply the equation with EE^\top from the left on both sides and obtain the expression E(λIA)Eα=0E^\top(\lambda I - A)E\alpha = 0. The latter can also be rewritten as

(λI(EE)1EAE)α=0. \left(\lambda I - (E^\top E)^{-1}E^\top A E\right)\alpha = 0\quad .

Arnoldi refers to the previous equation as reduced equation and to (EE)1EAE(E^\top E)^{-1}E^\top A E as reduced matrix in his work [1]. Note that the reduced matrix can be much smaller than the original matrix AA. Its size depends on how many linearly independent columns you have in your matrix EE. Arnoldi continues by noting that it would be useful if the matrix EEE^\top E was diagonal. In that case it is easy to compute the inverse. If ei,ej=0\left\langle e_i, e_j\right\rangle = 0 whenever iji\neq j (i.e. the vectors eie_i are orthogonal to each other), then EEE^\top E will indeed be diagonal and our previous equation becomes

(λI(ejAeiejei)i,j)α=0. \left( \lambda I - \left( \frac{e_j^\top A e_i}{e_j^\top e_i} \right)_{i,j} \right) \alpha = 0\quad .

The eigenvalues and eigenvectors of AA will be extracted from the previous system. However, we first need to find a way to generate suitable linearly independent vectors. Unless we chose them properly, we do not gain anything from the previous transformations.

Arnoldi suggests the following iterative strategy to obtain the vectors eie_i. This idea is also already present in the work of Lanczos [2]. Starting from a given vector e1Rne_1\in\mathbb{R}^n, we compute

ei+1=Aeik=0i1μi,keik e_{i+1} = A e_i - \sum_{k=0}^{i-1} \mu_{i,k} e_{i-k}

In order to determine the scalar coefficients μi,k\mu_{i,k} we exploit our orthogonality requirement. If e1e_1 is given, we want to generate a vector e2e_2 orthogonal to e1e_1. Next we generate a vector e3e_3 orthogonal to both e1e_1 and e2e_2... Overall we get the following identities.

eiei+1=eiAeiμi,0eiei=0eiei+2=eiAei+1μi,1eiei=0eiei+3=eiAei+2μi,2eiei=0\begin{align*} e_{i}^\top e_{i+1} &= e_{i}^\top A e_{i} - \mu_{i,0} e_i^\top e_i = 0 \\ e_{i}^\top e_{i+2} &= e_{i}^\top A e_{i+1} - \mu_{i,1} e_i^\top e_i = 0 \\ e_{i}^\top e_{i+3} &= e_{i}^\top A e_{i+2} - \mu_{i,2} e_i^\top e_i = 0 \\ &\vdots \end{align*}

Solving these equations for the coefficients μi,k\mu_{i,k} is straightforward. We immediately get

μi,k=eiAei+keiei. \mu_{i,k} = \frac{e_i^\top A e_{i+k}}{e_i^\top e_i}\quad .

The two previous steps are commonly referred to as Arnoldi iteration in the literature, c.f. this Wikipedia article. If you compute all nn iterates (see below for why there cannot be more than nn iterates) then you have generated a matrix EE such thath EAEE^\top A E will be in Hessenberg shape. Actually this is quite nice, but you might wonder what you can do with that representation. Arnoldis paper actually goes further and shows how to extract the eigendata. Unfortunately that part is omitted in many textbooks.

As mentionned, our choice to construct the orthogonal basis vectors implies that the matrix EAEE^\top A E is in Hessenberg shape. All entries below the first subdiagonal are 0. Further, the eigenvalues will approximate the eigenvalues of AA. After nn steps they will even be identical because we have actually applied a similarity transform.

The eigenvalues correspond to the roots of the characteristic polynomial. Since our matrix is in Hessenberg shape, computing its characteristic polynomial becomes much simpler because we can approach it recursively. Arnoldi shows in his work that we get the following sequence of polynomials in parallel with the basis vectors.

p1(λ)=λμ1,0p2(λ)=(λμ2,0)p1(λ)μ0,1p3(λ)=(λμ3,0)p2(λ)μ2,1p1(λ)μ0,2\begin{align*} p_1(\lambda) &= \lambda - \mu_{1,0} \\ p_2(\lambda) &= (\lambda - \mu_{2,0}) p_1(\lambda) - \mu_{0,1} \\ p_3(\lambda) &= (\lambda - \mu_{3,0}) p_2(\lambda) - \mu_{2,1} p_1(\lambda) - \mu_{0,2} \\ &\vdots \end{align*}

The roots of these polynomials approximate the eigenvalues of AA. Once you have computed the roots λk\lambda_k, you can also compute the coefficients αj\alpha_j that solve the equation E(λIA)Eα=0E^\top(\lambda I - A)E\alpha = 0. Similarly as for the λk\lambda_k, the αj\alpha_j can also be computed recursively.

αm=1αm1=λkμm,0αm2=(λkμm1,0)αm1μ0,m1\begin{align*} \alpha_m &= 1 \\ \alpha_{m-1} &= \lambda_k - \mu_{m,0} \\ \alpha_{m-2} &= (\lambda_k - \mu_{m-1,0})\alpha_{m-1} - \mu_{0,m-1} \\ &\vdots \end{align*}

Finally you get an approximation to the eigenvector corresponding to the λk\lambda_k by evaluating

u=k=1mαkek. u = \sum_{k=1}^m \alpha_k e_k\quad .

Again, once you have gathered all nn vectors eie_i, this eigenvector will correspond to an eigenvector of AA. Also note that if you want to compute all eigenvectors, then you have to compute the coefficients αj\alpha_j for each λk\lambda_k.

It is important to emphasize that our representation for the eigenvector uu will always be a finite sum. Both Arnoldi and Lanczos cite several other formulas in their works that give you the eigenvector as well (e.g Schmidt expansion or Liouvill-Neumann expansion). However these representations are based on infinite sums or require that you know an eigenvalue upfront. Arnoldis and Lanczos' merit is to provide a representation in terms of a finite sum.

What does it have to do with Krylov subspaces?

Let's have another look at how the iterates are generated.

  1. We begin with an arbitrary start vector e1e_1.

  2. We compute the second basis vector e2e_2 as follows:

e2=Ae1e1Ae1e1e1e1 e_2 = A e_1 - \frac{e_1^\top A e_1}{e_1^\top e_1} e_1
  1. We compute the third basis vector e3e_3 as follows:

e3=Ae2e2Ae2e2e2e2e1Ae2e1e1e1 e_3 = A e_2 - \frac{e_2^\top A e_2}{e_2^\top e_2} e_2 - \frac{e_1^\top A e_2}{e_1^\top e_1} e_1
  1. We keep iterating until we generated the desired number of vectors or until the generation breaks down by returning the 0 vector.

The first thing that we notice is that we actually compute

e1,Ae1,Ae2=A2e1,... e_1,\quad Ae_1,\quad Ae_2=A^2e_1,\quad ...

which are exactly the vectors whose span gives you the Krylov subspaces. Next, by subtracting the terms eiAejeieiei\frac{e_i^\top A e_j}{e_i^\top e_i} e_i from AejA e_j, we are actually applying a Gram-Schmidt orthogonalization on the iterates AejA e_j. Hence, in each iterate we compute a Krylov subspace and compute an orthogonal basis.

The algorithm then projects the original matrix into these Krylov subspaces and computes an approximation to the eigenvalues. Since these spaces are much smaller than the original space, the computations are much simpler (also our matrix has a much nicer structure)

Implementation

The following Mathematica code implements a very simple Arnoldi Iteration to compute eigenvalues and eigenvectors.

ComplexDot[a_, b_] := Conjugate[a] . b

ReducedMatrixEntry[leftvec_, rightvec_, A_] := Module[{},
   Return[
     ComplexDot[leftvec, A . rightvec]/ComplexDot[leftvec, leftvec]
     ];
   ];

ReducedMatrix[basis_, A_] := Module[{},
   Return[
     Outer[ReducedMatrixEntry[#1, #2, A] &, basis, basis, 1]
     ];
   ];

GramSchmidtStep[basis_, weights_, A_] := Module[{},
   Return[
     Append[basis,
      A . basis[[-1]] - Total[MapThread[#1*#2 &, {weights, basis}]]]
     ];
   ];

EigenvaluePolynomials[polynomials_, weights_, lambda_] := Module[{},
   Return[
     Append[polynomials,
      (lambda - weights[[-1]])*polynomials[[-1]] - 
       Total[weights[[1 ;; -2]]*polynomials[[1 ;; -2]]]]
     ];
   ];

EigenvectorPolynomials[polynomials_, weights_, lambda_] := 
  Module[{},
   Return[
     Prepend[
      polynomials,
      (lambda - weights[[1]])*polynomials[[1]] - 
       Total[weights[[2 ;; -1]]*polynomials[[2 ;; -1]]]]
     ];
   ];

EigenvalueEstimates[polynomial_, lambda_] := Module[{},
   Return[lambda /. Solve[polynomial == 0, lambda]]
   ];

EigenvectorEstimate[polynomials_, basis_, eigenvalues_, lambda_] :=
   Module[{},
   Return[
     Map[{#, 
        Normalize[Total[(polynomials /. lambda -> #)*basis]]} &, 
      eigenvalues]
     ];
   ];

ArnoldiEigendata[A_, start_, maxIts_] := Module[{
    VecK = {Normalize[start]},
    CoeffMat,
    ColIdx = 0,
    RowIdx = 0,
    EigenvaluePolys = {1},
    EigenvectorPolys = {1},
    EigenvalEstimates,
    EigenvecEstimates,
    lambda
    },
   
   For[ColIdx = 1, ColIdx <= maxIts, ColIdx++,
    CoeffMat = ReducedMatrix[VecK, A];
    VecK = GramSchmidtStep[VecK, CoeffMat[[1 ;; ColIdx, -1]], A];
    EigenvaluePolys = 
     EigenvaluePolynomials[EigenvaluePolys, 
      CoeffMat[[1 ;; ColIdx, -1]], lambda];
    ];
   
   EigenvalEstimates = 
    EigenvalueEstimates[EigenvaluePolys[[-1]], lambda];
   
   For[RowIdx = maxIts, RowIdx >= 1, RowIdx--,
    EigenvectorPolys = 
      EigenvectorPolynomials[EigenvectorPolys, 
       CoeffMat[[RowIdx, RowIdx ;; -1]], lambda];
    ];
   
   EigenvecEstimates = 
    EigenvectorEstimate[EigenvectorPolys[[2 ;; -1]], VecK[[1 ;; -2]], 
     EigenvalEstimates, lambda];
   
   Return[{EigenvalEstimates, EigenvecEstimates}];
   ];

Example

We consider the following matrix (in Mathematica notation)

{
  {1.943350, 0.578511, 1.163850, 0.268453, 1.73745, 0.98200},
  {0.578511, 1.246780, 0.910821, 0.090292, 1.62437, 1.35639},
  {1.163850, 0.910821, 0.409511, 0.265599, 1.74996, 0.67720},
  {0.268453, 0.090292, 0.265599, 0.232830, 1.23293, 0.35352},
  {1.737450, 1.624370, 1.749960, 1.232930, 1.41587, 1.07492},
  {0.982009, 1.356390, 0.677200, 0.353520, 1.07492, 1.76505}
}

Its eigenvalues are given by

{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}

and its eigenvectors are given by the columns of the following matrix

{
  {-0.460203, -0.554847,  0.164253,  0.585615,  0.326414, -0.062352},
  {-0.398644,  0.480159,  0.287389, -0.248932,  0.459752, -0.504578},
  {-0.363666, -0.164665,  0.410593, -0.158344, -0.763497, -0.253072},
  {-0.174360, -0.143923,  0.459720, -0.454511,  0.225959,  0.692752},
  {-0.548404, -0.185839, -0.710719, -0.397555, -0.010979,  0.037753},
  {-0.407301,  0.615814, -0.073336,  0.453195, -0.219035,  0.442875}
}

Our Arnoldi implementation above gives us the following iterates

{0.549131, 6.06347}

{
  {-0.864387, 0.502827},
  { 0.121370, 0.208641},
  { 0.244172, 0.419744},
  { 0.056320, 0.096817},
  { 0.364510, 0.626613},
  { 0.206022, 0.354163}
}
{-0.723417, 1.0684, 6.40053}

{
  { 0.380137, -0.801777, 0.461139},
  { 0.569453,  0.496034, 0.393024},
  {-0.340171,  0.056756, 0.379099},
  { 0.294660,  0.249443, 0.190803},
  {-0.572385,  0.043831, 0.548051},
  {-0.032460,  0.209139, 0.390385}
}
{-1.09743, 0.247749, 1.22842, 6.40536}

{
  {-0.255709,  0.483779, -0.699114, 0.460229},
  {-0.469832,  0.195576,  0.569868, 0.399032},
  {-0.017525, -0.657474, -0.210288, 0.361940},
  {-0.578735, -0.409707,  0.043735, 0.175555},
  { 0.555310, -0.200571,  0.020606, 0.550674},
  { 0.265066,  0.295547,  0.374075, 0.404846}
}
{-1.33928, -0.492637, 0.750416, 1.34907, 6.40546}

{
  { 0.164574, -0.327319,  0.587271, -0.555971, 0.460203},
  { 0.277366, -0.489640, -0.195416,  0.493952, 0.39867},
  { 0.402705,  0.746602, -0.129910, -0.156531, 0.363682},
  { 0.474268, -0.185201, -0.524148, -0.159935, 0.174327},
  {-0.710545,  0.010519, -0.399970, -0.185071, 0.548404},
  {-0.063307,  0.247569, 0.4066220,  0.602143, 0.407275}
}
{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}

{
  {-0.164253,  0.326414, -0.062352,  0.585615, -0.554847, 0.460203},
  {-0.287389,  0.459752, -0.504578, -0.248932,  0.480159, 0.398644},
  {-0.410593, -0.763497, -0.253072, -0.158344, -0.164665, 0.363666},
  {-0.459720,  0.225959,  0.692752, -0.454511, -0.143923, 0.174360},
  { 0.710719, -0.010979,  0.037753, -0.397555, -0.185839, 0.548404},
  { 0.073336, -0.219035,  0.442875,  0.453195,  0.615814, 0.407301}
}

At this point the eigenvalues correspond exactly with those computed by Mathematica and the eigenvectors coincide up to sign (which is perfectly fine).

Algorithmic considerations

References

[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quarterly of Applied Mathematics 9, 17–29, 1951, doi:10.1090/QAM/42792

[2] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research, Nat. Bu. Stand., 45, 255-282, 1950

[3] J. Liesen, Zdenĕk Strakoš, Krylov Subspace Methods: Principles and Analysis; Oxford University Press, 2013

[4] Gene H. Golub, Charles F. Van Loan, Matrix Computations, John Hopkins University Press, 2013

How to cite this page

Hoeltgen, Laurent: Arnoldi Iterations, 2021-08-18.

BibLaTeX code:
@online{arnoldi-iteration,
  author   = {Hoeltgen, Laurent},
  title    = {Arnoldi Iterations},
  date     = {2021-08-18},
  language = english
  url      = {https://www.laurenthoeltgen.name/content/blog/
              arnoldi-iteration}
}
Download BibLaTeX file

CC BY-SA 4.0 Laurent Hoeltgen. Last modified: September 19, 2025.
Website built with Franklin.jl and the Julia programming language.
Privacy Policy · Terms