# Krylov Subspaces

This is the first post in a (planned) series on Krylov subspaces, projection processes, and related algorithms. We already discussed projection processes when talking about the Bregman algorithm and we will see that the Krylov (sub-)spaces will be generated by a set of vectors that are not necessarily orthogonal. Getting an orthogonal basis for these spaces can for example be done by applying Gram-Schmidt. Forthcoming posts will discuss further interesting applications such as the Arnoldi iteration or the Lanczos method.

## Krylov Subspaces

Krylov subspaces are probably best known for their use in solving linear systems of equations. However, this was not exactly what Aleksey Nikolaevich Krylov had in mind when he published his paper  in 1931. He was interested in oscillations of mechanical systems (see  for an overview on his scientific achievements). Actually he worked on linear differential equations at that time. Linear systems of equations were basically just one discretisation step away. In his publication  Aleksey Krylov proposed a method to reformulate the characteristic equation of a matrix such that computations get easier. The characteristic equation of a matrix $A$ is given by $\det(\lambda I - A) = 0$ and its solutions tell you a lot about the linear mapping $A$.

Krylov’s approach is based on the following observation: Assume we have a square matrix $A\in\mathbb{R}^{n,n}$ and a non-zero vector $v\in\mathbb{R}$. If we iteratively multiply $A$ with $v$, then we obtain a sequence $v$, $Av$, $A^2v$, …. At some point (at latest after $n$ steps) this sequence will consist of linearly dependent vectors and adding an additional iterate will not enlarge the spanned space any further. Let’s call $d$ the smallest integer for which we obtain such a set of linearly dependent vectors. Then it follows that there exist coefficients $\alpha_j$ such that $$A^d v = \sum_{j=0}^{d-1} \alpha_j A^j v$$ We can rewrite this equation as $p(A)v = 0$, where $p(A)$ is given by $$A^d - \sum_{j=0}^{d-1} \alpha_j A^j$$ The polynomial $p$ that we have obtained in this way is called the minimal polynomial of $v$ with respect to $A$.

For the sake of simplicity let us now assume that we have a vector $v\in\mathbb{R}^n$ for which $d=n$ (so $d$ is as large as possible). We consider the following homogeneous linear system

\begin{align} \begin{pmatrix} v && \cdots && A^{n-1}v && A^nv \\ 1 && \cdots && \lambda^{n-1} && \lambda^n - p(\lambda) \end{pmatrix} \begin{pmatrix} -\alpha_0 \\ \vdots \\ -\alpha_{n-1} \\ 1 \end{pmatrix} &= \begin{pmatrix} 0 \\ \vdots \\ 0 \\ 0 \end{pmatrix} \end{align}

The first $n$ equations correspond to $p(A)v = 0$. The last equation is just $p(\lambda) - p(\lambda) = 0$. Since this system has a non-trivial solution, the system matrix must be singular and hence its determinant must be $0$. Using the linearity of the determinant in the last row one can derive $$p(\lambda) = \frac{\det\begin{pmatrix} v && \cdots && A^{n-1}v && A^nv \\ 1 && \cdots && \lambda^{n-1} && \lambda^n \end{pmatrix}}{\det\begin{pmatrix} v && \cdots && A^{n-1}v \end{pmatrix}}$$ Now we just have to observe that $p(\lambda)$ is also equal to the characteristic polynomial of $A$ and we have found our sought expression! The benefit of the previous expression is that $\lambda$ only appears in a single line of the determinant in the numerator. For the usual formula, $\lambda$ appears in every line.

Of course, certain conditions on $A$ and $v$ must apply for the previous formula to apply (we already assumed that $d=n$ above). One of Krylovs observations is (roughly speaking) that for each matrix $A$ and each vector, the nested sequence of spaces $$\mathrm{span}\left\lbrace v, Av, \ldots, A^{n-1}v\right\rbrace,\quad n=1, 2, \ldots$$ will eventually stop to grow and become invariant under $A$. These spaces in the previous equation are nowadays called Krylov spaces. In the forthcoming we will denote them by $$V_n(A, v) := \mathrm{span}\lbrace v, Av, A^2v, \ldots, A^{n-1}v \rbrace$$

### Relationship to Power Iterations

Looking at the definition of $V_n(A, v)$ we immediately notice the similarity to the classic power method. The power method starts with a vector $w_0=v$ and iteratively computes $$w_{k+\frac{1}{2}} = A w_k,\quad w_{k+1} = \frac{w_{k+\frac{1}{2}}}{\left\lVert w_{k+\frac{1}{2}}\right\rVert}$$ If certain requirements are met, then the sequence $(w_k)_k$ converges towards the eigenvector corresponding to the largest eigenvalue in magnitude. Hence the iterates from the power method form a basis of the Krylov subspace.

### Examples

If $A=I$, i.e. $A$ is the identity matrix, then, for any vector $v$ we have $$V_0(A,v) = V_0(I,v) = \mathrm{span}\lbrace v \rbrace = V_1(I,v) = \ldots = V_n(I,v)$$ for any $n>0$. In this case all our Krylov subspaces are identical and have dimension 1.

In general however, we have $V_k(A,v) \subseteq V_{k+1}(A,v)$ because you add $A^kv$ to the span. If we consider $$A = \begin{pmatrix} 8 & 4 & 2 \\ 4 & 2 & 1 \\ 2 & 1 & 0 \end{pmatrix},\quad v = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$$ then it’s easy to check that $V_0(A,v) \subset V_1(A,v)$ and $V_1(A,v) \subset V_2(A,v) = \mathbb{R}^3$.

Estimating the maximal dimension of a Krylov subspace, without explicitly computing all spaces $V_k(A,v)$ would be out of scope for this article.

 A. N. Krylov, О численном решении уравнения, которым в технических вопросах определяются частоты малых колебаний материальных систем (De la résolution numérique de l’équation servant à déterminer dans des questions de mécanique appliquée les fréquences de petites oscillations des systèmes matériels), Izvestiia Akademii nauk SSSR 7(4): 491–539, 1931. In Russian

 J. Liesen, Zdenĕk Strakoš, Krylov Subspace Methods: Principles and Analysis; Oxford University Press, 2013