The Bregman Algorithm (2/4)

Optimization With Bregman Divergences

In a previous post we discussed how to find a common point in a family of convex sets by using the Bregman algorithm. Actually the algorithm is capable of more. We can use it to solve constrained optimization problems. Since there is no change on the underlying algorithm, we will focus on an example here.

Assume we are given a quadratic form $f(x) = x^\top Q x$ with a symmetric positive definite matrix $Q$. A simple computation shows that its Bregman divergence is given by $D(x,y) = (x-y)^\top Q (x-y)$. Lets also assume that we are given a linear system of equations $Ax=b$. We want to minimize $f$ over the set of solutions of the linear system. We assume the system has infinitely many solutions, otherwise the task is trivial.

Applying the Bregman algorithm to the linear system of equations and using the Bregman divergence for $f$ not only finds a solution of the linear system, it also minimizes $f$. The advantage of the Bregman algorithm is that in each iteration your constraint is a single affine equation. Instead of having to deal with $Ax=b$, we only have to deal with a single equation of the form $n^\top x = q$. Actually we have to solve the following optimization problem.

$$\begin{equation} x^{(n+1)} = \arg\min_z\left\lbrace D(z, x^{(n)})\ \vert\ n^\top z = q\right\rbrace \end{equation}$$

This optimisation problem has an analytic solution, which is given by the following expression.

$$\begin{equation} x^{(n+1)} = x^{(n)} - \frac{n^\top x^{(n)} - q}{n^\top Q^{-1} n} Q^{-1}n \end{equation}$$

We remark that we assumed $Q$ to be positive definite, thus $Q^{-1}$ really does exist. For larger problems it might pay off to factorize $Q$ (e.g. with a Cholesky decomposition) to evaluate $Q^{-1}n$. We also note that the formula above coincides with the formula used in the ProjectOntoLine method in our previous post when $Q$ is the identity matrix.

Finally, there is a condition on the starting point $x^{(0)}$. For technical reasons we must choose $x^{(0)}$ such that there exists a vector $u$ which fulfills the equation $D[f](x^{(0)}) = A^\top u$. Such conditions can be found in the literature under the name source conditions. Here, $D[f](x^{(0)})$ is the Jacobian of $f$ at $x^{(0)}$. Note that if $f$ has a global minimum (without taking the constraints into consideration), then that global minimum may be used as a starting point.

Note: Satisfying the source condition is important. If we violate the source condition, the algorithm will still give you a solution of the linear constraint, but it won’t necessarily be a solution that minimizes the quadratic form.

A Concrete Example

We consider the problem described above with the following parameters.

$$\begin{align} Q &= \begin{pmatrix} 3 & -1 \\ -1 & 3\end{pmatrix} \\ A &= \begin{pmatrix} 1 & -4 \\ -1 & 4\end{pmatrix} \\ b &= \begin{pmatrix} 8 \\ -8 \end{pmatrix} \\ x^{(0)} &= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \end{align}$$

In this case the algorithm returns the correct solution already after a single iteration. Namely, it gives you the following values.

$$\begin{equation} x = \begin{pmatrix} -0.186047 \\ -2.04651 \end{pmatrix},\quad f(x) = 11.907 \end{equation}$$

Julia Code

Adapting our previous implementation is straightforward. We just have to bring the matrix $Q$ into play.

using LinearAlgebra

struct Line

# Compute the orthogonal projection onto a line
function ProjectOntoLine(line, point)
    println(dot(line.normal_vector, Q\line.normal_vector))
    return point - ((dot(line.normal_vector, point) - line.offset) /
        dot(line.normal_vector, Q\line.normal_vector)) * (Q\line.normal_vector)

function BregmanAlgorithm(matrix, rhs, x0)
    solution = x0
    i = 0
    # stop when the residual drops below 1.0e-10
    while (norm(A * solution - b) > 1.0e-10)
        # Extract a line from from the system of equations.
        line = Line(matrix[(i%2)+1,:], rhs[(i%2)+1])
        # Compute the projection of the previous iterate onto the line.
        solution = ProjectOntoLine(line, solution)
        i = i + 1
    return solution

Q = [3 -1;
     -1 3]
A = [1 -4;
     -1 4]
b = [8; -8]
x0 = [0; 0]

x = BregmanAlgorithm(A, b, x0)
println("Final solution: ", x)
println("Error: ", norm(A*x - b))
println("Energy: ", dot(x, Q*x))


[1] Bregman, L. M. “The Relaxation Method of finding the common point of convex sets and its application to the solution of problems in convex programming.” USSR Computational Mathematics and Mathematical Physics, 1967, 7, 200–217 (English translation, Russian original available here)


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

Mathematician and
Software Engineer

Researcher, Engineer, Tinkerer, Scholar, Philosopher, and Hyrox afficionado