## introduction

Pre-conditioning is a trick to solve linear systems of equations $$Ax=b$$ for iterative solvers. Typically, a simple-to-compute and non-singular matrix $$P$$ is chosen to change the system matrix into

$APy=b$

where $$y = P^{-1}x$$ is the new unknown. It is also possible to apply conditioning matrix from the left side, i.e.,

$PAx = Pb.$

The goal here is to reduce the condition number of $$A$$ such that the iterative method converges faster. One also implicitly assumes that $$A$$ is full rank such that the solution is unique. In this case pre-conditioning will not affect the solution.

In this post, I would like to discuss the situation when $$A$$ is rank-deficient. The iterative solver I picked is conjugate gradient (CG) and model system is

$A = [1, 1], \quad b=1 \\ P=\begin{bmatrix}1&0\\0&\alpha\end{bmatrix}.$

Obviously, this system does NOT have a unique solution. But does CG know that?

## observations

Using the following python code, we ended up with the solution

$\tilde x = [0.5, 0.5]^T$
import scipy.sparse.linalg as linalg
import numpy as np

A = np.array([[1,1],[1,1]])
b = np.array([1,1])
x = linalg.cg(A,b)


Here I augmented the $$A$$ matrix to $$2\times2$$ so that scipy does not complain that $$A$$ is not a square matrix. This augmentation can be viewed either as copying the row of $$A$$ as a new row or solving the normal equation

$A^TA x = A^T b.$

Interestingly, this solution is the matrix pseudo-inverse for matrix with linearly independent rows, i.e.,

$\tilde x = A^T(AA^T)^{-1} b.$

(Be very careful about the ordering of $$A$$ and $$A^T$$ here.)

This formula can be derived from the minimization problem

$\min_x\|x\|^2\quad s.t. Ax=b$

The derivation is included at the end of this post. In other words, it is the minimal norm solution for the under-determined system.

With the pre-conditioner $$P$$, the normal equation becomes

$\begin{bmatrix}1&\alpha\\\alpha&\alpha^2\end{bmatrix}x=\begin{bmatrix}1\\\alpha\end{bmatrix}.$

And again the CG result matches with the minimum norm solution

$\tilde x = \frac{1}{1+\alpha^2}\begin{bmatrix}1\\\alpha^2\end{bmatrix}.$

It seems to hold for randomly generated $$A$$ as well. You can check it out with the following code. Feel free to change the matrix dimension as well.

A = np.random.rand(3,4)
b = np.random.rand(3,1)
x = linalg.cg(np.dot(A.T,A), np.dot(A.T,b))
print x
print np.dot(A.T, np.dot(np.linalg.inv(np.dot(A,A.T)),b))


## caveats

It seems that we have a nice result that CG converges to minimum norm solution. However, there are two caveats to this observation

• The initial condition is set to 0.
• The normal equation is used.

With other initial conditions, the solutions vary. Also, if I copy rows of $$A$$ to itself to make it square, CG does not converge to minimum norm solution.

## summary

So far we have numerically demonstrated that CG converges to the minimum norm solution for a rank-deficient system, given 0 initial condition and normal equation are used.

The more interesting observation is that the solution depends on the pre-conditioner $$P$$. In some sense, it provides a systematic way to ‘manipulate’ the answer. In the literature, it seems to have the name ‘prior-conditioning’, which refers to confining the structure of $$x$$ with $$P$$ using some prior information.

In our simple example, the pre-conditioner $$P$$ rescales the components of $$x$$. It may be motivated by arguing that $$x_2$$ is expected to be greater than $$x_1$$ thus a pre-conditioner $$P$$ with $$\alpha>1$$ makes the unknowns more ‘equal’. Looking at the minimum norm solution, it seems to do the trick, i.e., it causes $$x_2>x_1$$.

In practice, the system may be so big (hello medical imaging) that it is not possible to explicitly check whether $$A$$ is rank-deficient, or even express $$A$$ explicitly. Then one needs to be cautious about the stopping criteria, initial condition, and the pre-conditioner of the iterative solvers.

## appendix

### derivation of the pseudo-inverse

The Lagrangian is defined as

$\mathcal L = x^Tx + \lambda^T(Ax-b)$

where the vector $$\lambda$$ is the Lagrangian multiplier.

The stationary condition gives rise to

$\frac{\partial \mathcal L}{\partial x}=0\rightarrow x=\frac{A^T\lambda}{2}.$

Plugging into the linear equation $$Ax=b$$, we get

$\lambda = 2(AA^T)^{-1}b$

Thus the pseudo-inverse is

$x = A^T(AA^T)^{-1}b.$