My next two or three posts are going to deal with the QR and SVD techniques. QR can be done in several different ways. I’m going to work through two methods of computing QR that will shed some light on our SVD implementation. So, these two topics dovetail nicely. Onward!
QR via Householder Matrices
For our first method, HouseholderQR
, we are going to use Householder vectors and matrices. We talked about these a bit when we generated random symmetric positive definite matrices to test our Cholesky implementation. They’re baaaaaaaack.
Here’s a quick reminder. Householder matrices look like \(H = I – 2vv^T\) for \(v\) an \(n\)-element column-vector. \(H\) is shaped \(n \times n\). \(H\) is orthogonal (also, unitary – not Unitarian – but we aren’t discussing either). We only made use of the orthogonality of Householder matrices, but they have other nice properties too. Today, the property that is most important to us is that Householders can be used to transform a nonzero vector into a vector with mostly zeros (all but one entry will be zero), and likewise, for rows/columns of a matrix. Since \(QR\) produces an upper-triangular \(R\), we want lots of zeros. I’m going to attack this problem algebraically; if you’d like to see a geometric take on constructing the Householder form, check out pages 11- 12 here.
I’m Your Zero
Here’s how zeros get introduced. If we have a column vector \(\vec{x}{\in}\mathbb{R}^n\), we would like to find \(\vec{Hx}=\vec{y}\ {\in}\ \mathbb{R}^n\) with the properties that \(y_1\) is some fixed scalar value (we’ll call it \(\alpha\)) and the other values are 0. I’ll use \(\vec{0_n}\) for a vector of \(n\) zeros, and \(\vec{e}_{i,n}\) for the \(i\)-th standard basis vector in \(\mathbb{R}^n\). I apologize for using 1-based indexing in the math and 0-based indexing in the code. Mea culpa.
\(\renewcommand{\vec}[1]{\mathbf{#1}}\renewcommand{\norm}[1]{\|\vec{#1}\|}\renewcommand{\abs}[1]{\left\lvert#1\right\lvert}\)
\[
y_i= \begin{cases}
\alpha & i = 1 \\
0 & i \neq 1
\end{cases}
\hspace{.5 in} \text{and} \hspace{.5 in}
\vec{y} = \left[\begin{array}{c}{\alpha \\ \mathbf{0_{n-1}}}\end{array}\right] = \vec{e}_{1,n} \alpha
\]
\(\vec{H}\) is a hint that we’re going to use a Householder matrix to map from \(\vec{x}\rightarrow\vec{y}\). So, take \(H\) as a Householder matrix. \(H\) is orthogonal which means \(\vec{H}\) preserves vector length (we’re using the 2-norm): \(\norm{\vec{Hx}}=\norm{\vec{x}}\). Together with the fact that most of \(\vec{y}\) is zero which gives us \(\norm{\vec{y}}=\abs{y_1}\), we get \(\norm{\vec{x}} = \abs{y_1} = \abs{\alpha}\). If we take the postive root, we now know that \(\alpha\) is equal to the length of \(\vec{x}\).
Let’s turn our attention to \(\vec{Hx}\) and apply the expanded Householder form:
\[\begin{align}
\vec{Hx} &= \left(\vec{I} – \frac{2\vec{vv^T}}{\vec{v^Tv}}\right)\vec{x} \\
&= \vec{x} – \frac{2\vec{vv^Tx}}{\vec{v^Tv}} & \vec{v^Tx},\vec{vv^T} \text{are scalars} \\
&= \vec{x} – C\vec{v} = \vec{y} = \left[\begin{array}{c}{\alpha \\ \mathbf{0_{n-1}}}\end{array}\right] & C = 2\frac{\vec{v^Tx}}{\vec{v^Tv}}
\end{align}
\]
Now, the value of \(C\) is a function of both \(\vec{v}\) and \(\vec{x}\). But, all we need is one satisfactory \(\vec{v}\). And, if we can pick the form of \(\vec{v}\) nicely, we won’t have to do much work to create it from \(\vec{x}\). Simplicity and simple thoughts lead to numbers like one and zero. Zero seems right out, but what about \(C=1\)? Let’s add a constraint that \(C=1\) and see if we end up with a consistent result. So:
\[
\begin{align}
\vec{y} = \vec{x}-\vec{v} = \left[\begin{array}{c}{\alpha \\ \mathbf{0_{n-1}}}\end{array}\right]
\qquad \implies \qquad
y_i = x_i – v_i = \begin{cases}
\alpha = \norm{x} & i = 1\\
0 & i \neq 1
\end{cases}
\end{align}
\]
So, the elements of our (hopefull) \(\vec{v}\) are given by: \(v_1 = x_1 – \alpha = x_1 – \norm{x}\) and \(v_{i \dots n} = x_{i \dots n}\). Now, since we pulled \(C\) out of a hat, we need to confirm that the values of \(\vec{v}\) and \(\vec{x}\) yield our \(C=1\). To simplify things, let’s note that the dot-product of \(\vec{x}\) with itself, over all but its first element, is \(d_{2{\dots}n}=\sum_{i=2}^n x_i^2 = \norm{x}^2 – x_1^2\).
Now we have, \[\begin{align}
\vec{v^Tv} &= v_1^2 + d_{2{\dots}n} \\
&= (x_1 – \norm{x})^2 + \norm{x}^2 – x_1^2 \\
&= x_1^2 – 2x_1\norm{x} + \norm{x}^2 + \norm{x}^2 – x_1^2 \\
&= 2\norm{x}^2 – 2x_1\norm{x} \\
& =2(\norm{x}^2 – x_1\norm{x})
\end{align}\]
and
\[\begin{align}
\vec{v^Tx} &= v_1x_1 + d_{2{\dots}n} \\
&= (x_1 – \norm{x})x_1 + (\norm{x}^2 – x_1^2) \\
&= x_1^2 – x_1\norm{x} + \norm{x}^2 – x_1^2 \\
&= \norm{x}^2 – x_1\norm{x}
\end{align}\]
So, \(\frac{2\vec{v^Tx}}{\vec{v^Tv}} = \frac{2 (\norm{x}^2 – x_1\norm{x})}{2(\norm{x}^2 – x_1\norm{x})}=1\) which means our vectors magically – well, I’m sure there is a nice linear algebra based proof about why that works – are consistent with \(C=1\).
Codin’
The punchline of this is that we can make \(\vec{v}\) very easily from \(\vec{x}\): copy it and replace the first element with \(\norm{x}\). And presto, we have a Householder matrix that will annihilate (zero) the other elements of \(\vec{x}\). I’m getting code-hungry. Too many math-carbs. We need just a few last points to tie the mathematics to the code.
If we normalize \(\vec{v}\) so that \(v_1=1\) then we can store our \(\vec{v}\) in \(n-1\) slots by knowing the \(1\) is there implicitly. This will be very handy for reusing memory, if we are willing to overwrite an input matrix. As we produce an upper triangular \(R\), we can store the \(Q\) (which is a sequence of \(\vec{H}\)s coming from a sequence of \(\vec{v}\)s) in the lower triangular portion of the input matrix. In our code, we will use a normalized \(\vec{v}\): \(\vec{v}_{code} = \vec{v}/v_1\).
We also have a strange \(\beta\) running through our code. It comes from G&VL and it shows up all over the place in LAPACK. So, let’s explain where it comes from. \(\renewcommand{\nvec}[2]{\vec{#1}_\mathrm{#2}}\)
Using the normalized \(\nvec{v}{code}\):
\[\nvec{H}{code}=\left(\vec{I} – \frac{2\nvec{v}{code}\nvec{v}{code}^T}{\nvec{v}{code}^T\nvec{v}{code}}\right)=\left(\vec{I}-\beta\nvec{v}{code}\nvec{v}{code}^T\right)\]
What is \(\beta\)? Well,
\[\beta=\frac{2}{\nvec{v}{code}^T\nvec{v}{code}}
=\frac{2}{\frac{v_1^2}{v_1^2} + \frac{1}{v_1^2}\left( d_{2{\dots}n}\right)}
=\frac{2v_1^2}{v_1^2 + d_{2{\dots}n}}\]
To reconstruct our \(\vec{H}\) properly, we will need both \(\beta\) and \(\nvec{v}{code}\) . Last note (really!), it turns out that \(v_1 = x_1 – \norm{x}\) can get us into numerical trouble if the \(\vec{x}\) is very close to a multiple of the identity vector — that is, almost all of \(\vec{v}\) is in \(x_1\). If that is the case, \(x_1 – \norm{x}\) will be very close to zero and we may encounter numerical errors. A formula by Parlett resolves this.
Householder Vectors
from math import sqrt
import numpy as np
import numpy.linalg as nla
np.set_printoptions(suppress=True)
# G&vL 3rd pg. 210
def make_house_vec(x):
n = x.shape[0]
dot_1on = x[1:].dot(x[1:])
# v is our return vector; we hack on v[0]
v = np.copy(x)
v[0] = 1.0
if dot_1on < np.finfo(float).eps:
beta = 0.0
else:
# apply Parlett's formula (G&vL page 210) for safe v_0 = x_0 - norm(X)
norm_x= sqrt(x[0]**2 + dot_1on)
if x[0] <= 0:
v[0] = x[0] - norm_x
else:
v[0] = -dot_1on / (x[0] + norm_x)
beta = 2 * v[0]**2 / (dot_1on + v[0]**2)
v = v / v[0]
return v, beta
A = np.array([[4,1,1],
[1,4,1],
[1,1,4]], dtype=np.float64)
v, beta = make_house_vec(A[:,0])
print beta, v
If you remember, the whole point of this exercise was to introduce zeros into \(x\) (and eventually, into a column of \(\vec{A}\)). Because we’ve done some massaging to the form of \(v\) and since \(v\) and \(\beta\) are used together to define our \(H\) (that actually does the transformation to a multiple of a basis vector (\(e_{1,n}\)), we need to extract out the full \(\vec{H}\) to apply it to \(v\). We do that by using the definition of \(H\) from above. Here we do it for a single Householder vector:
def fullH_1d(v, beta):
n = v.shape[0]
return np.eye(n) - beta * np.outer(v,v)
And we apply \(H\) to \(A\) to get \(HA\):
H = fullH_1d(v, beta)
H.dot(A)
On to \(QR\)
If we do that a few times in a row (well, actually across the columns), we’re going end up with zeros below the diagonal — which is exactly the \(R\) (upper triangular) matrix we’re seeking for \(QR\).
# householder QR .. G&VL 3rd pg 224
def house_qr(A):
'''
update A in place with QR ... R is upper triangular (at and above diagonal)
below the diagonal, A holds the "essential" parts of the Householder vectors.
the essential part gets shorter because they are only applied to the remaining
bottom right square of A
'''
m,n = A.shape
betas = np.empty(n)
for j in range(n):
v, betas[j] = make_house_vec(A[j:, j])
A[j:,j:] = (np.eye(m-j) - betas[j] * np.outer(v,v)).dot(A[j:,j:])
if j < m:
A[j+1:,j] = v[1:m-j+1]
return betas
A = np.array([[4,1,1],
[1,4,1],
[1,1,4]], dtype=np.float64)
betas = house_qr(A)
print betas
print A
print np.triu(A)
If we want to see the explicit \(Q\) matrix, we need to apply each of the \(v_i \rightarrow H_i\) transformations and get \(\prod_i H_i = Q\). In practice, you may never need to do this (see G&VL, 3rd, 211-215). Becasue of their nice structure, you can almost always apply them in their packed/compressed (read "factored") form efficiently.
# for explicit Q see G&VL 3rd pg. 213 ... algo 5.1.5
def fullQ_house(A, betas):
'''
take the "packed" Householder vectors stored in A's lower triangle,
along with the respective \beta s
and expand to a full Q matrix
(note, R still lives in A's upper triangle)
'''
m,n = A.shape
Q = np.eye(n)
for j in reversed(xrange(n)):
# NOTE: G&VL store the householder vector with a normalized v[0] == 1
# so it can be packed into the overwritten input matrix A
# we extract the "essential" part plus one spot and rewrite that spot with an
# explicit 1
v = A[j:,j].copy()
v[0] = 1.0
Q[j:,j:] = (np.eye(n-j) - betas[j] * np.outer(v,v)).dot(Q[j:,j:])
return Q
A = np.array([[4,1,1],
[1,4,1],
[1,1,4]], dtype=np.float64)
betas = house_qr(A)
Q = fullQ_house(A, betas)
print Q
print np.triu(A)
print Q.dot(np.triu(A))
_A = np.array([[4,1,1],
[1,4,1],
[1,1,4]], dtype=np.float64)
nqr = nla.qr(_A)
print nqr[0]
print nqr[1]
print nqr[0].dot(nqr[1])
You’ll notice that these are the same, up to some differences in signs. Negating \(Q\) and \(R\), certainly yields a new, valid \(QR\) decomposition. It also turns out that we can arbitrarily flip the signs of (one or more of) the elements on the diagonal of \(R\) (see Atkinson, Intro. to Numerical Analysis, pg. 614). If we do that, we need a different sign on a corresponding element of \(Q\) to maintain the sign of the entry in the dot-product between the two. Details, schmetails.
Additional Resources
You can grab a copy of this notebook.
Even better, you can view it using nbviewer.
License
Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.