Householder Bidiagonalization

We’re going to make use of Householder reflections to turn a generic matrix into a bidiagonal matrix. I’ve laid a lot of foundation for these topics in the posts I just linked. Check them out! Onward.

Reducing a Generic Matrix to a Bidiagonal Matrix Using Householder Reflections

Why would we want to do this? Well, it’s a long chain of logic (and this is abbreviated) but with an input matrix \(A\) that we eventually want to compute the SVD of \(A=U_S D V^T_S\). BTW, I’m using \(D\) instead of the usual \(\Sigma\) because I don’t want folks thinking it is a variance/covariance matrix and it is diagonal, in any case.

  1. If we have \(U^T_B A V_B = B\) with \(B\) upper-bidiagonal and \(U_B,V_B\) orthogonal, then \(V^T_B(A^T A)V_B=B^T B=T\) with \(T\) tridiagonal and symmetric. Amazingly (serendipity?), the \(U^T\) disappers since: \((XYZ)^T=X^T Y^T Z^T\) and \(U\) is orthogonal (\(U U^T = I\)).
  2. The singular values of \(A\) are the positive square roots of the non-zero eigenvalues of \(A^TA\). Take a look at the inner term of the second equation in point 1.
  3. \(A^TA\) is similar — which means they have the same eigenvalues — to \(B^TB\) and \(T\).
  4. A symmetric tridiagonal matrix gives rise to a particularly nice characteristic equation which allows "easy" determination of eigenvalues (they end up forming a Sturm sequence which has nice properties).

I don’t want to leave a misleading impression. We won’t end up explicitly using \(B^TB\). The reasons are the same reasons we avoid that form in solving least squares using the direct normal equations: it has bad numerical properties and can be large. However, we do need the bidiagonal form \(B\). Here’s make_house_vec from the post on using Householder reflections to perform QR

In [1]:
import numpy as np
import numpy.linalg as nla
np.set_printoptions(suppress=True)

# G&vL 3rd pg. 210
def make_house_vec(x):
    n = x.shape[0]
    dot_1on = x[1:].dot(x[1:])

    # v is our return vector; we hack on v[0]
    v = np.copy(x)
    v[0] = 1.0
    
    if dot_1on < np.finfo(float).eps:
        beta = 0.0
    else:
        # apply Parlett's formula (G&vL page 210) for safe v_0 = x_0 - norm(X) 
        norm_x= np.sqrt(x[0]**2 + dot_1on)
        if x[0] <= 0:
            v[0] = x[0] - norm_x
        else:
            v[0] = -dot_1on / (x[0] + norm_x)
        beta = 2 * v[0]**2 / (dot_1on + v[0]**2)
        v = v / v[0]
    return v, beta

Longwinded and Explicit

Using Householder vectors, we’re going to introduce "lots" of zeros into colums of our target matrix. This is very similar to the Householder QR process. However, instead of simply zeroing out below the diagonal one column at a time, we’re also going to zero out above the superdiagonal. We’re also going to work down the column and across the row. So, for position \((p,p)\) on the diagonal of a \((m,n)\) matrix, we’ll zero out elements in that column (using zero based indexing):

\[(p+1,p),\ (p+2,p),\ \ldots\ ,\ (m-1,p)\]

and we’ll zero out elements in that row:

\[(p,p+2),\ (p,p+3),\ \ldots\ ,\ (p, n-1)\]

We start across the row at \(p+2\) because we need to skip the superdiagonal element. To keep the implementation a bit more direct, we’re going to hold off on one optimization. Instead of "packing" the reflection matrices back into A, we’re going to compute them explicitly. You don’t need to do this in practice, but hold that thought.

In [2]:
def full_house(n, col, v, beta):
    ''' for size n, apply a Householder vector v in the lower right corner of 
        I_n to get a full-sized matrix with a smaller Householder matrix component'''
    full = np.eye(n)
    full[col:, col:] -= beta * np.outer(v,v)
    return full

# G&VL Algo. 5.4.2 with explicit reflections
def house_bidiag_explicit_UV(A):
    m,n = A.shape
    assert m >= n
    U,Vt = np.eye(m), np.eye(n)
    
    for col in xrange(n):
        v, beta = make_house_vec(A[col:,col])
        A[col:,col:] = (np.eye(m-col) - beta * np.outer(v,v)).dot(A[col:,col:])
        Q = full_house(m, col, v, beta)
        U = U.dot(Q)
        
        if col <= n-2:
            # transpose here, reflection for zeros above diagonal in A
            # col+1 keeps us off the super diagonal
            v,beta = make_house_vec(A[col,col+1:].T)
            A[col:,col+1:] = A[col:, col+1:].dot(np.eye(n-(col+1)) - beta * np.outer(v,v))
            P = full_house(n, col+1, v, beta)
            Vt = P.dot(Vt)
    return U, A, Vt
In [3]:
A = np.arange(1,13.0).reshape(4,3)
A_test = np.copy(A)
print A

U, B, Vt = house_bidiag_explicit_UV(A)
print "B:\n", B
print "U:\n", U
print "Vt:\n", Vt

print "should equal input:", np.allclose(U.dot(B).dot(Vt), A_test)
[[  1.   2.   3.]
 [  4.   5.   6.]
 [  7.   8.   9.]
 [ 10.  11.  12.]]
B:
[[ 12.88409873  21.87643283   0.        ]
 [  0.           2.24623524  -0.61328133]
 [ -0.           0.           0.        ]
 [ -0.           0.           0.        ]]
U:
[[ 0.07761505  0.83305216 -0.54737648  0.01946767]
 [ 0.31046021  0.45123659  0.74434565  0.38203344]
 [ 0.54330537  0.06942101  0.15343813 -0.8224699 ]
 [ 0.77615053 -0.31239456 -0.35040731  0.42096879]]
Vt:
[[ 1.          0.          0.        ]
 [ 0.          0.66700225  0.7450557 ]
 [ 0.          0.7450557  -0.66700225]]
should equal input: True

Refactored and Explicit

Reading (let alone coding) the row, col, +1, n-col values in that version is painful. It might be worthwhile, for performance purposes, to have so many long-winded indexing operations. Those operations are fast (-er than python level function calls), but lack human-readable semantics — I apologize to our numerical wizard siblings who speak in binary. So, here’s another look at it, with some functions to apply the Householder vectors to both the input matrix and the rotations that we are carrying along for the ride.

In [4]:
def apply_house_left(submatrix, hvec, beta, rotation, fullrows):
    m,n = submatrix.shape
    
    # A needs a reduced H (just the bottom-right); 
    # rotation needs a full H
    fullH = np.eye(fullrows)
    partH = fullH[fullrows-m:, fullrows-m:]
    partH -=  beta*np.outer(hvec,hvec)
    
    # mutate instead of (new) object assignment
    submatrix[:] = partH.dot(submatrix)
    rotation[:] = rotation.dot(fullH)


def apply_house_right(submatrix, hvec, beta, rotation, fullcols):
    m,n = submatrix.shape

    # A needs a reduced H (just the bottom-right); 
    # rotation needs a full H
    fullH = np.eye(fullcols)
    partH = fullH[fullcols-n:, fullcols-n:] # fillable = fullcols - m  
    partH -= beta*np.outer(hvec,hvec)
    
    # mutate instead of (new) object assignment
    submatrix[:] = submatrix.dot(partH)
    rotation[:]  = fullH.dot(rotation)

def hbd_simple(A):
    m,n = A.shape
    U,Vt = np.eye(m), np.eye(n)
    
    for col in xrange(n):
        # zero down the column
        u, beta_u = make_house_vec(A[col:,col])
        apply_house_left(A[col:, col:], u, beta_u, U, m)
        if col <= n-2:
            # zero across the row
            v, beta_v = make_house_vec(A[col,col+1:].T)
            apply_house_right(A[col:, col+1:], v, beta_v, Vt, n)
    return U, A, Vt
In [5]:
A = np.arange(1,13.0).reshape(4,3)
A_test = np.copy(A)
print A

U, B, Vt = hbd_simple(A)
print "B:\n", B
print "U:\n", U
print "Vt:\n", Vt

print "should equal input:", np.allclose(U.dot(B).dot(Vt), A_test)
[[  1.   2.   3.]
 [  4.   5.   6.]
 [  7.   8.   9.]
 [ 10.  11.  12.]]
B:
[[ 12.88409873  21.87643283   0.        ]
 [  0.           2.24623524  -0.61328133]
 [ -0.           0.           0.        ]
 [ -0.           0.           0.        ]]
U:
[[ 0.07761505  0.83305216 -0.54737648  0.01946767]
 [ 0.31046021  0.45123659  0.74434565  0.38203344]
 [ 0.54330537  0.06942101  0.15343813 -0.8224699 ]
 [ 0.77615053 -0.31239456 -0.35040731  0.42096879]]
Vt:
[[ 1.          0.          0.        ]
 [ 0.          0.66700225  0.7450557 ]
 [ 0.          0.7450557  -0.66700225]]
should equal input: True

Some Optimization

Now that we have the basic process down, let’s see if we can save some resources. Our major insight is going to be that our input matrix A, which ends up with lots of zeros, has lots of room to hold other information. And, to store our rotations, we really only need to store the essential part of the Householder vectors. We can do that in the zeroed out areas above and below the diagonal and superdiagonal band. Here’s what it looks like visually:

\[
\begin{bmatrix}
B & & \huge{0} \\
& B & \\
\huge{0} & & B \\
& \huge{0} &
\end{bmatrix}
\]

And here, we lay out the use of the zero spots for our \(\vec{v_r}\) (row vectors) and \(\downarrow u_c\)‘s (column vectors). Yes, I think the column vector/down arrow is very cute. The value after the colon is the length of the vector.

\[
\begin{bmatrix}
d_1 & s_1 & \vec{v_1}:n-2 & & \\
\downarrow u_1:{m-1} & d_2 & s_2 & \vec{v_2}:n-3 & \\
& \downarrow u_2:{m-2} & d_2 & s_3 & \vec{v_3}:n-4 \\
& & & \dots & & & \\
& & \downarrow u_{n-3}:m-(n-3) & d_{n-2} & s_{n-2} & \vec{v_{n-2}}:1 \\
& & & \downarrow u_{n-2}:m-(n-2)=2 & d_{n-1} & s_{n-1} \\
& & & & \downarrow u_{n-1}:m-(n-1)=1 & d_{n}
\end{bmatrix}
\]

When we store the information like that, we need a way to extract it. It’s not too bad. If you followed the explict U/V code above, it’s very similar. We’re simply extracting the \(U\) and \(V^T\) back out from the original Householder vectors. There’s one detail with the storage method: we don’t explicitly store the \(1\) value at the head of the Householder vector.

We usually don’t have to explicitly construct \(U\) or \(V^T\) to apply them. That is, we can compute the product of \(U\) or \(V\) with other matrices efficiently from the packed form. But, we’ll make them explicit here so we can debug our code and check out output.

Extraction

In [6]:
# Based on G&VL 3rd. pg. 213 ... 5.1.5
def extractU(A, betas):
    m,n = A.shape
    Q = np.eye(m)
    for j in reversed(xrange(n)): # ignore anything beyond n
        v = A[j:,j].copy()
        v[0] = 1.0
        Q[j:,j:] = (np.eye(m-j) - betas[j] * np.outer(v,v)).dot(Q[j:,j:])
    return Q

def extractVt(A, betas):
    m,n = A.shape
    Q = np.eye(n)
    # stupidly confusing to code b/c j (our index into the Q
    #                                   we are building)
    # is "off by one"
    # from the values we are extracting in Q
    # this has two practical results:
    #   1.  the last row/col in Q is ignored (it's an identity row/col)
    #   2.  to convert from input (A) land to Q land we have to 
    #       subtract 1 from the input land stuff (it's all shifted)
    for j in reversed(xrange(n)):
        v = A[j-1,j+1-1:].copy()
        v[0] = 1.0
        Q[j:,j:] = (np.eye(n-j) - betas[j-1] * np.outer(v,v)).dot(Q[j:,j:])
    return Q

These two pieces of code are very similar. We can generalize them to a single extract that will pull out either the upper (\(V^T\)) or lower (\(U\)) parts.

In [7]:
def extract_house_reflection(_A, betas, side="lower"):
    A, shift = (_A, 0) if side == "lower" else (_A.T,1)
    m,n = A.shape

    Q = np.eye(m)
    for j in reversed(xrange(min(m,n))):
        v = A[j:,j-shift].copy()
        v[0] = 1.0
        miniQ = np.eye(m-j) - betas[j-shift] * np.outer(v,v)
        Q[j:,j:] = (miniQ).dot(Q[j:,j:])
    return Q

And since we will want to extract all the pieces of the packed result, we can wrap that up in one nice call:

In [8]:
def extract_upper_bidiag(M):
    '''from M, pull out the diagonal and superdiagonal 
       like np.triu or np.tril would  ... assume rows >= cols
       (works for non-square M)'''
    B = np.zeros_like(M)

    shape = B.shape[1]
    step = shape + 1 # # cols + 1
    end  = shape**2  # top square (shape,shape)
    
    B.flat[ :end:step] = np.diag(M)     # diag
    B.flat[1:end:step] = np.diag(M,+1)  # super diag
    return B

def extract_packed_house_bidiag(H):
    U  = extract_house_reflection(H, betas_U, side="lower")
    Vt = extract_house_reflection(H, betas_V, side="upper")
    B  = extract_upper_bidiag(H)
    return U, B, Vt

Implicit Storage

And now we turn back to G&VL (3rd) Algorithm 5.4.2 with implicit reflections and reusing A‘s storage for u and v.

In [9]:
def house_bidiag(A):
    m,n = A.shape
    assert m >= n
    U,V = np.eye(m), np.eye(n)
    
    betas_U = np.empty(n)
    betas_V = np.empty(n-1)
    for j in xrange(n):
        u,betas_U[j] = make_house_vec(A[j:,j])
        miniHouse = np.eye(m-j) - betas_U[j] * np.outer(u,u)
        A[j:,j:] = (miniHouse).dot(A[j:,j:])
        # A's rows go 0 to m
        #  A[j,j] and A[j,j+1] are part of the bidiagonal result
        #  A[j+1:,j]  (remainder of this COLUMN) stores u
        #  A[j+1:] means indices j+1, j+2, .... m-1 is m-j positions
        A[j+1:,j] = u[1:] # [1:m-j+1]

        if j < n-1:
            v,betas_V[j] = make_house_vec(A[j,j+1:].T)
            miniHouse = np.eye(n-(j+1)) - betas_V[j] * np.outer(v,v)
            A[j:,j+1:] = A[j:, j+1:].dot(miniHouse)
            A[j ,j+2:] = v[1:] # [1:n-j]
    return betas_U, betas_V
In [10]:
A = np.arange(1,13.0).reshape(4,3)
A_test = A.copy()
print A

betas_U, betas_V = house_bidiag(A)
U, B, Vt = extract_packed_house_bidiag(A)

print " U:\n", U
print "Vt:\n", Vt
print " B:\n", B

print "should equal A:", np.allclose( U.dot(B).dot(Vt), A_test)
[[  1.   2.   3.]
 [  4.   5.   6.]
 [  7.   8.   9.]
 [ 10.  11.  12.]]
 U:
[[ 0.07761505  0.83305216 -0.54737648  0.01946767]
 [ 0.31046021  0.45123659  0.74434565  0.38203344]
 [ 0.54330537  0.06942101  0.15343813 -0.8224699 ]
 [ 0.77615053 -0.31239456 -0.35040731  0.42096879]]
Vt:
[[ 1.          0.          0.        ]
 [ 0.          0.66700225  0.7450557 ]
 [ 0.          0.7450557  -0.66700225]]
 B:
[[ 12.88409873  21.87643283   0.        ]
 [  0.           2.24623524  -0.61328133]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]]
should equal A: True

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

Creative Commons License
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.