Last time, we looked at using Givens rotations to perform a QR factorization of a matrix. Today, we’re going to be a little more selective and use Givens rotations to walk values off the edge of a special class of matrices
Old Friends
Here are few friends that we introduced last time. I updated the interfaces a little bit: the parts that define the Givens rotation are tupled together and the target matrix is the left or right argument, depending on which multiplication we are doing.
import numpy as np
import numpy.linalg as nla
import matplotlib.pyplot as plt
%matplotlib inline
np.set_printoptions(suppress=True)
# GvL pg. 216 : algo 5.1.3
def zeroing_givens_coeffs(x,z):
''' for the values x,z compute cos th, sin th
s.t. applying a Givens rotation G(cos th,sin th)
on 2 rows(or cols) with values x,z will
maps x --> r and z --> 0'''
if z == 0.0: # better to check for "small": abs(z) < np.finfo(np.double).eps
return 1.0,0.0
r = np.hypot(x,z) # C99 hypot is safe for under/overflow
return x/r, -z/r
# GvL, pg. 216 .... Section 5.1.9
def left_givensT((c,s,r1,r2), A):
''' update A <- G.T.dot(A) ... affects rows r1 and r2 '''
givensT = np.array([[ c, -s], # manually transposed
[ s, c]])
A[[r1,r2],:] = np.dot(givensT, A[[r1,r2],:])
def right_givens(A, (c,s,c1,c2)):
''' update A <- A.dot(G) ... affects cols c1 and c2 '''
givens = np.array([[ c, s],
[-s, c]])
A[:,[c1, c2]] = np.dot(A[:,[c1, c2]], givens)
Zeroing in a Bidiagonal Matrix (Take 1)
\[\renewcommand{\vec}[1]{\mathbf{#1}}\] With the tools above and the lessons from the Givens QR post we can come up with other uses for zeroing Givens rotations which I’ll abbreviate as \(\vec{G_0}(c,s,i,z)\).
We’re going to make use of \(\vec{G_0}\) in at least two different ways, when we perform an SVD (singular value decomposition). When we perform an SVD, we’ll take an intermediate step of making an (upper) bidiagonal matrix — that is a matrix where the only non-zero entries are on the main and super-diagonals. Along the way, the clean bidiagonal matrix may become blemished and have a non-zero entry off the main upper bidiagonal band creating a blemished bidiagonal matrix (BBM). We’ll use Givens rotations to "walk" the non-zero blemish out of the matrix. We can do that in three directions: across a row, up a column, or zig-zagging the main diagonal band.
To get started, we need to easily create upper bidiagonal matrices.
def make_upper_bidiag(d1,d2):
''' from two arrays, fill the diagonal and superdiagonal'''
B = np.diag(d1)
B.flat[1::d1.size+1] = d2 # awkward, but np.diag_indices is main diag only
return B
def extract_upper_bidiag(M):
'''from M, pull out the diagonal and superdiagonal
like np.triu or np.tril would'''
B = np.triu(M) * np.tril(np.ones_like(M), 1)
return B
extract_upper_bidiag(np.arange(36.0).reshape(6,6))
We might start by trying to "swap" the 8 with the 0 to its right. Basically, we’d try to take \(x=0 \rightarrow 8\) and \(z=8 \rightarrow 0\); that is, calling zeroing_givens_coeffs(0, 8)
). Before we see the results, what rows or columns are going to be modified, if our \(i,j\) are \(1,2\) and we use a right Givens rotation?
def move_nonzero_using_nextright(M, row, col):
cos_th,sin_th = zeroing_givens_coeffs(M[row,col+1], M[row,col])
right_givens(M, (cos_th,sin_th, col+1, col))
B = extract_upper_bidiag(np.arange(36.0).reshape(6,6))
move_nonzero_using_nextright(B, 1, 2)
print B
The Givens rotation makes use of the next column from the row,col
values we passed to move_nonzero_using_nextright
. So, the rotation affects columns 2 and 3 and introduces non-zeros at B[1,3], B[3,2]
. We modified more values than we really wanted. In particular, when we affect B[3,2]
we are modifying below the diagonal and making more blemishes. Further, we don’t really care that the 8
stays an 8
when it is "swapped". So, instead of "swapping" a zero in, let’s find a value that is non-zero (and can stay any non-zero value). We need look no further than the main diagonal.
Zeroing in a Bidiagonal Matrix (Take 2)
When I saw "swapping", I what I really want to do is introduce a zero off the diagonal and compute a new associated \(r\) on the diagonal. So, let’s find a new position to "swap" with. Any of the values on the diagonal are non-zero and, as long as they stay non-zero, they don’t change the bidiagonal form of the matrix. Another aside: we actually care that the bidiagonal part is fully filled with non-zeros, or you might say, dense – although that’s a slight abuse of the term.
Anywho. Let’s try to target a "swap" between a spot \(row,col\) and an entry on one of the two diagonal entries it shares in its row and column. When we do this, we will preserve much of the bidagonal matrix structure, since in a bidiagonal matrix, we already have entries on the diagonal. However, we will introduce or propogate a blemish. More on that in second. Here’s some code that works by using the diagonal elements. A mental quick-check: for right and left Givens rotations, do we use the shared row or column diagonal elements for our "swapping"?
def givens_zero_wdiag_left(A, row, col):
''' update two whole rows and at col: A[row,col] <- 0; A[col,col] <- r
moves value up/down to diagonal ... within the shared column'''
c,s = zeroing_givens_coeffs(A[col,col], A[row,col])
G = (c,s,col,row) # swapped order to zero *second* arg
left_givensT(G, A) # G tells affected *rows* - one col gets 0
return G
def givens_zero_wdiag_right(A, row, col):
''' update two whole cols and at row: A[row,col] <- 0; A[row,row] <- r
moves value left/right to diagonal ... within the shared row'''
c,s = zeroing_givens_coeffs(A[row,row], A[row,col])
G = (c,s,row,col)
right_givens(A, G) # G tells affected *cols* - one row gets 0
return G
I mentioned, in the last post, that I didn’t like the coupling of two function calls (creating the coefficients and applying them). In the cell directly above, we couple the two calls inside a higher level function. This doesn’t really fix the argument ordering issue or the that fact that using a givens_zero_wdiag_left
with a right_givens
(application) would be legal code with a non-sensical result.
If we really wanted to "fix" the software engineering issue, and stay with a functional (non-object oriented) solution, we could bundle the correct application (left or right) function with the result of the tuple arguments — which is basically what a class constructor would do. I did write a small set of classes to encapsulate this further, but for now I want to focus on the Givens parts, not the software engineering parts. The classes were even more useful because shortly we’ll have a special case where we can write a custom multiplication routine and save work on the left and right Givens applications. This gives quite a nice use of inheriting the basics and overriding the special case.
Back to the code we just wrote. On the left, the application looks like:
B = extract_upper_bidiag(np.arange(36.0).reshape(6,6))
givens_zero_wdiag_left(B, 1, 2)
print B
At this point, you’re ready to strangle me. That isn’t upper bidiagonal any more. But wait! What happens if we have and upper-bidiagonal with a zero on the diagonal?
B = extract_upper_bidiag(np.arange(36.0).reshape(6,6))
B[1,1] = 0
givens_zero_wdiag_left(B, 1, 2)
print B
We move a value to the right and keep the rest of the matrix upper bidiagonal (outside the blemish in the row we’re manipulating). That leads us to …
Blemishes
I also left out one point about the form of the matrix we’ll be using attacking. In reality, le-subject-d’-attack will be an upper-bidiagonal with a zero somewhere on the diagonal. The super-diagonal entry in the same row will be pushed across the row and out. Once it is off the super-diagonal, it is a "blemish" to the "pure" upper-bidiagonal matrix. Once we push it out, we’ll have two upper-bidiagonal matrices (one above and one below) the row we modified.
Here’s the general upper-bidiagonal matrix (no blemishes):
\[G=\begin{bmatrix}
d_1 & s_1 & 0 & & \dots & 0 \\
0 & d_2 & s_2 & & & 0 \\
& \ddots & \ddots & \ddots & \ddots & \vdots \\
\vdots & & & d_{n-2} & s_{n-2} & 0 \\
& & & & d_{n-1} & s_{n-1} \\
0 & & \dots & & 0 & d_n
\end{bmatrix}\]
If we introduce a \(0\) on the diagonal somewhere, we can talk about "walking" the superdiagonal element in that same row off to the right. Here, the two red values are passed to zeroing_givens_coeffs
to compute the coefficient values that will zero the value in the upper row. To the right of an arrow, the blue elements are newly computed updates. The \(\color{red}{B}\), our blemish, has also been updated: \(0 \rightarrow \color{red}{B}\). Note, the \(x\) and \(B\) names merely indicate the zeroness of a value – two different \(x\) or \(B\) will have different values. Generally speaking \(B\) will be different in different steps.
\[
\renewcommand{\red}[1]{\color{red}{#1}}
\renewcommand{\blue}[1]{\color{blue}{#1}}
\begin{align}
A = \begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & 0 & \red{x} & 0 & 0 \\
0 & 0 & \red{x} & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix}
& \xrightarrow{G^T_z(c,s,2,1) A} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & 0 & \blue{0} & \red{B} & 0 \\
0 & 0 & \blue{x} & \blue{x} & 0 \\
0 & 0 & 0 & \color{red}{x} & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix} \\
& \xrightarrow{G^T_z(c,s,3,2) A} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & 0 & 0 & \blue{0} & \red{B} \\
0 & 0 & x & x & 0 \\
0 & 0 & 0 & \blue{x} & \blue{x} \\
0 & 0 & 0 & 0 & \color{red}{x} \\
\end{bmatrix} \\
& \xrightarrow{G^T_z(c,s,4,2) A} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & \blue{0} \\
0 & 0 & x & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & \blue{x} \\
\end{bmatrix}
\end{align}
\]
When we move A[1,2]
to the right, we have to involve the \((1,2)\)-plane. That is, we need a Givens rotation on the \((1,2)\)-plane (and zeroing in the 1 plane). Since we’re applying a left-Givens, we are only updating two rows: 1
and 2
. Within those rows, the only nonzero entries are in columns 2
and 3
. So, using a similar argument to that we used for the QR algorithm last time, we can see that columns 0
, 1
, and 4
are unchanged and remain zero.
We can get a bit more specific about the values for columns 2
and 3
. Let’s name the relevant \(x\) values:
- \(b\) the blemish. In \(\vec{A}\) it’s the only non-zero in row \(1\); it’s also the upper red \(x\).
- \(d\) the diagonal value below \(b\). In \(\vec{A}\), the lower red \(x\).
- \(f\) the friend. The value to the right of \(d\) – it’s the super-diagonal super-Friend of \(d\). In \(\vec{A}\), it is at
A[2,3]
.
These are the only values from \(\vec{A}\) we need to calculate \(\vec{G^TA}\). Everything else disappears to zero. In the last iteration, there is no value to the right of \(d\), so we can ignore it. We’ll can use this knowledge to optimize the matrix multiplication from two dot-products (the length of the whole rows) down to a couple of multiplications and additions.
Efficient Blemish Updating
Based on the patterns of blue and red above, you might be thinking "gee, I don’t think we need the whole row" to do that update. We don’t. Turns out, we can do a "custom" multiplication routine for a Givens rotation against a blemished upper bidiagonal matrix, or BBM. Here’s the same left Givens multiplication (as the first \(\vec{G^T_z}\) above) written out by hand but not doing any unnecessary computations with zeros. Remember, friend
is the resulting blue, non-zero superdiagonal entry involved in that step.
B = extract_upper_bidiag(np.arange(36.0).reshape(6,6))
B[1,1] = 0
# the setup:
# . . . . .
# . 0 B 0 0
# . 0 D F 0
#
row,col = 1,2
blemish_loc = (row,col)
diag_loc = (col,col)
friend_loc = (col, col+1)
blemish = B[blemish_loc]
diag = B[diag_loc]
friend = B[friend_loc]
c,s = zeroing_givens_coeffs(diag, blemish)
# G^T =
# [s c] --> zeroing row
B[blemish_loc] = diag * s + blemish * c # == 0
B[diag_loc] = diag * c - blemish * s
new_blemish_loc = (row, col+1)
B[new_blemish_loc] = s * friend
B[friend_loc] = c * friend
print B
Do we get the same thing using our machinery? We got the same result in In[8]
. We don’t need to be nearly as explicit as I was in In[9]
, nor do we need to save all of those values in temporaries. Trimming it down, we can do this:
def givens_zero_wdiag_left_on_BBM(BBM, row, col):
#assert really BBM
m,n = BBM.shape
c,s = zeroing_givens_coeffs(BBM[col,col], BBM[row,col])
# col col+1
# row blem new_blem G^T =
#
# col diag friend [s c] ---> zeroing row
old_blemish = BBM[row,col]
BBM[row,col] = 0 # from zeroing row
BBM[col,col] = BBM[col,col] * c - old_blemish * s # from top row
if col < n-1: # update these in all but last column
BBM[row, col+1] = s * BBM[col, col+1]
BBM[col, col+1] *= c
return (c,s,row,col) # == G
B = extract_upper_bidiag(np.sqrt(np.arange(1.0,26.0)).reshape(5,5))
B[1,1] = 0.0
print B
givens_zero_wdiag_left_on_BBM(B, 1,2)
print B
And finally, we can put that in a loop to walk a value all the way off the right edge of the matrix:
def walk_blemish_out_right(B, row, start_col): # usually start_col = row+1
m,n = B.shape
for col in xrange(start_col, n):
givens_zero_wdiag_left_on_BBM(B, row, col)
B = extract_upper_bidiag(np.arange(25.0).reshape(5,5))
B[1,1] = 0.0
print B
walk_blemish_out_right(B, 1, 2)
print B
Woot! Err, well, at least it’s done. When you look at the result, you might notice that we took an "almost-full" bidiagonal matrix (with one bad row) and converted it into two "full" bidiagonal matrices separated by an empty row. If that looks like decomposing a larger problem into smaller ones, you’re right! Although, we are not leading up to the algorithm that is traditionally called "divide-and-conquer SVD".
Walking Up
We have one other case to consider. If the zero on the diagonal is in the bottom-right corner, than we can’t swap to the diagonal in that column because we’d end up taking a zero to a non-zero. So instead, we will walk the blemish up the column, swapping to the diagonal in the same row. From one perspective, this is the opposite what we did when walking to the right.
Here’s a version using the full Givens multiplication. We can use it to check the output of a customized "blemish-aware" version in a minute.
B = np.sqrt(extract_upper_bidiag(np.arange(9.0).reshape(3,3)))
B[2,2] = 0
print B
givens_zero_wdiag_right(B, 1, 2)
print B
def givens_zero_wdiag_right_on_BBM(BBM, row, col):
#assert really BBM
m,n = BBM.shape
c,s = zeroing_givens_coeffs(BBM[row,row], BBM[row,col])
# row col zero column
# row-1 friend new_blem G =
#
# row diag blem [-s c]
old_blemish = BBM[row,col]
BBM[row,col] = 0 # from zeroing col
BBM[row,row] = BBM[row,row] * c - old_blemish * s # from left col
if row > 0: # update these in all but last row
BBM[row-1, col] = s * BBM[row-1, row]
BBM[row-1, row] *= c
return (c,s,row,col) # == G
B = np.sqrt(extract_upper_bidiag(np.arange(9.0).reshape(3,3)))
B[2,2] = 0.0
print B
givens_zero_wdiag_right_on_BBM(B,1,2)
print B
def walk_blemish_out_up(B, start_row, col): # usually start_row = col-1
m,n = B.shape
for row in reversed(xrange(start_row+1)):
givens_zero_wdiag_right_on_BBM(B, row, col)
B = extract_upper_bidiag(np.arange(25.0).reshape(5,5))
B[4,4] = 0.0
print B
walk_blemish_out_up(B, 3, 4)
print B
Here, we’ve reduced the larger matrix to a slightly smaller matrix. This allows us to solve a single, slightly smaller, problem. It’s just one subproblem, instead of the two subproblems we got when we walked a blemish out along a row.
And Finally, Cleaning up the Superdiagonal
In G&VL, they call the decomposing of a large problem into two smaller problems "decoupling". And, I’ll call the reduction part reduce. So, depending on where we find a zero on the main diagonal, we’ll either decouple or reduce. We can put those together:
def decouple_or_reduce(B, zero_idxs):
# see GvL, pg. 454 and pray for mojo
m,n = B.shape
for kk in zero_idxs:
if kk < n-1:
walk_blemish_out_right(B, kk, kk+1) # decouple
elif kk == n-1:
walk_blemish_out_up(B, kk-1, kk) # reduce
B = extract_upper_bidiag(np.arange(25.0).reshape(5,5))
B[4,4] = 0.0
print B
decouple_or_reduce(B, [4])
print B
B = extract_upper_bidiag(np.arange(25.0).reshape(5,5))
B[1,1] = 0.0
print B
decouple_or_reduce(B, [1])
print B
One Other Walk Out
One other scenario we need to deal with (in computing the SVD), is bouncing or zig-zagging a blemish down the band of a "dense" bidiagonal matrix. The red values are used to compute the zeroing coefficients for that \(G_z\). Here’s what it looks like:
\[
\begin{align}
A =
\begin{bmatrix}
\red{x} & x & 0 & 0 & 0 \\
\red{B} & x & x & 0 & 0 \\
0 & 0 & x & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix}
& \xrightarrow{G^T_z(c,s,0,1) A} &
\begin{bmatrix}
x & \red{x} & \red{B} & 0 & 0 \\
0 & x & x & 0 & 0 \\
0 & 0 & x & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix}
& \xrightarrow{A G_z(c,s,1,2)} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & \red{x} & x & 0 & 0 \\
0 & \red{B} & x & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix} \\
& \xrightarrow{G^T_z(c,s,1,2) A} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & x & \red{x} & \red{B} & 0 \\
0 & 0 & x & x & 0 \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix}
& \xrightarrow{A G_z(c,s,2,3)} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & x & x & 0 & 0 \\
0 & x & \red{x} & x & 0 \\
0 & 0 & \red{B} & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix} \\
& \xrightarrow{G^T_z(c,s,2,3) A} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & x & x & 0 & 0 \\
0 & 0 & x & \red{x} & \red{B} \\
0 & 0 & 0 & x & x \\
0 & 0 & 0 & 0 & x \\
\end{bmatrix}
& \xrightarrow{A G_z(c,s,3,4)} &
\begin{bmatrix}
x & x & 0 & 0 & 0 \\
0 & x & x & 0 & 0 \\
0 & x & x & x & 0 \\
0 & 0 & 0 & \red{x} & x \\
0 & 0 & 0 & \red{B} & x \\
\end{bmatrix}
\end{align}
\]
There is a very important difference between this zig-zagging – it looks like we’ve got a drunken blemish on our hands – and the "straighline" walkouts we performed above. Instead of "swapping" to the diagonal, we are going to "swap" to an adjacent element on the band. The adjacent element will either be (1) in the same row and to the left of us or (2) in the same column and above us. If we don’t use those neighbors or friends, we end up with more blemishes. Since we have a BBM we use those friends, we keep one, and only one, blemish.
However, using those friend positions, means that we need some slightly different left and right Givens rotations. Here’s the corresponding functions that perform these two steps:
def givens_zero_adjacent_onband_left(A, row, col):
# affecting two rows: need the row above me
# assert col==row-1 ... below diag ... row>col
c,s = zeroing_givens_coeffs(A[row-1,col], A[row,col]) # same col
G = (c,s,row-1,row)
left_givensT(G, A) # G tells affected *rows* - one col gets 0
return G
def givens_zero_adjacent_onband_right(A, row, col):
# affecting two cols: need col left of me
# assert col-2==row ... above diag ... col>row
c,s = zeroing_givens_coeffs(A[row,col-1], A[row,col]) # same row
G = (c,s,col-1,col)
right_givens(A, G) # G tells affected *cols* - one row gets 0
return G
As with givens_zero_wdiag_left
and givens_zero_wdiag_right
, when applied to a BBM these have a very efficient form. I’m going to leave that as an exercise to the highly motivated reader. But, it will be very close to the forms for givens_zero_wdiag_left_on_BBM
(and for the right side application). If you follow the Givens rotations under "One Other Walkout", that seems pretty clear. We can turn that into code pretty quickly (adding one last left rotation to clear out the blemish in the bottom row):
def bounce_blemish_down_diagonal(BBM):
m,n = BBM.shape
for k in xrange(n-2):
givens_zero_adjacent_onband_left( BBM, k+1, k)
givens_zero_adjacent_onband_right(BBM, k, k+2)
givens_zero_adjacent_onband_left(BBM, n-1, n-2)
A = extract_upper_bidiag(np.arange(1.0, 17.0).reshape(4,4))
A[1,0] = 99
print A
bounce_blemish_down_diagonal(A)
print A
Additional Resources
You can grab a copy of this notebook.
Even better, you can view it using nbviewer.
License
Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.