DC SVD II: From Values to Vectors

In our last installment, we discussed solutions to the secular equation. These solutions are the eigenvalues (and/or) singular values of matrices with a particular form. Since this post is otherwise light on technical content, I’ll dive into those matrix forms now.

Setting up the Secular Equation to Solve the Eigen and Singular Problems

In dividing-and-conquering to solve the symmetric tridiagonal eigenvalue problem, we start with a symmetric tridiagonal matrix \(T\), and eventually, we have to find the eigendecomposition – eigenvalues and eigenvectors – of a "rank-one update" of a diagonal matrix: \(ROU = D + zz^T\). For my mathematician friends, I’m using \(ROU\) as a multi-character name for one variable. Sorry – I’ve moved on from Fortran 77.

Similarly, in DC SVD, we started with a (lower) bidiagonal matrix \(B\); eventually, we compute the SVD (singular values and right/left singular vectors) of a matrix \(M\) (it’s the "middle matrix" in an equation that derives DC SVD) which has non-zero entries only in its first column and on its diagonal:

\[M =
\begin{bmatrix}
z_1 \\
z_2 & d_2 \\
\vdots & & \ddots \\
z_n & & & d_n
\end{bmatrix}
\]

Two quick notes. (1) For \(B\), the top-left entry is the value of \(z_1\). We also introduce \(d_1 = 0.0\) to simplify our bookkeeping. (2) \(T \neq ROU\) in the eigen case and \(B \neq M\) in the SVD case. These are subcomputations that are needed in a larger process that produces the full decompositions.

We can do a little bit of matrix algebra to highlight the similarities between the eigen and singular problems. Treat \(z\) as a column-vector, \(d_1=0\) in the \(M\) (singular value) case, and recall that the singular values of \(M\) are the \(\sqrt{\text{eig}(MM^T)}\). Strictly, the last fact holds when \(M\) is tall or square (i.e., cols \(>=\) rows ), but rest assured, there’s a corresponding argument when \(M\) is wide instead of tall. We get the following:

\[
\begin{aligned}
ROU = D + zz^T =
\begin{bmatrix}
d_1 & & & \\
& d_2 & & \\
& & \ddots & \\
& & & d_n
\end{bmatrix}
+
\begin{bmatrix}
z_1 z_1 & & z_1 z_n \\
& \ddots & \\
z_n z_1 & & z_n z_n
\end{bmatrix}
=
\begin{bmatrix}
d_1 + z_1^2 & z_1 z_2 & & z_1 z_n \\
z_1 z_2 & d_2 + z_2^2 & & \\
& & \ddots & \\
z_1 z_n & & & d_n + z_n^2
\end{bmatrix}
\overset{eig}{\rightarrow} Q \Lambda Q^T
\end{aligned}
\]

\[
\begin{aligned}
M =
\begin{bmatrix}
z_1 & & & \\
z_2 & d_2 & & \\
& & \ddots & \\
z_n & & & d_n
\end{bmatrix}
\end{aligned}
\quad \text{and} \quad
\begin{aligned}
M^T =
\begin{bmatrix}
z_1 & z_2 & & z_n\\
& d_2 & & \\
& & \ddots & \\
& & & d_n
\end{bmatrix}
\end{aligned}
\implies
\begin{aligned}
MM^T =
\begin{bmatrix}
0 + z_1^2& z_1 z_2 & & z_1 z_n \\
z_1 z_2 & d_2^2 + z_2^2 & & \\
& & \ddots & \\
z_1 z_n & & & d_n^2 + z_n^2
\end{bmatrix}
\overset{svd}{\rightarrow} U \Omega V^T
\end{aligned}
\]

So, up to the difference between using \(d\) and \(d^2\), we have \(ROU=MM^T\) and \(\text{eig}(ROU) = \text{eig}(MM^T)\). If we really wanted to massage out the differences, we could define \(M\)‘s diagonal with the square roots of the values. But that seems excessive and gets us further away from the formulations used in the literature.

Back to the Secular Equation

For clarity, the secular equations for the eigen and singular cases are:

\[f_\text{eig}(\lambda) = 1 + \sum_{i=1}^n \frac{z_{i}^2}{d_i – \lambda} \qquad
f_\text{sv}(\omega) = 1 + \sum_{i=1}^n \frac{z_{i}^2}{d^2_i – {\omega}^2}\]

I won’t wade through the mathematics that gets us there, but about 10 lines of linear algebra will get the job done. You just have to get the right 10 lines. :grin: Seriously though, in the references from last time Arbenz lays it out in some detail and Demmel does it short-and-sweet (and, leaves some of it as an exercise!).

Practically speaking, when we want to solve the equivalent problems for the eigen or SV problem we are looking at code like this that compares a "known good" NumPy solution with our methods we are building:

In [ ]:
def test_eig(d,z):
    ROU = np.diag(d) + np.outer(z,z) # rank-one updated matrix
    
    expected = nla.eig(ROU)
    actual   = outer_eigval_eigvec(d,z)
    
    return compare_eig_results(expected, actual)

def test_sv(d,z):
    M = np.diag(d)
    M[:,0] = z

    expected = nla.svd(M)
    actual   = outer_singval_singvec(d**2,z)
    
    return compare_sv_results(expected, actual)

The code for the outer_XX functions is very, very similar. They are so similar it is almost worthwhile refactoring it to a common function. The argument against doing so is two-fold. (1) The problems being solved are slightly different: compare \(\lambda\) and \({\omega}^2\) above and the different matrices in the corresponding decompositions \(ROU \rightarrow Q \Lambda Q^T\) versus \(M \rightarrow U \Omega V^T\). (2) The names and concepts of the two don’t unify entirely perfectly. For example, \(Q\) and \(U\) are basically the same. Calling the variable q_or_u sounds like a band name to me. In fact, there is a mathrock band called Q and not U. What is mathrock? Think of absurd key signatures and time changes.

The visible differences in computing the SVD are: (1) passing in \(d^2\) instead of \(d\) (these terms act as constants), (2) adding a np.sqrt to get from \({\omega}^2 \rightarrow {\omega}\), and (3) we also need a series of steps to compute \(V^T\). There are one or two differences behind the scenes. If we ignore the reasons against refactoring (because we are lazhh^h, err, efficient and we follow the DRY principle: don’t repeat yourself), we get code that looks like this:

In [ ]:
def outer_eigval_eigvec(d,z):
    if z.size==1: # for consistency with singval/vec
        return z**2, np.ones((1,1))
    lambdas, Q = unified(d,z)
    Q /= nla.norm(Q, axis=0)   # norm dup'd row wise
    return lambdas, Q

def outer_singval_singvec(d_sq,z):
    if z.size==1: # slightly ill-behaved without this
        return np.ones((1,1)), z, np.ones((1,1))

    omega_sq, U = unified(d_sq,z)
    U *= np.sign(z).reshape(-1,1)
    
    # the calculations for V are described below
    V = U * np.sqrt(d_sq).reshape(-1,1)
    V[0,:] = -1.0
    U /= nla.norm(U, axis=0)
    V /= nla.norm(V, axis=0)
    return U, np.sqrt(omega_sq), V

For both problems, once we have the \(\lambda\)s (or \(\omega\)s), we can compute a temporary vector: \[
t_k=\sqrt{\frac
{\prod\limits_{i=1}^{k-1}(d_k – {\lambda}_i) \prod\limits_{i=k}^{n}({\lambda}_i – d_k)}
{\prod\limits_{i=1}^{k-1}(d_k – d_i) \prod\limits_{i=k}^{n}(d_i – d_k)}}
= \sqrt{\left|
\frac{\prod\limits_{i=1}^{n}d_k – {\lambda}_i}
{\prod\limits_{i \neq k}d_k – d_i}
\right|}
\]
We get the second equality by noting that the effect of the partial products is to make sure each individual term is positive. If that is confusing you, remember that the \(d\) values are sorted in ascending order with the corresponding \({\lambda}\) sit between them. So, we can simplify this by taking an absolute value at the end. Note that in the denominator, we also have to skip the term \(d_k – d_k = 0\) to avoid division by zero. In the code below, we compute the product term-by-term to prevent very large or small values from accumulating (if we wanted to be very careful, we could work with logarithms).

In the eigen case, we then compute the \(k\) column vectors making up \(Q\). For the SV case, \(U\) is identical except that we let \(\text{sign}(t_i) = \text{sign}(z_i)\) to get the directions of the vectors correct (where "correct" means doing the same thing LAPACK does). Note, the normalization (the \(\|\cdot\|_2\) values in the denominators) happens after the results are returned from the unifed function. This is because the \(V\) calculation for the SV case needs the unnormalized vectors. At least I think so, I’m fairly certain I broke a testcase when I tried to normalize before calculating \(V\)).

\[
\begin{aligned}
q_k \leftarrow
\left(
\frac{t_1}{d_1 – {\lambda}_k},
\dots,
\frac{t_n}{d_n – {\lambda}_k}
\right)^T \big/ \|q_k\|_2
\end{aligned}
\qquad
\begin{aligned}
u_k \leftarrow
\left(
\frac{t_1}{d_1^2 – {\omega}_k^2},
\dots,
\frac{t_n}{d_n^2 – {\omega}_k^2}
\right)^T \big/ \|u_k\|_2
\end{aligned}
\]

In [ ]:
def unified(d,z):
    N = z.size
    # filling in column by column
    lambdas, d_m_lambda = np.empty(N), np.empty((N,N))
    for k in xrange(N):
        lambdas[k], d_m_lambda[:,k] = solve_secular_oneroot(d, z, k)

    T = np.empty(N)
    for k in xrange(N):
        numers, denoms = d_m_lambda[k], d[k] - d
        lower_prod = np.prod(numers[:k] / denoms[:k])
        upper_prod = np.prod(numers[k:-1] / denoms[k+1:])
        term = lower_prod * upper_prod * numers[-1]
        T[k] = np.abs(term)

    np.sqrt(T, out=T)
    with np.errstate(invalid='ignore', divide='warn'):
        # Q is Q for eigen, U for SVD
        Q = T.reshape(N,1) / d_m_lambda  # dup Q column wise
        Q[np.logical_or(np.isnan(Q), np.isinf(Q))] = 1.0
    return lambdas, Q

Speaking of \(V\), it is calculated in outer_singval_singvec above, but here’s the mathematical definition of the columns:

\[
v_k \leftarrow \left(
-1.0,
\frac{d_2 t_2}{d_2^2 – {\omega}_k^2},
\dots,
\frac{d_n t_n}{d_n^2 – {\omega}_k^2}
\right)^T \big/ \|v_k\|_2
\]

The denominators in \(V\) are the same as in \(U\). The numerators in \(V\) simply need an additional \(d\) term. So, we can basically copy \(U\) and update a few things. A quick note: we take the "easy-coding" way out of reusing \(U\), calculating the first row, and then overwriting the first row with \(1\)s. If we were being completely memory conscious and receiving pre-allocated memory to return our answer in, we could just fill in the correct spaces once, without repeated work. I’m saving up a whole series of posts on how to do that "one big memory allocation and make use of it in subcalls": this is one of the ways LAPACK stays lean-and-mean. But, we can do much the same thing in Python – if we’re careful and tricky. In that order, please!

If you noticed, these cells don’t do anything other than define several functions. I do promise that I’ll get to test code — after we discuss deflation in the next post.

A Postscript

A quick note about the mathematics behind the scenes. When we perform these calculations, we have a given a matrix \(S\) that we start with. To simplify the discussion, I’ll talk about singular values, but the same applies to the eigen case. When we calculate the singular values of \(S\), we are actually computing approximations (numerical approximations) to the singular values of \(S\). The numerical approximation error here can be magnified in the calculations for \(U\) and \(V\) and can result in "reasonable" singular values but "unreasonable" singular vectors. Specifically, the singular vectors can lose their orthogonality. Doh!

What do we do? The approximate singular values we calculcated can be used to reconstruct a new matrix \(\hat{S}\). We make \(\hat{S}\) in such a way that the approximate singular values of \(S\) are the exact singular values of \(\hat{S}\). Then, the singular vectors we get by applying the formulas for \(U\) and \(V\) (really \(\hat{U}\) and \(\hat{V}\)) will be highly accurate. How does this help us? \(S\) and \(\hat{S}\) are numerically close, as are the respective \(U\) and \(V\). Phew. This postscript doesn’t do near enough justice to the mathematical sleuthery (new word alert!) that solved this problem in such an elegent way. In the "old days", people had to hack together extra precision solutions to the singular values to create more accuracte singular vectors.

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

Creative Commons License
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.