DC SVD I: Just Can’t Let It Go

It’s been way, way, way too long since I’ve posted. I haven’t been slacking though, I’ve merely been busy. Really.

I decided to dive back into the SVD problem and look at an alternative to the QR based SVD computations. Namely, I’m going to give a breakdown of the divide-and-conquer approach to computing the SVD. Similar to the relationship between bidiagonal SVD and tridiagonal QR decompositions, there is a close relationship between dividing-and-conquering QR and SVD. I’m going to start from "the inside out" with the innermost task of DC (divide-and-conquer) process: solving a particular equation known as the secular equation (secular meaning "not heavenly" – i.e., earthly or planetly – check it out on wikipedia).

One note: I feel very guilty about still using Python 2. I had intented to transition this project to Python 3 over the course of the last year. Alas, my major training clients over the last year were using Python 2 and I didn’t really want to have a mixed development environment on my laptop. Well, at least there’s something to do if I make a book out of these posts!

Enough preamble, let’s get down to business.

The Secular Equation

Divide-and-conquer SVD is built on computing the roots of the secular equation. The roots, or zeros, of an equation are the \(\lambda\)s such that \(f(\lambda) = 0\). The secular equation is (where we take \(\rho=1\)): \[f(\lambda) = 1 + \rho \sum_{i=1}^n \frac{z_{i}^2}{d_i – \lambda} = 1 + \sum_{i=1}^n \frac{z_{i}^2}{d_i – \lambda}\]

Here is a graph of a secular equation and its roots:

In [1]:
import numpy as np
import numpy.linalg as nla
import matplotlib.pyplot as plt
%matplotlib inline 
xs = np.linspace(-1,10,10000)

# graph secular equation (blue curve)
# plot z,d --> y
zs = np.array([.6,1.2,1.8])
ds = np.array([0.0,3,5])
with np.errstate(divide='ignore'):
    # plain vanilla to "fancy-schmancy"
    #ys1 = 1.0 + (1.0 / (0.0-xs_sq)) + (4.0 / (9-xs_sq)) + (16.0 / (25 - xs_sq))
    #ys2 = 1.0 + (zs[0]**2 / (ds[0]**2-xs_sq)) + (zs[1]**2 / (ds[1]**2-xs_sq)) + (zs[2]**2 / (ds[2]**2 - xs_sq))
    ys3 = 1.0 + ((zs**2).reshape(3,1) / np.subtract.outer(ds, xs)).sum(0)
    # assert np.allclose(ys1, ys2) and np.allclose(ys2,ys3)
plt.plot(xs, ys3)

# add an x-axis (yellow horizontal line)
plt.plot(xs, np.zeros_like(xs), 'y-')

# add poles/asymptotes (grey vertical lines)
plt.vlines(ds, -10, 10, '.75')

# add roots (red dots)
# use equivalent matrix (for these z,d) and eigenvalue computation to find roots
# ds are diagonal entries, zs are first columns (note, ds[0] is 0.0 by definition)
ROU = np.diag(ds) + np.outer(zs, zs)
#tm[:,0] = zs
zeros_act = nla.eig(ROU)[0]
plt.plot(zeros_act, np.zeros_like(zeros_act), 'r.')

#scaling
plt.ylim(-10,10), plt.xlim(-1, 10);

A quick note, the secular equation as written above is most directly used to compute eigenvalues not singular values. There is a strong relationship between the two sets of values and they lead to only slightly different forms in the secular equation. I mention this because you might see slightly different forms of the secular equation depending on whether you are reading about eigenvalues or singular values. We will use the form above for solving both problems by slightly modifying the resulting \(\lambda\)s.

Back to our regularly scheduled program. We will find our desired roots by isolating out \(f(\lambda)\) between each set of poles. The poles occur at the values of \(d_i\). So, on the interval \((d_i, d_{i+1})\), the problem simplifies to finding a single root of the secular equation. Throughout this post, we’ll assume that \(d_i < d_j\) for \(i<j\) (in English, the \(d_i\)s are distinct and sorted). We’ll find a single root \(n\) times and find all \(n\) roots. The cleverest will now point out that \(n\) poles only hold \(n-1\) zeros between them. The last zero is to the right of \(d_n\).

How do we find the single roots?

Newton’s Method

Let’s take a second and review Newton’s method for finding a root of an equation \(f(x)\) near \(x_0\).

  1. Approximate \(f(x)\) by a linear function \(l(x) = ax+b\)
  2. Apply constraints such that \(f(x_0) = l(x_0)\) and \(f'(x_0) = l'(x_0)\). These determine the coefficients of \(l(x)\).
  3. Find the root of \(l(x)\) and call it \(x_1\). In this case, the root is the \(x\)-intercept of \(l(x)\).
  4. That root of \(l(x)\) is a better guess as to the root of \(f(x)\).

Newton’s method is a great technique and it is used broadly because it is conceptually and formally simple, easy to implement, and the iteration steps are reasonably fast. However, in the case of the secular equation, as the numerators get very small, the corner in the graph will go from a gentle bend to a sharp corner. In other words, the graph will go from vertical to horizontal very quickly. When this happens, a linear approximation on the nearly horizontal part (which is the majority of the pole-to-pole interval), will also be approximately horizontal and be aimed far away from the true root in this interval.

A Modified Newton’s Method

So, we need to try something else. Fortunately, we can maintain the outline of Newton’s method while using a different approximating function. Instead of using a linear form, we will use the following rational function of \(\lambda\) which has poles at \(d_i\) and \(d_{i+1}\):

\[h(\lambda) = \frac{C_1}{d_i-\lambda} + \frac{C_2}{d_{i+1}-\lambda} + C_3\]

So, we are approximating \(f(\lambda)\) with \(h(\lambda)\):

\[f(\lambda) = 1 + \sum_{i=1}^n \frac{z_{i}^2}{d_i – \lambda} \approx
\frac{C_1}{d_i-\lambda} + \frac{C_2}{d_{i+1}-\lambda} + C_3 = h(x)\]

Also, since we want to avoid numerical problems with term cancellation, we will break the sum in \(f(\lambda)\) into two parts: (1) the sum up to term \(k\) and (2) the sum from term \(k+1\) on. Along with the an ordering assumption on the \(d_i\), this means that the lower terms are all negative and the upper terms are all positive for \(\lambda \in (d_i, d_{i+1})\). We will also give names to those partial sums.

\[
\begin{eqnarray}
f(\lambda) &=& 1 + & \sum_{i=1}^k \frac{z_{i}^2}{d_i – \lambda} + \sum_{i=k+1}^n \frac{z_{i}^2}{d_i – \lambda} &=& 1 + \Psi_1(\lambda) + \Psi_2(\lambda)\\
f'(\lambda) &=& & \sum_{i=1}^k \frac{z_i^2}{(d_i – \lambda)^2} + \sum_{i=k+1}^n \frac{z_i^2}{(d_i – \lambda)^2} &=& \Psi'_1(\lambda) + \Psi'_2(\lambda)
\end{eqnarray}
\]

Note that the derivative (and the derivatives of the partial sums) is strictly positive except at \(\lambda = d_i\) (the poles of \(f\)).

Since we have broken \(f\) into pieces, we will also break \(h\) into pieces:

\[h(\lambda)=1 + h_1(\lambda) + h_2(\lambda)\]

and we will give them each the same form, which is similar to that of \(h\):

\[h_1(\lambda) = \hat{c}_1 + \frac{C_1}{d_k – \lambda} \quad
h_2(\lambda) = \hat{c}_2 + \frac{C_2}{d_{k+1} – \lambda}\]

We will also enforce the "Newton conditions" that both the value of the approximations \(h_i\) and the value of the derivative of the approximations \(h'_i\) agree with the target function at \(\lambda_{\alpha}\):

\[
\begin{align}
h_1(\lambda_{\alpha}) = \Psi_1(\lambda_{\alpha}) &\quad& h'_1(\lambda_{\alpha}) = \Psi'_1(\lambda_{\alpha})\\
h_2(\lambda_{\alpha}) = \Psi_2(\lambda_{\alpha}) &\quad& h'_2(\lambda_{\alpha}) = \Psi'_2(\lambda_{\alpha})
\end{align}
\]

The result of these constraints is that: \[ C_1 = \Psi'_1(\lambda_{\alpha})(d_k – \lambda_{\alpha})^2 \\
\hat{c}_1 = \Psi_1(\lambda_{\alpha}) – \Psi'_1(\lambda_{\alpha})(d_k – \lambda_{\alpha})\]

\[ C_2 = \Psi'_2(\lambda_{\alpha})(d_k – \lambda_{\alpha})^2 \\
\hat{c}_2 = \Psi_2(\lambda_{\alpha}) – \Psi'_2(\lambda_{\alpha})(d_{k+1} – \lambda_{\alpha})\]

\[C_3 = 1 + \hat{c}_1 + \hat{c}_2\]

Since we are seeking a root of \(h(\lambda)\), we can set

\[h(\lambda) = \frac{C_1}{d_i-\lambda} + \frac{C_2}{d_{i+1}-\lambda} + C_3 = 0\]

and multiply out the denominators. This gives us:

\[0 = C_1 (d_{k+1} – \lambda) + C_2 (d_k – \lambda) + C_3(d_k – \lambda)(d_{k+1} – \lambda) \qquad \star\]

In our original Newton formulation of root finding, we went from a guess of the root \(x_0\) to \(x_1\). We could phrase that as going \(x_1=x_0+\kappa\) where \(\kappa\) is a correction to our our \(x_1\) guess. We will use that "\(\kappa\)-technique" here. Call our current approximation or guess \(\lambda_\alpha\). Let’s call our correction \(\eta\). Then we can write a \(\lambda_{\alpha} + \eta = \lambda\). Another way to think about this is as a variable substitution (or function shift) for \(\lambda \rightarrow \lambda – \lambda_{\alpha} = \eta\) which centers the function horizontally at \(\lambda_{\alpha}\). If we solve for a \(\lambda\) that gives a zero, we would also have a corresponding \(\eta\) value where \(\lambda_{\alpha} + \eta\) gives a zero. Also, the equality means we can write \(\lambda_\alpha = \lambda – \eta\).

Now, to help simplify the terms, let \(\Delta_k = d_k – \lambda_\alpha\) and \(\Delta_{k+1} = d_{k+1} – \lambda_\alpha\). Together, these give us (and likewise for \(\Delta_{k+1}\)):

\[\Delta_k = d_k – \lambda_\alpha = d_k – (\lambda – \eta) = d_k – \lambda + \eta\]

Rearranging:

\[d_k – \lambda = \Delta_k – \eta \quad \text{and} \quad d_{k+1} – \lambda = \Delta_{k+1} – \eta\]

Since, \(\lambda_\alpha\) is a constant during an interation of Newton’s method, we will drop the functional depence on it. Our \(C\) constants become: \[\begin{align}
C_1 &= \Psi'_1 \Delta_{k}^2 \\
C_2 &= \Psi'_2 \Delta_{k+1}^2 \\
C_3 &= 1 + \Psi_1 + \Psi_2 – \Psi'_1 \Delta_{k} – \Psi'_2 \Delta_{k+1}
\end{align}
\]

The punchline is an expression we can solve for \(\eta\) (the correction to our old guess \(\lambda_\alpha\)):

\[\begin{align}
C_1 (d_{k+1} – \lambda) + C_2 (d_k – \lambda) + C_3(d_k – \lambda)(d_{k+1} – \lambda)
&= C_1 (\Delta_{k+1} – \eta) + C_2 (\Delta_{k} – \eta) + C_3 (\Delta_{k} – \eta) (\Delta_{k+1} – \eta) \\
&= {\eta}^2 C_3\ + \\
&\quad {-\eta} (C_1 + C_2 + C_3(\Delta_k + \Delta_{k+1}))\ +\\
&{\quad}C_1\Delta_{k+1} + C_2\Delta_{k} + C_3\Delta_k\Delta_{k+1}
\end{align}
\]

So, rewriting these constants into the form of the classic quadratic equation \(a\eta^2+b\eta+c\), we have (we introduce \(\Psi_{\text{sum}} = 1 + \Psi_1 + \Psi_2\)):

\[
\begin{align}
a &= C_3 \\
&= (1 + \Psi_1 + \Psi_2) – \Psi'_1 \Delta_{k} – \Psi'_2 \Delta_{k+1} \\
&= \Psi_{\text{sum}} – \Psi'_1 \Delta_{k} – \Psi'_2 \Delta_{k+1} \\
\\
-b &= C_1 + C_2 + C_3(\Delta_k + \Delta_{k+1}) \\
&= \Psi'_1 \Delta_{k}^2 + \Psi'_2 \Delta_{k+1}^2 + C_3(\Delta_k + \Delta_{k+1}) \\
&= \Psi'_1 \Delta_{k}^2 + \Psi'_2 \Delta_{k+1}^2 + (\Psi_{\text{sum}} – \Psi'_1 \Delta_{k} – \Psi'_2 \Delta_{k+1})(\Delta_k + \Delta_{k+1})\\
&= \Psi_{\text{sum}} (\Delta_k + \Delta_{k+1}) – \Psi'_1 \Delta_{k} \Delta_{k+1} – \Psi'_1 \Delta_{k} \Delta_{k+1} \\
&= \Psi_{\text{sum}} (\Delta_k + \Delta_{k+1}) – (\Psi'_1 + \Psi'_2)\Delta_{k} \Delta_{k+1} \\
b &= (\Psi'_1 + \Psi'_2)\Delta_{k} \Delta_{k+1} – \Psi_{\text{sum}} (\Delta_k + \Delta_{k+1}) \\
\\
c &= C_3\Delta_k\Delta_{k+1} + C_1\Delta_{k+1} + C_2\Delta_{k} \\
&= (\Psi_{\text{sum}} – \Psi'_1 \Delta_{k} – \Psi'_2 \Delta_{k+1})\Delta_k\Delta_{k+1} + \Psi'_1 \Delta_{k}^2\Delta_{k+1} + \Psi'_2 \Delta_{k+1}^2 \Delta_{k} \\
&= \Psi_{\text{sum}} \Delta_{k} \Delta_{k+1}
\end{align}
\]

The solutions \(\eta_1, \eta_2\) to the quadratic above are well known as \(\eta_1, \eta_2 = \frac{-b \pm \sqrt{b^2-4ac}}{2a} = \frac{-b \pm \text{root}}{2a}\). However, there is an interesting (and useful!) alternative formula for the solutions: \(\eta_2,\eta_1 = \frac{2c}{-b \mp \sqrt{b^2-4ac}} = \frac{2c}{-b \mp \text{root}}\). These come from either (1) using different steps to solve the generic quadratic form or (2) applying Vieta’s formulas to the standard solution. Due to some algebraic magic and inequalities I haven’t quite figured out yet, the solution that we need can be picked out based on the value of \(b\). (Li claims – and tests on running code show – that based on the quadratic equation and the constaints above – including \(C_1,C_2 > 0\) – the correct \(\eta\) is based on \(b\). I can’t demonstrate that, despite trying many different algebraic contortions. I did get to the point of showing \(C_3>0\), but alas. I’d love help and I’ll buy you a cookie or a beer for it.) The punchline is:

\[
\eta=\begin{cases}
\frac{2c}{-b + \text{root}} & b<0 \\
\frac{-b – \text{root}}{2a} & b \geq 0
\end{cases}
\]

Two last pieces lead us to our improvised (MacGyvered?) Newton’s method.

We need to know where to start and when to stop. These topics make varied appearances in the literature. Since all but one root falls between the poles, we can start most root finding iterations at the midpoint of the poles \(\frac{d_i + d_{i+1}}{2}\). For the straggler root (which lies to the right of \(d_n\)), we use a starting point at \(d_n + \|z\|^2\) (following Arbenz’s lead). I don’t use a midpoint to the right (i.e., \(\frac{d_n + (d_n + \|z\|^2)}{2} = d_n + \frac{\|z\|^2}{2}\)) because I’ve had small numerical differences with LAPACK’s results. Another note, Gu & Eisenstat say the root should be to the left of \(d_n + \|z\|\). So, there’s some discreprancy here. Oh well. It happens.

Gu & Eisenstat propose stopping when \(|\Psi_{\text{sum}}| < 10.0\epsilon(1 + |\Psi_1| + |\Psi_2|)\).

Code Outline

For all that ridiculously long development, the code becomes:

  1. Setup: set some variables (including inital \(\lambda\)), deal with \(n\)-th root, determine closest \(d_i\), and shift (center) the problem there
  2. Loop until stopping condition is met
    1. Calculate \(\Psi\)s, \(\Delta\)s, \(a, b, c\), and solve the quadratic for \(\eta\).
    2. Update \(\lambda\).
  3. Undo the shift.
  4. Et voila.
In [2]:
import itertools as it

EPSILON = np.finfo(float).eps
NEPS = 10.0 * EPSILON # via Gu & Eisenstat

def solve_secular_oneroot(d, z, i):
    # translated and modified from:  Arbenz, Peter. see citation below.
    # there are no "l" (ells) in the following code, they are all "1" (ones)
    n = d.size; d_i = d[i]; z = z*z
    
    if i < n - 1:
        d_ip1   = d[i+1]
        lambda_ = (d_i + d_ip1) / 2.0
    else:
        adder = max(nla.norm(z)**2, 1.0) # protect against small norm
        d_ip1   = d[-1] + adder
        lambda_ = d_ip1

    left_idx, right_idx = slice(0,i+1), slice(i+1, None)
    left_z, right_z = z[left_idx], z[right_idx]

    denoms = d - lambda_    # PN: this makes a new vector
    psi1 = np.sum( left_z**2 / denoms[ left_idx])
    psi2 = np.sum(right_z**2 / denoms[right_idx])

    # determine whether soln is closer to right or left pole
    # (remember, we are currently at lambda == midpt)
    psi_sum = 1.0 + psi1 + psi2
    left_half = psi_sum > 0.0
    active_d = d_i if left_half else d_ip1

    # shift poles (so one of them is zero), initial guess, and d_i,d_ip1
    d = d - active_d
    lambda_ = lambda_ - active_d

    if left_half:
        d_i, d_ip1 = 0.0, d_ip1-d_i
    else:
        d_i, d_ip1 = d_i-d_ip1, 0.0

    stop_cond = NEPS * n * (1.0 + np.abs(psi1) + np.abs(psi2))
    itct = it.count()
    while (np.abs(psi_sum) > stop_cond and itct.next() < 10):
        denoms = d - lambda_
        
        psi1, psi1_p = (np.sum(left_z / denoms[ left_idx]),
                        np.sum(left_z / denoms[ left_idx]**2))
        psi2, psi2_p = (np.sum(right_z / denoms[right_idx]),
                        np.sum(right_z / denoms[right_idx]**2))
        psi_sum = 1.0 + psi1 + psi2


        # D is \Delta : recall, we've shifted d_i, d_ip1 (line 36 above)
        if left_half:
            D_i, D_ip1 = -lambda_, d_ip1 - lambda_
        else:
            D_i, D_ip1 = d_i - lambda_, -lambda_

        #a \eta^2 + b \eta + c = 0
        a = psi_sum - (D_i * psi1_p) - (D_ip1 * psi2_p)
        b = - (D_i + D_ip1) * psi_sum + (D_i * D_ip1 * (psi1_p + psi2_p))
        c = D_i * D_ip1 * psi_sum

        root = np.sqrt(b**2 - 4*a*c)
        eta = (2.0 * c) / (-b + root) if b < 0.0 else (-b - root)/(2.0*a)
        lambda_ = lambda_ + eta

        stop_cond = NEPS * n * (1.0 + np.abs(psi1) + np.abs(psi2))

    d_m_lambda = d - lambda_      # (both shifted) - this will be discussed next post
    lambda_ = lambda_ + active_d
    return lambda_, d_m_lambda

Testing

I’d be remiss, if I didn’t offer some code to test the root-finding code. The basic process is to use NumPy’s (and hence LAPACK’s) eigenvalue computation system to calculate the roots in a "known good" fashion. NumPy wants a matrix as input, so we take our \(d\) and \(z\) values and put them in an appropriate matrix. The actual test cases here are a bit ad hoc – they came up during development of the larger D&C SVD code. I’ll have more thorough testing either in the next post or as a standalone follow-up to the next post.

Here goes:

In [4]:
np.set_printoptions(precision=10,suppress=False)

def run_for(d,z, verbose=False):
    assert d[0] == 0.0 and d.shape == z.shape
    # ROU is a "rank-one updated" matrix
    ROU = np.diag(d) + np.outer(z,z)
    if verbose:
        print "cond(ROU): %5.2g" % nla.cond(ROU)
        print "Effective ROU:\n", ROU
    expected = np.sort(nla.eigvals(ROU))
    sols = [solve_secular_oneroot(d,z,i)[0] for i in xrange(d.size)]
    actual = np.sort(np.array(sols))
    return np.allclose(expected, actual)

testcases = [([0, 2-.1, 2+.1, 5], [1, .1, .1, 1]),
             ([0, 2-10e-8, 2+10e-8, 5], [1, 10e-8, 10e-8, 1]),
             ([0.0,5], [7.0,2]),
             ([0.0, 10], [1e-4, 1e-4]),
             ([0.0, 1e5],[1.0, 4e-9])]

def run_straight_tests():
    print "all pass?", all(run_for(np.array(d), np.array(z))

run_straight_tests()
all pass? True

References

The main references for this post come from:

  1. Arbenz, Peter. Lecture Notes for Solving Large Scale Eigenvalue Problems, Chapter 5: Cuppen’s Divide and Conquer Algorithm. The single most valuable piece of this is working Matlab code for solving the secular equation. I started with this code in octave, translated it to Python+NumPy, factored out some common pieces, renamed some variables, incorporated a different stopping condition, and massaged the variable names to match the discussion in this post.
  2. Li, Ren-Cang. Solving Secular Equations Stably and Effciently. This describes the theory behind LAPACK’s solution of the secular equation for the symmetric, tridiagonal eigenvalue problem. Arbenz follow the presentation of Li fairly closely. The LAPACK implementation – basically the LAPACK slaed series of routines – is described by Rutter in a separate LAPACK Working Note.
  3. Demmel, James. Applied Numerical Linear Algebra, Section 5.3.3: Divide-and-Conquer. Great discussion of the connection between classical Newton’s method and the non-linear version we use. Great mathematical presentation of the setup for the approximation, but he punts on the details of solving the quadratic form.

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

Creative Commons License
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.