In the last post, we looked at three ways of solving the least-squares regression problem (Cholesky, QR, and SVD decompositions) using LAPACK routines in a round-about fashion. The different decompositions (i.e., factorizations) were performed by (1) calling a NumPy np.linalg routine that called (2) an LAPACK driver routine for a "bigger" problem like linear equation solving or least-squares. The driver routine called one of the decompositions internally.

LAPACK Routines

Since the QR and SVD solutions both used a copy of our \(X\) matrix (or at least the memory used hinted at that), I want to dive into the lower level of LAPACK (avoiding NumPy) and call the driver routines directly. We’ll try to avoid doing the decompositions by hand, if we can!. The most immediate issue, in this series of experiments is: how do we call the LAPACK routines? Here is a table of the LAPACK routines used by the solver/least squares routines in NumPy, SciPy, and cvxopt.

Python Library Routine	Underlying LAPACK Routine	Matrix Type	Algorithm
`sla.solve` w/ `sym_pos=True`	`posv`	symmetric positive definite matrix	solve eqn using Cholesky
`nla.solve` `sla.solve`	`gesv`	square matrix	solve eqn using LU
`cla.solve`	`gels`	general matrix (full rank)	least-squares via QR
`sla.lstsq`	`gelss`	general matrix	least-squares via SVD
`nla.lstsq`	`gelsd`	general matrix	least-squares via Divide-and-Conquer SVD

The different libraries make use of different LAPACK routines. And, in general, the libraries only expose the LAPACK routines that they need. For example, scipy.linalg does not provide dgels (as of scipy-0.16.1) it is not listed in the scipy.linalg.lapack function list. It is almost possible to pick and choose from each library to get the desired, underlying LAPACK routines, but I want to do something slightly different.

LAPACK: Fortran to C (Sharks/Jets? Montague/Capulet?)

One of the biggest obstacles to using LAPACK in Python is the fact that Python (at least CPython) and NumPy (and friends) are written in C. LAPACK is written in Fortran. This isn’t a show stopper. There are a number of interoperabiliity options out there: (1) converting Fortran to C (f2c program) and (2) linking Fortran binaries into a C program. The first is just a fancy pure-C program. The second generally requires declaring a C function prototype for the Fortran function you want to call (with something like: extern "C" void dgels(arg1, arg2, etc.)) and then simply calling the function. Here’s a quick overview.

Declaring your own prototypes is a real drag and it can also leave some holes in the conversion between C-land and Fortran-land. The biggest hole is the fact that Fortran stores data in column-major order, while C stores data in row-major order (wikipedia:Row-major). There are other issues as well and having a well-defined interface between the two layers is a really good idea. BLAS (lower level routines than LAPACK) has had one for years called CBLAS. However, it took significantly longer for LAPACK to get a standard C-interface: LAPACKE. A quick side-note: CLAPACK is a f2c‘d version of LAPACK, so it provides similar functionality in a different way.

LAPACKE and Python

After all that lead-in, my goal is to use LAPACKE to call LAPACK routines — from Python. It turns out that this is really easy. py-liblapack is a tiny shim built using a Python CFFI library called cffi. Here, CFFI stands for C foreign function interface and that acronym shows up all over the place. One example is LISP’s CFFI where the C stands for Common. Anyway, cffi let’s us do some really cool things:

Define and allocate C-level objects (variables). For example, workspace = ffi.new("int[]", n) creates a Python reference to a length-n int array
Read in declarations from a .h file: header_file = open("liblapack.h"); ffi.cdef(header_file.read())
Access functions defined in a C library. For example, liblapack = ffi.dlopen("libreflapacke.so") gives us dot-access (via Python attributes) to functions in libreflapacke. We can use dgels by coding: liblapack.LAPACKE_dgels.

These three capabilities let mikefc write an ~10 line script that exposes LAPACKE for us in Python. Granted, more work went into making a cffi-compatible set of .h files for the LAPACKE routines, but that was done by another set of Python scripts. py-liblapack exposes all of the single and double (s and d) routines from LAPACKE; it would be straight-forward to also expose the complex and double-precision complex (c and z) routines.

To use the py-liblapack code, all you need are two files from the source repository: liblapack.py and liblapack.h. Then, put the proper file names in liblapack.py:

import os
from cffi import FFI
ffi = FFI()

this_dir = os.path.dirname(os.path.abspath(__file__)) + "/"
with open(this_dir + "liblapack.h") as header_file:
    ffi.cdef(header_file.read())
liblapack = ffi.dlopen("libreflapacke.so")

In the examples below, I’ll assume that liblapack.py lives in the same directory as my notebook and I’ll be importing it, shortly.

Generic Setup and Imports

In [1]:

import operator as op

import numpy as np
import numpy.linalg as nla
import scipy.linalg as sla

from cvxopt import matrix as cvx_matrix
import cvxopt.lapack as cla

%load_ext memory_profiler

In [2]:

def create_data(n=5000, p=100):
    ''' 
        n is number cases/observations/examples
        p is number of features/attributes/variables
    '''
    X      = np.random.uniform(0,10, (n,p))
    coeffs = np.random.uniform(0,10, p)
    def f(X): return X.dot(coeffs)

    noise = np.random.normal(0.0, 2.0, n)
    Y = f(X) + noise
    Y = Y.reshape((n,1))
    
    return X,Y

In [3]:

n_obs = 2**16
n_ftrs = 2**10
X,Y = create_data(n_obs,n_ftrs)
print "about", 10**int(np.log10(n_obs * n_ftrs)), "entries"
print "about", (X.itemsize * X.size) / float(1024**2), "MB"

about 10000000 entries
about 512.0 MB

`liblapack` and Utilities

In [4]:

from liblapack import liblapack as lap
from cffi import FFI
ffi = FFI()

#
# numpy array to underlying C array
#
def get_dptr(arr):
    return ffi.cast("double *", arr.__array_interface__['data'][0])

def get_iptr(arr):
    return ffi.cast("int *", arr.__array_interface__['data'][0])

Cholesky via `dposv`

In [5]:

# posv
# http://www.netlib.org/lapack/explore-html/dc/d7f/lapacke__dposv_8c.html
# https://software.intel.com/en-us/node/520982
# 
# lapack_int LAPACKE_dposv(int matrix_layout,
#                          char uplo,
#                          lapack_int n, lapack_int nrhs,
#                          double *a, lapack_int lda,
#                          double *b, lapack_int ldb)
def sls_cholesky_orig(X, Y):
    XtX = X.T.dot(X)
    XtY = X.T.dot(Y)

    m,n = XtX.shape
    nrhs = XtY.shape[1] if XtY.ndim == 2 else 1

    info = lap.LAPACKE_dposv(lap.LAPACK_ROW_MAJOR,
                             "L", # prefer L.dot(L.T) factorization
                             n, nrhs,
                             get_dptr(XtX), n,    # ldXtX = n
                             get_dptr(XtY), nrhs) # ldXtY = nrhs
    assert not info
    return XtY[:X.shape[1]]

In [6]:

# %memit sls_cholesky_orig(X,Y)
# peak memory: 581.86 MiB, increment: 11.87 MiB

LU via `dgesv`

In [7]:

# gesv
# http://www.netlib.org/lapack/explore-html/d6/de8/lapacke__dgesv_8c.html
# https://software.intel.com/en-us/node/520973
#
#lapack_int LAPACKE_dgesv(int matrix_layout,
#                         lapack_int n, lapack_int nrhs,
#                         double *a, lapack_int lda,
#                         lapack_int *ipiv,
#                         double *b, lapack_int ldb)
def sls_lu_orig(X,Y):
    m,n = X.shape
    nrhs = Y.shape[1] if Y.ndim == 2 else 1

    workspace = ffi.new("int[]", n) # workspace array

    info = lap.LAPACKE_dgesv(lap.LAPACK_ROW_MAJOR,
                             n, nrhs,
                             get_dptr(X), n,    # ldX = n
                             workspace,
                             get_dptr(Y), nrhs) # ldY = nrhs

    assert not info
    return Y[:n]

In [8]:

# %memit sls_orig(X,Y)
# peak memory: 576.93 MiB, increment: 6.71 MiB

QR via `dgels`

In [9]:

# gels
# http://www.netlib.org/lapack/explore-html/d9/da2/lapacke__dgels_8c.html
# https://software.intel.com/en-us/node/521112
#
#
#lapack_int LAPACKE_dgels(int matrix_layout,
#                         char trans,
#                         lapack_int m, lapack_int n, lapack_int nrhs,
#                         double* a, lapack_int lda,
#                         double* b, lapack_int ldb);
def sls_qr_orig(X,Y):
    m,n  = X.shape
    nrhs = Y.shape[1] if Y.ndim == 2 else 1
    info = lap.LAPACKE_dgels(lap.LAPACK_ROW_MAJOR,
                             "N", # do not transpose
                             m, n, nrhs,
                             get_dptr(X), n,
                             get_dptr(Y), nrhs)

    assert not info
    return Y[:n]

In [10]:

# %memit sls_qr_orig(X,Y)
# peak memory: 1083.94 MiB, increment: 514.06 MiB

QR via `dgels`: Take Two

In [11]:

def sls_qr_minwork(X,Y):
    m,n  = X.shape
    nrhs = Y.shape[1] if Y.ndim == 2 else 1
    

    # http://www.netlib.org/lapack/explore-html/d8/dde/dgels_8f_source.html
    # min LWORK >= max( 1, MN + max( MN, NRHS ) )
    work_size = X.size + max(X.size,nrhs)
    work = np.zeros(work_size)
    
    info = lap.LAPACKE_dgels_work(lap.LAPACK_ROW_MAJOR,
                                 "N", # do not transpose
                                 m, n, nrhs,
                                 get_dptr(X), n,
                                 get_dptr(Y), nrhs,
                                 get_dptr(work), work_size)

    assert not info
    return Y[:n]

In [12]:

# %memit sls_qr_minwork(X,Y)
#peak memory: 1084.91 MiB, increment: 517.84 MiB

SVD via `dgelss`

In [13]:

# gelss
#
#
# http://www.netlib.org/lapack/explore-html/db/d77/lapacke__dgelss_8c.html
# https://software.intel.com/en-us/node/521114
#
#
#lapack_int LAPACKE_dgelss(int matrix_layout, 
#                          lapack_int m, lapack_int n, lapack_int nrhs, 
#                          double* a, lapack_int lda, 
#                          double* b, lapack_int ldb, 
#                          double* s, 
#                          double rcond, 
#                          lapack_int* rank )
#
def sls_svd_dgelss(X,Y):
    m,n = X.shape
    nrhs = Y.shape[1] if Y.ndim == 2 else 1

    singular_values = np.empty(min(m,n))
    rank = np.zeros(1, dtype=np.int64)

    info = lap.LAPACKE_dgelss(lap.LAPACK_ROW_MAJOR,
                              m,n,nrhs,
                              get_dptr(X), n,
                              get_dptr(Y), nrhs,
                              get_dptr(singular_values),
                              -1, 
                              get_iptr(rank))
    return Y[:n]

In [14]:

# %memit sls_svd_dgelss(X,Y)
# peak memory: 1083.97 MiB, increment: 513.99 MiB

SVD via `dgelsd`

In [15]:

# gelsd
#
# http://www.netlib.org/lapack/explore-html/df/de3/lapacke__dgelsd_8c.html
# https://software.intel.com/en-us/node/521115
#
#
#lapack_int LAPACKE_dgelsd(int matrix_layout,
#                          lapack_int m, lapack_int n, lapack_int nrhs,
#                          double* a, lapack_int lda,
#                          double* b, lapack_int ldb,
#                          double* s,
#                          double rcond,
#                          lapack_int* rank );
#
def sls_svd_orig(X,Y):
    m,n = X.shape
    nrhs = Y.shape[1] if Y.ndim == 2 else 1

    singular_values = np.empty(min(m,n))
    rank = np.zeros(1, dtype=np.int64)

    info = lap.LAPACKE_dgelsd(lap.LAPACK_ROW_MAJOR,
                              m,n,nrhs,
                              get_dptr(X), n,
                              get_dptr(Y), nrhs,
                              get_dptr(singular_values),
                              -1, 
                              get_iptr(rank))
    return Y[:n]

In [16]:

# %memit sls_svd_orig(X,Y)
#peak memory: 1084.62 MiB, increment: 514.18 MiB

Check for Correctness

Here, we test the code to make sure they all give the same \(\beta\)s. Note that we make a copy because most of these routines modify X adn Y in-place. This differs a bit from our semantics of early posts, but it isn’t terribly important because within this post, we are still comparing apples to apples. When we go back to the outside world, we may still need to make copies, depending on our use-cases for X and Y.

In [17]:

_X,_Y = create_data(n_obs,n_ftrs)

X, Y = np.copy(_X), np.copy(_Y)
betas_chol = sls_cholesky_orig(X,Y)

X, Y = np.copy(_X), np.copy(_Y)
betas_qr = sls_qr_orig(X,Y)

X, Y = np.copy(_X), np.copy(_Y)
betas_svd = sls_svd_orig(X,Y)

betas = [betas_chol, betas_qr, betas_svd]
print "All same betas?:", all(np.allclose(b1, b2) for b1 in betas[:-1] for b2 in betas[1:])

All same betas?: True

And Do a Quick Timing Run

In [18]:

%timeit sls_cholesky_orig(X,Y)
%timeit sls_lu_orig(X,Y)
%timeit sls_qr_orig(X,Y)
%timeit sls_qr_minwork(X,Y)
%timeit sls_svd_dgelss(X,Y)
%timeit sls_svd_orig(X,Y)

1 loops, best of 3: 1.54 s per loop
10 loops, best of 3: 32.6 ms per loop
1 loops, best of 3: 8.99 s per loop
1 loops, best of 3: 8.77 s per loop
1 loops, best of 3: 9.17 s per loop
1 loops, best of 3: 9.19 s per loop

Summary Results

Algorithm	Called Via	Memory Increment(MB)	~Time(s)
Cholesky	`dposv`	12	1.5
LU	`dgesv`	7	>0.0
QR	`dgels`	514	9
QR (min LWORK)	`dgels`	517	9
SVD	`dgelss`	514	9
SVD	`dgelsd`	514	9

As you can see, things are still not "right". And there are some hints running around in the code and discussion above as to what the remaining issue is. I won’t spill the beans, but stay tuned for next time.

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.

drsfenner.org

Blogging our way through …

Into the Weeds I – LAPACK, LAPACKE, and Three+ Paths

LAPACK Routines

LAPACK: Fortran to C (Sharks/Jets? Montague/Capulet?)

LAPACKE and Python

Generic Setup and Imports

`liblapack` and Utilities

Cholesky via `dposv`

LU via `dgesv`

QR via `dgels`

QR via `dgels`: Take Two

SVD via `dgelss`

SVD via `dgelsd`

Check for Correctness

And Do a Quick Timing Run

Summary Results

Additional Resources

License

LAPACK Routines

LAPACK: Fortran to C (Sharks/Jets? Montague/Capulet?)

LAPACKE and Python

Generic Setup and Imports

liblapack and Utilities

Cholesky via dposv

LU via dgesv

QR via dgels

QR via dgels: Take Two

SVD via dgelss

SVD via dgelsd

Check for Correctness

And Do a Quick Timing Run

Summary Results

Additional Resources

License

`liblapack` and Utilities

Cholesky via `dposv`

LU via `dgesv`

QR via `dgels`

QR via `dgels`: Take Two

SVD via `dgelss`

SVD via `dgelsd`