In the last post, we looked at three ways of solving the least-squares regression problem (Cholesky, QR, and SVD decompositions) using LAPACK routines in a round-about fashion. The different decompositions (i.e., factorizations) were performed by (1) calling a NumPy np.linalg
routine that called (2) an LAPACK driver routine for a "bigger" problem like linear equation solving or least-squares. The driver routine called one of the decompositions internally.
LAPACK Routines
Since the QR and SVD solutions both used a copy of our \(X\) matrix (or at least the memory used hinted at that), I want to dive into the lower level of LAPACK (avoiding NumPy) and call the driver routines directly. We’ll try to avoid doing the decompositions by hand, if we can!. The most immediate issue, in this series of experiments is: how do we call the LAPACK routines? Here is a table of the LAPACK routines used by the solver/least squares routines in NumPy, SciPy, and cvxopt.
Python Library Routine |
Underlying LAPACK Routine |
Matrix Type | Algorithm |
---|---|---|---|
sla.solve w/ sym_pos=True |
posv |
symmetric positive definite matrix |
solve eqn using Cholesky |
nla.solve sla.solve |
gesv |
square matrix | solve eqn using LU |
cla.solve |
gels |
general matrix (full rank) | least-squares via QR |
sla.lstsq |
gelss |
general matrix | least-squares via SVD |
nla.lstsq |
gelsd |
general matrix | least-squares via Divide-and-Conquer SVD |
The different libraries make use of different LAPACK routines. And, in general, the libraries only expose the LAPACK routines that they need. For example, scipy.linalg
does not provide dgels
(as of scipy-0.16.1) it is not listed in the scipy.linalg.lapack function list. It is almost possible to pick and choose from each library to get the desired, underlying LAPACK routines, but I want to do something slightly different.
LAPACK: Fortran to C (Sharks/Jets? Montague/Capulet?)
One of the biggest obstacles to using LAPACK in Python is the fact that Python (at least CPython) and NumPy (and friends) are written in C. LAPACK is written in Fortran. This isn’t a show stopper. There are a number of interoperabiliity options out there: (1) converting Fortran to C (f2c
program) and (2) linking Fortran binaries into a C program. The first is just a fancy pure-C program. The second generally requires declaring a C function prototype for the Fortran function you want to call (with something like: extern "C" void dgels(arg1, arg2, etc.)
) and then simply calling the function. Here’s a quick overview.
Declaring your own prototypes is a real drag and it can also leave some holes in the conversion between C-land and Fortran-land. The biggest hole is the fact that Fortran stores data in column-major order, while C stores data in row-major order (wikipedia:Row-major). There are other issues as well and having a well-defined interface between the two layers is a really good idea. BLAS (lower level routines than LAPACK) has had one for years called CBLAS. However, it took significantly longer for LAPACK to get a standard C-interface: LAPACKE. A quick side-note: CLAPACK is a f2c
‘d version of LAPACK, so it provides similar functionality in a different way.
LAPACKE and Python
After all that lead-in, my goal is to use LAPACKE to call LAPACK routines — from Python. It turns out that this is really easy. py-liblapack is a tiny shim built using a Python CFFI library called cffi. Here, CFFI stands for C foreign function interface and that acronym shows up all over the place. One example is LISP’s CFFI where the C stands for Common. Anyway, cffi
let’s us do some really cool things:
- Define and allocate C-level objects (variables). For example,
workspace = ffi.new("int[]", n)
creates a Python reference to a length-n
int
array - Read in declarations from a
.h
file:header_file = open("liblapack.h"); ffi.cdef(header_file.read())
- Access functions defined in a C library. For example,
liblapack = ffi.dlopen("libreflapacke.so")
gives us dot-access (via Python attributes) to functions inlibreflapacke
. We can usedgels
by coding:liblapack.LAPACKE_dgels
.
These three capabilities let mikefc write an ~10 line script that exposes LAPACKE for us in Python. Granted, more work went into making a cffi
-compatible set of .h
files for the LAPACKE routines, but that was done by another set of Python scripts. py-liblapack
exposes all of the single
and double
(s
and d
) routines from LAPACKE; it would be straight-forward to also expose the complex
and double-precision complex
(c
and z
) routines.
To use the py-liblapack
code, all you need are two files from the source repository: liblapack.py
and liblapack.h
. Then, put the proper file names in liblapack.py
:
import os
from cffi import FFI
ffi = FFI()
this_dir = os.path.dirname(os.path.abspath(__file__)) + "/"
with open(this_dir + "liblapack.h") as header_file:
ffi.cdef(header_file.read())
liblapack = ffi.dlopen("libreflapacke.so")
In the examples below, I’ll assume that liblapack.py
lives in the same directory as my notebook and I’ll be import
ing it, shortly.
Generic Setup and Imports
import operator as op
import numpy as np
import numpy.linalg as nla
import scipy.linalg as sla
from cvxopt import matrix as cvx_matrix
import cvxopt.lapack as cla
%load_ext memory_profiler
def create_data(n=5000, p=100):
'''
n is number cases/observations/examples
p is number of features/attributes/variables
'''
X = np.random.uniform(0,10, (n,p))
coeffs = np.random.uniform(0,10, p)
def f(X): return X.dot(coeffs)
noise = np.random.normal(0.0, 2.0, n)
Y = f(X) + noise
Y = Y.reshape((n,1))
return X,Y
n_obs = 2**16
n_ftrs = 2**10
X,Y = create_data(n_obs,n_ftrs)
print "about", 10**int(np.log10(n_obs * n_ftrs)), "entries"
print "about", (X.itemsize * X.size) / float(1024**2), "MB"
liblapack
and Utilities
from liblapack import liblapack as lap
from cffi import FFI
ffi = FFI()
#
# numpy array to underlying C array
#
def get_dptr(arr):
return ffi.cast("double *", arr.__array_interface__['data'][0])
def get_iptr(arr):
return ffi.cast("int *", arr.__array_interface__['data'][0])
Cholesky via dposv
# posv
# http://www.netlib.org/lapack/explore-html/dc/d7f/lapacke__dposv_8c.html
# https://software.intel.com/en-us/node/520982
#
# lapack_int LAPACKE_dposv(int matrix_layout,
# char uplo,
# lapack_int n, lapack_int nrhs,
# double *a, lapack_int lda,
# double *b, lapack_int ldb)
def sls_cholesky_orig(X, Y):
XtX = X.T.dot(X)
XtY = X.T.dot(Y)
m,n = XtX.shape
nrhs = XtY.shape[1] if XtY.ndim == 2 else 1
info = lap.LAPACKE_dposv(lap.LAPACK_ROW_MAJOR,
"L", # prefer L.dot(L.T) factorization
n, nrhs,
get_dptr(XtX), n, # ldXtX = n
get_dptr(XtY), nrhs) # ldXtY = nrhs
assert not info
return XtY[:X.shape[1]]
# %memit sls_cholesky_orig(X,Y)
# peak memory: 581.86 MiB, increment: 11.87 MiB
LU via dgesv
# gesv
# http://www.netlib.org/lapack/explore-html/d6/de8/lapacke__dgesv_8c.html
# https://software.intel.com/en-us/node/520973
#
#lapack_int LAPACKE_dgesv(int matrix_layout,
# lapack_int n, lapack_int nrhs,
# double *a, lapack_int lda,
# lapack_int *ipiv,
# double *b, lapack_int ldb)
def sls_lu_orig(X,Y):
m,n = X.shape
nrhs = Y.shape[1] if Y.ndim == 2 else 1
workspace = ffi.new("int[]", n) # workspace array
info = lap.LAPACKE_dgesv(lap.LAPACK_ROW_MAJOR,
n, nrhs,
get_dptr(X), n, # ldX = n
workspace,
get_dptr(Y), nrhs) # ldY = nrhs
assert not info
return Y[:n]
# %memit sls_orig(X,Y)
# peak memory: 576.93 MiB, increment: 6.71 MiB
QR via dgels
# gels
# http://www.netlib.org/lapack/explore-html/d9/da2/lapacke__dgels_8c.html
# https://software.intel.com/en-us/node/521112
#
#
#lapack_int LAPACKE_dgels(int matrix_layout,
# char trans,
# lapack_int m, lapack_int n, lapack_int nrhs,
# double* a, lapack_int lda,
# double* b, lapack_int ldb);
def sls_qr_orig(X,Y):
m,n = X.shape
nrhs = Y.shape[1] if Y.ndim == 2 else 1
info = lap.LAPACKE_dgels(lap.LAPACK_ROW_MAJOR,
"N", # do not transpose
m, n, nrhs,
get_dptr(X), n,
get_dptr(Y), nrhs)
assert not info
return Y[:n]
# %memit sls_qr_orig(X,Y)
# peak memory: 1083.94 MiB, increment: 514.06 MiB
QR via dgels
: Take Two
def sls_qr_minwork(X,Y):
m,n = X.shape
nrhs = Y.shape[1] if Y.ndim == 2 else 1
# http://www.netlib.org/lapack/explore-html/d8/dde/dgels_8f_source.html
# min LWORK >= max( 1, MN + max( MN, NRHS ) )
work_size = X.size + max(X.size,nrhs)
work = np.zeros(work_size)
info = lap.LAPACKE_dgels_work(lap.LAPACK_ROW_MAJOR,
"N", # do not transpose
m, n, nrhs,
get_dptr(X), n,
get_dptr(Y), nrhs,
get_dptr(work), work_size)
assert not info
return Y[:n]
# %memit sls_qr_minwork(X,Y)
#peak memory: 1084.91 MiB, increment: 517.84 MiB
SVD via dgelss
# gelss
#
#
# http://www.netlib.org/lapack/explore-html/db/d77/lapacke__dgelss_8c.html
# https://software.intel.com/en-us/node/521114
#
#
#lapack_int LAPACKE_dgelss(int matrix_layout,
# lapack_int m, lapack_int n, lapack_int nrhs,
# double* a, lapack_int lda,
# double* b, lapack_int ldb,
# double* s,
# double rcond,
# lapack_int* rank )
#
def sls_svd_dgelss(X,Y):
m,n = X.shape
nrhs = Y.shape[1] if Y.ndim == 2 else 1
singular_values = np.empty(min(m,n))
rank = np.zeros(1, dtype=np.int64)
info = lap.LAPACKE_dgelss(lap.LAPACK_ROW_MAJOR,
m,n,nrhs,
get_dptr(X), n,
get_dptr(Y), nrhs,
get_dptr(singular_values),
-1,
get_iptr(rank))
return Y[:n]
# %memit sls_svd_dgelss(X,Y)
# peak memory: 1083.97 MiB, increment: 513.99 MiB
SVD via dgelsd
# gelsd
#
# http://www.netlib.org/lapack/explore-html/df/de3/lapacke__dgelsd_8c.html
# https://software.intel.com/en-us/node/521115
#
#
#lapack_int LAPACKE_dgelsd(int matrix_layout,
# lapack_int m, lapack_int n, lapack_int nrhs,
# double* a, lapack_int lda,
# double* b, lapack_int ldb,
# double* s,
# double rcond,
# lapack_int* rank );
#
def sls_svd_orig(X,Y):
m,n = X.shape
nrhs = Y.shape[1] if Y.ndim == 2 else 1
singular_values = np.empty(min(m,n))
rank = np.zeros(1, dtype=np.int64)
info = lap.LAPACKE_dgelsd(lap.LAPACK_ROW_MAJOR,
m,n,nrhs,
get_dptr(X), n,
get_dptr(Y), nrhs,
get_dptr(singular_values),
-1,
get_iptr(rank))
return Y[:n]
# %memit sls_svd_orig(X,Y)
#peak memory: 1084.62 MiB, increment: 514.18 MiB
Check for Correctness
Here, we test the code to make sure they all give the same \(\beta\)s. Note that we make a copy because most of these routines modify X
adn Y
in-place. This differs a bit from our semantics of early posts, but it isn’t terribly important because within this post, we are still comparing apples to apples. When we go back to the outside world, we may still need to make copies, depending on our use-cases for X
and Y
.
_X,_Y = create_data(n_obs,n_ftrs)
X, Y = np.copy(_X), np.copy(_Y)
betas_chol = sls_cholesky_orig(X,Y)
X, Y = np.copy(_X), np.copy(_Y)
betas_qr = sls_qr_orig(X,Y)
X, Y = np.copy(_X), np.copy(_Y)
betas_svd = sls_svd_orig(X,Y)
betas = [betas_chol, betas_qr, betas_svd]
print "All same betas?:", all(np.allclose(b1, b2) for b1 in betas[:-1] for b2 in betas[1:])
And Do a Quick Timing Run
%timeit sls_cholesky_orig(X,Y)
%timeit sls_lu_orig(X,Y)
%timeit sls_qr_orig(X,Y)
%timeit sls_qr_minwork(X,Y)
%timeit sls_svd_dgelss(X,Y)
%timeit sls_svd_orig(X,Y)
Summary Results
Algorithm | Called Via | Memory Increment(MB) |
~Time(s) |
---|---|---|---|
Cholesky | dposv |
12 | 1.5 |
LU | dgesv |
7 | >0.0 |
QR | dgels |
514 | 9 |
QR (min LWORK) | dgels |
517 | 9 |
SVD | dgelss |
514 | 9 |
SVD | dgelsd |
514 | 9 |
As you can see, things are still not "right". And there are some hints running around in the code and discussion above as to what the remaining issue is. I won’t spill the beans, but stay tuned for next time.
Additional Resources
You can grab a copy of this notebook.
Even better, you can view it using nbviewer.
License
Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.