In the last post, we tried to use lower-level LAPACK/LAPACKE calls to remove some of the copying overhead of our operations. Well, it turns out that it didn’t work out so well (see the table at the end of the post). The fundamental reason why is that LAPACKE took the easy way out, when they decided to wrap LAPACK functions. In particular, whenever there is a C-/Fortran-ordering issue (for a reminder), LAPACKE simply makes a copy. Continue reading
Author Archives: Mark Fenner
Into the Weeds I – LAPACK, LAPACKE, and Three+ Paths
In the last post, we looked at three ways of solving the least-squares regression problem (Cholesky, QR, and SVD decompositions) using LAPACK routines in a round-about fashion. The different decompositions (i.e., factorizations) were performed by (1) calling a NumPy np.linalg
routine that called (2) an LAPACK driver routine for a "bigger" problem like linear equation solving or least-squares. The driver routine called one of the decompositions internally. Continue reading
Three Paths to Least Squares Linear Regression
In the prior installment, we talked about solving the basic problem of linear regression using a standard least-squares approach. Along the way, I showed a few methods that gave the same results to the least squares problem. Today, I want to look at three methods for solving linear regression:
- solving the full normal equations via Cholesky decomposition
- solving a least-squares problem via QR decomposition
- solving a least-squares problem via SVD (singular value decomposition)
Linear Regression Basic Solution
Before we diverged into an investigation of representation/storage of Python variables \(\mathtt{X}\) and \(\mathtt{Y}\) (math variables \(\bf{X}\) and \(\bf{Y}\)), we had the following equation:
\[\min RSS(\beta)=(\bf{y}-\bf{X}\beta)^T(\bf{y}-\bf{X}\beta)\]
The question is, how do we solve it? Well, if you recall your calculus, you should remember (or I’ll remind you) that we can optimize (find the minima and maxima) of function by taking derivatives. Let’s back up to the summation version:
\[RSS(\beta)=\sum_{i=1}^N (y_i-(\beta_0+\sum_{j=1}^px_{i}\beta_j))^2\]
LinearRegression And Notation
The precision of mathematical notation is a two-edged sword. On one side, it should clearly and concisely represent formal statements of mathematics. On the other side, sloppy notation, abbreviations, and time/space-saving shortcuts (mainly in the writing of mathematics) can lead to ambiguity — which is a time sink for the reader! Even acceptable mathematical notation can hide important aspects of the mathematics. Notation can leave underlying assumptions hidden from view, and these assumptions may not be revealed until a problem solution, working implementation, or model application reveal the deficiency. Since I want to write a few posts on linear regression, I’m going to look at an example of the downstream effects that notation can cause … Continue reading
Game of Life in NumPy
In a previous post, we explored generating the grid-neighborhoods of an element in a NumPy array. The punch line of that work was grid_nD
, shown below.
import numpy as np
import matplotlib.pyplot as plt
import time
from IPython import display
%matplotlib inline
from numpy.lib.stride_tricks import as_strided
def grid_nD(arr):
assert all(_len>2 for _len in arr.shape)
nDims = len(arr.shape)
newShape = [_len-2 for _len in arr.shape]
newShape.extend([3] * nDims)
newStrides = arr.strides + arr.strides
return as_strided(arr, shape=newShape, strides=newStrides)
Game of Life in NumPy (Preliminaries)
You might be familiar with Conway’s Game of Life. It is a simple game based on the idea of modeling a dynamic system progressing over time. We progress from discrete time points \(t_i \rightarrow t_i+1\) and apply very simple rules at each time point. The game progresses on a square grid of discrete points. Each cell on the grid has eight neighbors (the four cardinal directions, plus diagonals). The progression to the next time interval is given by the following:
- If I’m alive and I have fewer than two alive neighors, I die of loneliness.
- If I’m alive and I have two or three alive neighbors, I live.
- If I’m alive and I have more than three alive neighbors, I die of starvation.
- If I’m dead and I have three (exactly) live neighbors, I become alive by spontaneous combustion.
Typically, these rules are applied by loops and conditionals. I want to show-off a lesser known corner of NumPy by looking at as_strided
from NumPy’s dark corner of stride_tricks
. We’ll get to know as_strided
today and, in a future post, we’ll play the game of life using stride tricks – woot! Continue reading
Pushing IPython Notebooks to WordPress
It turns out that I really, really like writing documents as iPython notebooks – at least shorter documents. I’m also (as you can tell, if you are reading this) into blogging about my technical (and recreational) endavors. Since blog posts are shorter documents … Captain Obvious? … it is a small step to wanting to do my blog posting, or much of it … or any of it with Python code, as an iPython notebook. However, WordPress is primarily a php and database mashup and it doesn’t really work well (natively) with Python or iPython.
This has led several folks (like Jake VdV) to bail ship from WordPress to Pelican) and other folks (like Matt R) to not use notebooks directly in their posts. For myself, I decided to explore another option (b/c I already had web hosting and I didn’t want to spin up additional servers, migrate my blog — again, etc.).
Visualizing Probabilities in a Venn Diagram
I came across a little problem that lends itself nicely to visualization. And, it’s the kind of visualization that many of us have seen before: Venn diagrams. Continue reading
(YAGD) Yet Another Git Demo
There are many, many git tutorials out there. http://rogerdudler.github.io/git-guide/ is pretty cool and gives a nice overview. However, it doesn’t quite set you up to play around with the commands. Here is a completely self-contained walk-through that doesn’t require any networking to experiment with a reasonable git setup. Continue reading