Author Archives: Mark Fenner

Into the Weeds II – LAPACK, LAPACKE, and Two Paths without Copies

In the last post, we tried to use lower-level LAPACK/LAPACKE calls to remove some of the copying overhead of our operations. Well, it turns out that it didn’t work out so well (see the table at the end of the post). The fundamental reason why is that LAPACKE took the easy way out, when they decided to wrap LAPACK functions. In particular, whenever there is a C-/Fortran-ordering issue (for a reminder), LAPACKE simply makes a copy. Continue reading →

Into the Weeds I – LAPACK, LAPACKE, and Three+ Paths

In the last post, we looked at three ways of solving the least-squares regression problem (Cholesky, QR, and SVD decompositions) using LAPACK routines in a round-about fashion. The different decompositions (i.e., factorizations) were performed by (1) calling a NumPy np.linalg routine that called (2) an LAPACK driver routine for a "bigger" problem like linear equation solving or least-squares. The driver routine called one of the decompositions internally. Continue reading →

Three Paths to Least Squares Linear Regression

In the prior installment, we talked about solving the basic problem of linear regression using a standard least-squares approach. Along the way, I showed a few methods that gave the same results to the least squares problem. Today, I want to look at three methods for solving linear regression:

solving the full normal equations via Cholesky decomposition
solving a least-squares problem via QR decomposition
solving a least-squares problem via SVD (singular value decomposition)

Continue reading →

Linear Regression Basic Solution

Before we diverged into an investigation of representation/storage of Python variables \(\mathtt{X}\) and \(\mathtt{Y}\) (math variables \(\bf{X}\) and \(\bf{Y}\)), we had the following equation:

\[\min RSS(\beta)=(\bf{y}-\bf{X}\beta)^T(\bf{y}-\bf{X}\beta)\]

The question is, how do we solve it? Well, if you recall your calculus, you should remember (or I’ll remind you) that we can optimize (find the minima and maxima) of function by taking derivatives. Let’s back up to the summation version:

\[RSS(\beta)=\sum_{i=1}^N (y_i-(\beta_0+\sum_{j=1}^px_{i}\beta_j))^2\]

Continue reading →

LinearRegression And Notation

The precision of mathematical notation is a two-edged sword. On one side, it should clearly and concisely represent formal statements of mathematics. On the other side, sloppy notation, abbreviations, and time/space-saving shortcuts (mainly in the writing of mathematics) can lead to ambiguity — which is a time sink for the reader! Even acceptable mathematical notation can hide important aspects of the mathematics. Notation can leave underlying assumptions hidden from view, and these assumptions may not be revealed until a problem solution, working implementation, or model application reveal the deficiency. Since I want to write a few posts on linear regression, I’m going to look at an example of the downstream effects that notation can cause … Continue reading →

Game of Life in NumPy

In a previous post, we explored generating the grid-neighborhoods of an element in a NumPy array. The punch line of that work was grid_nD, shown below.

In [1]:

import numpy as np
import matplotlib.pyplot as plt
import time
from IPython import display

%matplotlib inline

from numpy.lib.stride_tricks import as_strided

In [2]:

def grid_nD(arr):
    assert all(_len>2 for _len in arr.shape)
    
    nDims = len(arr.shape)
    newShape = [_len-2 for _len in arr.shape]
    newShape.extend([3] * nDims)
    
    newStrides = arr.strides + arr.strides
    return as_strided(arr, shape=newShape, strides=newStrides)

Continue reading →

Game of Life in NumPy (Preliminaries)

You might be familiar with Conway’s Game of Life. It is a simple game based on the idea of modeling a dynamic system progressing over time. We progress from discrete time points \(t_i \rightarrow t_i+1\) and apply very simple rules at each time point. The game progresses on a square grid of discrete points. Each cell on the grid has eight neighbors (the four cardinal directions, plus diagonals). The progression to the next time interval is given by the following:

If I’m alive and I have fewer than two alive neighors, I die of loneliness.
If I’m alive and I have two or three alive neighbors, I live.
If I’m alive and I have more than three alive neighbors, I die of starvation.
If I’m dead and I have three (exactly) live neighbors, I become alive by spontaneous combustion.

Typically, these rules are applied by loops and conditionals. I want to show-off a lesser known corner of NumPy by looking at as_strided from NumPy’s dark corner of stride_tricks. We’ll get to know as_strided today and, in a future post, we’ll play the game of life using stride tricks – woot! Continue reading →

Pushing IPython Notebooks to WordPress

It turns out that I really, really like writing documents as iPython notebooks – at least shorter documents. I’m also (as you can tell, if you are reading this) into blogging about my technical (and recreational) endavors. Since blog posts are shorter documents … Captain Obvious? … it is a small step to wanting to do my blog posting, or much of it … or any of it with Python code, as an iPython notebook. However, WordPress is primarily a php and database mashup and it doesn’t really work well (natively) with Python or iPython.

This has led several folks (like Jake VdV) to bail ship from WordPress to Pelican) and other folks (like Matt R) to not use notebooks directly in their posts. For myself, I decided to explore another option (b/c I already had web hosting and I didn’t want to spin up additional servers, migrate my blog — again, etc.).

Continue reading →

Visualizing Probabilities in a Venn Diagram

I came across a little problem that lends itself nicely to visualization. And, it’s the kind of visualization that many of us have seen before: Venn diagrams. Continue reading →

(YAGD) Yet Another Git Demo

There are many, many git tutorials out there. http://rogerdudler.github.io/git-guide/ is pretty cool and gives a nice overview. However, it doesn’t quite set you up to play around with the commands. Here is a completely self-contained walk-through that doesn’t require any networking to experiment with a reasonable git setup. Continue reading →

drsfenner.org

Blogging our way through …

Author Archives: Mark Fenner

Into the Weeds II – LAPACK, LAPACKE, and Two Paths without Copies

Into the Weeds I – LAPACK, LAPACKE, and Three+ Paths

Three Paths to Least Squares Linear Regression

Linear Regression Basic Solution

LinearRegression And Notation

Game of Life in NumPy

Game of Life in NumPy (Preliminaries)

Pushing IPython Notebooks to WordPress

Visualizing Probabilities in a Venn Diagram

(YAGD) Yet Another Git Demo