Category Archives: SciMathStatPython

Into the Weeds II – LAPACK, LAPACKE, and Two Paths without Copies

In the last post, we tried to use lower-level LAPACK/LAPACKE calls to remove some of the copying overhead of our operations. Well, it turns out that it didn’t work out so well (see the table at the end of the post). The fundamental reason why is that LAPACKE took the easy way out, when they decided to wrap LAPACK functions. In particular, whenever there is a C-/Fortran-ordering issue (for a reminder), LAPACKE simply makes a copy. Continue reading

Into the Weeds I – LAPACK, LAPACKE, and Three+ Paths

In the last post, we looked at three ways of solving the least-squares regression problem (Cholesky, QR, and SVD decompositions) using LAPACK routines in a round-about fashion. The different decompositions (i.e., factorizations) were performed by (1) calling a NumPy np.linalg routine that called (2) an LAPACK driver routine for a "bigger" problem like linear equation solving or least-squares. The driver routine called one of the decompositions internally. Continue reading

Three Paths to Least Squares Linear Regression

In the prior installment, we talked about solving the basic problem of linear regression using a standard least-squares approach. Along the way, I showed a few methods that gave the same results to the least squares problem. Today, I want to look at three methods for solving linear regression:

  1. solving the full normal equations via Cholesky decomposition
  2. solving a least-squares problem via QR decomposition
  3. solving a least-squares problem via SVD (singular value decomposition)

Continue reading

Linear Regression Basic Solution

Before we diverged into an investigation of representation/storage of Python variables \(\mathtt{X}\) and \(\mathtt{Y}\) (math variables \(\bf{X}\) and \(\bf{Y}\)), we had the following equation:

\[\min RSS(\beta)=(\bf{y}-\bf{X}\beta)^T(\bf{y}-\bf{X}\beta)\]

The question is, how do we solve it? Well, if you recall your calculus, you should remember (or I’ll remind you) that we can optimize (find the minima and maxima) of function by taking derivatives. Let’s back up to the summation version:

\[RSS(\beta)=\sum_{i=1}^N (y_i-(\beta_0+\sum_{j=1}^px_{i}\beta_j))^2\]

Continue reading

LinearRegression And Notation

The precision of mathematical notation is a two-edged sword. On one side, it should clearly and concisely represent formal statements of mathematics. On the other side, sloppy notation, abbreviations, and time/space-saving shortcuts (mainly in the writing of mathematics) can lead to ambiguity — which is a time sink for the reader! Even acceptable mathematical notation can hide important aspects of the mathematics. Notation can leave underlying assumptions hidden from view, and these assumptions may not be revealed until a problem solution, working implementation, or model application reveal the deficiency. Since I want to write a few posts on linear regression, I’m going to look at an example of the downstream effects that notation can cause … Continue reading

Game of Life in NumPy

In a previous post, we explored generating the grid-neighborhoods of an element in a NumPy array. The punch line of that work was grid_nD, shown below.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import time
from IPython import display

%matplotlib inline

from numpy.lib.stride_tricks import as_strided
In [2]:
def grid_nD(arr):
    assert all(_len>2 for _len in arr.shape)
    
    nDims = len(arr.shape)
    newShape = [_len-2 for _len in arr.shape]
    newShape.extend([3] * nDims)
    
    newStrides = arr.strides + arr.strides
    return as_strided(arr, shape=newShape, strides=newStrides)

Continue reading

Game of Life in NumPy (Preliminaries)

You might be familiar with Conway’s Game of Life. It is a simple game based on the idea of modeling a dynamic system progressing over time. We progress from discrete time points \(t_i \rightarrow t_i+1\) and apply very simple rules at each time point. The game progresses on a square grid of discrete points. Each cell on the grid has eight neighbors (the four cardinal directions, plus diagonals). The progression to the next time interval is given by the following:

  • If I’m alive and I have fewer than two alive neighors, I die of loneliness.
  • If I’m alive and I have two or three alive neighbors, I live.
  • If I’m alive and I have more than three alive neighbors, I die of starvation.
  • If I’m dead and I have three (exactly) live neighbors, I become alive by spontaneous combustion.

Typically, these rules are applied by loops and conditionals. I want to show-off a lesser known corner of NumPy by looking at as_strided from NumPy’s dark corner of stride_tricks. We’ll get to know as_strided today and, in a future post, we’ll play the game of life using stride tricks – woot! Continue reading

Sorting Friends and Permutations

Introduction

You may be familiar with permuations: a rearrangement of a sequence. If you are, you probably know that for a sequence of length \(n\), there are \(n!\) arrangements (if the elements are all distinct – duplicates will reduce the number of unique rearrangements). There are many, many uses of permutations and many specific permutations of interest.

Continue reading

Some Iron Condor Heuristics

In the world of Iron Condors (an option position composed of short, higher strike (bear) call spread and a short, lower strike (bull) put spread), there are two heuristics that are bandied about without much comment. I’m generally unwilling to accept some Internet author’s word for something (unless it completely gels with everything I know), so I set out to convince myself that these heuristics held. Here are some demonstrations (I hesitate to use the word proof) that convinced me. Continue reading