{"id":373,"date":"2015-11-11T15:54:33","date_gmt":"2015-11-11T15:54:33","guid":{"rendered":"http:\/\/drsfenner.org\/blog\/?p=373"},"modified":"2025-02-12T00:39:07","modified_gmt":"2025-02-12T00:39:07","slug":"linearregression-and-notation","status":"publish","type":"post","link":"https:\/\/drsfenner.org\/blog\/2015\/11\/linearregression-and-notation\/","title":{"rendered":"LinearRegression And Notation"},"content":{"rendered":"<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>The precision of mathematical notation is a two-edged sword. On one side, it <em>should<\/em> clearly and concisely represent formal statements of mathematics. On the other side, sloppy notation, abbreviations, and time\/space-saving shortcuts (mainly in the writing of mathematics) can lead to ambiguity &#8212; which is a time sink for the reader! Even acceptable mathematical notation can hide important aspects of the mathematics. Notation can leave underlying assumptions hidden from view, and these assumptions may not be revealed until a problem solution, working implementation, or model application reveal the deficiency. Since I want to write a few posts on linear regression, I&#8217;m going to look at an example of the downstream effects that notation can cause &#8230; <!--more--><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p> Since I happen to have Hastie, et al.&#8217;s learning book (<em>The Elements of Statistical Learning<\/em>) open and in front of me, I&#8217;m going to follow along with their easy walkthrough of linear regression (namely, multiple linear regression with least squares). Immediately, they start out with a problem representation and then they quickly change the representation on you. And, I recall this being a hang up for me (even in grad. school). Maybe I&#8217;m admitting I&#8217;m not the quickest mathematician out there (I&#8217;m not &#8212; you can figure out whether that&#8217;s a statement about my admission or my mathematical skills). But I&#8217;m hoping to add a pedagogical and computational flavor to the normal presentations of mathematical machine learning (which has a high degree of overlap with computational statistics).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"the-case-of-the-disappearing-coefficient\">The Case of the Disappearing Coefficient<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>So, here goes. In the first formulation, we have (2.1):<br \/>\n<span class=\"math display\">\\[\\hat{Y} = \\hat{\\beta}_0 + \\sum_{j=1}^p X_j \\hat{\\beta}_j\\]<\/span><\/p>\n<p>In the second formulation (2.2), we have <span class=\"math display\">\\[\\hat{Y} = \\sum_{j=0}^p X_j \\hat{\\beta}_j\\]<\/span><\/p>\n<p>It might be fairly clear (how&#8217;s that for equivocation) that the <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span> has been rolled into the summation. However, what is less clear, and possibly lost in the text, is:<\/p>\n<blockquote><p>Often it is convenient to include the constant variable 1 in <span class=\"math inline\">\\(X\\)<\/span> and include <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span> in the vector of coefficients <span class=\"math inline\">\\(\\hat{\\beta}\\)<\/span><\/p><\/blockquote>\n<p>Nod your head, act like you know what&#8217;s going on (nothing to see here), and move along. Maybe you (advanced reader) feel like I&#8217;m overstating the confusion such changes can instigate. I&#8217;m not. Having taught undergraduates (strong, weak, technical, <em>a<\/em>technical, and everyone in-between) for 15 years, I&#8217;m not overstating. Less tongue-in-cheek, this modification of <span class=\"math inline\">\\(X\\)<\/span> can also be looked at as a transition from the <em>data matrix<\/em> to a <em>design matrix<\/em>.<\/p>\n<p>There are also deeper (modeling) issues that occur with the presence or absence of <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span> (note, both formulas above have it present &#8212; just in different forms). <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span> is called the <em>constant<\/em> or <em>intercept<\/em> term. Briefly (at the risk of being too brief and misleading), if (1) <span class=\"math inline\">\\({\\beta}_0=0\\)<\/span> (you have background information that the actual value is zero), (2) you are using a different type of linear model &#8211; anova\/categorical data, or (3) using standardized data, you can get away without <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span>. Otherwise, you want to have the <em>0<\/em>-th term because it will give unbiased <span class=\"math inline\">\\(\\hat{\\beta}\\)<\/span>s and zero-mean residuals. (Double check those if you have a dissertation defense coming up &#8212; or are working on a medical system.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"pulling-back-the-covers\">Pulling Back the Covers<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>So, let&#8217;s figure this out. I&#8217;m going use HTF&#8217;s notation, unless otherwise stated. Here&#8217;s a simple example to get us started. Here is a single case or observation <span class=\"math inline\">\\((X,Y)\\)<\/span> with 3 input (independent) variables or attributes <span class=\"math inline\">\\(X=(X_1,X_2,X_3)\\)<\/span> and an output (dependent) variable <span class=\"math inline\">\\(Y\\)<\/span>. Our goal will be to take measurement <span class=\"math inline\">\\(X\\)<\/span> and predict an output <span class=\"math inline\">\\(\\hat{Y}\\)<\/span> for cases where we haven&#8217;t already seen <span class=\"math inline\">\\(Y\\)<\/span>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<table>\n<thead>\n<tr class=\"header\">\n<th align=\"left\">Obs<\/th>\n<th align=\"left\"><span class=\"math inline\">\\(X_1\\)<\/span><\/th>\n<th align=\"left\"><span class=\"math inline\">\\(X_2\\)<\/span><\/th>\n<th align=\"left\"><span class=\"math inline\">\\(X_3\\)<\/span><\/th>\n<th align=\"left\"><span class=\"math inline\">\\(Y\\)<\/span><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"odd\">\n<td align=\"left\">1<\/td>\n<td align=\"left\">7.01<\/td>\n<td align=\"left\">3.46<\/td>\n<td align=\"left\">4.21<\/td>\n<td align=\"left\">25.22<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>More generally, <span class=\"math inline\">\\(X\\)<\/span> can have <span class=\"math inline\">\\(p\\)<\/span>-dimensions (our example above has <span class=\"math inline\">\\(p=3\\)<\/span>). With <em>p<\/em>-dimensions, <span class=\"math inline\">\\(X=(X_1, X_2, \\dots, X_p)\\)<\/span>. We started our indices at 1; we&#8217;re going to monkey with that in a second. You might see this general setup as &#8220;a problem with a <span class=\"math inline\">\\(p\\)<\/span>-dimensional input space&#8221;.<\/p>\n<p>What about the other terms <span class=\"math inline\">\\(\\hat{\\beta}_i\\)<\/span>? These are the coefficients we are trying to estimate. When we fill them in with estimated values (e.g., <span class=\"math inline\">\\(\\hat{\\beta}_1=\\hat{\\beta}_2=1.25\\)<\/span> and <span class=\"math inline\">\\(\\hat{\\beta}_0=\\hat{\\beta}_3=0.5\\)<\/span>), we have an equation that is linear in the <span class=\"math inline\">\\(X_i\\)<\/span> &#8212; we&#8217;re drawing a line (or a higher dimensional analogue). With those <span class=\"math inline\">\\(\\hat{\\beta}_i\\)<\/span>, we have: <span class=\"math display\">\\[\\hat{Y} = \\frac{1}{2} + 1.25X_1 + 1.25X_2 + \\frac{1}{2}X_3\\]<\/span><\/p>\n<p>When we fill in the <span class=\"math inline\">\\(X_i\\)<\/span> with values from a particular observation, we get out one predicted value for <span class=\"math inline\">\\(Y\\)<\/span>. For the case above, we would predict:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[1]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">Y_hat<\/span> <span class=\"o\">=<\/span> <span class=\"o\">.<\/span><span class=\"mi\">5<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.25<\/span><span class=\"o\">*<\/span><span class=\"mf\">7.01<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.25<\/span><span class=\"o\">*<\/span><span class=\"mf\">3.46<\/span> <span class=\"o\">+<\/span> <span class=\"o\">.<\/span><span class=\"mi\">5<\/span><span class=\"o\">*<\/span><span class=\"mf\">4.21<\/span>\n<span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">Y_hat<\/span><span class=\"p\">)<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>15.6925\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Note that this is not a perfect prediction &#8212; I pulled the <span class=\"math inline\">\\(\\hat{\\beta}_i\\)<\/span> out of a hat (hah!). Usually, when we try to fit a line to the data, we don&#8217;t do a perfect job, but we do tend to improve over random guessing.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"sleuthing\">Sleuthing<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Hopefully, it is obvious that <span class=\"math inline\">\\(\\hat{\\beta}_0=1\\hat{\\beta}_0\\)<\/span>. So, <span class=\"math inline\">\\(X_0=1\\)<\/span> (i.e., <span class=\"math inline\">\\(X=(X_0,X_1,&#8230;,X_p)=(1,X_1,&#8230;,X_p)\\)<\/span>).<\/p>\n<p>So, we have: <span class=\"math display\">\\[<br \/>\n\\begin{eqnarray}<br \/>\n\\hat{Y} &amp;=&amp; \\hat{\\beta}_0 + \\sum_{j=1}^p X_j \\hat{\\beta}_j \\\\<br \/>\n&amp;=&amp; 1\\hat{\\beta}_0 + \\sum_{j=1}^p X_j \\hat{\\beta}_j \\\\<br \/>\n&amp;=&amp; X_0\\hat{\\beta}_0 + \\sum_{j=1}^p X_j \\hat{\\beta}_j \\\\<br \/>\n&amp;=&amp; \\sum_{j=0}^p X_j \\hat{\\beta}_j \\\\<br \/>\n\\hat{Y} &amp;=&amp; X{\\beta}^T \\\\<br \/>\n\\end{eqnarray}<br \/>\n\\]<\/span><\/p>\n<p>Whenever we have a sum of products of values, we can rewrite the summation and multiplcation as a dot-product. We do that in the last line above. <span class=\"math inline\">\\(X\\)<\/span> becomes a <span class=\"math inline\">\\(p+1\\)<\/span> dimensional row-vector; likewise, <span class=\"math inline\">\\(\\beta\\)<\/span> is a <span class=\"math inline\">\\(p+1\\)<\/span> dimensional row-vector (transposed to a column-vector). If you recall your linear algebra, we utilize a &#8220;row-against-column&#8221; multiplication rule and <span class=\"math inline\">\\((1,p+1)\\times(p+1,1)\\)<\/span> yields a <span class=\"math inline\">\\((1,1)\\)<\/span> dimensional result.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"more-than-one-observation\">More Than One Observation<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Now, if you&#8217;re like me (computer scientist first, mathematician &#8212; lovingly, but &#8212; second) you&#8217;re interested in what this means from a computational perspective. In this case, I&#8217;m not necessarily concerned about what the different equations mean in terms of time-complexity (we&#8217;ve got <span class=\"math inline\">\\(O(p)\\)<\/span> additions and multiplications in both equations), but what about space-complexity and code expression?<\/p>\n<p>Now, you&#8217;ll notice that we expressed the relationship between the output and the input for one pair of input-output values. What if we want to do that for <span class=\"math inline\">\\(N\\)<\/span> input values? Then we have (HTF, 2.3):<\/p>\n<p><span class=\"math display\">\\[RSS(\\beta)=\\sum_{i=1}^N (y_i-x_{i}\\beta^T)^2\\]<\/span><\/p>\n<p>I&#8217;m going to diverge ever so slightly from HTF&#8217;s notation and make <span class=\"math inline\">\\(x_i\\)<\/span> a row-vector. So, we drop their <span class=\"math inline\">\\(\\cdot^T\\)<\/span> on <span class=\"math inline\">\\(x_i\\)<\/span>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Now, recall what I just said: if we have a sum of products, we can rewrite the calculation as a dot-product. Here, the product is &#8220;hidden&#8221; in the square (i.e., the <span class=\"math inline\">\\({\\cdot}^2\\)<\/span>). So, we can dot-product between <span class=\"math inline\">\\(y_i-x_{i}^T\\beta\\)<\/span> and itself &#8230; except, we need the subscripted items (<span class=\"math inline\">\\(y\\)<\/span> and <span class=\"math inline\">\\(x\\)<\/span>) to expand in structure to handle the different values that each carries across the summation (i.e., in general, <span class=\"math inline\">\\(x_1 \\neq x_2\\)<\/span>). HTF comes up with (2.4):<\/p>\n<p><span class=\"math display\">\\[\\min RSS(\\beta)=(\\bf{y}-\\bf{X}\\beta)^T(\\bf{y}-\\bf{X}\\beta)\\]<\/span><\/p>\n<p>Now, if you&#8217;re like me, you might be a little be grumpy. Even if you like a dynamically typed language like Python, I still like <em>strong<\/em> typing. Thus, <span class=\"math inline\">\\(X\\)<\/span> going from a <span class=\"math inline\">\\((p+1,1)\\)<\/span> vector <span class=\"math inline\">\\(\\rightarrow (p+1,N)\\)<\/span> matrix, sort of annoys me. But, if you look very, very carefully you will notice that <span class=\"math inline\">\\(X \\rightarrow \\bf{X}\\)<\/span>. Yes, <span class=\"math inline\">\\(X\\)<\/span> became bold-faced. Such utterly subtle changes in presentation with large changes in meaning are a hallmark of mathematical chickanery (I mean beauty). It&#8217;s the same reason that <span class=\"math inline\">\\(i\\)<\/span> and <span class=\"math inline\">\\(j\\)<\/span> are used for subscripts and that <span class=\"math inline\">\\(r\\)<\/span> and <span class=\"math inline\">\\(n\\)<\/span> are frequently used in binomical coeffients (i.e., <span class=\"math inline\">\\(\\binom{n}{r}\\)<\/span>). If you&#8217;re confused, just imagine writing those letters repeatedly on the chalkboard. And yes, I&#8217;ve seen folks getting soft and writing <span class=\"math inline\">\\(\\binom{n}{k}\\)<\/span>, but I digress &#8230; And please don&#8217;t be offended &#8212; computer scientists pull the same sort of crap in different ways.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>They don&#8217;t say it, but the use of matrices enables us to say: <span class=\"math display\">\\[\\bf{\\hat{Y}} = \\bf{X}\\hat{\\beta}^T\\]<\/span> Here we have some mathematical beauty. It says, very concisely, that the predicted outputs are a linear function of the inputs and the estimated coefficients. Although, it hides whether or not there is a <span class=\"math inline\">\\(\\hat{\\beta}_0\\)<\/span> term.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"computational-implications-of-notation-implementation\">Computational Implications of Notation: Implementation<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>We have some mathematical ideas sketched out. Let&#8217;s take a brief pause (from the mathematics) and turn to some code. We could work with a standard dataset (e.g., a UCI dataset like <em>iris<\/em>). But, for now, I&#8217;m going to go with a simple synthetic example that couples an actual linear relationship with some noise. We&#8217;ll start with a single <span class=\"math inline\">\\(X\\)<\/span> (err, <span class=\"math inline\">\\(x_0\\)<\/span>) and then we&#8217;ll build up to <span class=\"math inline\">\\(\\bf{X}\\)<\/span>. To be clear, in the Python code I&#8217;m going to use <span class=\"math inline\">\\(\\mathtt{x}\\)<\/span> for a single example and <span class=\"math inline\">\\(\\mathtt{X}\\)<\/span> for a set of examples. Also, we&#8217;ll use the following model: <span class=\"math inline\">\\(Y=f(X)=1+\\sum_{i=1}^{3}2^i X_i=1+2 X_1 + 4 X_2 + 8 X_3\\)<\/span> and we&#8217;ll restrict the <span class=\"math inline\">\\(X_i\\)<\/span> to come from the range <span class=\"math inline\">\\([0,10)\\)<\/span>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[2]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">np<\/span>\n<span class=\"n\">x<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">x<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 8.64408799  7.97358871  5.18087896]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[3]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">noise<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">normal<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">noise<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 3.73810917]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[4]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"k\">def<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">):<\/span>  \n    <span class=\"k\">return<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span><span class=\"o\">**<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"mi\">4<\/span><span class=\"p\">))<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>91.6295624877\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[5]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">noise<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">y<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 95.36767166]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>If you haven&#8217;t seen it before, you may be a little bit confused at this point. Why are we creating data out of thin air? And why did we add <em>noise<\/em>? We&#8217;re using synthetic data because it is easy to create, easy to control, and when we want to understand the relationship between the data and the model, we can do some sleuthing relatively easily. As to noise, the idea is that in the &#8220;real world&#8221; we typically have less than perfect knowledge of the output (math: <span class=\"math inline\">\\(Y\\)<\/span>, python <span class=\"math inline\">\\(\\mathtt{y}\\)<\/span>} values. We account for this by explicitly adding (controlled) noise. In the real world, the noise is there implicitly.<\/p>\n<p>One last issue. This has to do with the <span class=\"math inline\">\\(\\beta_0\\)<\/span> <span class=\"math inline\">\\(X_0\\)<\/span> issue. Mathematically, it is convenient to roll the coefficent into the summation and <em>expand<\/em> the <span class=\"math inline\">\\(X_0=1\\)<\/span> into the data. We can certainly do that:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[6]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">concatenate<\/span><span class=\"p\">([<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">ones<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">),<\/span> <span class=\"n\">x<\/span><span class=\"p\">])<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">x<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 1.          8.64408799  7.97358871  5.18087896]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Voila. Please take a second to look closely at the assignment. We just added three steps to (meet) the mathematical convenience. We (1) created a new array (<em>np.ones<\/em>), (2) we created a new python list (content inside of <em>[]<\/em>), and (3) we <em>created a new numpy array<\/em>. That third bit could be disasterous: for the benefit of adding one element, we just made a second allocation of <span class=\"math inline\">\\(O(p)\\)<\/span> memory elements and copies to fill them. Yes, premature optimization is the root of all evil. And yes, <span class=\"math inline\">\\(2p\\)<\/span> is still <span class=\"math inline\">\\(O(p)\\)<\/span>. But! We care immensely about going from using half of system memory to using all of system memory. And, I&#8217;m showing you the optimization issue here for pedagogical purposes: the same principle applies when we <em>must<\/em> use the optimization. There&#8217;s a moral to this story. We want to do one (and only one) memory allocation if we can help it. There&#8217;s a second moral coming in a moment, but we&#8217;ll take them one at a time. We can do a single allocation in one of two ways:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[7]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">x1<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">empty<\/span><span class=\"p\">((<\/span><span class=\"mi\">4<\/span><span class=\"p\">,))<\/span>\n<span class=\"n\">x1<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span>\n<span class=\"n\">x1<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">:]<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">x1<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 1.          0.48824435  8.03740956  8.83808266]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[8]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">x2<\/span>    <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span><span class=\"mi\">4<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">x2<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">x2<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 1.          9.85313908  5.0465084   0.38967288]\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>I&#8217;ll leave it as an exercise to the reader to discuss the benefits of each of these. As a take home, try not to use concatenation after you&#8217;ve moved past the point of sketching out ideas. A side note: <code>np.concatenate<\/code> sometimes has use when you are working with partial arrays as you build incremental solutions. Then you don&#8217;t pay for allocating over your whole data rows or columns (<em>p<\/em> or <em>N<\/em>) &#8212; particularly when you don&#8217;t know those final values. If we&#8217;re reading data from a file, this also introduces complexity because we need to account for the &#8220;padded&#8221; 1.0s in each row as we do the input. Also, remember that we&#8217;re going to have a second moral, related to the <span class=\"math inline\">\\(0\\)<\/span>-index coefficient parameter and input value, in a little bit. For now, let&#8217;s make a lot of data (i.e., let&#8217;s go from <code>x<\/code> to <code>X<\/code> in Python).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[9]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">X<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span><span class=\"mi\">4<\/span><span class=\"o\">*<\/span><span class=\"mi\">1000<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">X<\/span> <span class=\"o\">=<\/span> <span class=\"n\">X<\/span><span class=\"o\">.<\/span><span class=\"n\">reshape<\/span><span class=\"p\">((<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span><span class=\"mi\">4<\/span><span class=\"p\">))<\/span>\n<span class=\"n\">X<\/span><span class=\"p\">[:,<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>You might say, &#8220;Ack!&#8221; An extra 4000 uniform number generations? Isn&#8217;t that expensive? Possibly. So, let&#8217;s do:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[10]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">X<\/span>       <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">empty<\/span><span class=\"p\">((<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span><span class=\"mi\">4<\/span><span class=\"p\">))<\/span>\n<span class=\"n\">X<\/span><span class=\"p\">[:,<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>  <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span>\n<span class=\"n\">X<\/span><span class=\"p\">[:,<\/span><span class=\"mi\">1<\/span><span class=\"p\">:]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,(<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">))<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Well, now, I&#8217;m wondering, &#8220;Why the heck do we have an extra 1000 <span class=\"math inline\">\\(1.0\\)<\/span> values running around?&#8221; They are doing nothing but trying to smooth out our mathematical convenience. If have a gig (10^9 or 2^30, depending on your math\/comp preferences: either way, a billion or so) of examples, we&#8217;re now taking up a good bit of excess memory:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[11]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"mi\">10<\/span><span class=\"o\">**<\/span><span class=\"mi\">9<\/span> <span class=\"o\">*<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">array<\/span><span class=\"p\">([<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">])<\/span><span class=\"o\">.<\/span><span class=\"n\">nbytes<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt output_prompt\">Out[11]:<\/div>\n<div class=\"output_text output_subarea output_execute_result\">\n<pre>8000000000<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Not surprisingly, at 8 bytes per 64-bit float: for a billion of them, it&#8217;s 8 billion bytes or 8GB and near 8GiB. Point is, as the data grows, we&#8217;re wasting a good bit of space. And, we have to pay to get the 1.0 values there. So, while the explicit 1.0 is useful for the mathematical formalism, it seems like the cost to <em>directly<\/em> mimic the mathematics in code is not worth the code and time complexity (code complexity means cost in read-, write-, and maintainability).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"calculating-rss\">Calculating RSS<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Let&#8217;s return to the point of computing the residual sum-of-squares <span class=\"math inline\">\\(RSS\\)<\/span> for a dataset. I&#8217;m going to take the advice from above and <em>not<\/em> put in any explicit <span class=\"math inline\">\\(1.0\\)<\/span>s in the <code>X<\/code> (Python) matrix.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[12]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">X<\/span>     <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,(<\/span><span class=\"mi\">1000<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">))<\/span>\n<span class=\"n\">noise<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">normal<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1000<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">Y<\/span>     <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">noise<\/span>\n\n<span class=\"k\">print<\/span> <span class=\"n\">X<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">],<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]),<\/span> <span class=\"n\">Y<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>[ 6.6557709   6.35524414  6.67009511] 93.0932792752 91.449699275\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Remember, this is synthetic data. We know the underlying model (and thus, the <span class=\"math inline\">\\(\\beta\\)<\/span> coefficients) used to generate the outputs from the inputs. But in the real-world we would have the inputs and outputs without those coefficients. We would go through some process (that we&#8217;ll discuss soon!) to recover some &#8220;best guesses&#8221; &#8212; really, it&#8217;s more than that &#8212; at the coefficients. For sake of argument, let&#8217;s say we came up with two different sets of <span class=\"math inline\">\\(\\beta\\)<\/span>&#8216;s: <span class=\"math inline\">\\(\\hat{\\beta}_a=\\)<\/span> <code>beta_hat_a<\/code> and <span class=\"math inline\">\\(\\hat{\\beta}_b=\\)<\/span> <code>beta_hat_b<\/code>. I&#8217;m using <em>a<\/em> and <em>b<\/em>, to reduce confusion with the mathematical <span class=\"math inline\">\\(\\beta\\)<\/span> coefficients. Incidentally, hat (i.e., <span class=\"math inline\">\\(\\hat{\\cdot}\\)<\/span>) is used for values that are <em>inferred<\/em> (functions of) the dataset. &#8220;hat&#8221;, I guess, sounds better and looks more formal than &#8220;best guess given the data&#8221;.<\/p>\n<p>Moving along:<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[13]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">beta_hat_a<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">ones<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"p\">)<\/span>               <span class=\"c\"># \\betas = 1,1,1,1<\/span>\n<span class=\"n\">beta_hat_b<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"p\">)<\/span><span class=\"o\">+<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">2<\/span>   <span class=\"c\"># \\betas = 2,4,6,8, force float typing<\/span>\n<span class=\"n\">beta_hat_c<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"o\">**<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mi\">4<\/span><span class=\"p\">)<\/span>          <span class=\"c\"># \\betas = 1,2,4,8, force float typing<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">RSS<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span><span class=\"n\">y<\/span><span class=\"p\">,<\/span><span class=\"n\">beta<\/span><span class=\"p\">):<\/span>  \n    <span class=\"n\">errors<\/span> <span class=\"o\">=<\/span> <span class=\"n\">beta<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span> <span class=\"o\">+<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span><span class=\"n\">beta<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">:])<\/span> <span class=\"o\">-<\/span> <span class=\"n\">y<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">errors<\/span><span class=\"p\">,<\/span> <span class=\"n\">errors<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">print<\/span> <span class=\"s\">\"beta_hat_a: <\/span><span class=\"si\">%10s<\/span><span class=\"s\"> RSS: <\/span><span class=\"si\">%11.3f<\/span><span class=\"s\">\"<\/span>  <span class=\"o\">%<\/span> <span class=\"p\">(<\/span><span class=\"n\">beta_hat_a<\/span><span class=\"p\">,<\/span> <span class=\"n\">RSS<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">,<\/span><span class=\"n\">beta_hat_a<\/span><span class=\"p\">))<\/span>\n<span class=\"k\">print<\/span> <span class=\"s\">\"beta_hat_b: <\/span><span class=\"si\">%10s<\/span><span class=\"s\"> RSS: <\/span><span class=\"si\">% 11.3f<\/span><span class=\"s\">\"<\/span> <span class=\"o\">%<\/span> <span class=\"p\">(<\/span><span class=\"n\">beta_hat_b<\/span><span class=\"p\">,<\/span> <span class=\"n\">RSS<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">,<\/span><span class=\"n\">beta_hat_b<\/span><span class=\"p\">))<\/span>\n<span class=\"k\">print<\/span> <span class=\"s\">\"beta_hat_c: <\/span><span class=\"si\">%10s<\/span><span class=\"s\"> RSS: <\/span><span class=\"si\">% 11.3f<\/span><span class=\"s\">\"<\/span> <span class=\"o\">%<\/span> <span class=\"p\">(<\/span><span class=\"n\">beta_hat_b<\/span><span class=\"p\">,<\/span> <span class=\"n\">RSS<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">,<\/span><span class=\"n\">beta_hat_c<\/span><span class=\"p\">))<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>beta_hat_a: [ 1.  1.  1.  1.] RSS: 3537775.852\nbeta_hat_b: [ 2.  4.  6.  8.] RSS:  506959.425\nbeta_hat_c: [ 2.  4.  6.  8.] RSS:    3931.634\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Look at what happened: even though I kept the <code>1.0<\/code>s out of <code>X<\/code>, <em>I still included them in my model<\/em>. I included them by including a <code>beta[0]<\/code> term when I put together the predicted <code>y<\/code> value: <code>beta[0] + np.dot(x,beta[1:])<\/code>. This is the second moral with a few parts: (1) factoring out common data is a powerful technique that can save memory, (2) mathematical beauty and computation simplicity\/efficiency can sometimes be at odds, and (3) just because you know for form of the data <code>X<\/code> doesn&#8217;t mean you know the model being applied &#8212; you also need to know whether a <code>beta[0]<\/code> term is showing up in the calculations.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"a-note-on-np.dot-and-broadcasting\">A Note on <code>np.dot<\/code> and Broadcasting<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>The guess of all 1.0 coefficients has has quite a bit more error (residue) than the multiples of two. How can we get a &#8220;really good&#8221; guess at the coefficients? Ah, young grasshoper, that is for another lesson. But, I can give you one other tip. Be careful with <code>np.dot(x,beta)<\/code>.<\/p>\n<p>If <code>beta<\/code> is a row-vector (a 1-D NumPy array with shape <code>(p,)<\/code>), than <code>np.dot(beta, x)<\/code> will work when <code>x<\/code> is a 1-D array of shape <code>(p,)<\/code> <em>but it will fail<\/em> when <code>x<\/code> is a 2-D array of shape <code>(n,p)<\/code>. <em>However<\/em>, <code>np.dot(x,beta)<\/code> will work for both the multiple data <code>[n x p] dot [p]<\/code> and for the single data <code>[p] dot [p]<\/code>. The reason is tied up in NumPy&#8217;s rules for broadcasting, which we won&#8217;t dive into now.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In\u00a0[14]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">,<\/span><span class=\"mf\">5.0<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">xs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">row_stack<\/span><span class=\"p\">([<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span><span class=\"n\">x<\/span><span class=\"p\">])<\/span>\n<span class=\"n\">beta<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">arange<\/span><span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">,<\/span><span class=\"mf\">5.0<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">print<\/span> <span class=\"s\">\"with dot(&lt;data&gt;, beta)\"<\/span>\n<span class=\"k\">print<\/span>  <span class=\"n\">x<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">beta<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">beta<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">xs<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">beta<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">xs<\/span><span class=\"p\">,<\/span> <span class=\"n\">beta<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">print<\/span> <span class=\"s\">\"with dot(beta, &lt;data&gt;)\"<\/span>\n<span class=\"k\">print<\/span> <span class=\"n\">beta<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span>  <span class=\"n\">x<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">beta<\/span><span class=\"p\">,<\/span>  <span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">try<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">print<\/span> <span class=\"n\">beta<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">xs<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">,<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">beta<\/span><span class=\"p\">,<\/span> <span class=\"n\">xs<\/span><span class=\"p\">)<\/span>\n<span class=\"k\">except<\/span> <span class=\"ne\">ValueError<\/span><span class=\"p\">:<\/span>\n    <span class=\"k\">print<\/span> <span class=\"s\">\"failed\"<\/span>\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>with dot(&lt;data&gt;, beta)\n(4,) (4,) 4.0\n(2, 4) (4,) [ 4.  4.]\nwith dot(beta, &lt;data&gt;)\n(4,) (4,) 4.0\n(4,) (2, 4) failed\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"final-words\">Final Words<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>You might be wondering why I spent so much time working through the mathematical and computational representations, only to point out the one that has some pretty clear advantages (the implicit <span class=\"math inline\">\\(0\\)<\/span>-index constant). The reason is that many (most? all?) issues in expressing mathematics as computations (and translating from mathematics to computation) have similar issues: efficiency and representation trade-offs.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\"><\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3>Additional Resources<\/h3>\n<p>You can grab a <a href=\"http:\/\/drsfenner.org\/public\/notebooks\/LinearRegression1.ipynb\">copy of this notebook<\/a>.<\/p>\n<p>Even better, you can <a href=\"http:\/\/nbviewer.ipython.org\/url\/drsfenner.org\/public\/notebooks\/LinearRegression1.ipynb\">view it using nbviewer<\/a>.<\/p>\n<h3>License<\/h3>\n<p>Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).<\/p>\n<p><a href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\" rel=\"license\"><img decoding=\"async\" style=\"border-width: 0;\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-sa\/4.0\/88x31.png\" alt=\"Creative Commons License\" \/><\/a><br \/>\nDrsFenner.org Blog And Notebooks by <a href=\"drsfenner.org\" rel=\"cc:attributionURL\">Mark and Barbara Fenner<\/a> is licensed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\" rel=\"license\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License<\/a>.<br \/>\nPermissions beyond the scope of this license may be available at <a href=\"drsfenner.org\/blog\/about-and-contacts\" rel=\"cc:morePermissions\">drsfenner.org\/blog\/about-and-contacts<\/a>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The precision of mathematical notation is a two-edged sword. On one side, it should clearly and concisely represent formal statements of mathematics. On the other side, sloppy notation, abbreviations, and time\/space-saving shortcuts (mainly in the writing of mathematics) can lead to ambiguity &#8212; which is a time sink for the reader! Even acceptable mathematical notation [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,7],"tags":[],"class_list":["post-373","post","type-post","status-publish","format-standard","hentry","category-mrdr","category-sci-math-stat-python"],"_links":{"self":[{"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/comments?post=373"}],"version-history":[{"count":4,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/373\/revisions"}],"predecessor-version":[{"id":468,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/373\/revisions\/468"}],"wp:attachment":[{"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/media?parent=373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/categories?post=373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/tags?post=373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}