{"id":392,"date":"2015-12-22T15:00:06","date_gmt":"2015-12-22T15:00:06","guid":{"rendered":"http:\/\/drsfenner.org\/blog\/?p=392"},"modified":"2016-02-05T16:30:24","modified_gmt":"2016-02-05T16:30:24","slug":"into-the-weeds-ii-lapack-lapacke-and-two-paths-without-copies","status":"publish","type":"post","link":"http:\/\/drsfenner.org\/blog\/2015\/12\/into-the-weeds-ii-lapack-lapacke-and-two-paths-without-copies\/","title":{"rendered":"Into the Weeds II &#8211; LAPACK, LAPACKE, and Two Paths without Copies"},"content":{"rendered":"<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>In the <a href=\"http:\/\/drsfenner.org\/blog\/2015\/12\/into-the-weeds-i-lapack-lapacke-and-three-paths\/\">last post<\/a>, we tried to use lower-level LAPACK\/LAPACKE calls to remove some of the copying overhead of our operations. Well, it turns out that it didn&#8217;t work out so well (see the table at the end of the post). The fundamental reason why is that LAPACKE took the easy way out, when they decided to wrap LAPACK functions. In particular, whenever there is a C-\/Fortran-ordering issue (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Row-major_order\">for a reminder<\/a>), LAPACKE simply makes a copy.<!--more--><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"diatribe\">Diatribe<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Honestly, I can&#8217;t believe it. LAPACK is <em>it<\/em>, when it comes to high-performance numerical libraries. And the &quot;major, official&quot; C-interface for LAPACK <em>PUNTS<\/em> on performance and makes a copy? In the modern world, Fortran is more and more of a historical footnote (outside of numerical analysis), C is <em>almost<\/em> a <em>lingua franca<\/em> underneath many other programming systems, and data is more likely to live in row-major order (&quot;natively&quot;), and &quot;FLOPS are free&quot; (memory is the bottleneck in many cases) &#8230; well, I won&#8217;t go on. Point is, if we &#8211; as engineers &#8211; want to make use of <em>LAPACK<\/em> proper and we live in C-land (which includes, at least, CPython), we have to make adjustments. Here, we&#8217;ll look at what those adjustments get us.<\/p>\n<p>BTW, we (in C-land) don&#8217;t have to clutch at LAPACK forever. There are other libraries out there &#8212; unfortunately, some of them are so wedded to LAPACK that they too, keep an emphasis on column-major data. Two that are more flexible, or at least allow for data in row- or column-major order to be used with high-performance numerical algorithms <em>without copying<\/em>, are <a href=\"http:\/\/www.cs.utexas.edu\/~flame\/web\/libFLAME.html\">libFLAME<\/a> and <a href=\"http:\/\/apfel.mathematik.uni-ulm.de\/~lehn\/FLENS\/index.html\">FLENS<\/a>. Unfortunately, FLENS missed a great opportunity to move beyond the FORTRAN77 names (6 characters only) of raw BLAS and LAPACK. <i>*sigh*<\/i> Someone should make a <em>readable<\/em> interface around the FLENS library. One other, older project that looked interesting while I was researching these posts is <a href=\"http:\/\/homepage.math.uiowa.edu\/~dstewart\/meschach\/\">Meschach<\/a> which is a pure C numerical linear algebra library.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>One last comment: all hope is not lost for C-to-LAPACK. Here are some hints in the slow moving world of &quot;building a better mousetrap&quot; for numerical linear algebra. Two are from the GSL (GNU Scientific Library) mailing list and one from the LAPACK mailing list itself:<\/p>\n<ul>\n<li>https:\/\/sourceware.org\/ml\/gsl-discuss\/2008-q3\/msg00028.html<\/li>\n<li>https:\/\/sourceware.org\/ml\/gsl-discuss\/2008-q3\/msg00029.html<\/li>\n<li>and, especially, http:\/\/icl.cs.utk.edu\/lapack-forum\/viewtopic.php?f=14&amp;t=4469<\/li>\n<\/ul>\n<p>I think part of the issue is that the best people in the world for implementing (new) numerical methods &#8211; numerical methods researchers developing new, faster, more efficient, more stable algorithms &#8211; don&#8217;t actually <em>use<\/em> those methods very much. They don&#8217;t go on to write other, bigger systems in which the numerical method is &quot;just&quot; one cog. They quickly move on to making new, better numerical methods. And we should be thankful! (I am!) But, they implement the new methods in <em>the language they&#8217;ve always used: F77<\/em>. Then, they dump them in LAPACK and move on to the next research project. I&#8217;m not actually being critical. It&#8217;s just how the game works. But, for those of use on the application\/user side of the coin, we&#8217;re left wanting.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>Enough talk. Let&#8217;s fight. Err, sorry about that. What I meant was: let&#8217;s see if we can actually remove the copy. Here&#8217;s some example code from LAPACKE_work that demonstrates what happens (see <a href=\"http:\/\/www.netlib.org\/lapack\/explore-html\/db\/d32\/lapacke__dgelsd__work_8c_source.html\">lapacke_dgelsd_work<\/a> lines 43-86).<\/p>\n<div class=\"sourceCode\">\n<pre class=\"sourceCode c\"><code class=\"sourceCode c\">     <span class=\"kw\">if<\/span>( matrix_layout == LAPACK_COL_MAJOR ) {\r\n            <span class=\"co\">\/* Call LAPACK function and adjust info *\/<\/span>\r\n            ...\r\n     } <span class=\"kw\">else<\/span> <span class=\"kw\">if<\/span>( matrix_layout == LAPACK_ROW_MAJOR ) {\r\n            ...\r\n           <span class=\"co\">\/* Allocate memory for temporary array(s) *\/<\/span>\r\n           a_t = (<span class=\"dt\">double<\/span>*)LAPACKE_malloc( <span class=\"kw\">sizeof<\/span>(<span class=\"dt\">double<\/span>) * lda_t * MAX(<span class=\"dv\">1<\/span>,n) );\r\n           ...\r\n           b_t = (<span class=\"dt\">double<\/span>*)LAPACKE_malloc( <span class=\"kw\">sizeof<\/span>(<span class=\"dt\">double<\/span>) * ldb_t * MAX(<span class=\"dv\">1<\/span>,nrhs) );\r\n           ...\r\n           <span class=\"co\">\/* Transpose input matrices *\/<\/span>\r\n           LAPACKE_dge_trans( matrix_layout, m, n, a, lda, a_t, lda_t );\r\n           LAPACKE_dge_trans( matrix_layout, MAX(m,n), nrhs, b, ldb, b_t, ldb_t );\r\n           <span class=\"co\">\/* Call LAPACK function and adjust info *\/<\/span>\r\n           ...\r\n     }<\/code><\/pre>\n<\/div>\n<p>So, the punch-line is: if we want to avoid a transpose-copy, then we must call LAPACKE with a LAPACK_COL_MAJOR matrix. The easiest way to do that is to <em>create our data in column-major (Fortran) order<\/em>. In some respects, this also punts on the issue. If we have a large, C-based system, we can expect to work with our data in row-major order. But, we&#8217;ll take a mulligan there and look at it in another post. As a hint, can we compute the decomposition of a <em>transposed<\/em> matrix and answer questions about the <em>original, non-transposed<\/em> matrix. But that&#8217;s for next time. Let&#8217;s see what happens when we use Fortran-happy data.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"spin-up\">Spin Up<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[1]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"kn\">import<\/span> <span class=\"nn\">operator<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">op<\/span>\r\n\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">np<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">numpy.linalg<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">nla<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">scipy.linalg<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">sla<\/span>\r\n\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">cvxopt<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">matrix<\/span> <span class=\"k\">as<\/span> <span class=\"n\">cvx_matrix<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">cvxopt.lapack<\/span> <span class=\"kn\">as<\/span> <span class=\"nn\">cla<\/span>\r\n\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">liblapack<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">liblapack<\/span> <span class=\"k\">as<\/span> <span class=\"n\">lap<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">cffi<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">FFI<\/span>\r\n<span class=\"n\">ffi<\/span> <span class=\"o\">=<\/span> <span class=\"n\">FFI<\/span><span class=\"p\">()<\/span>\r\n\r\n<span class=\"c\">#<\/span>\r\n<span class=\"c\"># numpy array to underlying C array<\/span>\r\n<span class=\"c\">#<\/span>\r\n\r\n<span class=\"k\">def<\/span> <span class=\"nf\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">arr<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"k\">return<\/span> <span class=\"n\">ffi<\/span><span class=\"o\">.<\/span><span class=\"n\">cast<\/span><span class=\"p\">(<\/span><span class=\"s\">&quot;double *&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">arr<\/span><span class=\"o\">.<\/span><span class=\"n\">__array_interface__<\/span><span class=\"p\">[<\/span><span class=\"s\">&#39;data&#39;<\/span><span class=\"p\">][<\/span><span class=\"mi\">0<\/span><span class=\"p\">])<\/span>\r\n\r\n<span class=\"k\">def<\/span> <span class=\"nf\">get_iptr<\/span><span class=\"p\">(<\/span><span class=\"n\">arr<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"k\">return<\/span> <span class=\"n\">ffi<\/span><span class=\"o\">.<\/span><span class=\"n\">cast<\/span><span class=\"p\">(<\/span><span class=\"s\">&quot;int *&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">arr<\/span><span class=\"o\">.<\/span><span class=\"n\">__array_interface__<\/span><span class=\"p\">[<\/span><span class=\"s\">&#39;data&#39;<\/span><span class=\"p\">][<\/span><span class=\"mi\">0<\/span><span class=\"p\">])<\/span>\r\n\r\n<span class=\"o\">%<\/span><span class=\"k\">load_ext<\/span> memory_profiler\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"data-creation-with-an-ecumenical-flavor\">Data Creation (with an Ecumenical Flavor)<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>I was going to call this agnostic data creation, but really, it is creating data in either row- or column-major order. So, ecumenical felt right. Here goes. We have to rely on the correct interactions between <code>np.dot<\/code> and NumPy&#8217;s broadcasting mechanisms to properly <em>apply<\/em> our model (i.e., the coefficients) to our data. On lines 14 and 18, the net effect of swapping <code>X.dot(coeffs)<\/code> for <code>coeffs.dot(X)<\/code> is that we get <span class=\"math inline\">\\(row\\ x\\ col\\)<\/span> in the first case and <span class=\"math inline\">\\(row\\ x\\ row\\)<\/span> in the second. Two other notes on the code: (1) the single line <code>def<\/code>&#8216;s are not often seen in Python &#8211; that also works for any <code>:<\/code> terminated, compound-statement starter and (2) on line #22, the argument to <code>Y.reshape<\/code> is selected by a ternery operator that picks the shape. This is to keep a nicely parallel <code>X.T,Y.T<\/code> in the <code>return<\/code> statement. If you can&#8217;t see line numbers, when you read this post, just load the raw notebook (there&#8217;s a link at the bottom), navigate to this cell, and hit Control-M followed by L.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[2]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"k\">def<\/span> <span class=\"nf\">create_data<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"o\">=<\/span><span class=\"mi\">5000<\/span><span class=\"p\">,<\/span> <span class=\"n\">p<\/span><span class=\"o\">=<\/span><span class=\"mi\">100<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"o\">=<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"sd\">&#39;&#39;&#39; <\/span>\r\n<span class=\"sd\">        n is number cases\/observations\/examples<\/span>\r\n<span class=\"sd\">        p is number of features\/attributes\/variables<\/span>\r\n\r\n<span class=\"sd\">        order = &quot;F&quot; or &quot;T&quot; to create fortran\/transposed data<\/span>\r\n<span class=\"sd\">    &#39;&#39;&#39;<\/span>\r\n    <span class=\"k\">assert<\/span> <span class=\"n\">order<\/span> <span class=\"ow\">in<\/span> <span class=\"p\">{<\/span><span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;T&quot;<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">}<\/span>\r\n\r\n    <span class=\"n\">coeffs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span> <span class=\"n\">p<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"k\">if<\/span> <span class=\"n\">order<\/span> <span class=\"o\">==<\/span> <span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">:<\/span>\r\n        <span class=\"n\">X<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span><span class=\"n\">p<\/span><span class=\"p\">))<\/span>\r\n        <span class=\"c\"># row x col<\/span>\r\n        <span class=\"k\">def<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">):<\/span> <span class=\"k\">return<\/span> <span class=\"n\">X<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">coeffs<\/span><span class=\"p\">)<\/span>        \r\n    <span class=\"k\">else<\/span><span class=\"p\">:<\/span>\r\n        <span class=\"n\">X<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span> <span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">,<\/span><span class=\"n\">n<\/span><span class=\"p\">))<\/span>\r\n        <span class=\"c\"># row x row<\/span>\r\n        <span class=\"k\">def<\/span> <span class=\"nf\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">):<\/span> <span class=\"k\">return<\/span> <span class=\"n\">coeffs<\/span><span class=\"o\">.<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">noise<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">random<\/span><span class=\"o\">.<\/span><span class=\"n\">normal<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">noise<\/span>\r\n    <span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Y<\/span><span class=\"o\">.<\/span><span class=\"n\">reshape<\/span><span class=\"p\">((<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">if<\/span> <span class=\"n\">order<\/span><span class=\"o\">==<\/span><span class=\"s\">&quot;C&quot;<\/span> <span class=\"k\">else<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"n\">n<\/span><span class=\"p\">))<\/span> <span class=\"c\"># too cute<\/span>\r\n\r\n    <span class=\"k\">return<\/span> <span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">)<\/span> <span class=\"k\">if<\/span> <span class=\"n\">order<\/span><span class=\"o\">==<\/span><span class=\"s\">&quot;C&quot;<\/span> <span class=\"k\">else<\/span> <span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"o\">.<\/span><span class=\"n\">T<\/span><span class=\"p\">,<\/span> <span class=\"n\">Y<\/span><span class=\"o\">.<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span>\r\n\r\n<span class=\"c\">#<\/span>\r\n<span class=\"c\"># Don&#39;t Forget Us!<\/span>\r\n<span class=\"c\">#<\/span>\r\n<span class=\"n\">n_obs<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"o\">**<\/span><span class=\"mi\">16<\/span>\r\n<span class=\"n\">n_ftrs<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"o\">**<\/span><span class=\"mi\">10<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"go-time\">Go Time<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>There are a few arguments that we have to pull out of <code>X<\/code> and <code>Y<\/code> for each LAPACKE call. This makes life easier below and it also let&#8217;s us massage away order problems in one spot.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[3]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\">#<\/span>\r\n<span class=\"c\"># don&#39;t repeat yourself (not lazy - efficient! at least for the programmer)<\/span>\r\n<span class=\"c\"># <\/span>\r\n<span class=\"k\">def<\/span> <span class=\"nf\">make_common_args<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"o\">=<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">m<\/span><span class=\"p\">,<\/span><span class=\"n\">n<\/span>  <span class=\"o\">=<\/span> <span class=\"n\">X<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span>\r\n    <span class=\"n\">nrhs<\/span> <span class=\"o\">=<\/span>  <span class=\"n\">Y<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"k\">if<\/span> <span class=\"n\">Y<\/span><span class=\"o\">.<\/span><span class=\"n\">ndim<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">2<\/span> <span class=\"k\">else<\/span> <span class=\"mi\">1<\/span>\r\n    \r\n    <span class=\"k\">if<\/span> <span class=\"n\">order<\/span><span class=\"o\">==<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">:<\/span>\r\n        <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span> <span class=\"n\">la_order<\/span> <span class=\"o\">=<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span> <span class=\"n\">lap<\/span><span class=\"o\">.<\/span><span class=\"n\">LAPACK_ROW_MAJOR<\/span>\r\n    <span class=\"k\">else<\/span><span class=\"p\">:<\/span>\r\n        <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span> <span class=\"n\">la_order<\/span> <span class=\"o\">=<\/span> <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">m<\/span><span class=\"p\">,<\/span>    <span class=\"n\">lap<\/span><span class=\"o\">.<\/span><span class=\"n\">LAPACK_COL_MAJOR<\/span>\r\n\r\n    <span class=\"k\">return<\/span> <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span> <span class=\"n\">la_order<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>And a couple of short routines to use the underlying decompositions via LAPACK routines whose primary duty is to solve least-squares. In <code>sls_svd<\/code>, we allocate some memory on the heap to store the singular values. LAPACK itself (not LAPACKE) goes to great lengths to avoid this. &quot;All&quot; (as near as I can tell) memory must be allocated in the parent caller &#8211; hence the need for <code>WORK<\/code> arrays.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[4]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"k\">def<\/span> <span class=\"nf\">sls_qr<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"o\">=<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span> <span class=\"n\">la_order<\/span> <span class=\"o\">=<\/span> <span class=\"n\">make_common_args<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"p\">)<\/span>\r\n    \r\n    <span class=\"n\">info<\/span> <span class=\"o\">=<\/span> <span class=\"n\">lap<\/span><span class=\"o\">.<\/span><span class=\"n\">LAPACKE_dgels<\/span><span class=\"p\">(<\/span><span class=\"n\">la_order<\/span><span class=\"p\">,<\/span>\r\n                             <span class=\"s\">&quot;N&quot;<\/span><span class=\"p\">,<\/span> <span class=\"c\"># do not transpose<\/span>\r\n                             <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span>\r\n                             <span class=\"n\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">),<\/span> <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span>\r\n                             <span class=\"n\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">Y<\/span><span class=\"p\">),<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"k\">assert<\/span> <span class=\"ow\">not<\/span> <span class=\"n\">info<\/span>\r\n    <span class=\"k\">return<\/span> <span class=\"n\">Y<\/span><span class=\"p\">[:<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span>\r\n\r\n\r\n<span class=\"k\">def<\/span> <span class=\"nf\">sls_svd<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">,<\/span><span class=\"n\">order<\/span><span class=\"o\">=<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span> <span class=\"n\">la_order<\/span> <span class=\"o\">=<\/span> <span class=\"n\">make_common_args<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">singular_values<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">empty<\/span><span class=\"p\">(<\/span><span class=\"nb\">min<\/span><span class=\"p\">(<\/span><span class=\"n\">m<\/span><span class=\"p\">,<\/span><span class=\"n\">n<\/span><span class=\"p\">))<\/span>\r\n    <span class=\"n\">rank<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">zeros<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">dtype<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">int64<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">info<\/span> <span class=\"o\">=<\/span> <span class=\"n\">lap<\/span><span class=\"o\">.<\/span><span class=\"n\">LAPACKE_dgelsd<\/span><span class=\"p\">(<\/span><span class=\"n\">la_order<\/span><span class=\"p\">,<\/span>\r\n                              <span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">nrhs<\/span><span class=\"p\">,<\/span>\r\n                              <span class=\"n\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">),<\/span> <span class=\"n\">ldX<\/span><span class=\"p\">,<\/span>\r\n                              <span class=\"n\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">Y<\/span><span class=\"p\">),<\/span> <span class=\"n\">ldY<\/span><span class=\"p\">,<\/span>\r\n                              <span class=\"n\">get_dptr<\/span><span class=\"p\">(<\/span><span class=\"n\">singular_values<\/span><span class=\"p\">),<\/span>\r\n                              <span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> \r\n                              <span class=\"n\">get_iptr<\/span><span class=\"p\">(<\/span><span class=\"n\">rank<\/span><span class=\"p\">))<\/span>\r\n    <span class=\"k\">assert<\/span> <span class=\"ow\">not<\/span> <span class=\"n\">info<\/span>\r\n    <span class=\"k\">return<\/span> <span class=\"n\">Y<\/span><span class=\"p\">[:<\/span><span class=\"n\">n<\/span><span class=\"p\">]<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"same-results\">Same Results?<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>So, let&#8217;s run each of these routines with row- and column-major data and verify we get the same output.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[5]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\">#<\/span>\r\n<span class=\"c\"># we have to copy the data so we can use the same data in the next call<\/span>\r\n<span class=\"c\"># lapack\/e overwrite its inputs with the results (decomps and betas)<\/span>\r\n<span class=\"c\">#<\/span>\r\n<span class=\"k\">def<\/span> <span class=\"nf\">copy_and_do<\/span><span class=\"p\">(<\/span><span class=\"n\">func<\/span><span class=\"p\">,<\/span> <span class=\"n\">_X<\/span><span class=\"p\">,<\/span> <span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"o\">=<\/span><span class=\"s\">&quot;C&quot;<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"p\">),<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"n\">order<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"k\">return<\/span> <span class=\"n\">func<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span><span class=\"p\">,<\/span><span class=\"n\">order<\/span><span class=\"p\">)<\/span>\r\n\r\n<span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">create_data<\/span><span class=\"p\">(<\/span><span class=\"n\">n_obs<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_ftrs<\/span><span class=\"p\">)<\/span>\r\n\r\n<span class=\"n\">betas_qr_c<\/span>  <span class=\"o\">=<\/span> <span class=\"n\">copy_and_do<\/span><span class=\"p\">(<\/span><span class=\"n\">sls_qr<\/span><span class=\"p\">,<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span><span class=\"p\">)<\/span>\r\n<span class=\"n\">betas_qr_f<\/span>  <span class=\"o\">=<\/span> <span class=\"n\">copy_and_do<\/span><span class=\"p\">(<\/span><span class=\"n\">sls_qr<\/span><span class=\"p\">,<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">)<\/span>\r\n<span class=\"n\">betas_svd_c<\/span> <span class=\"o\">=<\/span> <span class=\"n\">copy_and_do<\/span><span class=\"p\">(<\/span><span class=\"n\">sls_svd<\/span><span class=\"p\">,<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span><span class=\"p\">)<\/span>\r\n<span class=\"n\">betas_svd_f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">copy_and_do<\/span><span class=\"p\">(<\/span><span class=\"n\">sls_svd<\/span><span class=\"p\">,<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">)<\/span>\r\n\r\n<span class=\"n\">betas<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">betas_qr_c<\/span><span class=\"p\">,<\/span> <span class=\"n\">betas_qr_f<\/span><span class=\"p\">,<\/span> <span class=\"n\">betas_svd_c<\/span><span class=\"p\">,<\/span> <span class=\"n\">betas_svd_f<\/span><span class=\"p\">]<\/span>\r\n<span class=\"k\">print<\/span> <span class=\"s\">&quot;All same betas?:&quot;<\/span><span class=\"p\">,<\/span> <span class=\"nb\">all<\/span><span class=\"p\">(<\/span><span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">allclose<\/span><span class=\"p\">(<\/span><span class=\"n\">b1<\/span><span class=\"p\">,<\/span> <span class=\"n\">b2<\/span><span class=\"p\">)<\/span> <span class=\"k\">for<\/span> <span class=\"n\">b1<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">betas<\/span><span class=\"p\">[:<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"k\">for<\/span> <span class=\"n\">b2<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">betas<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">:])<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>All same betas?: True\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"memory-usage\">Memory Usage<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[6]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\"># X,Y = create_data(n_obs, n_ftrs)<\/span>\r\n<span class=\"c\"># %memit sls_qr( X,Y) # peak memory: 1084.82 MiB, increment: 513.27 MiB<\/span>\r\n<span class=\"c\"># %memit sls_svd(X,Y) # peak memory: 1600.71 MiB, increment: 514.94 MiB<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[7]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\"># X,Y = create_data(n_obs, n_ftrs, order=&#39;F&#39;)<\/span>\r\n<span class=\"c\"># %memit sls_qr( X,Y, order=&#39;F&#39;) # peak memory: 574.21 MiB, increment: 3.11 MiB<\/span>\r\n<span class=\"c\"># %memit sls_svd(X,Y, order=&#39;F&#39;) # peak memory: 574.00 MiB, increment: 2.90 MiB<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>And now, it&#8217;s time for a little happy dance: we avoided copies!<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"timing\">Timing<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[8]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"n\">_X<\/span><span class=\"p\">,<\/span><span class=\"n\">_Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">create_data<\/span><span class=\"p\">(<\/span><span class=\"n\">n_obs<\/span><span class=\"p\">,<\/span> <span class=\"n\">n_ftrs<\/span><span class=\"p\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[9]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\"># forcing copies since all of these calls place the results in the input (overwrite our data)<\/span>\r\n<span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_X<\/span><span class=\"p\">),<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_Y<\/span><span class=\"p\">)<\/span>\r\n<span class=\"o\">%<\/span><span class=\"k\">timeit<\/span> sls_qr(X,Y)\r\n\r\n<span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_X<\/span><span class=\"p\">),<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_Y<\/span><span class=\"p\">)<\/span>\r\n<span class=\"o\">%<\/span><span class=\"k\">timeit<\/span> sls_svd(X,Y)\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>1 loops, best of 3: 8.78 s per loop\r\n1 loops, best of 3: 9.02 s per loop\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing code_cell rendered\">\n<div class=\"input\">\n<div class=\"prompt input_prompt\">In&nbsp;[10]:<\/div>\n<div class=\"inner_cell\">\n<div class=\"input_area\">\n<div class=\" highlight hl-ipython2\">\n<pre><span class=\"c\"># another way to get &#39;F&#39; ordered data<\/span>\r\n<span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">),<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">)<\/span>\r\n<span class=\"o\">%<\/span><span class=\"k\">timeit<\/span> sls_qr(X,Y, &quot;F&quot;)\r\n\r\n<span class=\"n\">X<\/span><span class=\"p\">,<\/span><span class=\"n\">Y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_X<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">),<\/span> <span class=\"n\">np<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">(<\/span><span class=\"n\">_Y<\/span><span class=\"p\">,<\/span> <span class=\"s\">&quot;F&quot;<\/span><span class=\"p\">)<\/span>\r\n<span class=\"o\">%<\/span><span class=\"k\">timeit<\/span> sls_svd(X,Y, &quot;F&quot;)\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"output_wrapper\">\n<div class=\"output\">\n<div class=\"output_area\">\n<div class=\"prompt\"><\/div>\n<div class=\"output_subarea output_stream output_stdout output_text\">\n<pre>1 loops, best of 3: 6.9 s per loop\r\n1 loops, best of 3: 7.31 s per loop\r\n<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3 id=\"summary\">Summary<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<p>So, at long last, we finally got rid of the (large) copy operation and are simply doing the work of the algorithm. Both QR and SVD still take &quot;a lot&quot; of time (especially compared to Cholesky which was about 1.5s in the last post). But, Cholesky is far less stable than QR and SVD. SVD is generally overkill for solving least-squares, but sometimes rouding (numerical) error can result in a singular (not invertible) <span class=\"math inline\">\\(R\\)<\/span> and then SVD is necessary. An analog of those small errors appears in SVD calculations, but buried inside the SVD algorithm is a parameter that determines when &quot;small&quot; becomes &quot;zero&quot;. Our next step is to look at ways of dealing with the dreaded copy, without leaving C-land (or at least only leaving it when absolutely necessary). Then, we will look at the underlying algorithms themselves (oooooo, spooky!).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"cell border-box-sizing text_cell rendered\">\n<div class=\"prompt input_prompt\">\n<\/div>\n<div class=\"inner_cell\">\n<div class=\"text_cell_render border-box-sizing rendered_html\">\n<h3>\nAdditional Resources<br \/>\n<\/h3>\n<p>\nYou can grab a <a href=\"http:\/\/drsfenner.org\/public\/notebooks\/OLS_LAPACK_and_LAPACKE_2.ipynb\">copy of this notebook<\/a>.\n<\/p>\n<p>\nEven better, you can <a href=\"http:\/\/nbviewer.ipython.org\/url\/drsfenner.org\/public\/notebooks\/OLS_LAPACK_and_LAPACKE_2.ipynb\">view it using nbviewer<\/a>.\n<\/p>\n<h3>\nLicense<br \/>\n<\/h3>\n<p>\nUnless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).\n<\/p>\n<p><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-sa\/4.0\/88x31.png\" \/><\/a><br \/><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\">DrsFenner.org Blog And Notebooks<\/span> by <a xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" href=\"drsfenner.org\" property=\"cc:attributionName\" rel=\"cc:attributionURL\">Mark and Barbara Fenner<\/a> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License<\/a>.<br \/>Permissions beyond the scope of this license may be available at <a xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" href=\"drsfenner.org\/blog\/about-and-contacts\" rel=\"cc:morePermissions\">drsfenner.org\/blog\/about-and-contacts<\/a>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In the last post, we tried to use lower-level LAPACK\/LAPACKE calls to remove some of the copying overhead of our operations. Well, it turns out that it didn&#8217;t work out so well (see the table at the end of the post). The fundamental reason why is that LAPACKE took the easy way out, when they [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,7],"tags":[],"class_list":["post-392","post","type-post","status-publish","format-standard","hentry","category-mrdr","category-sci-math-stat-python"],"_links":{"self":[{"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/comments?post=392"}],"version-history":[{"count":2,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/392\/revisions"}],"predecessor-version":[{"id":428,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/posts\/392\/revisions\/428"}],"wp:attachment":[{"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/media?parent=392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/categories?post=392"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/drsfenner.org\/blog\/wp-json\/wp\/v2\/tags?post=392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}