Category Archives: SciMathStatPython

Some Iron Condor Heuristics

I’m still debating the best way to work with ipython notebooks in this blog. However, until I come to “final” answer (which might be going away from wordpress and going with a github-pages/pellican solution, ala Jake VdP), I’m just going to (hopefully) upload the notebooks and link to them via nbviewer. Here goes:

The raw notebooks: Two Iron Condor Heuristics (Raw)

As seen by nbviewer: Two Iron Condor Heuristics (Through nbviewer)

PyData NYC Nov. 2013

PyData 2013 NYC was a pretty great time.  It is always fun to meet folks as passionate about your favorite tools as you are.  There’s probably too much to really mention, but I definitely want to throw together a few of my thoughts and ideas.  Without futher ado …

Some of the talks I went to:

  • Travis talking about conda (and blog post and blog post).  While I’m an admitted gentoo fanboy (actually, I don’t fan at all; I just use it), having a lighter weight option for the Python eco-system (across *nix (including OSX) and Windows) is really nice.  If I would have realized a few things about conda last year (I’m not sure how far along it was, at the right time point), I might have used it for some internal code deployment.
  • Yves talking about Performance Python (and an ipython notebook of the same; some other talk material is at his website).  Not much here was new to me — but — being reminded of the fundamentals and low-hanging fruit is always good.
  • Dan Blanchard talking about skll (and a link to the talk).  skll seems to take care of several procedural meta-steps in scikit-learn programs:  train/test/CV splits and model parameter grid searches.
  • Thomas Wiecki talking about pymc3 (most of the talk material shows up in the pymc3 docs; he also mentioned Quantopian’s zipline project and he has a few interesting git repos).
  • Peter Wang’s keynote was insightful, thought provoking, and not the typical painful keynote that has you checking email the whole time.  He mentioned a Jim Gray paper that seems worthwhile.  By reputation, everything Jim Gray did was worthwhile.  [Gray disappeared while sailing a few years back.]

A thought that I’ve had over the years and that I’d love to see come to (ongoing) completion is some sort of CI job (continuous integration) that grabs the main Python learning systems, builds them, and runs [some|many|most|all] of the learning algorithms on synthetic, random, and/or standard (UCI, kaggle, etc.) datasets.  Of course, we would measure resource usage (time/memory) and error rates.  While the time performance is what would really get most people interested (and also cause the most dissent:  you weren’t fair to XYZ), I’m more interested in verifying that random forest in scikit-learn and orange give marginally similar results.  Throwing in some R and matlab options would give some comparison to the outside world, as well.

Doing these comparisons in the right way has a number of difficulties, as I discussed with Jake VanderPlas.  In just a few minutes, we were worried about data format differences (less important for numpy based alternatives, Orange uses its own ExampleTable — which you can convert to/from numpy arrays), default and hard-coded parameters (possibly not being able to compare equivalent models), and social issues.

iPython

Oddly, although I’ve been using the scientific Python stack for about 10 years, I’ve never jumped on the iPython band-waggon.  Well, I’ve fired it up once or twice, but I’ve never really dug into using it.  I’ve gotten more curious about the iPython notebook facility (I assume this idea traces back to Mathematica’s notebook nomenclature).  Anyway, I decided to see what all the fuss is about (and by fuss, I mean the fact that a large number of folks at SciPy 2013 gave their presentations in iPython or https://www.wakari.io/gallery notebooks.  So, here goes.

Obviously, I could start with wakari in about five minutes, but I always like to try to “spin my own” first.  I’m on gentoo and I have iPython installed.  So, after browsing through the iPython notebook page (linked above), I found the magical incantation:

ipython notebook

As is fairly typical in the roll your own universe, I got an error:

Traceback (most recent call last):
 File "/usr/bin/ipython-python2.7", line 9, in <module>
 load_entry_point('ipython==0.13.2', 'console_scripts', 'ipython')()
 File "/usr/lib64/python2.7/site-packages/IPython/frontend/terminal/ipapp.py", line 388, in launch_new_instance
 app.initialize()
<snip> boy error messages are "noisy" </snip>
File "/usr/lib64/python2.7/site-packages/IPython/frontend/html/notebook/__init__.py", line 8, in <module>
 raise ImportError(msg)ImportError: The IPython Notebook requires tornado >= 2.1.0

But, a quick emerge -av www-servers/tornado and ipython notebook I was off and running:

ipython-start

A simple click on the New Notebook button:

new-notebook-button

and I had an (i)python prompt.  Part of my goal with this is to experiment with formatting of hybrid text, mathematics, and (mostly python) source code for a book.  I have an idea in mind, but much like artistry, I have to figure out (clarify and implement) what I have in my head.  Anyway, as a simple example, I’m going to play with some description, mathematics, and code for the time-honored method of linear regression.  And for that, I’m going to switch over to my notebook.  Hopefully, I’ll be able to share it @http://nbviewer.ipython.org/.  I’ll also try the static html export facility to dump it here.

So, it turns out I was a bit ahead of myself.  Emerging tornado installed a half-baked, and somewhat broken, MathJax processing.  This resulted in some weird “this works, that doesn’t” \(LaTeX\) support.  I uninstalled iPython and www-servers/tornado and I made some edits to /etc/portage/package.use:

mfenner [528] % grep dev-python/ipython /etc/portage/package.use
dev-python/ipython -wxwidgets matplotlib examples notebook

The change resulted in a much happier installation.

I dug up (“Googled”) a few resources for editing the notebooks, but the underlying URLs are are a bit ugly (they might break): Rich Display System and  Markdown Cells.