Pushing IPython Notebooks to WordPress

It turns out that I really, really like writing documents as iPython notebooks – at least shorter documents. I’m also (as you can tell, if you are reading this) into blogging about my technical (and recreational) endavors. Since blog posts are shorter documents … Captain Obvious? … it is a small step to wanting to do my blog posting, or much of it … or any of it with Python code, as an iPython notebook. However, WordPress is primarily a php and database mashup and it doesn’t really work well (natively) with Python or iPython.

This has led several folks (like Jake VdV) to bail ship from WordPress to Pelican) and other folks (like Matt R) to not use notebooks directly in their posts. For myself, I decided to explore another option (b/c I already had web hosting and I didn’t want to spin up additional servers, migrate my blog — again, etc.).

I decided to make use of html as a common display format and use iPython’s conversion systems to feed the notebook. This process turned out to be more difficult than I wanted. I had originally hoped to add an element within my blog page that would be a "direct" link a notebook that was rendered by nbviewer. That didn’t work out so well: mainly b/c displaying a webpage within another webpage is a pain-in-the-tail. It is understandable that it is difficult because arbitrarily linking to external content can be a severe security issue.

There are ways around the cross linking issue (iframe is the "builtin" html construct — it sucks): ajax, requesting the remote resource on the server side and then displaying "joined" content to the client using php. I did this, it sort of worked, but it required me manually writing a "wrapper" post that included a php call to render the notebook on nbviewer). What was more difficult was getting all the elements of the nbviewer css integrated with the blog css theme. I don’t have patience for that sort of web development (not my cup of tea), so I kept looking for another option. In the end, I did have to do a bit of css work, but it amount to getting two pre-created css files onto my server. I can deal with that.

My Solution

What I settled on gave me the following capabilities.

In [1]:
from IPython.display import Image
Image('./resources/chart.png')
Out[1]:

As you can see, I wanted to give my readers the maximum flexibility to view the notebook in the way that worked best for them. You can either view it:

  • within my blog (and I’m assuming that’s how folks will find the notebook the first time),
  • through nbviewer, or
  • download the notebook and experiment.

I love option three because it really gets at the heart of what is most enjoyable about programming, data analysis, and this whole technical experience (for me).

All that said, here’s how I put together a simple system to accomplish this (on WordPress).

A Python Script

A few imports and globals
In [ ]:
#!/usr/bin/ipython --                                                                      
import os
import os.path as osp

# host and blog constants                                                                  
hostUser = "username"
remoteHost = "hostname"
remoteNotebookDir = "server/directory/for/notebooks"
blogUser = "blogusername"
blog_xml_url = "http://%s/path/to/blog/xmlrpc.php" % remoteHost

I’ve never actually needed to import Ipython.<xyz> before. Turns out, you need to be using an ipython environment to do that. So, my she-bang line calls out to ipython.

pushNotebook
In [ ]:
def pushNotebook(nbFilepath):
    nbFilename = osp.basename(nbFilepath)

    scpTgt = "%s@%s:%s" % (hostUser, remoteHost, osp.join(remoteNotebookDir,
                                                          nbFilename))
    print "Copying to %s" % scpTgt
    os.system("scp %s %s" % (nbFilepath, scpTgt))

Pushing the raw notebook file is pretty trivial. There are some higher level scp interfaces for Python, but they were all much more work than this one-liner. I do have to enter a password to access my private ssh key file when I run the program (which I could alleviate with an ssh-agent).

createBasicHTML

[Update as of 2015-03: Using iPython 2.x broke the process because it is, in general, smarter (it uses node.js to convert markdown, instead of pandoc). However, I need the old behavior (using pandoc because it resolves all of the MathJax to <class> entries.]

In [ ]:
def createBasicHTML(nbFilepath):
    from IPython.nbconvert import HTMLExporter
    from IPython.config import Config
    from IPython.nbconvert.filters.markdown import markdown2html_pandoc

    # force pandoc so blog content has no $, $$ left
    # (mimics ipython 1.x behavior
    myConfig = Config({"HTMLExporter":
                       {'filters':
                        {"markdown2html":markdown2html_pandoc}}})

    exporter = HTMLExporter(myConfig, template_file="basic")
    res = exporter.from_filename(nbFilepath)

    # for debugging                                                                        
    # open("tmp.html", "w").write(res[0])                                                  

    return res[0]

Here I’m leveraging iPython’s built-in HTML conversion machinery. Not very difficult. Win!

uploadPost
In [2]:
def uploadPost(content, title, cats):
    # http://python-wordpress-xmlrpc.readthedocs.org/en/latest/index.html                  
    from wordpress_xmlrpc import WordPressPost
    from wordpress_xmlrpc import Client
    from wordpress_xmlrpc.methods import posts, taxonomies

    import getpass

    # create post                                                                          
    post = WordPressPost()
    post.title = title
    # now using <!--raw--> so it doesn't show up in title page summaries
    post.content = "\n<!--raw-->\n%s\n<!--/raw-->" % content

    # create client connection                                                             
    blogPasswd = getpass.getpass("Enter blog password: ")
    client = Client(blog_xml_url, blogUser, blogPasswd)

    if cats:
        # make sure cats are valid                                                         
        serverCats = client.call(taxonomies.GetTerms('category'))
        serverCats = set(getattr(s, "name") for s in serverCats)
        assert set(cats).issubset(serverCats), "Invalid Cats"
        # apply cats to post                                                               
        post.terms_names = {'category':cats}

    post.id = client.call(posts.NewPost(post))

This was the most difficult piece to get working. The code isn’t complicated, and it even includes some error checking. But, it also hints at the fact that there are some things that have to happen on the blog server to get this to work. Among other steps, I have to have a mathjax plugin installed.

__main__
In [ ]:
if __name__ == "__main__":
    from optparse import OptionParser

    usage = "usage: % nb_pusher.py [options] ipynb_file"
    optParser = OptionParser(usage=usage)
    optParser.add_option("--cats", dest="cats", default="",
                         help="Cats for post (comma-separated string)")
    optParser.add_option("--title", dest="title", default="",
                         help="Title for post (string)")
    (options, args) = optParser.parse_args()

    nbFilepath = args[0]
    print "Processing %s" % nbFilepath

    html = createBasicHTML(nbFilepath)
    uploadPost(html,
               title=options.title,
               cats=[t.strip() for t in options.cats.split(",")])

    pushNotebook(nbFilepath)

I make a nice use of OptionParser to deal with the command-line interface. Also, I only call pushNotebook after I’ve done the (more error prone?) step of uploading the post.

Blog Configuration

I had to make a few modifications on the WordPress server side to get posts to play nicely. The simplest was to install and enable the Mathjax-LaTeX plugin (I’m currently using version 1.3.3) and the Raw HTML plugin (looks like my current version is 1.4.14). It’s not obvious, but for the raw plugin, you can use either [raw] or <!--raw--> style tags. The later is nice because if the HTML code shows up in a different context, the square bracket form might leak through.

The second set of mods was a bit more involved. In my WordPress theme’s functions.php file, I added the following:

Additions to functions.php
function enable_ipynb_css(){
    wp_register_style('ipynb_notebook_colors',
                      get_template_directory_uri() . '/css/minimal.ipynb.colors.css');          
    wp_register_style('ipynb_notebook_flair',
                      get_template_directory_uri() . '/css/flair.ipynb.css');

    wp_enqueue_style('ipynb_notebook_colors');
    wp_enqueue_style('ipynb_notebook_flair');
}
add_action('wp_enqueue_scripts', 'enable_ipynb_css');

Since I don’t do web dev, I had to hack together the two css files. I can’t actually remember how I put them together now (at least, I don’t remember where I scraped the colors from for minimal.ipynb.colors.css). It was either in one of iPython‘s distributed css files (IPython/static/style/*.css) or from the css files that nbviewer uses. I’m leaning towards nbviewer.

For the second file, I took some hints from Frank Cleary who built a very nice, clean style for his standalone exported notebooks. I had to edit it a bit to get it to work when embedded in blog post (I had to add some space to the left margin, I think).

flair.ipynb.css
div#notebook{
margin-top:50px;
margin-bottom:100px;
}
div.cell{
max-width:60em;
margin-left:auto;
margin-right:auto;
}
div.input_prompt, div.output_prompt{
 display:none;
}
div.input, div.output_wrapper{
margin-top:1em;
margin-bottom:1em;
}
div.text_cell_render{
margin-top:-2px;
margin-bottom:-2px;
padding-top:2px;
padding-bottom:2px;
padding-left:6px;
border-left:2px solid #505050;
border-collapse:collapse;
border-top:none;
border-bottom:none;
}
minimal.ipynb.colors.css
.highlight .hll { background-color: #ffffcc }
.highlight  { background: #f8f8f8; }
.highlight .c { color: #408080; font-style: italic } /* Comment */
.highlight .err { border: 1px solid #FF0000 } /* Error */
.highlight .k { color: #008000; font-weight: bold } /* Keyword */
.highlight .o { color: #666666 } /* Operator */
.highlight .cm { color: #408080; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #BC7A00 } /* Comment.Preproc */
.highlight .c1 { color: #408080; font-style: italic } /* Comment.Single */
.highlight .cs { color: #408080; font-style: italic } /* Comment.Special */
.highlight .gd { color: #A00000 } /* Generic.Deleted */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #FF0000 } /* Generic.Error */
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.highlight .gi { color: #00A000 } /* Generic.Inserted */
.highlight .go { color: #888888 } /* Generic.Output */
.highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.highlight .gt { color: #0044DD } /* Generic.Traceback */
.highlight .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
.highlight .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
.highlight .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
.highlight .kp { color: #008000 } /* Keyword.Pseudo */
.highlight .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #B00040 } /* Keyword.Type */
.highlight .m { color: #666666 } /* Literal.Number */
.highlight .s { color: #BA2121 } /* Literal.String */
.highlight .na { color: #7D9029 } /* Name.Attribute */
.highlight .nb { color: #008000 } /* Name.Builtin */
.highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */
.highlight .no { color: #880000 } /* Name.Constant */
.highlight .nd { color: #AA22FF } /* Name.Decorator */
.highlight .ni { color: #999999; font-weight: bold } /* Name.Entity */
.highlight .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
.highlight .nf { color: #0000FF } /* Name.Function */
.highlight .nl { color: #A0A000 } /* Name.Label */
.highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
.highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */
.highlight .nv { color: #19177C } /* Name.Variable */
.highlight .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mf { color: #666666 } /* Literal.Number.Float */
.highlight .mh { color: #666666 } /* Literal.Number.Hex */
.highlight .mi { color: #666666 } /* Literal.Number.Integer */
.highlight .mo { color: #666666 } /* Literal.Number.Oct */
.highlight .sb { color: #BA2121 } /* Literal.String.Backtick */
.highlight .sc { color: #BA2121 } /* Literal.String.Char */
.highlight .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
.highlight .s2 { color: #BA2121 } /* Literal.String.Double */
.highlight .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
.highlight .sh { color: #BA2121 } /* Literal.String.Heredoc */
.highlight .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
.highlight .sx { color: #008000 } /* Literal.String.Other */
.highlight .sr { color: #BB6688 } /* Literal.String.Regex */
.highlight .s1 { color: #BA2121 } /* Literal.String.Single */
.highlight .ss { color: #19177C } /* Literal.String.Symbol */
.highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #19177C } /* Name.Variable.Class */
.highlight .vg { color: #19177C } /* Name.Variable.Global */
.highlight .vi { color: #19177C } /* Name.Variable.Instance */
.highlight .il { color: #666666 } /* Literal.Number.Integer.Long */

Last Thoughts and Future Work

Some of you may have noticed that for my little information flow graphic above, I used ipython’s display system to render the graphic. I did that because my notebook pusher, currently, does not handle additional resources (such as linked local graphics). I’ll consider that future work and, for now, simply display additional resources inline. If I want to use examples with sample data files, I’ll (probably) just manually push the data to a common, global url so it can be used from anywhere with an absolute path.

Also, I wanted the license of all the blog content (and notebooks) to be clear. So, I’m (manually) attaching the following cell to all of the notebooks. Eventually, I’ll just have the notebook_pusher do it. The notebook itself is simple json, so appending an entry at the end of the notebook should be pretty simple. I just haven’t done it yet.

Additional Resources

You can grab a copy of this notebook.

Even better, you can view it using nbviewer.

License

Unless otherwise noted, the contents of this notebook are under the following license. The code in the notebook should be considered part of the text (i.e., licensed and treated as as follows).

Creative Commons License
DrsFenner.org Blog And Notebooks by Mark and Barbara Fenner is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at drsfenner.org/blog/about-and-contacts.