Skip to content
April 13, 2016 / Damien Irving

Keeping up with Continuum

I’m going to spend the next few hundred characters gushing over a for-profit company called Continuum Analytics. I know that seems a little weird for a blog that devotes much of its content to open science, but stick with me. It turns out that if you want to keep up with the latest developments in data science, then you need to be on top of what this company is doing.

If you’ve heard the name Continuum Analytics before, it’s probably in relation to a widely used Python distribution called Anaconda. In a nutshell, Travis Oliphant (who was the primary creator of NumPy) and his team at Continuum developed Anaconda, gave it away for free to the world, and then built a thriving business around it. Continuum makes its money by providing training, consultation and support to paying customers who use Anaconda (and who are engaged in data science/analytics more generally), in much the same way that RedHat provides support to customers using Linux.

The great thing about companies like RedHat and Continuum is that because their business fundamentally depends on open source software, they contribute a great deal back to the open source community. If you’ve ever been to a SciPy conference (something I would highly recommend), you would have noticed that there’s always a few presentations from Continuum staff, whose primary job appears to be to simply work on the coolest open source projects going around. What’s more, the company seems to have a knack for supporting projects that make life much, much easier for regular data scientists (i.e. people who know how to analyse data in Python, but for which things like system administration and web programming are beyond them). For instance, the projects they support (see the full list here) can help you install software without having to know anything about system admin (conda), create interactive web visualisations without knowing Javascript (bokeh), process data arrays larger than the available RAM without knowing anything about multi-core parallel processing (dask) and even speed up your code without having to resort to a low level language (numba).

Of these examples, the most important achievement (in my opinion) is the conda package manager, which I’ve talked about previously. Once you’ve installed either Anaconda (which comes with 75 of the most popular Python data science libraries already installed) or Miniconda (which essentially just comes with conda and nothing else), you can then use conda to install pretty much any library you’d like with one simple command line entry. That’s right. If you want pandas, just type conda install pandas and it will be there, along with its dependencies, playing nicely with all your other libraries. If you decide you’d like to access pandas from the jupyter notebook, just type conda install jupyter and you’re done. There are about 330 libraries available directly like this and because they are maintained by the Continuum team, they are guaranteed to work.

While this is all really nice, other Python distributions like Canopy also come with a package manager for installing widely used libraries. What sets conda apart is the ease with which the wider community can contribute. If you’ve written a library that you’d like people to be able to install easily, you can write an associated installation package and post it at Anaconda Cloud. For instance, Andrew Dawson (a climate scientist at Oxford) has written eofs, a Python library for doing EOF analysis. Rather than have users of his software mess around installing the dependencies for eofs, he has posted a conda package for eofs at his channel on Anaconda Cloud. Just type conda install -c https://conda.anaconda.org/ajdawson eofs and you’re done; it will install eofs and all its dependencies for you. Some users (e.g. like the US Integrated Ocean Observing System) even go a step further and post packages for a wide variety of Python libraries that are relevant to the work they do. This vast archive of community contributed conda packages means there isn’t a single library I use in my daily work that isn’t available via either conda install or Anaconda Cloud. In fact, a problem I often face is that there is more than one installation package for a particular library (i.e. which one do I use? And if I get an error, where should I ask for assistance?). To solve this problem, conda-forge has recently been launched. The idea is that it will house the lone instance of every community contributed package, in order to (a) avoid duplication of effort, and (b) make it clear where questions (and suggested updates / bug fixes) should be directed.

The final mind blowing feature of conda is the ease with which you can manage different environments. Rather than lump all your Python libraries in together, it can be nice to have a clean and completely separate environment for each discrete aspect of the work you do (e.g. I have a separate environments for my ocean data analysis, atmosphere data analysis and for testing new libraries). This will sound familiar to anyone who has used virtualenv, but again the value of conda environments is the ease with which the community can share. As an example, I’ve shared the details of my ocean data analysis environment (right down to the precise version of every single Python library). I started by exporting the details of the environment by typing conda env export -n ocean-environment -f blog-example, before posting it to my channel at Anaconda Cloud (conda env upload -f blog-example). Anyone can now come along and recreate that environment on their own computer by typing conda env create damienirving/blog-example (and then source activate blog-example to get it running). This is obviously huge for the reproducibility of my work, so for my next paper I’ll be posting a corresponding conda environment to Anaconda Cloud.

If you want to know more about Continuum, I highly recommend this Talk Python To Me podcast with Travis Oliphant.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: