Keeping up with Continuum
I’m going to spend the next few hundred characters gushing over a for-profit company called Continuum Analytics. I know that seems a little weird for a blog that devotes much of its content to open science, but stick with me. It turns out that if you want to keep up with the latest developments in data science, then you need to be on top of what this company is doing.
If you’ve heard the name Continuum Analytics before, it’s probably in relation to a widely used Python distribution called Anaconda. In a nutshell, Travis Oliphant (who was the primary creator of NumPy) and his team at Continuum developed Anaconda, gave it away for free to the world, and then built a thriving business around it. Continuum makes its money by providing training, consultation and support to paying customers who use Anaconda (and who are engaged in data science/analytics more generally), in much the same way that RedHat provides support to customers using Linux.
Of these examples, the most important achievement (in my opinion) is the conda package manager, which I’ve talked about previously. Once you’ve installed either Anaconda (which comes with 75 of the most popular Python data science libraries already installed) or Miniconda (which essentially just comes with conda and nothing else), you can then use conda to install pretty much any library you’d like with one simple command line entry. That’s right. If you want pandas, just type
conda install pandas and it will be there, along with its dependencies, playing nicely with all your other libraries. If you decide you’d like to access pandas from the jupyter notebook, just type
conda install jupyter and you’re done. There are about 330 libraries available directly like this and because they are maintained by the Continuum team, they are guaranteed to work.
While this is all really nice, other Python distributions like Canopy also come with a package manager for installing widely used libraries. What sets conda apart is the ease with which the wider community can contribute. If you’ve written a library that you’d like people to be able to install easily, you can write an associated installation package and post it at Anaconda Cloud. For instance, Andrew Dawson (a climate scientist at Oxford) has written eofs, a Python library for doing EOF analysis. Rather than have users of his software mess around installing the dependencies for eofs, he has posted a conda package for eofs at his channel on Anaconda Cloud. Just type
conda install -c https://conda.anaconda.org/ajdawson eofs and you’re done; it will install eofs and all its dependencies for you. Some users (e.g. like the US Integrated Ocean Observing System) even go a step further and post packages for a wide variety of Python libraries that are relevant to the work they do. This vast archive of community contributed conda packages means there isn’t a single library I use in my daily work that isn’t available via either
conda install or Anaconda Cloud. In fact, a problem I often face is that there is more than one installation package for a particular library (i.e. which one do I use? And if I get an error, where should I ask for assistance?). To solve this problem, conda-forge has recently been launched. The idea is that it will house the lone instance of every community contributed package, in order to (a) avoid duplication of effort, and (b) make it clear where questions (and suggested updates / bug fixes) should be directed.
The final mind blowing feature of conda is the ease with which you can manage different environments. Rather than lump all your Python libraries in together, it can be nice to have a clean and completely separate environment for each discrete aspect of the work you do (e.g. I have a separate environments for my ocean data analysis, atmosphere data analysis and for testing new libraries). This will sound familiar to anyone who has used virtualenv, but again the value of conda environments is the ease with which the community can share. As an example, I’ve shared the details of my ocean data analysis environment (right down to the precise version of every single Python library). I started by exporting the details of the environment by typing
conda env export -n ocean-environment -f blog-example, before posting it to my channel at Anaconda Cloud (
conda env upload -f blog-example). Anyone can now come along and recreate that environment on their own computer by typing
conda env create damienirving/blog-example (and then
source activate blog-example to get it running). This is obviously huge for the reproducibility of my work, so for my next paper I’ll be posting a corresponding conda environment to Anaconda Cloud.
If you want to know more about Continuum, I highly recommend this Talk Python To Me podcast with Travis Oliphant.