Getting started with Python
When you’re just starting out in the weather/climate sciences, it can be difficult to know which programming language to adopt. In fact, even if you’ve been around for years, you probably find yourself re-assessing your relationship with your current programming language from time to time. I’ve discussed the pros and cons of the various options in a previous post, but for the purposes of this post let’s assume you’ve decided that Python is the language for you.
First off, congratulations on making a great choice – you’re certainly not alone. In a recent issue of the Bulletin of the American Meteorological Society (BAMS), Johnny Lin (Python guru with a very useful homepage) wrote a great article explaining why Python is the ‘new wave’ in earth sciences computing. It would be fair to say that this is a widely held view, particularly considering that the AMS has hosted an annual Python symposium for the last 3 years running. Since the most difficult part of learning a new language is often figuring out where to start, I’ve put together a 5-step guide to getting started with Python:
Step 1: Install Anaconda
There are literally thousands of Python packages out there, written for all sorts of computing applications. In order to create an effective working environment, you basically need to collect all the packages that are relevant to your work (e.g. packages for data visualisation, statistics, reading/writing netCDF files) and then install them in such a way that they all interact nicely. This used to be a time consuming nightmare, until a company called Continuum Analytics came along. Their Anaconda distribution (which is free to download here) bundles together 200 of the most popular Python libraries for science, engineering and data analysis. What’s more, for libraries that aren’t part of the core 200 and can’t be easily installed with pip (the default Python package installer), they’ve developed their own package manager called conda. Anaconda also comes with a number of development environments (IPython QtConsole, IPython Notebook, Spyder), so you can choose whichever you like best.
Step 2: Learn the basics of Python programming
The quickest way to learn the basics of Python is to attend a Software Carpentry workshop. This volunteer-based organisation has been teaching programming to scientists for a number of years now and their lessons introduce a series of fundamental best practices for scientific computing. If there isn’t a workshop coming up in your region, you can contact them to request a workshop and/or look over their online lesson materials.
Step 3: Familiarise yourself with the core Python libraries used in the atmospheric and ocean sciences
The default Python library for dealing with large arrays of numeric data (e.g. four dimensional latitude/longitude/altitude/time data arrays) is numpy, while the default for reading and writing netCDF files is netCDF4 (the capability to read and write text files, including .csv, is built into numpy). Because these are both generic libraries, it usually takes a fair bit of wrangling to get them to do tasks that are common to the atmospheric and ocean sciences. What’s more, most people end up doing similar wrangling, so there’s a lot of duplication of effort. Recognising this as a problem, Stephan Hoyer and others at the The Climate Corporation have written a library called xray, which builds on numpy, netCDF4 and another popular data analysis library called pandas (which is great for tabular data). You’ll want to use xray for all your netCDF input/output and also for routine tasks like calculating climatologies and anomalies.
Similar to xray, the team at the Met Office realised that there’s a fair bit of wrangling involved in using the generic Python plotting library (matplotlib) to create typical atmospheric and ocean science plots. They’ve therefore written iris and cartopy, which greatly simplify the use of matplotlib and add some great new functionality, especially when it comes to plotting on different map projections.
The xray, iris and cartopy websites explain how to install them using conda and also provide links to excellent tutorials.
Step 4: Find the specialised libraries you need
Now you’ve got your head around the core Python libraries used for data processing and visualisation, you’ll want to hunt around and see if there are any libraries out there for the highly specialised aspects of your work. For instance, there are libraries available for dealing with radar data (Py-ART), analysing and plotting skew-T diagrams (SkewT), performing computations on global wind fields in spherical geometry (windspharm) and so on. A listing of most of the packages out there can be found at the packages tab above (please let us know if there are any missing!).
Step 5: Sign up to the PyAOS mailing list
The PyAOS mailing list (sign up here) is the place to keep up to date with the latest Python developments relevant to the atmospheric and ocean sciences.