Skip to content
June 11, 2013 / Damien Irving

Picking the right programming language

One of the really nice things about data analysis in the weather/climate sciences is that our profession has pretty much universally settled on Network Common Data Form (netCDF) as the file format for storing our data (see this previous post for details). As such, a number of command line utilities known as the NetCDF Operators (NCO) and Climate Data Operators (CDO) have been developed for performing common tasks on these files. The former focuses on simple data curation (e.g. viewing the contents of a file, selecting a subset of the data or editing the metadata within a file), while the latter provides for simple statistical analysis (e.g. calculating the climatology, percentile, correlation or heat wave index). Of course, all these tasks could be achieved by writing your own code, but it’s much quicker to use these command line utilities.

With freely available utilities like NCO and CDO out there, it’s tempting to think that you might be able to avoid ever having to write your own code. While this would be great, the unfortunate reality is that most of us do some pretty serious data processing from time to time, which goes far beyond the limits of what NCO and CDO can do. In a typical weather/climate science institution, people will use one or more of Fortran, C, MATLAB, Python, NCL, IDL, Ferret or R to do this. This is an intimidating list for the uninitiated, so people often find it difficult to decide which language to use. In fact, even when people have settled on an option, they often spend a lot of time wondering whether they should switch to a different one. Hopefully, the following summary will assist with these tough decisions.

A brief summary of the options…

Fortran and C are known as “type-safe”, “compiled”, “low-level” or “system programming” languages. Code written in these languages isn’t particularly concise, intuitive or programmer friendly. The pay-off is that it’s easier for the computer to interpret, meaning it runs really fast.

MATLAB and IDL are proprietary software, meaning that you or your employer would need to purchase an annual license. They are complete scientific computing environments that consist of not only their own “dynamically typed”, “interpreted”, “high-level” or “scripting” programming language, but also built in functions for data analysis and visualisation, as well as a fancy graphical user interface for viewing the data while you analyse it. High-level languages are popular because the resulting code is much shorter and more intuitive, which reduces the time required to write it. However, there’s a price to pay for writing concise, easy to understand code – it runs much slower than low-level code.

R, Ferret and NCL are somewhat similar to MATLAB and IDL, in that they are complete scientific computing environments with their own high-level language and lots of functions built in. The associated graphical user interface isn’t as fancy (or is simply absent) and the documentation isn’t as good, but that’s because they’re free. The greatest strength of NCL is that it has lots of weather/climate specific functions and creates very attractive images, while R has the most extensive library of statistical functions. Ferret has many functions that are useful for analysing oceanographic data, and is particularly popular with that community.

The problem with the environment-specific coding languages that come with MATLAB, IDL, NCL, Ferret and R is that they are fairly simple/primitive. It’s also difficult (or impossible) to link them with code written in other languages. Python on the other hand is a ‘real’, fully fledged, high-level, free and open-source programming language (i.e. like Fortran and C are real, fully fledged languages). It offers the clean and simple syntax of popular computing environments like MATLAB, but also has lots of tools for interacting with code written in different languages. In theory, the flexibility of Python would allow you to build your own MATLAB-like scientific computing environment. In practice this task would be an absolute nightmare, so it’s very fortunate that Continuum Analytics is such a nice company. A couple of years ago they released (for free) Anaconda, which bundles together around 200 of the most popular Python libraries for science, maths, engineering and data analysis. What’s more, if you need a library that isn’t part of the core 200 and can’t be installed easily with pip (the default Python package installer), then they’ve developed their own package manager called Conda to make the process painless (see this post on software installation for more details).

How to pick one…

Computers are so fast these days that unless you’re writing code that is many thousands of lines long (e.g. like a global climate model, which are typically written in Fortran), the biggest bottleneck in your data processing will be the speed at which you can write your code, not the speed at which it can be executed. This is why scientific computing environments like MATLAB, IDL, R, NCL, Ferret and Python/Anaconda have become so popular – high-level code is much faster to write, especially when you have tools available to view your data as you go. You should therefore weigh up the pros and cons of these six environments and simply pick one (i.e. consider the license fees, requirements of your work, what your colleagues use, etc). I personally think that Python/Anaconda is the best choice, but the truth of the matter is that you can find highly effective weather/climate scientists using any one of these six options. The key is that once you’ve picked one, you need to commit to learning it really well. In my experience it’s useful to become highly proficient in one language, so that you can make efficient use of that language for the majority of your work. You can then pick up the basics of other languages on an as needs basis, for those occasional tasks that can’t be achieved with your language of choice.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: