Skip to content
December 30, 2018 / Damien Irving

Data analysis and ocean model grids

Since making the shift from an atmosphere-based PhD to an ocean-based postdoc, I’ve spent more time than I’d like getting my head around ocean model grids. In particular, the convention for modelling groups participating in CMIP5 was to archive ocean variables on the native model grid. Some of the CMIP5 ocean models run on a rectilinear grid (i.e. a regular latitude/longitude grid like the atmosphere model), but most are run on some sort of curvilinear grid (i.e. a grid where the coordinate lines are curved). The details of these curvilinear grids are in most cases not widely available, but the (netCDF) data files do contain auxillary coordinate information that specifies the location of each curvilinear grid point in two dimensional latitude-longitude space.

At some point, most analysis of CMIP5 ocean data involves remapping from the native curvilinear grid to a regular latitude/longitude grid. I’ve found that the level of complexity involved in this remapping depends on whether you’re dealing with a scalar or vector quantity, and whether there are conservation properties that need to be maintained (e.g. conservation of energy or moisture).

The discussion that follows is my attempt to step through each level of complexity, exploring the software packages and analysis approaches currently used for remapping from a curvilinear to rectilinear grid.

1. Scalar quantity

The most simple case involves remapping a scalar quantity such as temperature or salinity from a curvilinear to rectilinear grid. There are a number of libraries out there for doing this:

2. Scalar quantity with conservation

The next level of complexity involves remapping a scalar quantity such as ocean heat content (OHC), where it can be important to conserve the global total. The remapping packages listed above approximately conserve the global mean, but unfortunately not the global sum. This is presumably because some ocean area/volume is gained or lost as the continents subtly change shape between the old and new grid. One of the reasons why CMIP5 ocean data is archived on the native model grid is that providing remapped data would make it impossible to close energy budgets.

To work around this problem, people tend to devise their own scheme for moving from curvilinear to rectilinear space. This typically involves making use of the auxillary coordinate information contained in the CMIP5 data files. For instance, in calculating the zonally integrated OHC (while conserving the global total), Nummelin et al (2017) essentially iterate over each one degree latitude band, adding up the OHC from all grid cells whose auxillary coordinate falls within that band.

3. Vector quantity

The most complex case is when remapping vector quantities such as heat flux (hf) or water velocity. The x and y components of such variables are archived as two separate variables (e.g. hfx and hfy). The complexity arises because the x and y directions on a curvilinear grid are not everywhere the same as the x (eastward) and y (northward) directions on a geographic latitude/longitude grid. This means you can’t apply the aforementioned software packages or work-arounds to move from curvilinear to rectilinear space – any remapping requires the use of information from both the native x and y components.

The most obvious solution to this problem would be to convert the native x and y components into eastward and northward components using the grid angle at each grid cell. Unfortunately, the CMIP5 modelling groups didn’t archive a grid angle variable, and even if they did it turns out that complications can arise. Using grid angles obtained for the NorESM1-M model, Outten et al (2018) calculated the northward heat transport from the native hfx and hfy components and found small differences compared to the northward heat transport output directly from the model (the hfbasin variable). Investigation revealed that the grid angle was given for the p points of the grid cells (i.e. the center of each grid cell), while the hfy and hfx were given on the top and bottom and left and right edges of the grid cell, respectively, as per a standard C-grid configuration. Thus the angles given were not precisely accurate for the locations of hfy and hfx. It’s possible that these issues might be overcome using the xgcm package (which interpolates variables from one position to another on a C-grid), but the use of that package requires detailed knowledge of the grid configuration (i.e. you need to know that it’s a C-grid in the first place), which isn’t readily accessible for most CMIP5 models.

The “zigzag” solution that Outten et al (2018) came up with was somewhat similar to the scalar work-around described above. In their method, grid cells were selected along a line of single latitude. A zonal boundary was then identified from the edges of these grid cells, and the fluxes across this boundary were summed to give the meridional transport at the respective latitude. At latitudes close to the model grid poles where the grid is curved, the identified cells at a single latitude were not on the same row, thus the transport across the boundary included heat transport in both the y and x directions. The process was repeated at each latitude to obtain the complete meridional ocean heat transport.

Where possible, others get around the vector quantity issue by converting to scalar quantities. For instance, when dealing with water velocity it is possible to use helmholtz decomposition to write each vector in terms of the streamfunction and velocity potential. These scalar quantities can be remapped to a rectilinear grid, and then gradients can be calculated on the new grid to recover the eastward and northward components of the water velocity.

Conclusion

Once you move beyond the simple case of remapping a scalar quantity (where conservation isn’t important) from a curvilinear to rectilinear grid, there aren’t any “off the shelf” software packages available for the job. It’s difficult to think of a simple solution for remapping when conservation of the global sum is required (changing the shape of the continents is unavoidable), but we could do much better when it comes to vector quantities. If grid angles and configurations were archived/documented, it would be possible to write software packages that ingest that information in order to perform the remapping.

Advertisements
September 10, 2018 / Damien Irving

Data Carpentry for atmosphere and ocean scientists

This post originally appeared on the Data Carpentry blog.

Back in late 2012, I was a couple of years into my first job out of college. My undergraduate studies had left me somewhat underprepared for the coding associated with analysing climate model data for a national science organisation, so I was searching online for assistance with Python programming. I stumbled upon the website of an organisation called Software Carpentry, which at the time was a relatively small group of volunteers running two-day scientific computing “bootcamps” for researchers. I reached out to ask if they’d be interested in running a workshop alongside the 2013 Annual Conference of the Australian Meteorological and Oceanographic Society (AMOS), and to my surprise Greg Wilson – the co-founder of the organisation – flew out to Australia to teach at our event in Melbourne and another in Sydney (the first ever bootcamps outside of North America and Europe). I trained up as an instructor soon after, and from 2014-2017 I hosted Software Carpentry workshops alongside the AMOS conference, as well as other ad hoc workshops in various meteorology and oceanography departments.

While these workshops were very popular and well received (Software Carpentry workshops always are), in the back of my mind I wanted to have a go at running a workshop designed specifically for atmosphere and ocean scientists. Instead of teaching generic skills in the hope that people would figure out how to apply them in their own context, I wanted to cut out the middle step and run a workshop in the atmosphere and ocean science context. This idea of discipline (or data-type) specific workshops was the driving force behind the establishment of Data Carpentry, so this year with their assistance I’ve developed lesson materials for a complete one-day workshop:
https://carpentrieslab.github.io/python-aos-lesson

The workshop centers around the task of writing a Python script that calculates and plots the seasonal rainfall climatology (i.e. the average rainfall) from the output from any arbitrary climate model. Such data is typically stored in netCDF file format and follows a strict “climate and forecasting” metadata convention. Along the way, we learn about the PyAOS stack (i.e. the ecosystem of libraries used in the atmosphere and ocean sciences), how to manage and share a software environment using conda, how to write modular/reusable code, how to write scripts that behave like other command line programs, version control, defensive programming strategies and how to capture and record the provenance of the data files and figures that we produce.

I’ve run the workshop twice now (at the 2018 AMOS Conference in Sydney and at Woods Hole Oceanographic Institution last month), which means I’ve completed the alpha stage of the Data Carpentry lesson development cycle. Moving from the alpha to beta stage involves having people other than me teach, which is where you come in. If you’re a qualified Carpentries instructor and would be interested in teaching the lessons (some experience with the netCDF file format and xarray Python library is useful), please get in touch with either myself or Francois Michonneau (Curriculum Development Lead for Data Carpentry). You can also request a workshop at your institution by contacting us and we’ll reach out to instructors. There is no fee for a pilot workshop, but you would need to cover travel expenses for instructors. I’d also be happy to hear any general feedback about the lesson materials at the associated GitHub repository.

March 16, 2018 / Damien Irving

Collaborative lesson development

If you’re a regular reader of this blog, you’ve no doubt heard me talk about The Carpentries – a global community committed to teaching foundational computational and data science skills to researchers. Hundreds of Software Carpentry and Data Carpentry workshops are held around the world every year, which is a monumental effort for an initiative that only got started in earnest five or so years ago.

While these workshops have had a massive impact on the computational literacy of the research community, in my opinion the most revolutionary thing about The Carpentries is not what we teach, but how we teach it. More specifically, the revolution lies in what we do in the background to develop and maintain our lessons. When a volunteer instructor is preparing to teach a workshop, they have an extensive collection of open and easily accessible lesson materials to select from. Unlike a static textbook produced by a small group of authors, these lessons are continually refined and updated by a large and diverse community of contributors, which means they are (by a wide margin) the state-of-the-art lessons in their discipline.

This process of community lesson development is probably best explained by reflecting on my own personal experience. I participated in The Carpentries instructor training program back in 2013, which among other things gave me a grounding in various evidence-based best-practices of teaching (i.e. an understanding of the fundamental pedagogical principles underpinning the lessons). Upon teaching my first few workshops, I started to contribute back to the lessons by fixing typos and other minor issues identified by participants. As I got more experience with teaching the materials, I started to make more substantive contributions, proposing and participating in re-writes of entire sections. Now with over twenty workshops under my belt, I’m writing a whole new set of Data Carpentry lessons specifically for atmosphere and ocean scientists (see here for a sneak peak).

The contrast between this process and that typically followed by university lecturers could not be more stark. While contributing back to the The Carpentries lesson materials can be a little tedious at times, it is WAY less work (and results in a much superior product) than if I had to develop the materials for my workshops myself. Most lecturers get some hand-me-down materials from the staff member that went before them (if they’re lucky), and then they’re on their own. What’s more, anything they learn about teaching their discipline better has no impact beyond the four walls of their own classroom.

By following the community-based lead of The Carpentries, the quality of teaching at universities could be improved, while at the same time saving substantial time for lecturers. For instance, I can think of at least five universities that teach detailed courses on the weather and climate of Australia. If the teachers of these courses got together (perhaps facilitated by the Australian Meteorological and Oceanographic Society) and started to collaboratively develop and maintain a set of lesson materials, the educational experience for students (and all the good things that come from that; e.g. student retention, graduate quality and numbers) could improve markedly.

Of course, the one thing I’ve skipped over here is all the ingredients that make community lesson development work. What platforms are best for hosting the materials? What do you do when people disagree on the direction of the lessons? How do you structure the lessons for ease of use and contribution? To try and assist people with this, a bunch of Carpentries people (myself included) got together recently and published Ten Simple Rules for Collaborative Lesson Development. It distills everything we’ve learned over the years, and is hopefully a useful resource for anyone thinking of giving it a try.

 

 

February 8, 2018 / Damien Irving

Semantic versioning for our major climate reports?

I’m attending the joint AMOS / ICSHMO Conference in Sydney this week, where among other topics there’s been a lively discussion about preliminary plans for the next generation of climate projections for Australia.

In the past, new projections have been released by our major national science agencies (i.e. the Bureau of Meteorology and CSIRO) every 5-10 years: 1992, 1996, 2001, 2007 and 2015 (Whetton et al, 2016). This approach seems appropriate for the 1990s, but with such widespread use of the internet nowadays and the rapid pace of advances in climate research, the custom of publishing a periodical static report (and/or website) seems somewhat dated.

An alternative approach would be to follow the lead of the software development community. After a software package is released, it is common practice to keep track of subsequent updates and improvements via the use of semantic versioning. The convention for labeling each new version release is MAJOR.MINOR.PATCH (e.g. version 2.5.3), where you increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes

The authors of the software then update the changelog for the project, which lists the notable changes for each new version. (The new changelog entry is often emailed to users of the software.)

It’s easy to imagine a situation where national climate projections could be posted online (e.g. https://www.climatechangeinaustralia.gov.au), with updates released on an as-needs basis by increments to the:

  • MAJOR version when there is a fundamental change to the projections themselves (e.g. a critical new dataset becomes available, like CMIP6)
  • MINOR version when content is added for a highly topical issue (e.g. an international climate target/agreement is reached, a major climate event occurs such as the warming hiatus)
  • PATCH version when minor research updates are added

The most obvious advantage of this approach is that frequent incremental updates ensure that the projections remain up-to-date with both user demands and the latest science. Most of the work would still get done in the lead up to the release of a new major version, but incremental improvements could be made in the meantime, rather than waiting years for the next report.

October 24, 2017 / Damien Irving

Best practices for scientific software

Code written by a research scientist typically lies somewhere on a continuum ranging from “scientific code” that was simply hacked together for individual use (e.g. to produce a figure for a journal paper) to “scientific software” that has been formally packaged and released for use by the wider community.

I’ve written at length (e.g. how to write a reproducible paper) about the best practices that apply to the scientific code end of the spectrum, so in this post I wanted to turn my attention to scientific software. In other words, what’s involved in turning scientific code into something that anyone can use?

My attempt at answering this question is based on my experiences as an Associate Editor with the Journal of Open Research Software. I’m focusing on Python since (a) most new scientific software in the weather/ocean/climate sciences is written in that language, and (b) it’s the language I’m most familiar with.

Hosting

First off, you’ll need to create a repository on a site like GitHub or Bitbucket to host your (version controlled) software. As well as providing the means to make your code available to the community, these sites have features that help with things like community discussion and software release management. One of the first things you’ll need to include in your repository is a software license. Jake VanderPlas has an excellent post on why you need a license and how to pick one.

Packaging / installation

If you want people to use your software, you need to make it as easy as possible for them to install it. In Python, this means packaging the code in such a way that it can be made available via the Python Package Index (PyPI). If your code and all the libraries it depends on are written purely in Python, then this is all you need to do. People will simply be able to “pip install” your software from the command line.

If your software has non-Python dependencies (e.g. netCDF libraries), then it’s a good idea to make sure that it can also be installed via conda. Using recipes that developers (i.e. you, in this case) submit to conda-forge, this popular package manager installs software and all it’s dependencies at once. I’ve talked extensively about conda in a previous post.

Documentation

While it might seem like the documentation pages for your favourite Python libraries were painstakingly typed by hand, they were almost certainly created using software that automatically takes all the information from the docstrings in your code and formats them nicely for display on the web. In most cases, people use Sphinx to generate the documentation and Read the Docs to publish it (here’s a nice description of that whole process).

Assistance

In providing assistance to users, software projects will typically use a combination of encouraging people to submit issues on their GitHub/Bitbucket page (for technical questions that will possibly require a change to the code) and platforms like Google Groups and/or Gitter (a chat client provided by GitLab) for more general questions about how to use the software.

The bonus of GitHub issues, Google Groups and Gitter is that anyone can view the questions and answers, not just the lead developers of the software. This means that random people from the community can chime in with answers (reducing your workload) and it also helps reduce the incidence of getting the same question from many people.

Testing

If you want users (and your future self) to trust that your code actually works, you’ll need to develop a suite of tests using one of the many testing libraries available in Python. You can then use a platform like Travis CI to automatically run those tests each time you change your code, to make sure you haven’t broken anything. Many people add a little code coverage badge to the README file in their code repository using Coveralls, to indicate how much of the code is covered by the tests.

Academic publishing

To make sure you get the academic credit you deserve for the hard work associated with releasing and maintaining scientific software, it’s important to publish an academic article about your software (i.e. so that people can cite it in the methods sections of their papers). If there isn’t an existing journal dedicated to the type of software you’ve written (e.g. Geoscientific Model Development), then the Journal of Open Research Software or Journal of Open Source Software are good options.

 

This is obviously a very broad overview of what’s involved in packaging and releasing scientific software. Depending on where you sit on the scientific code / scientific software spectrum, not all of the things listed above will be necessary. For instance, if you’re writing code that only needs to be used by a group of 5 people working on the same computer system, hosting on GitHub, testing using Travis CI and the use of GitHub issues and gitter for discussion might be useful, but perhaps not packaging with PyPI or a journal paper with the Journal of Open Research Software.

A great resource for more detailed advice is the Software Sustainability Institute (their online guides are particularly useful). It’s also worth checking out the gold standards in the weather/ocean/climate space. In terms of individual researchers releasing their own software, this would be the eofs and windspharm packages from Andrew Dawson. Packages like MetPy (UCAR / Unidata), Py-ART (ARM Climate Research Facility) and Iris / Cartopy (MetOffice) are good examples of what can be achieved with some institutional support.

October 20, 2017 / Damien Irving

Talk Python To Me

I’m a big fan of the Talk Python To Me podcast, so I was very excited to be invited on the show this week to record an episode about how Python is used in climate science!

If you like the podcast, the episodes with Jonah Duckles from Software Carpentry and Travis Oliphant from Continuum Analytics are super interesting. I’ve also added some new entries to my list of weather/climate science podcasts, so there’s plenty out there to listen to! 🙂

May 8, 2017 / Damien Irving

A vision for CMIP6 in Australia

Most climate researchers would be well aware that phase 6 of the Climate Model Intercomparison Project (CMIP6) is now underway. The experiments have been designed, the modelling groups are gearing up to run them, and data should begin to come online sometime next year (see this special issue of Geoscientific Model Development for project details). As is always the case with a new iteration of CMIP, this one is going to be bigger and better than the last. By better I mean cooler experiments and improved model documentation (via the shiny new Earth System Documentation website), and by bigger I mean more data. At around 3 Petabytes in total size, CMIP5 was already so big that it was impractical for most individual research institutions to host their own copy. In Australia, the major climate research institutions (e.g. Bureau of Meteorology, CSIRO, ARC Centre of Excellence for Climate System Science – ARCCSS) got around this problem by enlisting the help of the National Computational Infrastructure (NCI) in Canberra. A similar arrangement is currently being planned for CMIP6, so I wanted to share my views (as someone who has spent a large part of the last decade wrangling CMIP3 and CMIP5 data) on what is required to help Australian climate researchers analyse that data with a minimum of fuss.

(Note: In focusing solely on researcher-related issues, I’m obviously ignoring vitally important technical issues related to data storage and funding issues etc. Assuming all that gets sorted, this post looks at how the researcher experience might be improved.)

 

1. A place to analyse the data

In addition to its sheer size, it’s important to note that the CMIP6 dataset will be in flux for many years as modelling groups begin to contribute data (and then revise and re-issue erroneous data) from 2018 onwards. For both these reasons, it’s not practical for individual researchers and/or institutions to be creating their own duplicate copies of the dataset. Recognising this issue (which is not unique to the CMIP projects), NCI have built a whole computational infrastructure directly on top of their data library, so that researchers can do their data processing without having to copy/move data anywhere. This computational infrastructure consists of Raijin (a powerful supercomputer) and the NCI High Performance Cloud for super complex and/or data-intensive tasks, while for everyday work they have their Virtual Desktop Infrastructure. These virtual desktops have more grunt than your personal laptop or desktop computer (4 CPUs, 20 GB RAM, 66 GB storage) and come with a whole bunch of data exploration tools pre-installed. Better still, they are isolated from the rest of the system in the sense that unlike when you’re working on Raijin (or any other shared supercomputer), you don’t have to submit processes that will take longer than 15 or so minutes to the queuing system. I’ve found the virtual desktops to be ideal for analysing CMIP5 data (I do all my CMIP5 data analysis on them, including large full-depth ocean data processing) and can’t see any reason why they wouldn’t be equally suitable for CMIP6.

 

2. A way to locate and download data

Once you’ve logged into a virtual desktop, you need to be able to (a) locate the CMIP data of interest that’s already been downloaded to the NCI data library, and (b) find out if there’s data of interest available elsewhere on the international Earth System Grid. In the case of CMIP5, Paola Petrelli (with help from the rest of the Computational Modelling Support team at the ARCCSS) has developed an excellent package called ARCCSSive that does both these things. For data located elsewhere on the grid, it also gives you the option of automatically sending a request to Paola for the data to be downloaded to the NCI data library. (They also have a great help channel on Slack if you get stuck and have questions.)

Developing and maintaining a package like ARCCSSive is no trivial task, particularly as the Earth System Grid Federation (ESGF) continually shift the goalposts by tweaking and changing the way the data is made available. In my opinion, one of the highest priority tasks for CMIP6 would be to develop and maintain an ARCCSSive-like tool that researchers can use for data lookup and download requests.

 

3. A way to systematically report and handle errors in the data

Before a data file is submitted to a CMIP project, it is supposed to have undergone a series of checks to ensure that the data values are reasonable (e.g. nothing crazy like a negative rainfall rate) and that the metadata meets community agreed standards. Despite these checks, data errors and metadata inconsistencies regularly slip through the cracks and many hours of research time is spent weeding out and correcting these issues. For CMIP5, there is a process (I think) for notifying the relevant modelling group (via the ESGF maybe?) of an error you’ve found, but it will be many months (if ever) before a file gets corrected and re-issued. For easy-to-fix errors, researchers will therefore often generate a fixed file (which is only available in their personal directories on the NCI system) and then move on with their analysis.

The obvious problem with this sequence is that the original file hasn’t been flagged as erroneous (and no details of how to fix it archived), which means the next researcher who comes along will experience the same problem all over again. The big improvement I think we can make between CMIP5 and CMIP6 is a community effort to flag erroneous files, share suggested fixes and ultimately provide temporary corrected data files until the originals are re-issued. This is something the Australian community has talked about for CMIP5, but the farthest we got was a wiki that is not widely used. (Paola has also added warning/errata functionality to the ARCCSSive package so that users can filter out bad data.)

In an ideal world, the ESGF would coordinate this effort. I’m imagining a GitHub page where CMIP6 users from around the world could flag data errors and for simple cases submit code that fixes the problem. A group of global maintainers could then review these submissions, run accepted code on problematic data files and provide a “corrected” data collection for download. As part of the ESGF, the NCI could push for the launch of such an initiative. If it turns out that the ESGF is unwilling or unable, NCI could facilitate a similar process just for Australia (i.e. community fixes for the CMIP data that’s available in the NCI data libary).

 

4. Community maintained code for common tasks

Many Australian researchers perform the same CMIP data analysis tasks (e.g. calculate the Nino 3.4 index from sea surface temperature data or the annual mean surface temperature over Australia), which means there’s a fairly large duplication of effort across the community. To try and tackle this problem, computing support staff from the Bureau of Meteorology and CSIRO launched the CWSLab workflow tool, which was an attempt to get the climate community to share and collaboratively develop code for these common tasks. I actually took a one-month break during my PhD to work on that project and even waxed poetic about it in a previous post. I still love the idea in principle (and commend the BoM and CSIRO for making their code openly available), but upon reflection I feel like it’s a little ahead of its time. The broader climate community is still coming to grips with the idea of managing its personal code with a version control system; it’s a pretty big leap to utilising and contributing to an open source community project on GitHub, and that’s before we even get into the complexities associated with customising the VisTrails workflow management system used by the CWSLab workflow tool. I’d much prefer to see us aim to get a simple community error handling process off the ground first, and once the culture of code sharing and community contribution is established the CWSLab workflow tool could be revisited.

 

In summary, as we look towards CMIP6 in Australia, here’s how things look from the perspective of a scientist who’s been wrangling CMIP data for years:

  1. The NCI virtual desktops are ready to go and fit for purpose
  2. The ARCCSS software for locating and downloading CMIP5 data is fantastic. Developing and maintaining a similar tool for CMIP6 should be a high priority.
  3. The ESGF (or failing that, NCI) could lead a community-wide effort to identify and fix bogus CMIP data files
  4. A community maintained code repository for common data processing tasks (i.e. the CWSLab workflow tool) is an idea that is probably ahead of its time
April 11, 2017 / Damien Irving

Attention scientists: Frustrated with politics? Pick a party and get involved.

The March for Science is coming up on 22 April, so I’m taking a quick detour from my regular focus on research best practice. I’ve been invited to speak at the march in Hobart, Australia, so I thought I’d share what I’m going to say…

In today’s world of alternative facts and hyper-partisan public debate, there are growing calls for scientists to get involved in politics. This might take the form of speaking out on your area of expertise, participating in a non-partisan advocacy group and/or getting involved with a political party. If you think the latter sounds like the least attractive option of the three, you’re not alone. Membership of political parties has been in decline for years, to the point where many sporting clubs have more members. While this might sound like a good reason not to join a political party, I’ve found that it means your involvement can have a bigger impact than ever before.

A little over twelve-months ago, I moved to Hobart to take up a postdoctoral fellowship. As part of a new start in a new town, I decided to get actively involved with the Tasmanian Greens. Fast forward a year and I’m now the Convenor of the Denison Branch of the Party. Bob Brown (the father of the environment movement in Australia) started his political career as a Member for Denison in the Tasmanian Parliament and our current representative (Cassy O’Connor MP) is the leader of the Tasmanian Greens, so it’s been a fascinating and humbling experience so far.

Upon taking the plunge into politics, the first thing that struck me was the overwhelming reliance on volunteers. The Tasmanian Greens have very few staff, which means there is an infinite number of ways for volunteers to get involved. If your motivation lies in changing party policy in your area of expertise, you can take a lead role in re-writing that policy and campaigning for the support of the membership. If you’re happy with party policy and want to help achieve outcomes, your professional skills can definitely be put to good use. My data science skills have been in particularly high demand, and I’m now busily involved in managing our database of members and supporters. Besides this practical contribution, the experience has also been great for my mental wellbeing. Rather than simply despair at the current state of politics (which most often means ranting to like-minded friends and followers on social media), I now have an outlet for actively improving the situation.

If you’re a scientist (or simply someone who cares about the importance of knowledge, evidence and objectivity in the political process) and aren’t currently involved with a political party, I’d highly recommend giving it a go. Any party would benefit from the unique knowledge and skills you bring to the table. As with most volunteer experiences, you’ll also get out a whole lot more than you put in.

There are going to be over 400 marches around the world, so check the map and get along to the one nearest you (or better still, contact the organiser and offer to speak).

February 15, 2017 / Damien Irving

The research police

You know who I’m talking about. I’m sure every research community has them. Those annoying do-gooders who constantly advocate for things to be done the right way. When you’re trying to take a shortcut, it’s their nagging voice in the back of your mind. You appreciate that what they’re saying is important, but with so much work and so little time, you don’t always want to hear it. Since I’m fond of creating lists on this blog, here’s my research police of the weather, ocean and climate sciences:

 

Statistics

Dan Wilks is a widely regarded statistics guru in the atmospheric sciences. He is the author of the most clearly written statistics textbook I’ve ever come across, as well as great articles such as this recent essay in BAMS, which is sure to make you feel bad if you’ve ever plotted significance stippling.

 

Data visualisation

Ed Hawkins’ climate spiral visualisation received worldwide media coverage in 2016 (and even featured in the opening ceremony of the Rio Olympics). He makes the list of research police due to his end the rainbow campaign, which advocates for the use of more appropriate colour scales in climate science.

 

Communication

David Schultz is the Chief Editor of Monthly Weather Review and has authored well over 100 research articles, but is probably best known as the “Eloquent Science guy.” His book and blog are a must read for anyone wanting to improve their academic writing, reviewing and speaking.

 

Programming

Unfortunately I’m going to have to self-nominate here, as I’ve been a strong advocate for publishing reproducible computational results for a number of years now (see related post and BAMS essay). To help researchers do this, I’ve taught at over 20 Software Carpentry workshops and I’m the lead author of their climate-specific lesson materials.

 

If I’ve missed any other research police, please let me know in comments!

January 11, 2017 / Damien Irving

Need help with reproducible research? These organisations have got you covered.

The reproducibility crisis in modern research is a multi-faceted problem. If you’re working in the life sciences, for instance, experimental design and poor statistical power are big issues. For the weather, ocean and climate sciences, the big issue is code and software availability. We don’t document the details of the code and software used to analyse and visualise our data, which means it’s impossible to interrogate our methods and reproduce our results.

(For the purposes of this post, research “software” is something that has been packaged and released for use by the wider community, whereas research “code” is something written just for personal use. For instance, I might have written some code to perform and plot an EOF analysis, which calls and executes functions from the eofs software package that is maintained by Andrew Dawson at Oxford University.)

Unbeknown to most weather, ocean and climate scientists, there are a number of groups out there that want to help you make your work more reproducible. Here’s a list of the key players and what they’re up to…

 

Software Sustainability Institute (SSI)

The SSI is the go-to organisation for people who write and maintain scientific software. They provide training and support, advocate for formal career paths for scientific software developers and manage the Journal of Open Research Software, where you can publish the details of your software so that people can cite your work. They focus mainly on researchers in the UK, so it’s my hope that organisations like SSI will start popping up in other countries around the world.

 

OntoSoft

The OntoSoft project in the US has a bit of overlap with the SSI (e.g. they’re working on “software commons” infrastructure where people can submit their geoscientific software so that it can be searched and discovered by others), but in addition their Geoscientific Paper of the Future (GPF) initiative has been looking at the broader issue of how researchers should go about publishing the details of the digital aspects of their research (i.e. data, code, software and provenance/workflow). In a special GPF issue of Earth and Space Science, researchers from a variety of geoscience disciplines share their experiences in trying to document their digital research methods. The lead paper from that issue gives a fantastic overview of the options available to researchers. (My own work in this area gives a slightly more practical overview but in general covers many of the same ideas.)

 

Software Carpentry

The global network of volunteer Software Carpentry instructors run hundreds of two-day workshops around the world each year, teaching the skills needed to write reusable, testable and ultimately reproducible code (i.e. to do the things suggested by the GPF). Their teaching materials have been developed and refined for more than a decade and every instructor undergoes formal training, which means you won’t find a better learning experience anywhere. To get a workshop happening at your own institution, you simply need to submit a request at their website. They’ll then assist with finding local instructors and all the other logistics that go along with running a workshop. A sibling organisation called Data Carpentry has recently been launched, so it’s also worth checking to see if their more discipline-specific, data-centric lessons would be a better fit.

 

Mozilla Science Lab

Once you’ve walked out of a two-day Software Carpentry workshop, it can be hard to find ongoing support for your coding. The best form of support usually comes from an engaged and well connected local community, so the Mozilla Science Lab assists researchers in forming and maintaining in-person study groups. If there isn’t already a study group in your area, I’d highly recommend their study group handbook. It has a bunch of useful advice and resources for getting one started, plus they periodically run online orientation courses to go through the handbook content in detail.

 

Hopefully one or more of those organisations will be useful in your attempts to make your work more reproducible – please let me know in comments if there’s other groups/resources that I’ve missed!