Skip to content
November 5, 2015 / Damien Irving

A call for reproducible research volunteers

Around the time that I commenced my PhD (May 2012… yes, I know I should have finished by now!) there were lots of editorial-style articles popping up in prestigious journals like Nature and Science about the reproducibility crisis in computational research. Most papers do not make the data and code underpinning their key findings available, nor do they adequately specify the software packages and libraries used to execute that code, which means it’s impossible to replicate and verify their results. Upon reading a few of these articles, I decided that I’d try and make sure that the results presented in my PhD research were fully reproducible from a code perspective (my research uses publicly available reanalysis data, so the data availability component of the crisis wasn’t so relevant to me).

While this was an admirable goal, I quickly discovered that despite the many editorials pointing to the problem, I could find very few (none, in fact) regular weather/climate papers that were actually reproducible. (By “regular” I mean papers where code was not the main focus of the work, like it might be in a paper describing a new climate model.) A secondary aim of my thesis therefore became to consult the literature on (a) why people don’t publish their code, and (b) best practices for scientific computing. I would then use that information to devise an approach to publishing reproducible research that reduced the barriers for researchers while also promoting good programming practices.

My first paper using that approach was recently accepted for publication with the Journal of Climate (see the post-print here on Authorea) and the Bulletin of the American Meteorological Society have just accepted an essay I’ve written explaining the rationale behind the approach. In a nutshell, it requires the author to provide three key supplementary items:

  1. A description of the software packages and operating system used
  2. A (preferably version controlled and publicly accessible) code repository, and
  3. A collection of supplementary log files that capture the data processing steps taken in producing each key result

The essay then goes on to suggest how academic journals (and institutions that have an internal review process) might implement this as a formal minimum standard for the communication of computational results. I’ve contacted the American Meteorological Society (AMS) Board on Data Stewardship about this proposed minimum standard (they’re the group who decide the rules that AMS journals impose around data and code availability) and they’ve agreed to discuss it when they meet at the AMS Annual Meeting in January.

This is where you come in. I’d really love to find a few volunteers who would be willing to try and meet the proposed minimum standard when they write their next journal paper. These volunteers could then give feedback on the experience, which would help inform the Board on Data Stewardship in developing a formal policy around code availability. If you think you might like to volunteer, please get in touch!

 

Advertisements

8 Comments

Leave a Comment
  1. Georgy Ayzel / Nov 5 2015 22:49

    Hi! It’s a great idea not only for weather/climate research, but for all planetary system related studies. I plan to write an article about some hydrological forecasting issues and may use your framework, why not. But more information about your standard workflow needed to start this project 🙂

    • Damien Irving / Nov 6 2015 14:34

      Hi Georgy. Fantastic to hear that you might try and use the framework for your next paper. My workflow involves writing programs (usually in Python) that act like any other command line tool and then I combine them using Make. You can read about this approach in my post on workflow automation and Software Carpentry have a great lesson on using Make here.

      • Georgy Ayzel / Nov 6 2015 16:38

        I see that I have very common workflow in my research too – but for me is still better to use ipython notebook for prototyping my code (with command line interactions). Thanks for info about MAKE – very powerful tool for pipeline different code chunks without deep in OOP 🙂

      • Damien Irving / Nov 6 2015 16:42

        Absolutely – I also use IPython notebook for prototyping code before I write it up in a Python command line script.

  2. Katelyn Watson / Dec 31 2015 12:16

    Happy to see others are thinking about reproducibility in computational science. If you haven’t already seen the Geosciences Papers of the Future Initiative (http://www.ontosoft.org/gpf/), you might be interested. Another group thinking about similar concerns.

    • Damien Irving / Jan 1 2016 14:21

      I wasn’t aware of the Geosciences Papers of the Future Initiative… looks great! They should definitely team up with Software Carpentry for their training sessions.

  3. Damien Irving / Jan 3 2016 13:35

    If you’re interested in a gold standard for reproducible research (i.e. as opposed to a minimum standard), this post from Bill Mills outlines just how far you can go: http://billmills.github.io/blog/full-stack/

Trackbacks

  1. How to write a reproducible paper | Dr Climate

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: