Skip to content
July 8, 2021 / Damien Irving

We wrote a book!

For the best part of a decade, I’ve volunteered as an instructor with The Carpentries, which is a global community committed to teaching foundational coding and data science skills to researchers. From humble beginnings, things have grown to the point that there are hundreds of Carpentries workshops hosted around the globe every year. The audience at these workshops is typically researchers who are self-taught programmers (i.e. they are able to cobble together enough Python or R to clean, analyse and plot their research data) and we expose them to a number coding best practices that have solid foundations in research and experience and that improve productivity and reliability.

A big part of the Carpentries success is the two-day format (it’s not an overwhelming time commitment for busy researchers), but over the years I’ve often wondered what we’d teach if we had more time. With an entire semester, for instance, you could take a researcher through the entire lifecycle of a data analysis project, from the initial setup and code development through to a fully automated data processing pipeline and published software package. A few of years ago, Greg Wilson (who co-founded The Carpentries) assembled a small group of Carpentries instructors to try and write such a book. I very happily joined in, and I’m excited to say that Research Software Engineering with Python is now available for purchase (all proceeds go to The Carpentries). The content of the book and the associated code is licensed under a CC-BY 4.0 and MIT License, so there’s also a freely available web version of the book. A corresponding book for R users (which I’m not involved with) is also currently under development (see this landing page for related projects).

The book follows Amira and Sami as they work together to write a software package to address a real research question. The data analysis task relates to a fascinating result in the field of quantitative linguistics. Zipf’s Law states that the second most common word in a large body of text appears half as often as the most common, the third most common appears a third as often, and so on. To test whether Zipf’s Law holds for a collection of classic novels that are freely available from Project Gutenberg, Amira and Sami write a software package that counts and analyses the word frequency distribution in any arbitrary body of text.

In the process of writing and publishing this Python package to verify Zipf’s Law, the book covers how to do the following:

  • Organise small and medium-sized data science projects.
  • Use the Unix shell to efficiently manage your data and code.
  • Write Python programs that can be used on the command line.
  • Use Git and GitHub to track and share your work.
  • Work productively in a small team where everyone is welcome.
  • Use Make to automate complex workflows.
  • Enable users to configure your software without modifying it directly.
  • Test your software and know which parts have not yet been tested.
  • Find, handle, and fix errors in your code.
  • Publish your code and research in open and reproducible ways.
  • Create Python packages that can be installed in standard ways.

The book was written to be used as the material for a semester-long course at the university level (complete with exercises and solutions), although it can also be used for independent self-study. Comments and suggestions are more than welcome at the book’s GitHub repository – we’d be particularly keen to hear from anyone (before, during or after) who uses it as a textbook for a semester course or an extended Carpentries-style workshop.

2 Comments

Leave a Comment
  1. Anonymous / Jul 22 2021 19:32

    Great initiative Damien (and team). Thanks for sharing. – Malcolm

Trackbacks

  1. PyboEnlaces de la quincena (2021-07-12 a 2021-07-25) – Pybonacci

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: