Skip to content
November 16, 2012 / Damien Irving

Version control

In a previous post on best practices for scientific computing, I listed the 10 recommendations put forward by a group of computing experts in a recent draft paper (see manuscript here). While all 10 will undoubtedly improve your computing, in my opinion number 5 is the most important: use version control.

For those who aren’t familiar, a version control system stores a master copy of all your code in a repository, which you can never edit directly. Instead, you check out a working copy of the repository, edit that copy as you wish, and then commit your changes back to the repository once you’re done. The repository stores the entire revision history of your code, so that you can retrieve and compare previous versions, together with metadata such as comments on what was changed and the author of those changes.

Version control systems were originally designed for people developing code in teams, for two main reasons:

  1. It allows everyone in the team access to the most up-to-date version of the code (i.e. by checking out the latest version of the repository)
  2. The system prevents people from overwriting each other’s work by forcing them to merge concurrent changes before committing

People soon came to realise, however, that version control is useful even if you’re not (a) working in a team, or (b) developing code. For instance, let’s say you’re writing your PhD thesis using LaTeX. Since LaTeX files (usually denoted .tex) are simply plain old text files** and therefore no different to a Python (.py), Fortran (.f) or Matlab (.mat) script, a version control system will happily track them (i.e. keep a complete revision history). So if you deleted a paragraph in your introduction chapter three weeks ago, but now realise that it you actually liked the first couple of sentences, it’s no problem. You can easily pull up the three-week-old version of that chapter and retrieve the deleted text.

Subversion (or svn) and Git are the most widely used version control systems, and are both open source and freely available. The fundamental difference is that Subversion is a centralised system, while Git is distributed. There is a lot of debate about which is better (search “why svn is better than git” or vice versa if you don’t believe me), however it’s kind of like debating the forehands of Rafael Nadal and Roger Federer. Both are great and certainly get the job done.

The final thing you’ll need in order to get started is a web-based hosting service, which is where an external copy of your repository is kept. Widely used (and free for small-scale usage) services include Bitbucket, GitHub, SourceForge and Google Code – again, it doesn’t really matter which one you choose. While it is possible to use svn or Git without linking up to a web-based hosting service, everybody uses one because it makes it really easy to share your code. For instance, if you want to give someone a copy of your code, they can simply retrieve the corresponding URL from your page on the hosting service website and then type “git clone URL” (or “svn checkout URL”) at their command line. Then, hey presto, they have a copy of your code. It’s also an extra layer of backup, because if your computer crashes you can simply enter this command to restore your code repository. Most hosting service websites also allow you to view your code (including visual displays of the difference between old and new versions of code), track bugs, manage software releases, create mailing lists, and produce wiki based documentation.

Many open source software projects make their hosting service webpages public, so that people can download and install the latest version of the code, view documentation for the project, post issues/bugs that they find and even contribute code themselves. For example, check out the Matplotlib GitHub page, which is an open source plotting package written in Python. If that page seems a little overwhelming, don’t stress – your personal host page won’t be nearly as large and complicated. My personal Bitbucket page is probably a better example of what a single user repository looks like.

So if you aren’t already using version control, why not start today? To help you get started, here’s a list of some of my favourite tutorials/resources for Git (as that’s what I use):

 

**It’s also now possible to track binary files (as opposed to text files), however the process is a bit messy (e.g. see here for a description of how to do it using Git). Unless the binary format you’re tracking is very common (e.g. like Microsoft Word), it’s also unlikely that you’ll be able to do many of the cool things that you can do with text files, like view the difference between two versions.

Advertisements

9 Comments

Leave a Comment
  1. undercrawl / Aug 1 2013 11:05

    Your method of explaining all in this post is actually good,
    every one can simply understand it, Thanks a lot.

  2. Heya! I understand this is sort of off-topic but I had to ask.

    Does managing a well-established website like yours require a massive amount work?
    I’m brand new to operating a blog however I do
    write in my journal everyday. I’d like to start a blog so I can share my own experience and feelings online.

    Please let me know if you have any recommendations or tips for new aspiring blog owners.
    Appreciate it!

Trackbacks

  1. The future of journal submissions | Dr Climate
  2. What’s in your bag? | Dr Climate
  3. The week the paper was published |
  4. Managing your data | Dr Climate
  5. The week in the cloud |
  6. How to write a reproducible paper | Dr Climate

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: