Wednesday, July 03, 2013

Write good commit messages

Over the years I've become more and more verbose in commit messages. For example, here's a recent commit message for something I'm working on at CloudFlare (I've obscured some details). This is actually a one line change to a Makefile but gives a good example of what I'm aiming for.
commit 6769d6679019623a6749783ea285043d9449d009
Author: John Graham-Cumming
Date:   Mon Jul 1 13:04:05 2013 -0700

    Sort the output of $(wildcard) as it is unsorted in GNU Make 3.82+

    The Makefile was relying on the output of $(wildcard) to be sorted. This is
    important because the XXXXXXXXXXXX rules have files that are numbered and
    must be handled in order. The XXXXXXX relies on this order to build the rules
    in the correct order (and set the order attributes in the JSON files). This
    worked with GNU Make 3.81

    In GNU Make 3.82 the code that globs has been changed to add the GLOB_NOSORT
    option and so the output of $(wildcard) is no longer ordered and the build
    would break. For example,

       make clean-out && make

    would fail because the XXXXXXXXXXXXXXXX (which is used for the XXXXX action)
    which appears in XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX would not have been
    parsed before the XXXXX action was used in some other XXXXX file. That would
    generate a fatal error.

    The solution is simple: wrap the $(wildcard) in $(sort). The actual change
    uses a $(foreach) loop because it's necessary to keep the directories in the
    order specified in the Makefile and the files within the directory are sorted
    by the $(sort $(wildcard ...)). The directory order is important because
    XXXXXXXXXXXX must be processed before the other rule directories because it
    contains XXXXXXXXXXXXXXXXXXXXXXXXXXX which sets the XXXXXXXXXX thresholds.
The first line gives a brief summary of the commit. But the rest explains in detail why this change was made (a change in GNU Make 3.82 in this case), why the change in GNU Make 3.82 caused a problem, verification of what actually changed that caused the problem, how to reproduce the problem and finally a note about the specific implementation. The final note is there so that someone looking at the commit later can understand what I was thinking and assumptions that went into the change.

I've come to like long commit messages for a number of reasons.

Firstly, I tend to forget why I changed things. And I certainly forget detailed reasoning about a change.

Secondly, I'm not working alone. Other people are looking at my code (and its changes) and need to be able to understand how the code evolves.

And these long commit messages overcome the problem of code comments that get out of date. Because the commit message is tied to a specific diff (and hence state of the code) it never gets out of date.

There's another interesting effect. These log messages take just a minute or two to write, but they force me to write clearly what I've been doing. Sometimes this causes me to stop and go "Oh wait, I've forgotten X". Some part of writing down a description of what I'm doing (for someone else to read) makes my brain apply different neurons to the task.

Here's another example from a different project:
commit 86db749caf52b20c682b3230d2488dad08b7b7fe
Author: John Graham-Cumming
Date:   Mon Jul 1 10:14:49 2013 -0700

    Handle SIGABRT and force a panic

    It can be useful to crash XXXXXXX via a signal to get a stack trace of every
    running goroutine. To make this reliable have added handling of SIGABRT.

    If you do,

       kill -ABRT 

    A panic is generated with message "panic: SIGABRT called" followed by
    a stack trace of every goroutine.

5 comments:

Jim Kyr said...

But tell me honestly, does anyone read these messages ? To me it seems better to read the code than the message itself.

It's a bit funny to write documentation in one commit. Better to prompt a short message in commit to the documentation related to the new change - if exists. Then you have separated consisted documentation which has more other benefits like accesibility, discoverability etc etc.

John Graham-Cumming said...

@Jim: I reread them when I need to understand something.

Also, I've had others at CloudFlare tell me that they are useful because they've been able to work with my code without asking me anything. These is especially important as I work across timezones and am frequently asleep while others work on my code.

Braden said...

I agree with Jim--I'd rather see this description in code or in a issue tracker, both of which generally have better tooling than VCSes. If you have a commit message that says "Sort the output of $(wildcard) (CF-42)", then people can go discuss that issue, and you can easily edit or clarify. It'll be much easier to search and to browse changes on a particular issue over time.

Lars Thorup said...

I have a rule of thumb to always use the word "because" (or its cousins, like "as") in commit messages. Helps me a lot (via the blame function of the VCS) when I need to figure out _why_ some change was made to the code.

36f30bfc-4ccc-11e0-9a64-000bcdcb2996 said...

Thanks for this! It captures my sentiments regarding commit messages almost exactly. I've been meaning to formally write it down for ages.

Interestingly, I find that it's often easier to get away with a shorter message when the diff is longer. Small changes often (but not always) need a lot of justification and background as the bug may have been rather subtle. As I've developed the technique I tend to write a fair bit of fairly dense prose even for larger changes.

As for reading the message, I find that people who don't read the commit messages don't tend to read much of anything else either. Most developers I know are "lazy" enough that they're definitely not going to chase a pointer to a bug tracker and even if they do they're unlikely to then get stuck in discussing something that's already been "fixed".

I don't think it's sufficient to read the code and not the message. Often the whole story isn't in the code. If you write a good message then sometimes I don't need to read your code at all or, if I do, then it's not only easy to scan really quickly but I can also pick up bugs in it without too much scrutiny.

I think DVCS really helps with a culture of good commit messages because you can spend time working on your patches before sending them out. More centralised systems tended to enforce an "all my work since last Tuesday" style of message or really really bitty commits where you don't get the whole picture for a particular feature because the committer didn't want to save up their merge pain.

Patches, not programs, really are the key output of the work of a software engineer. A patch/commit consisting of code and documentation really is an art form that turns out to be really useful.

I totally agree that the process of writing the message is useful in itself as I often find juicy bugs or ways to simplify or refactor things at that stage that would otherwise have caused lots of trouble further down the line.


A technique such as this is helpful in any team but it's one of the key things that allow remote teams to even work at all. For the same reason they also tend to reduce meetings or allow meetings to focus on design critique and approach. I think learnt a lot by watching how patches are designed on LKML and the early git mailing list.


Thanks again for the post.


Regards,
@ndy