Skip to main content

Write good commit messages

Over the years I've become more and more verbose in commit messages. For example, here's a recent commit message for something I'm working on at CloudFlare (I've obscured some details). This is actually a one line change to a Makefile but gives a good example of what I'm aiming for.
commit 6769d6679019623a6749783ea285043d9449d009
Author: John Graham-Cumming
Date:   Mon Jul 1 13:04:05 2013 -0700

    Sort the output of $(wildcard) as it is unsorted in GNU Make 3.82+

    The Makefile was relying on the output of $(wildcard) to be sorted. This is
    important because the XXXXXXXXXXXX rules have files that are numbered and
    must be handled in order. The XXXXXXX relies on this order to build the rules
    in the correct order (and set the order attributes in the JSON files). This
    worked with GNU Make 3.81

    In GNU Make 3.82 the code that globs has been changed to add the GLOB_NOSORT
    option and so the output of $(wildcard) is no longer ordered and the build
    would break. For example,

       make clean-out && make

    would fail because the XXXXXXXXXXXXXXXX (which is used for the XXXXX action)
    which appears in XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX would not have been
    parsed before the XXXXX action was used in some other XXXXX file. That would
    generate a fatal error.

    The solution is simple: wrap the $(wildcard) in $(sort). The actual change
    uses a $(foreach) loop because it's necessary to keep the directories in the
    order specified in the Makefile and the files within the directory are sorted
    by the $(sort $(wildcard ...)). The directory order is important because
    XXXXXXXXXXXX must be processed before the other rule directories because it
    contains XXXXXXXXXXXXXXXXXXXXXXXXXXX which sets the XXXXXXXXXX thresholds.
The first line gives a brief summary of the commit. But the rest explains in detail why this change was made (a change in GNU Make 3.82 in this case), why the change in GNU Make 3.82 caused a problem, verification of what actually changed that caused the problem, how to reproduce the problem and finally a note about the specific implementation. The final note is there so that someone looking at the commit later can understand what I was thinking and assumptions that went into the change.

I've come to like long commit messages for a number of reasons.

Firstly, I tend to forget why I changed things. And I certainly forget detailed reasoning about a change.

Secondly, I'm not working alone. Other people are looking at my code (and its changes) and need to be able to understand how the code evolves.

And these long commit messages overcome the problem of code comments that get out of date. Because the commit message is tied to a specific diff (and hence state of the code) it never gets out of date.

There's another interesting effect. These log messages take just a minute or two to write, but they force me to write clearly what I've been doing. Sometimes this causes me to stop and go "Oh wait, I've forgotten X". Some part of writing down a description of what I'm doing (for someone else to read) makes my brain apply different neurons to the task.

Here's another example from a different project:
commit 86db749caf52b20c682b3230d2488dad08b7b7fe
Author: John Graham-Cumming
Date:   Mon Jul 1 10:14:49 2013 -0700

    Handle SIGABRT and force a panic

    It can be useful to crash XXXXXXX via a signal to get a stack trace of every
    running goroutine. To make this reliable have added handling of SIGABRT.

    If you do,

       kill -ABRT 

    A panic is generated with message "panic: SIGABRT called" followed by
    a stack trace of every goroutine.


Unknown said…
But tell me honestly, does anyone read these messages ? To me it seems better to read the code than the message itself.

It's a bit funny to write documentation in one commit. Better to prompt a short message in commit to the documentation related to the new change - if exists. Then you have separated consisted documentation which has more other benefits like accesibility, discoverability etc etc.
@Jim: I reread them when I need to understand something.

Also, I've had others at CloudFlare tell me that they are useful because they've been able to work with my code without asking me anything. These is especially important as I work across timezones and am frequently asleep while others work on my code.
Braden said…
I agree with Jim--I'd rather see this description in code or in a issue tracker, both of which generally have better tooling than VCSes. If you have a commit message that says "Sort the output of $(wildcard) (CF-42)", then people can go discuss that issue, and you can easily edit or clarify. It'll be much easier to search and to browse changes on a particular issue over time.
Unknown said…
I have a rule of thumb to always use the word "because" (or its cousins, like "as") in commit messages. Helps me a lot (via the blame function of the VCS) when I need to figure out _why_ some change was made to the code.
Anonymous said…
Thanks for this! It captures my sentiments regarding commit messages almost exactly. I've been meaning to formally write it down for ages.

Interestingly, I find that it's often easier to get away with a shorter message when the diff is longer. Small changes often (but not always) need a lot of justification and background as the bug may have been rather subtle. As I've developed the technique I tend to write a fair bit of fairly dense prose even for larger changes.

As for reading the message, I find that people who don't read the commit messages don't tend to read much of anything else either. Most developers I know are "lazy" enough that they're definitely not going to chase a pointer to a bug tracker and even if they do they're unlikely to then get stuck in discussing something that's already been "fixed".

I don't think it's sufficient to read the code and not the message. Often the whole story isn't in the code. If you write a good message then sometimes I don't need to read your code at all or, if I do, then it's not only easy to scan really quickly but I can also pick up bugs in it without too much scrutiny.

I think DVCS really helps with a culture of good commit messages because you can spend time working on your patches before sending them out. More centralised systems tended to enforce an "all my work since last Tuesday" style of message or really really bitty commits where you don't get the whole picture for a particular feature because the committer didn't want to save up their merge pain.

Patches, not programs, really are the key output of the work of a software engineer. A patch/commit consisting of code and documentation really is an art form that turns out to be really useful.

I totally agree that the process of writing the message is useful in itself as I often find juicy bugs or ways to simplify or refactor things at that stage that would otherwise have caused lots of trouble further down the line.

A technique such as this is helpful in any team but it's one of the key things that allow remote teams to even work at all. For the same reason they also tend to reduce meetings or allow meetings to focus on design critique and approach. I think learnt a lot by watching how patches are designed on LKML and the early git mailing list.

Thanks again for the post.


Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:

Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

All the symmetrical watch faces (and code to generate them)

If you ever look at pictures of clocks and watches in advertising they are set to roughly 10:10 which is meant to be the most attractive (smiling!) position for the hands. They are actually set to 10:09.14 if the hands are truly symmetrical. CC BY 2.0image by Shinji
I wanted to know what all the possible symmetrical watch faces are and so I wrote some code using Processing. Here's the output (there's one watch face missing, 00:00 or 12:00, because it's very boring):

The key to writing this is to figure out the relationship between the hour and minute hands when the watch face is symmetrical. In an hour the minute hand moves through 360° and the hour hand moves through 30° (12 hours are shown on the watch face and 360/12 = 30).
The core loop inside the program is this:   for (int h = 0; h <= 12; h++) {
    float m = (360-30*float(h))*2/13;
    int s = round(60*(m-floor(m)));
    int col = h%6;
    int row = floor(h/6);
    draw_clock((r+f)*(2*col+1), (r+f)*(row*2+1), r, h, floor(m…

The Elevator Button Problem

User interface design is hard. It's hard because people perceive apparently simple things very differently. For example, take a look at this interface to an elevator:

From flickr

Now imagine the following situation. You are on the third floor of this building and you wish to go to the tenth. The elevator is on the fifth floor and there's an indicator that tells you where it is. Which button do you press?

Most people probably say: "press up" since they want to go up. Not long ago I watched someone do the opposite and questioned them about their behavior. They said: "well the elevator is on the fifth floor and I am on the third, so I want it to come down to me".

Much can be learnt about the design of user interfaces by considering this, apparently, simple interface. If you think about the elevator button problem you'll find that something so simple has hidden depths. How do people learn about elevator calling? What's the right amount of informati…