## Thursday, January 25, 2007

### The Tao of Debugging

I hate debuggers.

And not only do I hate them, I rarely use them. For me, a debugger is (almost) always the wrong tool. And people who habitually use debuggers are making a big mistake, because they don't truly understand their code. I suspect that the same people who use debuggers all the time, are the same people who don't unit test their code.

Any programmer not writing unit tests for their code in 2007 should be considered a pariah (*). The truth is that if you haven't written unit tests for your code then it's unlikely to actually work. Over the years I've become more and more radical about this: an untested line of code is a broken line of code.

Just the other day I came across a wonderful quote from Brian Kernighan:
The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.
He wrote that in 1979, but it's still true today. There are two really important ideas there: careful thought and print statements. If I allow myself to update his quotation then I'd say:
The most effective debugging tools are: your brain, a unit test, and the print statement.
If you want to be a great debugger then those are the only three things you are going to need. To be a great debugger you need to follow these steps (that I jokingly call the Tao of Debugging):

WRITE A UNIT TEST THAT TICKLES THE BUG

1. Once you have a bug in front of you add a failing unit test to your unit test suite that tickles the bug. This should be absolute smallest, simplest, and fastest unit test you can write. You want the test to be fast because you are going to have to run it over and over again as you debug.

Once when I was working on a strange problem with Novell's IPX protocol the only way to tickle the bug was to play the Beverly Hills Cop theme tune across the network. After a few runs my coworkers were going insane so I pulled the speaker cable off the motherboard. Nevertheless, a few bars of Axel F were enough to crash the IPX stack and reproduce the bug.

You may have to do a lot of work to write the unit test, since writing the unit test means narrowing down the bug as much as possible. The best way to do this is to write unit tests each time you have a way to reproduce the bug. These tests may start at complex, but as you delve into the code you'll be able to write simpler and simpler tests until you are faced with just the ultimate cause of the bug. Along the way you've written a small test suite that makes sure this bug doesn't happen again.

2. Once you've got the unit tests in place (or while you are writing ever simpler tests) you can start the actual debugging. Since the tests point to the place in the code where the bug is located it's time to instrument the code with printf or similar statements. Each time you run the unit test you'll be able to examine, and refine the output of those printf statements to reveal the location and reason for the bug.

Most debugger freaks would at this point start a debugger and set a breakpoint and then start single stepping through the code, examing all sorts of variables, setting up watch statements, and generally using the seductive machinery of debuggers. The problems is the debugger is an enormous time sink: you have to tell it what you are looking for, which variables to display, and then you single step through code.

The problem with single stepping through code is that it's rarely useful. To narrow down a bug you need to start at the highest level (if I do X, Y happens) and take into account the inputs and outputs of the broken code. As you narrow down the bug you are essentially looking at the inputs (the state of various variables, or bits of memory) and outputs (states of other variables, and memory) of an ever smaller piece of code (until you get to the actual location of the bug). The ideal way to monitor those inputs and outputs is the print statement.

Single stepping is totally the wrong approach: it starts at the lowest level and the debugger wrangler gets lost in all sorts of fine detail that's totally irrelevant to the debugging process. Even if you step over subroutine calls and loops it's still the wrong level of detail. You need to start wide and narrow down, writing unit tests along the way.

And if you are writing multi-threaded code then I'll wager that the debugger will make things exponentially worse (base is the number of running threads) then using print statements.

THINK

3. Use your noggin(**). Many bugs can be found (once you've got your unit test telling you where to look) by just staring at the code and rereading it. Or you can 'single step' in your head (this is way more flexible than a debugger single step because you can intelligently ignore irrelevant detail and jump over blocks of code in an arbitrary fashion).

In doing so, you'll often see small improvements that can be made to the existing code. As you are narrow down a bug you'll frequently realize that you could write a better comment, or fix a small bug, along the way. If you are in a text editor (and not a debugger), looking at the code, then you can fix those things while narrowing down the bug. The overall effect is that you fix the bug and improve the code.

I find it very helpful to look at my own code. In trying to understand it I'll often find myself saying "Well, that can't happen" (when, of course, it can) and it's usually when I think I know what's happening (but don't) that I find the bug in question.

CAVEATS

There are, of course, some times when a debugger can be used.

When I used to write device drivers the debugger was sometimes helpful to look at a very complex situation related to the handling of interrupts between a network adapter card and the host machine (and for that SoftICE reigned supreme), but even there the printf-style (where in fact each 'printf' was the output of a single character directly to the screen memory in assembly language) was usually enough to capture what was happening.

Some odd memory corruption bugs can benefit from a debugger than can monitor a section of memory and break when the corruption happens.

NOTES

I frequently disagree with Linus Torvalds, but his post on not having a kernel debugger is priceless.

I used to work for John Ousterhout and consider him to be way smarter than me. He uses a debugger all the time; perhaps I'm just not smart enough to use one.

Yes, I know I wrote a debugger for GNU Make. I'm not saying debuggers should be banned, but they should be relegated to the list of tools you break out on rare occasions. And the fact that GNU Make didn't have a debugger at all isn't a good thing because some people just can't live without one.

(*) Yes, I'm aware that POPFile's unit test suite is currently broken :-)

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

<$BlogCommentBody$>

<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
<$BlogBacklinkSnippet$>