Thursday, January 25, 2007

What Makefile am I in?

A common request when using GNU Make is: "Is there a way to find the name and path of the current Makefile?". By 'current' people usually mean that Makefile that GNU Make is currently parsing. There's no built-in way to quickly get the answer, but there is a way using the GNU Make variable MAKEFILE_LIST.

MAKEFILE_LIST (documented in the manual here) is the list of Makefiles currently loaded or included. Each time a Makefile is loaded or included the variable is appended. The paths and names in the variable are relative to the current working directory (where GNU Make was started or where it moved to with the -C or --directory option). The current working directory is stored in the CURDIR variable.

So you can quite easily define a GNU Make function (let's call it where-am-i) that will return the current Makefile (it uses $(word) to get the last Makefile name from the list):
where-am-i = $(CURDIR)/$(word $(words $(MAKEFILE_LIST)),$(MAKEFILE_LIST))

then whenever you want to find out the full path to the current Makefile write the following at the top of the Makefile (the 'at the top' part is important because any include statement in the Makefile will change the value of MAKEFILE_LIST so you want to grab the location of the current Makefile right at the top):
THIS_MAKEFILE := $(call where-am-i)


Here's Makefile
where-am-i = $(CURDIR)/$(word ($words $(MAKEFILE_LIST)),$(MAKEFILE_LIST)

include foo/Makefile

foo/Makefile contains:
THIS_MAKEFILE := $(call where-am-i)
$(warning $(THIS_MAKEFILE))

include foo/bar/Makefile

foo/bar/Makefile contains:
THIS_MAKEFILE := $(call where-am-i)
$(warning $(THIS_MAKEFILE))

Running this on my machine (with the first Makefile in /tmp) gives the output:

foo/Makefile:2: /tmp/foo/Makefile
foo/bar/Makefile:2: /tmp/foo/bar/Makefile

The Tao of Debugging

I hate debuggers.

And not only do I hate them, I rarely use them. For me, a debugger is (almost) always the wrong tool. And people who habitually use debuggers are making a big mistake, because they don't truly understand their code. I suspect that the same people who use debuggers all the time, are the same people who don't unit test their code.

Any programmer not writing unit tests for their code in 2007 should be considered a pariah (*). The truth is that if you haven't written unit tests for your code then it's unlikely to actually work. Over the years I've become more and more radical about this: an untested line of code is a broken line of code.

Just the other day I came across a wonderful quote from Brian Kernighan:
The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.
He wrote that in 1979, but it's still true today. There are two really important ideas there: careful thought and print statements. If I allow myself to update his quotation then I'd say:
The most effective debugging tools are: your brain, a unit test, and the print statement.
If you want to be a great debugger then those are the only three things you are going to need. To be a great debugger you need to follow these steps (that I jokingly call the Tao of Debugging):


1. Once you have a bug in front of you add a failing unit test to your unit test suite that tickles the bug. This should be absolute smallest, simplest, and fastest unit test you can write. You want the test to be fast because you are going to have to run it over and over again as you debug.

Once when I was working on a strange problem with Novell's IPX protocol the only way to tickle the bug was to play the Beverly Hills Cop theme tune across the network. After a few runs my coworkers were going insane so I pulled the speaker cable off the motherboard. Nevertheless, a few bars of Axel F were enough to crash the IPX stack and reproduce the bug.

You may have to do a lot of work to write the unit test, since writing the unit test means narrowing down the bug as much as possible. The best way to do this is to write unit tests each time you have a way to reproduce the bug. These tests may start at complex, but as you delve into the code you'll be able to write simpler and simpler tests until you are faced with just the ultimate cause of the bug. Along the way you've written a small test suite that makes sure this bug doesn't happen again.


2. Once you've got the unit tests in place (or while you are writing ever simpler tests) you can start the actual debugging. Since the tests point to the place in the code where the bug is located it's time to instrument the code with printf or similar statements. Each time you run the unit test you'll be able to examine, and refine the output of those printf statements to reveal the location and reason for the bug.

Most debugger freaks would at this point start a debugger and set a breakpoint and then start single stepping through the code, examing all sorts of variables, setting up watch statements, and generally using the seductive machinery of debuggers. The problems is the debugger is an enormous time sink: you have to tell it what you are looking for, which variables to display, and then you single step through code.

The problem with single stepping through code is that it's rarely useful. To narrow down a bug you need to start at the highest level (if I do X, Y happens) and take into account the inputs and outputs of the broken code. As you narrow down the bug you are essentially looking at the inputs (the state of various variables, or bits of memory) and outputs (states of other variables, and memory) of an ever smaller piece of code (until you get to the actual location of the bug). The ideal way to monitor those inputs and outputs is the print statement.

Single stepping is totally the wrong approach: it starts at the lowest level and the debugger wrangler gets lost in all sorts of fine detail that's totally irrelevant to the debugging process. Even if you step over subroutine calls and loops it's still the wrong level of detail. You need to start wide and narrow down, writing unit tests along the way.

And if you are writing multi-threaded code then I'll wager that the debugger will make things exponentially worse (base is the number of running threads) then using print statements.


3. Use your noggin(**). Many bugs can be found (once you've got your unit test telling you where to look) by just staring at the code and rereading it. Or you can 'single step' in your head (this is way more flexible than a debugger single step because you can intelligently ignore irrelevant detail and jump over blocks of code in an arbitrary fashion).

In doing so, you'll often see small improvements that can be made to the existing code. As you are narrow down a bug you'll frequently realize that you could write a better comment, or fix a small bug, along the way. If you are in a text editor (and not a debugger), looking at the code, then you can fix those things while narrowing down the bug. The overall effect is that you fix the bug and improve the code.

I find it very helpful to look at my own code. In trying to understand it I'll often find myself saying "Well, that can't happen" (when, of course, it can) and it's usually when I think I know what's happening (but don't) that I find the bug in question.


There are, of course, some times when a debugger can be used.

When I used to write device drivers the debugger was sometimes helpful to look at a very complex situation related to the handling of interrupts between a network adapter card and the host machine (and for that SoftICE reigned supreme), but even there the printf-style (where in fact each 'printf' was the output of a single character directly to the screen memory in assembly language) was usually enough to capture what was happening.

Some odd memory corruption bugs can benefit from a debugger than can monitor a section of memory and break when the corruption happens.


I frequently disagree with Linus Torvalds, but his post on not having a kernel debugger is priceless.

I used to work for John Ousterhout and consider him to be way smarter than me. He uses a debugger all the time; perhaps I'm just not smart enough to use one.

Yes, I know I wrote a debugger for GNU Make. I'm not saying debuggers should be banned, but they should be relegated to the list of tools you break out on rare occasions. And the fact that GNU Make didn't have a debugger at all isn't a good thing because some people just can't live without one.

(*) Yes, I'm aware that POPFile's unit test suite is currently broken :-)
(**) British slang for your head or brain.

Friday, January 19, 2007

SpamOrHam shut down

Today I shut down SpamOrHam after a total of 357,380 messages were examined by volunteers around the world in 9 months. (If this is the first time you are hearing about SpamOrHam then read this).

At the same time, I'm happy to annouce that the associated competition was won by Alan Wylie in the UK. He clicked through 456 messages in a row without once disagreeing with the corpus gold standard classification. His prize is on its way to him today.

As promised I'm happy to make the data associated with SpamOrHam public. If you would like to get the details of all 357,380 classifications and how they match up to the TREC 2005 data please email me and I'll point you to the download location.

Finally, thanks to everyone who participated in SpamOrHam. I look forward to being able to report on the results of the experiment at a later date, and anyone who wishes to do their own analysis can simply ask for the raw data by email.

Time for a little encryption of my email

Out of the blue I received a PGP (actually GnuPG) encrypted email using the public key listed on my home page. That key had laid unused for 2 years, and when I went to get the private key off the USB key it was stored on I had a problem. The USB key was unreadable along with the message. All this lead me back to finally doing something about using GnuPG for my mail.

Since I use Thunderbird it turned out to be simple. I created a new key using GnuPG, downloaded the Enigmail extension and changed one configuration item: I set up signing of all messages by default.

So from today the key with ID 0xBDE7FE10 is valid for my main email address. You can download it from the link above or any well-known public key server.

Tuesday, January 09, 2007

Book review: Loose Wire

Here's the ideal stocking stuffer. Oops, I'm two weeks late with this review. Well, perhaps it's the ideal present for the Chinese New Year on February 18.

Jeremy Wagstaff (who writes about technology for the Wall Street Journal) was kind enough to send me a copy of his new book Loose Wire. It's a collection of his columns from starting in 2000 (yes, 7 years ago) and is subtitled 'a personal guide to making technology work for you'. I give this book high praise by saying that you should buy a copy for any friend who loves technology and stick it in their toilet. More on that later.

First a disclaimer: my name appears in the index of the book and he mentions POPFile a couple of times (even recommending it) and Jeremy and I have discussed various machine learning related things over the last few years.

When I first received the book I was wondering how he was going to pull off turning 7 years of columns into something worth reading today. After all, technology moves very quickly and thinking back to 2000 you'd begin to wonder if anything said back then is worth repeating. But it is, and Jeremy has neatedly woven together his columns by grouping them into subjects. Some of the subjects are pretty general (e.g. Traveling or Contacts), others are very techie (e.g. RSS). He's also bound the columns together with his own humour: he frequently points out what he got wrong in previous columns and makes jokes about his friends' ability with technology.

The overall effect is that the book is readable as a collection of totally separate chapters. And it's readable in no particular order. It's quite possible to sit down with the book, pick a subject that you think is interesting and read Jeremy's summary of the technology and his own trials and tribulations making it work. His advice is good and he covers a broad enough range of subjects to please most people.

Jeremy includes his own (amusing) glossary of terms (that he's invented) such as 'wantage (n): the shortfall between your present computer's capacity and that required to run the program you just bought' and 'devizes (n): gadgets you bought, used once and then, realizing they took up more time that they saved, threw in a drawer'. To that I'd like to add: 'wagstaff (v): to poke any new technology with a long stick, make sure it does what it says on the box, and summarize the experience in less than 2,000 words'.

This book isn't for the hardcore techie. At times, I found myself saying 'I know that!' while reading the book (although, I'll admit to the odd, 'I've never heard of that, must check it out' moment). This book is for the technophile. If you (or a friend) are the sort of person who's really interested in technology, but not a programmer, then Jeremy's explanations are just your cup of tea.

I said this book should be in the toilet. In fact, I think it's such a good book for reading in small doses in a small, quiet room, that a global band of Gideons-like technology evangelists should be leaving copies in the smallest room in the house of any technophile.

Introducing rpnbuddy

rpnbuddy is an AIM bot that implements a reverse polish notation calculator with floating point, fixed decimal places, decimal, binary, hexadecimal and octal modes. It's currently in somewhat beta, but feel free to chat to it using the AIM user name 'rpnbuddy'.

Theoretically, rpnbuddy is online all the time, but if you have trouble accessing it please drop me a line. As usual, rpnbuddy is a totally free service from me; I'll do my best to fix bugs and add features as requested. At this time the source code is not open.

Typing help makes rpnbuddy print out the following help information:

Arithmetic operators: + - * / % (mod)
Bitwise operators: and, xor, not, or
Functions: sqrt, e^x, 10^x, x^y, cos, sin, tan, ln, log, int, abs, x^2, 1/x
Set word size: push word size (8, 16, 32, 64) then wsize
Set base: dec, bin, oct, hex
Set floating point mode: float
Set fixed decimal places: push decimal places (up to 16) then fix
Show stack: stack
Show state: state
Useful constants: pi, e

And here's a the log of an rpnbuddy session calculating the area of a circle of radius 10:

(13:27:31) jgc: pi
(13:27:31) rpnbuddy: 3.14159265358979
(13:27:39) jgc: 10
(13:27:41) jgc: x^2
(13:27:42) rpnbuddy: 100
(13:27:43) jgc: *
(13:27:43) rpnbuddy: 314.159265358979

Or switching to hexadecimal mode you can do a little bitmasking:

(13:28:28) jgc: 0
(13:28:29) jgc: hex
(13:28:30) rpnbuddy: Unsigned; Base: hex; Word size: 32
(13:28:30) rpnbuddy: 00000000
(13:28:38) jgc: 1234abcd
(13:28:42) jgc: fefefefe
(13:28:43) jgc: and
(13:28:43) rpnbuddy: 1234aacc
(13:28:52) jgc: ff
(13:28:53) jgc: xor
(13:28:53) rpnbuddy: 1234aa33