Thursday, April 27, 2006

Free GNU Make documentation

If, like me, you use GNU Make a lot then you should be aware of two really important pieces of documentation for GNU Make that are totally free (speech and beer):

1. The GNU Make Manual. This is the standard manual that comes with GNU Make and is available on the web here: http://www.gnu.org/software/make/manual/make.html

2. Robert Mecklenburg's "Managing Projects with GNU Make". This is the book published by O'Reilly but released under the Free Documentation License. PDFs of each of the sections are available here: http://www.oreilly.com/catalog/make3/book/index.csp

It's great that these two resources are freely available, but don't let that stop you buying them. Supporting the FSF and Mecklenburg with a little cash is a good way of keeping free documents free.

Monday, April 24, 2006

Bayesian Poisoning paper pointers

Over in the POPFile forums there's a lively discussion going on about a potential way of getting spam through POPFile and then corrupting the POPFile user's database to increase the false positive rate.

This particular attack uses common English words and relies on an implementation detail of POPFile: the fact that POPFile counts the number of times a word appears in an email. That detail is a little different from most spam filters that might consider a restricted range of hammy or spammy words. However, the attack described by POPFile user Olivier Guillion probably would work. On the other hand, I don't think we're likely to see it in the field primarily because it only affects POPFile and not other spam filters.

During the discussion I mentioned a number of papers that I thought Olivier should read concerning attacks on Bayesian filters. I then realized that they are not necessarily easily available. So, for the sake of everyone being able to read up quickly on this areas, here's a quick bibliography:

Any I've missed?

Thursday, April 20, 2006

Stateless web pages with hashes

Recently I've been working on a web application that requires some state to be passed between pages. I really didn't want to keep server side state and then give the user a cookie or some other token that I'd have to track in the server side application, age out if discarded etc.

I hit upon the idea of keeping everything in hidden fields passed between page transitions by form POSTs. Of course, the problem with hidden fields is that someone could fake the information and submit a form with their notion of state. For example, if this were a commerce application, someone could alter the contents of their own shopping cart and perhaps even the prices they have to pay.

To get around this problem I include two extra pieces of information in the form: the Unix epoch time when the form was delivered to the user and a hash that covers all the contents of the form. For example, a typical form might look like:
<form action=http://www.mywebsite.com/form.cgi method=POST>
<input type=hidden name=hash value=ff9f5c4a0d10d7ab384ad0f95ff3727f>
<input type=hidden name=now value=1145516605>
<input type=hidden name=cart value=agwiji8973cnwiei938943>
<input type=submit value="Checkout" name=checkout>
</form>
Here the form contains a cart value that is just an encoded version of the contents of the user's cart (note I say encoded and not encrypted; there's no protection inherent in the string encoding the cart contents: it's just safe to be passed in a form).

The time the form was sent to the user is in the now field and is just the Unix epoch time when the page was generated on the server side.

The hash is an MD5 hash of the now, the cart, the IP address of the person who requested the page and a salt value known only to the web server. The salt prevents an attacker from generating their own hashes and hence faking the form values, but it means that the web server can verify the validity of the form.

The now value means that old forms can be timed out just be checking the epoch time against the value in the form. The hashing of the IP address means that only the person for whom the form was generated can submit it.

I'm sure this isn't new to anyone who's written web applications. And it appears that Steve Gibson over at GRC is doing something similar with his e-commerce system and there's apparently something called View State in ASP.

Anyone who is a web expert care to comment?

Wednesday, April 19, 2006

Would you buy a "GNU Make Cookbook" e-book?

I've been thinking about taking all the recipes for GNU Make things that I've written over the years as articles, or blog entries, or answers to people's questions on help-make and writing them up as an e-book for purchase and download from this web site.

Here's a sample recipe in PDF format so that you can see what I'm taking about.



So, the critical questions:

1. Would you buy such a book?
2. If so, how much would you pay for it?
3. What format would be best? PDF?

John.

A small bug fix to my keep state shoehorning

A while back I wrote about a way to shoehorn Sun Make's "keep state" functionality into GNU Make. With a fairly simple Makefile it's possible to get GNU Make to rebuild targets when the targets' commands have changed. I blogged this here and wrote it up for Ask Mr Make here.

One reader was having trouble with the system because every single Make he did caused a certain target to be built. It turned out this was because he'd done something like:

target:
$(call do, commands)

The space after the , and before the commands was messing up my signature system's comparison and causing it to think that the commands changed every time. This is easily fixed by stripping the commands.

Here's the updated signature file (with the changed parts highlighted in blue):

include gmsl

last_target :=

dump_var = \$$(eval $1 := $($1))

define new_rule
@echo "$(call map,dump_var,@ % < ? ^ + *)" > $S
@$(if $(wildcard $F),,touch $F)
@echo [email protected]: $F >> $S
endef

define do
$(eval S := $*.sig)$(eval F := $*.force)$(eval C := $(strip $1))
$(if $(call sne,[email protected],$(last_target)),$(call new_rule),$(eval last_target := [email protected]))
@echo "$(subst $$,\$$,$$(if $$(call sne,$(strip $1),$C),$$(shell touch $F)))" >> $S
$C
endef

Another common thing people have asked for is that the signature system rebuild targets when the Makefile has changed. Currently the signature system cannot spot an edit to the Makefile that changes the commands. It's pretty simple to Make this happen (although this will cause all targets to be built when the Makefile is updated) by adding the following line in new_rule above:

define new_rule
@echo "$(call map,dump_var,@ % < ? ^ + *)" > $S
@$(if $(wildcard $F),,touch $F)
@echo [email protected]: $F >> $S
@echo $F: Makefile >> $S
endef

It's an exercise or the reader to replace Makefile with the actual name of the Makefile that is including active when new_rule is called.

Tuesday, April 18, 2006

Rebuilding when the hash has changed, not the timestamp

GNU Make decides whether to rebuild a file based on whether any of its prerequisites are newer or if the file is missing. But sometimes this isn't desirable: when using GNU Make with a source code control system the time on a prerequisite might be updated by the source code system when the files are checked out, even though the file itself hasn't changed.

It's desirable, in fact, to change GNU Make to check a hash of the file contents and only rebuild if the file has actually changed (and ignore the timestamp).

You can hack this into GNU Make using md5sum (I'm assuming you're on a system with Unix-like commands). Here's a little example that builds foo.o from foo.c but only updates foo.o when foo.h has changed... and changed means that its checksum has changed:

.PHONY: all

to-md5 = $(patsubst %,%.md5,$1)
from-md5 = $(patsubst %.md5,%,$1)

all: foo.o

foo.o: foo.c
foo.o: $(call to-md5,foo.h)

%.md5: FORCE
@$(if $(filter-out $(shell cat [email protected] 2>/dev/null),
$(shell md5sum $*)),md5sum $* > [email protected])

FORCE:

This works because when foo.h was mentioned in the prerequisite list of foo.o it was changed to foo.h.md5 by the to-md5 function. So GNU Make sees the prerequisites of foo.o to be foo.c and foo.h.md5.

Then there's a pattern rule to build foo.h.md5 (the %.md5 rule) that will only update the .md5 file if the checksum has changed. Thus if and only if the checksum has changed does the .md5 file get changed and foo.o rebuilt.

The %.md5 rule is forced to run by having a dummy prereq called FORCE so that every MD5 hash is checked for every prerequisite that GNU Make needs to examine.

First the rule uses a filter-out/if combination to check to see if the MD5 hash has changed. If it has then the %.md5 rule will run md5sum $* > [email protected] (in the example md5sum foo.h > foo.h.md5). This will both update the hash in the file and change the .md5 file's timestamp and force foo.o to build.

If within the rule for foo.o $?, $^ or other automatics that work on the prerequisite list were used these need to be passed through from-md5 to remove the .md5 extension so that the real prerequisite is used in the commands to build foo.o.

In the example this isn't necessary.

If the foo.h.md5 file does not exist then the %.md5 rule will create it and force foo.o to get built.

You can also adapt this tip to work with different definitions of 'changed'. For example, the .md5 file could store the version number of a file from the source control system and rebuilds would only happen when the version had changed.