Skip to main content

Posts

Showing posts from 2006

Bug fix: 12 Tasty Make Recipes Part II

Back in May, 2006 I gave a talk called 12 Tasty Make Recipes Part II in which I talked about a user-defined GNU Make function to recursively search from a directory for a file or set of files.

The function was written like this:

search = $(foreach d,$(wildcard $1/*),$(call search,$d)$(filter $(subst *,%,$2),$d))

Unfortunately, there was a small mistake in the code that appeared in the presentation. However, two bugs collided to cause the mistake to have no effect. In the above function I wrote $(call search,$d) when I should have written $(call search,$d,$2). But since GNU Make had a bug that caused it to not reset $2 (or any other arguments in the form $n) on a nested $(call) and since I was reusing $2 as the second argument, my function worked.

That is, until GNU Make 3.81 was released and fixed bug 1744.

The correct version of search which will work with all versions of GNU Make is:

search = $(foreach d,$(wildcard $1/*),$(call search,$d,$2)$(filter $(subst *,%,$2),$d))

Thanks for Lou…

Two weeks of image spam innovation

Since I last blogged about image spam I've received numerous image spams myself, and reports from Nick FitzGerald, Sorin Mustaca and Nicholas Johnston about interesting image spams that I've just got to see.

On November 14, I received a report of image spams where the background noise had been updated from dots or small lines to polygons. Here's a sample:



A day later it was clear that the same spammer was randomizing the image size, the background noise and the color palette used:



Then on November 17 I was shown this interesting pump'n'dump technique:


83622874400056543 047183602660 41478311028418 100278 84807407350 05087016712772
78810870435635016 71651855827222 4725576405300038 84252840 12157351038885 630188325737443
23414 23133 41104 4312 7131 6341874402 48244 02522 1428 3224
55263 16021 42114 6654 …

Yet more spammer image optimization; this time it's pretty

These something new in the image-spam wave: pretty colours! This spammer is working hard to randomize his images and avoid OCR. Here's a sample:



And to give you an idea of the randomization here's another:



Thanks to Nick FitzGerald and Sorin Mustaca for samples. Notice how the letters are misaligned both vertically and horizontally to try to avoid OCR, and the background polygons are randomized. Also the aspect ratio and size of the messages have been changed for each image.

Ransom note spam

Back in January I added a trick called The Small Picture to The Spammers' Compendium, and in August I updated The tURLing Test trick with an example of its use in image-based spam.

The Small Picture consists of sending individual letter images attached to a message. These letter images are then used to display a message and break up words that the spammer might think a spam filter would find suspicious. Here's an example of The Small Picture where certain letters (look carefully!) are formed using images rather than text:



The tURLing Test consists of disguising a URL by breaking it up and then explaining to the user how to type in the URL, thus proving that a human is reading the spam not a spam filter. This is done with URLs so that URL blacklists are bypassed. Here's an example of that from an image-based spam:



Now comes a combination of the two, that deserves the name 'Ransom Note Spam': it combines both The Small Picture (the letters are individual images atta…

Why OCRing spam images is useless

Nick FitzGerald forwards me another animated GIF spam that takes the animation plus transparency trick I outlined in the blog post A spam image that slowly builds to reveal its message to a new level. And it shows why spammers will work around OCR as fast as they can.

Here's what you see in the spam image:



Looks simple enough until you take a look at the GIF file that actually generated what you see. It's animated and it has three frames:





The first image is the GIF's background and is displayed for 10ms then the second image is layered on top with a transparent background so that the two images merge together and the image the spammer wants you to see appears. That image remains on screen for 100,000 ms (or 1 minute 40 seconds). After that the image is completely blanked out by the third frame.

My favourite touch is that it's not the entire image that's transparent, not even the white background, but just those pixels necessary to make the black pixels underneath s…

A spam image that slowly builds to reveal its message

Nick FitzGerald sent me a stunning example of lateral thinking on the part of a spammer. The spammer has taken a standard stock pump-and-dump spam image and split it horizontally into strips.



Each of the 17 horizontal strips cuts fairly randomly through the text making OCR on each strip not very useful. The spammer has then mounted each strip in its correct position on a transparent background and put each strip into an animated GIF. Here, for example, are a couple of strips:




The end result is that only once the entire image animation has completed is the complete spam visible making this a challenge for spam filters. And the spammer has thrown in a couple of frames at the end of the image, that get displayed after such a long delay (8 minutes) that they essentially never get shown. But those final frames are there just to throw off a spam filter trying to find the actual image.

Here's what gets displayed:



and here's the final image in the animation:



Very clever! (I'm ca…

Ye Olde OCR Buster

Regular spam-correspondent Nick FitzGerald writes with an example of a spam that he believes is trying to get around both hash busting and OCR in an image.



The image has random dots in the bottom left hand corner to mess up hashing of the GIF itself, and the fonts used are badly rendered unusual fonts.

A Test::Class gotcha

I'm working on a project that involves building a prototype application in Perl. I've made extensive use of Perl's OO features and have a collection of classes that implement the mathematical calculations necessary to drive the web site running the application. Naturally, as I've been building the classes I've been building a unit test suite.

Since Test::Class is the closest thing Perl has to junit or cppunit I'm using it to test all the class methods in my Perl classes. Everything was looking good until I told the guy writing the server to integrate with my code. His code died with an error like this:
Can't locate object method "new" via package "Class::A" (perhaps you
forgot to load "Class::A" at Class/B.pm line 147.
Taking a quick look inside Class::B revealed that it did try to create a new Class::A object and that, sure enough, there was no use Class::A; anywhere in Class::B. Easy enough bug to fix, but what left …

Image spam filtering BOF at Virus Bulletin 2006 Montreal

I'm leading a BOF meeting at Virus Bulletin 2006 in Montreal next month. The idea is to get together in one room for a practical, tactical meeting to share experiences on how people are currently filtering image spam and what might be done in future (and what we expect spammers to do). I've already got commitments from major anti-spam vendors to be there and talk (as much as they are permitted) about their approach and I'll try to cover what the Bayesian guys are doing.

If you are interested please email me, or post a comment here. If you represent a vendor and want to be involved I'm especially interested to hear from you as I want to get all experiences out on the table (as much as is practical).


Date and Time Confirmed: Thursday, October 12. 17:40 to 18:40 in the Press Room.

Downloadable PDF flyer.

A C implementation of my simple GPS code

Reader Chris Kuethe wrote in with a version of my simple code for entering latitude and longitude to GPS devices written in C (my demonstration code was in Perl).

Seems Chris is a bit of a GPS fanatic and maintains a page on GPS hackery.

He ported my Perl code to C and is releasing the code freely. He gave me the choice of releasing under two clause BSD license or making it public domain. I think the most generous is public domain (especially since the Perl code was public domain).

Here's the code to compute a SOC:

#include <sys/types.h>
#include <stdio.>

int
main(int argc, char **argv){
int i, j;
unsigned long long lat, lon, c, p, soc_num;
char soc[11], *alpha = "ABCDEFGHJKLMNPQRTUVWXY0123456789";
int primes[] = { 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 };
float f;

if (argc != 3){
printf("Usage: %s <lat> <lon>\n", argv[0]);
exit(1);
}

sscanf(argv[1], "%f", &f);
lat = (int)((f + 90.0) * 10000.0);

sscanf(argv[2], "%f"…

Optimal SMS keyboard layouts

One of the things I find very frustrating about typing SMS messages on my phone is that I often find that the next letter I want to type is actually on the same key that I just pressed. And that slows me down because either I wait for the timeout, or I click the right arrow key to move on.

For example, here's a standard keyboard on cell phones:

abc def
1 2 3

ghi jkl mno
4 5 6

pqrs tuv wxyz
7 8 9

Very common English letter pairs such as 'ed' and 'on' appear on the same key meaning that if you need to type one of these you are going to incur the cost of dealing with the 'next letter is on same key' problem. In addition, the most common English letters are more than one click away; the most common English letter 'e' is two clicks, 'o' is three, 'n' is two, etc.

What you really want is a keyboard layout that means that most common letters are as few clicks away as possible, and that the common letter pairs are on differe…

Subliminal advertising in spam?

Nick FitzGerald sent me a great example of subliminal advertising in a spam message. At least that's what he thinks the spammer might have been up to. The spam contains an animated GIF with four frames. One of the frames (which contains the actual spam message) remains visible for 17 seconds. The other three frames are displayed for 10ms or 40ms, and each of those contains a little random noise and the word BUY in random positions.

Was the spammer really hoping to make us fall for his pump and dump scam with a quick flash of BUY on screen?

Here's the actual GIF with the animation in place (watch out you might be forced to BUY :-)



And there are the four separate frames:

10ms
17s
40ms
40ms

A simple code for entering latitude and longitude to GPS devices

This post proposes a coding system for entering any location on earth with 10m of accuracy using a 10 character code that includes features to prevent errors in entering the code.

The idea is that any one could publish their location by writing something like VUF DDC F8UG. This short code could be entered into a GPS device giving you any spot on the globe.

I'm calling it the SOC: Simple Orientation Code.

Some example uses:
I could print my company's SOC on my business cards and visitors could punch it into their car navigation system and come visitA restaurant could publish its SOC along with its phone number (after all it's the same length as a phone number so it's something people can easily grok) making the restaurant easy to findGeocachers could publish SOC trails for people hunting down cachesSCUBA divers could refer to dive sites by their SOC (10m of accuracy is enough surface accuracy for most people)Here's how the code works.

First you need the latitude and lon…

How I love my HP-16C

A while ago I bought an HP-16C calculator on eBay. It wasn't cheap and there was no manual; the calculator itself works fine and is in almost mint condition. Since then I've fallen in love with the device.



You probably think I'm nuts to be using a calculator that was discontinued in 1989 and only 203 bytes of memory. And I had to pay extra to get a PDF version of the scanned original manual.

Perhaps I am crazy, but here's why I love this little machine:

1. RPN. You either love this or hate it. This is my first RPN calculator and for me RPN is the right way to use a calculator. I read a short introduction to RPN tricks (of which there are very few, but filling the stack for repeated operations is one and using LST x to prevent the stack from moving is another).

2. The industrial design of HP calculators is pure art. They are the right size for your hand, the keyboard is clearly marked, keys are spaced far apart (which avoids fat fingers like mine) and the keys give…

Free GNU Make documentation

If, like me, you use GNU Make a lot then you should be aware of two really important pieces of documentation for GNU Make that are totally free (speech and beer):

1. The GNU Make Manual. This is the standard manual that comes with GNU Make and is available on the web here: http://www.gnu.org/software/make/manual/make.html

2. Robert Mecklenburg's "Managing Projects with GNU Make". This is the book published by O'Reilly but released under the Free Documentation License. PDFs of each of the sections are available here: http://www.oreilly.com/catalog/make3/book/index.csp

It's great that these two resources are freely available, but don't let that stop you buying them. Supporting the FSF and Mecklenburg with a little cash is a good way of keeping free documents free.

Bayesian Poisoning paper pointers

Over in the POPFile forums there's a lively discussion going on about a potential way of getting spam through POPFile and then corrupting the POPFile user's database to increase the false positive rate.

This particular attack uses common English words and relies on an implementation detail of POPFile: the fact that POPFile counts the number of times a word appears in an email. That detail is a little different from most spam filters that might consider a restricted range of hammy or spammy words. However, the attack described by POPFile user Olivier Guillion probably would work. On the other hand, I don't think we're likely to see it in the field primarily because it only affects POPFile and not other spam filters.

During the discussion I mentioned a number of papers that I thought Olivier should read concerning attacks on Bayesian filters. I then realized that they are not necessarily easily available. So, for the sake of everyone being able to read up quickly on…

Stateless web pages with hashes

Recently I've been working on a web application that requires some state to be passed between pages. I really didn't want to keep server side state and then give the user a cookie or some other token that I'd have to track in the server side application, age out if discarded etc.

I hit upon the idea of keeping everything in hidden fields passed between page transitions by form POSTs. Of course, the problem with hidden fields is that someone could fake the information and submit a form with their notion of state. For example, if this were a commerce application, someone could alter the contents of their own shopping cart and perhaps even the prices they have to pay.

To get around this problem I include two extra pieces of information in the form: the Unix epoch time when the form was delivered to the user and a hash that covers all the contents of the form. For example, a typical form might look like:
<form action=http://www.mywebsite.com/form.cgi method=POST>
<inp…

Would you buy a "GNU Make Cookbook" e-book?

I've been thinking about taking all the recipes for GNU Make things that I've written over the years as articles, or blog entries, or answers to people's questions on help-make and writing them up as an e-book for purchase and download from this web site.

Here's a sample recipe in PDF format so that you can see what I'm taking about.



So, the critical questions:

1. Would you buy such a book?
2. If so, how much would you pay for it?
3. What format would be best? PDF?

John.

A small bug fix to my keep state shoehorning

A while back I wrote about a way to shoehorn Sun Make's "keep state" functionality into GNU Make. With a fairly simple Makefile it's possible to get GNU Make to rebuild targets when the targets' commands have changed. I blogged this here and wrote it up for Ask Mr Makehere.

One reader was having trouble with the system because every single Make he did caused a certain target to be built. It turned out this was because he'd done something like:

target:
$(call do, commands)

The space after the , and before the commands was messing up my signature system's comparison and causing it to think that the commands changed every time. This is easily fixed by stripping the commands.

Here's the updated signature file (with the changed parts highlighted in blue):

include gmsl

last_target :=

dump_var = \$$(eval $1 := $($1))

define new_rule
@echo "$(call map,dump_var,@ % < ? ^ + *)" > $S
@$(if $(wildcard $F),,touch $F)
@echo [email protected]: $F >> $S
endef

def…

Rebuilding when the hash has changed, not the timestamp

GNU Make decides whether to rebuild a file based on whether any of its prerequisites are newer or if the file is missing. But sometimes this isn't desirable: when using GNU Make with a source code control system the time on a prerequisite might be updated by the source code system when the files are checked out, even though the file itself hasn't changed.

It's desirable, in fact, to change GNU Make to check a hash of the file contents and only rebuild if the file has actually changed (and ignore the timestamp).

You can hack this into GNU Make using md5sum (I'm assuming you're on a system with Unix-like commands). Here's a little example that builds foo.o from foo.c but only updates foo.o when foo.h has changed... and changed means that its checksum has changed:

.PHONY: all

to-md5 = $(patsubst %,%.md5,$1)
from-md5 = $(patsubst %.md5,%,$1)

all: foo.o

foo.o: foo.c
foo.o: $(call to-md5,foo.h)

%.md5: FORCE
@$(if $(filter-out $(shell cat [email protected] 2>/…

Shoehorning Keep State into GNU Make

Sun Make has a lovely feature called Keep State: if the commands used to build a target change from build to build the target is rebuilt, even if looking at file time stamps shows that the target is "up to date". Why is this a lovely feature? Because it means that make followed by make DEBUG=1 will do that right thing. In Make's that only check time stamps the make DEBUG=1 would probably report that there was no work to do.

Of course, you can get round these problems if you really try (e.g. for the DEBUG case you could encode the fact that the objects are debug objects in either the name or path and then Make would do the right thing).

A recent post on the GNU Make mailing list got me thinking about this problem again and I've come up with a very simple solution that shoehorns Keep State into GNU Make. There's no code change to GNU Make at all; it's all done with existing GNU Make functions.

Here's an example Makefile that I've modified to rebuild fo…

Deconstructing Sundance with POPFile

The brave folks at Unspam have taken POPFile and years of data surrounding the Sundance Film Festival in attempt to predict the outcome of the 2006 edition. Specifically, they want to get invited to parties, and they've chosen to geek out on data, bend POPFile to their use (boy, that must have been hard work) and try to predict the hits from the festival.

They claim an 81% accuracy over the past festivals and have a nice web site giving their predictions for this year. The predictions also include a breakdown of how the decisions are made indicating the most important (and worst) words to appear in a review of a movie.

There's also movie metadata like the type of film it was shot on, or who reviewed it. That's a very interesting use of POPFile and if they get it right and are invited to all the cool parties I hope they fulfill my wish and put a good word in for me with Neve Campbell.

The WMF SetAbortProc problem is not a backdoor

Microsoft recently fixed a problem wherein a Windows MetaFile (WMF) could contain arbitrary code that would be executed when the WMF was "played". The problem was particularly bad because a WMF could be played automatically in Internet Explorer by referencing it in an IFRAME if the Windows Picture and Fax Viewer was installed and registered (which by default on recent versions of Windows it was), because that program would automatically handle WMFs and play them to display them.

Steve Gibson suggested in a podcast, which then became a big news story, that he was convinced that this functionality was in fact an intentional backdoor inserted by Microsoft for their own purposes (or at last for the purpose of a rogue engineer inside the company).

Microsoft responded to that accusation with a blog posting by Stephen Toulouse in which he gave some more details of the problem and essentially said that Gibson was wrong (without naming him).

I've looked carefully at this and am con…

Do spammers fear OCR?

Nick FitzGerald recently sent me two sample spams that seem to indicate that some spammers fear that using images in place of words isn't enough. They've started to obscure their messages to prevent optical character recognition.

The first spam appears to be a scan of a document that's been skewed slightly. Now this could be a simple and bad scan.



But the second is even more interesting. It appears to be perfectly normal:



Until you look at the fact that this was actually constructed using <DIV> tags for layout and the breaks between the lines are in the middle of words. Here Nick has kindly inserted borders showing that the words are broken horizontally and then put back in the right position:



But is anyone doing OCR, or are spam filters getting good enough that the spammers are being really paranoid about what they are sending?

The funny think about the second example is that the URL they include is not obfuscated, is clickable and appears in the SURBL :-) So desp…

Python is the new Tcl; Ruby is the new Perl

Given how bad I am at predictions I'd like to give you the following (which you'd better take with a grain of salt): Python is the new Tcl and Ruby is the new Perl. My prediction for 2006 is that most Perl 5 programmers will decide that Ruby looks cool and learn the language and that Python will be a niche language that a die hard core continues to love and embellish.

I think Ruby's the new Perl because of the confluence of three things: Perl 6, RubyGems and Ruby's own Perliness. Firstly, given how hideous Perl is if you're a Perl person (like me) why learn Perl 6 when you could learn Ruby? Ruby's core language is clean, regular expressions are strong and there are many Perl influences (which Ruby is slowly removing to ensure that language doesn't become Perl). Secondly, RubyGems will do for Ruby what CPAN did for Perl.

Python's lack of good package management (if you ignore ActiveState's PyPPM and the recently started Cheese Shop) and idiosync…