Friday, September 22, 2006

A Test::Class gotcha

I'm working on a project that involves building a prototype application in Perl. I've made extensive use of Perl's OO features and have a collection of classes that implement the mathematical calculations necessary to drive the web site running the application. Naturally, as I've been building the classes I've been building a unit test suite.

Since Test::Class is the closest thing Perl has to junit or cppunit I'm using it to test all the class methods in my Perl classes. Everything was looking good until I told the guy writing the server to integrate with my code. His code died with an error like this:
Can't locate object method "new" via package "Class::A" (perhaps you
forgot to load "Class::A" at Class/B.pm line 147.
Taking a quick look inside Class::B revealed that it did try to create a new Class::A object and that, sure enough, there was no use Class::A; anywhere in Class::B. Easy enough bug to fix, but what left me scratching my head was why the unit test suite didn't show this.

For each class I have an equivalent test class (so there's Class::A::Test and Class::B::Test) which are loaded using a .t file which in turn is loaded with prove. The test classes all use Test::Class.

The classes are tested with a Makefile that does the following:
test:
@prove classes.t
And classes.t consists of:
use strict;
use warnings;

use Class::A::Test;
use Class::B::Test;

Test::Class->runtest;
Since the test suite for Class::A does a use Class::A; and the test suite for Class::B does a use Class::B; and the two test suites are loaded using use in classes.t, both Class::A and Class::B are loaded before running the tests. This means that the fact that use Class::A; was missing from Class::B is masked in the test suite.

The solution is to have two .t files one for each class so that only the class being tested is loaded. So I dumped classes.t and created class_a.t and class_b.t as follows:
use strict;
use warnings;

use Class::A::Test;

Test::Class->runtest;
and
use strict;
use warnings;

use Class::B::Test;

Test::Class->runtest;
and the Makefile is changed to do:
test:
@prove class_a.t class_b.t
This now works correctly. The missing use Class::A; causes a fatal error in the test suite.

Friday, September 15, 2006

Image spam filtering BOF at Virus Bulletin 2006 Montreal

I'm leading a BOF meeting at Virus Bulletin 2006 in Montreal next month. The idea is to get together in one room for a practical, tactical meeting to share experiences on how people are currently filtering image spam and what might be done in future (and what we expect spammers to do). I've already got commitments from major anti-spam vendors to be there and talk (as much as they are permitted) about their approach and I'll try to cover what the Bayesian guys are doing.

If you are interested please email me, or post a comment here. If you represent a vendor and want to be involved I'm especially interested to hear from you as I want to get all experiences out on the table (as much as is practical).


Date and Time Confirmed: Thursday, October 12. 17:40 to 18:40 in the Press Room.

Downloadable PDF flyer.

Thursday, September 14, 2006

A C implementation of my simple GPS code

Reader Chris Kuethe wrote in with a version of my simple code for entering latitude and longitude to GPS devices written in C (my demonstration code was in Perl).

Seems Chris is a bit of a GPS fanatic and maintains a page on GPS hackery.

He ported my Perl code to C and is releasing the code freely. He gave me the choice of releasing under two clause BSD license or making it public domain. I think the most generous is public domain (especially since the Perl code was public domain).

Here's the code to compute a SOC:

#include <sys/types.h>
#include <stdio.>

int
main(int argc, char **argv){
int i, j;
unsigned long long lat, lon, c, p, soc_num;
char soc[11], *alpha = "ABCDEFGHJKLMNPQRTUVWXY0123456789";
int primes[] = { 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 };
float f;

if (argc != 3){
printf("Usage: %s <lat> <lon>\n", argv[0]);
exit(1);
}

sscanf(argv[1], "%f", &f);
lat = (int)((f + 90.0) * 10000.0);

sscanf(argv[2], "%f", &f);
lon = (int)((f +180.0) * 10000.0);

p = lat * 3600000 + lon;
soc_num = p * 128;

c = 0;
for(i = 0; i < (sizeof(primes)/sizeof(primes[0])); i++){
c += ((p % 32) * primes[i]);
p /= 32;
}

c %= 127;
soc_num += c;

for(i = 9; i >= 0; i--){
j = soc_num % 32;
soc[i] = alpha[j];
soc_num /= 32;
}
soc[10] = '\0';

printf("%s\n", soc);
}

And to compute latitude and longitude from a SOC:

#include <sys/types.h>
#include <stdio.h>

int
main(int argc, char **argv){
int i, j, c, k;
unsigned long long x, y, p, soc_num;
char soc[11], *alpha = "ABCDEFGHJKLMNPQRTUVWXY0123456789";
int primes[] = { 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 };
float lat, lon;

if ((argc != 2 )|| (strlen(argv[1]) != 10)){
printf("Usage: %s <10-digit-SOC>\n", argv[0]);
exit(1);
}

soc_num = 0;
for (i = 0; i < 10; i++){
c = (char)argv[1][i];
c = c & 0xff;
c = toupper(c);
switch(c){
case 'I': c = '1'; break;
case 'O': c = '0'; break;
case 'S': c = '5'; break;
case 'Z': c = '2'; break;
default: ;
}
for (j = 0; j < strlen(alpha); j++)
if (c == alpha[j]){
soc_num = (soc_num * 32 + j);
}
}

p = soc_num / 128;
k = soc_num % 128;

lon = ((p % 3600000) / 10000.0) -180.0;
lat = ((p / 3600000) / 10000.0) - 90.0;

c = 0;
for (i = 0; i < (sizeof(primes)/sizeof(primes[0])); i++){
c += ((p % 32) * primes[i]);
p /= 32;
}

c %= 127;
if (c != k)
printf("warning: checksum mismatch - %d %d\n", c, k);
printf("%0.4f %0.4f\n", lat, lon);
}

Thanks Chris!

Update: Chris writes to say that B1NLADEN02 can be found in Antarctica: -76.7847/-106.0187 and JIMMYHOFFA is here: -23.3433/-61.6087.

Monday, September 04, 2006

Optimal SMS keyboard layouts

One of the things I find very frustrating about typing SMS messages on my phone is that I often find that the next letter I want to type is actually on the same key that I just pressed. And that slows me down because either I wait for the timeout, or I click the right arrow key to move on.

For example, here's a standard keyboard on cell phones:

abc def
1 2 3

ghi jkl mno
4 5 6

pqrs tuv wxyz
7 8 9

Very common English letter pairs such as 'ed' and 'on' appear on the same key meaning that if you need to type one of these you are going to incur the cost of dealing with the 'next letter is on same key' problem. In addition, the most common English letters are more than one click away; the most common English letter 'e' is two clicks, 'o' is three, 'n' is two, etc.

What you really want is a keyboard layout that means that most common letters are as few clicks away as possible, and that the common letter pairs are on different keys so that you can maximize typing speed. And if possible make the layout as close as the current one so that it's easy to learn.

There are some people who propose squeezing QWERTY into the the current keyboard. This ends up with a key that starts with 'q' and another that starts with 'z': two of the least common keys are given pride of place on the keyboard.

Other propose using dictionaries. I think the fastest typing would be on an intelligently laid out keyboard without the need for a dictionary.

I took the 1000 most common words in English as a test set, and performed three keyboard optimizations: one by hand and two using different sets of common letter pairs and tested them against the 1000 most common words. Each set received a score equal to the number of clicks required to type all 1000 words. A single click on a key was worth one click (so typing 'a' is one click, typing 'q' is two, etc. on the standard keyboard), and the cost of handling the 'next letter is on same key' was set at the same time as two clicks.

I used letter and letter pair frequency information from the excellent book Cryptanalysis. And of course I wrote some code to perform the layout of the keyboards optimizing for the least number of clicks per common letter and the least number of same key clicks for letter pairs.

The standard keyboard layout gets a score of 12,447 clicks.

The following machine generated layout can be used to type the same words in 8757 clicks (70.35% of the clicks of the standard keyboard):

acb euwj
1 2 3

ipg hmx olv
4 5 6

sfq tdyz nrk
7 8 9

This keyboard doesn't look anything like the original keyboard so I then used a shorter list of letter pairs and hand optimized the keyboard to balance clicks and similarity. The result is 8,912 clicks (or 71.6% of the standard keyboard) and a nice layout:

adc efb
1 2 3

igj hlk omz
4 5 6

srpq tuv nwyx
7 8 9

Now, if there was only a way to get that on my RAZR I could save 30% of my typing time.

Subliminal advertising in spam?

Nick FitzGerald sent me a great example of subliminal advertising in a spam message. At least that's what he thinks the spammer might have been up to. The spam contains an animated GIF with four frames. One of the frames (which contains the actual spam message) remains visible for 17 seconds. The other three frames are displayed for 10ms or 40ms, and each of those contains a little random noise and the word BUY in random positions.

Was the spammer really hoping to make us fall for his pump and dump scam with a quick flash of BUY on screen?

Here's the actual GIF with the animation in place (watch out you might be forced to BUY :-)



And there are the four separate frames:

10ms
17s
40ms
40ms