Friday, September 22, 2006

A Test::Class gotcha

I'm working on a project that involves building a prototype application in Perl. I've made extensive use of Perl's OO features and have a collection of classes that implement the mathematical calculations necessary to drive the web site running the application. Naturally, as I've been building the classes I've been building a unit test suite.

Since Test::Class is the closest thing Perl has to junit or cppunit I'm using it to test all the class methods in my Perl classes. Everything was looking good until I told the guy writing the server to integrate with my code. His code died with an error like this:
Can't locate object method "new" via package "Class::A" (perhaps you
forgot to load "Class::A" at Class/ line 147.
Taking a quick look inside Class::B revealed that it did try to create a new Class::A object and that, sure enough, there was no use Class::A; anywhere in Class::B. Easy enough bug to fix, but what left me scratching my head was why the unit test suite didn't show this.

For each class I have an equivalent test class (so there's Class::A::Test and Class::B::Test) which are loaded using a .t file which in turn is loaded with prove. The test classes all use Test::Class.

The classes are tested with a Makefile that does the following:
@prove classes.t
And classes.t consists of:
use strict;
use warnings;

use Class::A::Test;
use Class::B::Test;

Since the test suite for Class::A does a use Class::A; and the test suite for Class::B does a use Class::B; and the two test suites are loaded using use in classes.t, both Class::A and Class::B are loaded before running the tests. This means that the fact that use Class::A; was missing from Class::B is masked in the test suite.

The solution is to have two .t files one for each class so that only the class being tested is loaded. So I dumped classes.t and created class_a.t and class_b.t as follows:
use strict;
use warnings;

use Class::A::Test;

use strict;
use warnings;

use Class::B::Test;

and the Makefile is changed to do:
@prove class_a.t class_b.t
This now works correctly. The missing use Class::A; causes a fatal error in the test suite.

Thursday, September 21, 2006

BOUTS: Best of UseTheSource

Long, long ago, OK, 1999, I registered the domain name (as in Use the source, Luke!) and used it to start a web site which would these days be called a blog. The site was powered by Slashdot's code Slashcode and featured a mix of my commentary on the news and original articles. You can still read the old site at The site even got me an appearance with Leo Laporte on The Screen Savers.

Most of the articles published are irrelevant today. The commentary is often on start ups that have fizzled out long ago, and I shut down the site in 2004. But some of the articles are worth repeating. So, from time to time, I'll be republishing original pieces from UseTheSource as BOUTS entries in this blog.

To get things rolling here's an article I wrote back in 2002 about calculating the area of an annulus based on the length of a tangent.

Originally published June 12, 2002

Take a look at the following shape. It's an annulus: two concentric circles, something like a simple washer or donut.

Imagine that you know only one fact about this shape, the length of a tangent of the inner circle where it touches the edges of the outer circle. Call that length x. Can you calculate the area of the yellow shaded part?

This problem was presented on the NPR radio show CarTalk a few weeks back and after I solved it I realized that there were a couple of interesting ways of calculating the area. Both require knowledge of the formula for the area of a circle: πr2, where r is the radius of the circle. One requires remembering Pythagoras' Theorem, the other a little logical reasoning.

Solution by logical reasoning

Insight: there must be many such concentric circles where it's possible that the tangent has length x.

In fact if I start with the small circle in the middle it must always be possible to choose the size of the outer circle so that the tangent is x.

So what if I make then inner circle have size zero. Then all I need is an outer circle with diameter x.

Since we know there's only one solution (surely the person posing this question knew that there was only one solution), then we can just calculate the area of the outer circle when the inner circle has zero radius.

The outer circle in that case has diameter x or radius x/2 and so the area is π(x/2)2 or πx2/4.

Solution by Pythagoras

To calculate the area of the annulus we need to calculate the area of the big circle and subtract the area of the small circle. If we name the radius of the big circle r and the radius of the small circle s then we need to calculate πr2 - πs2 or π(r2 - s2). Hmm. That r2 - s2 bit looks a lot like something we might get from Pythagoras' Theorem (the square on the hypotenuse is equal to the sum of the squares on the other two sides).

For Pythagoras we need a right angle triangle. Low and behold we have one. Since we have a tangent we know it's at right angles to a radius of the inner circle. The complete triangle has sides r, s and x/2.

So run through Pythagoras on this triangle and we get r2 = (x/2)2 + s2. Subtract s2 from both sides and you've got r2 - s2 = (x/2)2. Now we know how to calculate r2 - s2, it's (x/2)2 and so the area of the annulus is π(x/2)2 or πx2/4.

iPod nano in diagnostic mode

This morning I was running out to do some shopping. I grabbed my iPod nano... when I slipped it out of its case it was somehow in diagnostic mode. Here's what it looked like:

This text interface includes a text scroll bar on the right hand side with a $ sign indicating the current position. The menu is scrolled with the << and >> keys. Clicking on the FiveInOne test reveals:

Great, SDRAM is OK and I have a 3.81GB HD. Asking for HD information from the menu reveasls that it's FAT-32 partition in the device, and the ID number of the drive:

And then I got just too excited on seeing the screen pattern test and couldn't hold the camera still:

More on this here.

Wednesday, September 20, 2006

Watching a phishing attack live

Yesterday a phishing mail for a community bank in a US east coast state (throughout this blog post I have obscured many details including names, domains and IP addresses) slipped through GMail's spam/phish filter and then right through POPFile. Only Thunderbird bothered to warn me that it might be a scam.

The message itself was sent from an ADSL connected machine in China.

Of course, since I don't have an account with this bank it was an obvious phish, but I was curious about it so I followed the link in the message.

The link appeared to go to https://***** but actually went to http://***.***.164.158:82/***** Clearly a phish running on a compromised host.

A reverse DNS lookup on the IP address of the host revealed that the phish was being handled by a web server installed in a school in a small central Californian town. The machine appeared to be running IIS, but the phishing server identified itself (on port 82) as Apache/2.0.55 (Win32) Server.

The Start.html page was identical to the actual sign on page used by the bank. In fact taking a screen shot of the real page and doing a screen shot of the phishing page revealed that they were identical. Even the MD5 checksum of the images was the same. Naturally, not everything was the same in the HTML.

Although almost all the HTML was identical (with the phishing site even pulling its images off the real bank's site), the name of the script that handled validation of the user name and password had been changed from SignOn.asp (the actual bank uses ASP) to verify.php (the phisher used PHP).

The only significant diff between the phisher site and the real site is:

< <form action="verify.php" method="post" id="form1" name="form1">
> <form action="SignOn.asp" method="post" id="form1" name="form1">

Once a username and password was entered the phishee was taken to a page asking for name, email address, credit card number, CVV2 number and PIN (with the PIN asked for a second time for validation). After that the user was thanked for verifying their details.

The user name, password, credit card number, CVV2 number and PINs were saved to a file called red.txt in the same directory as the HTML and PHP files used to make the phishing site. How do I know that? Simple, by popping up one level in the phishing URL to http://***.***.164.158:82/***** I was able to get a directory listing. In the directory there were three HTML files, two PHP scripts and red.txt. Clicking on that file gave me access to the phished details as they came in.

I quickly informed the bank and US CERT of the phishing site. I tried to figure out how to contact the school, but it was 0500 in California.

Here's a sample entry from the actual log file.

Tue Sep 19, 2006 5:33 am
Username: youare
Password: stuipd
Tue Sep 19, 2006 5:34 am
cc: 4111111111111111
expm: 10
expy: 2006
cvv: 321
pin: 1122
pin2: 1122

The time is local to California and you can see the details that the person entered. Here clearly a vigilante has decided to mess with the phisher by entering bogus details. In fact, the last time I was able to access the site (before it was pulled down) there were 33 entries in the log file. Of these 32 contained nothing, or offensive user names and passwords.

But one seemed to contain legitimate information.

The log file had a first entry at 0454 California time from a machine owned by MessageLabs (I assume that they are doing some automated testing of phishing sites), the last entry was as 1226 California time.

The one legitimate entry contained a valid Visa card number (valid in the sense that the number validated against the standard Luhn check digit algorithm). Also the user name and password looked legitimate and a quick Google search revealed that the username was also used as part of the email address of a small business in the same town as one of this small bank's branches. It looked very likely that this entry was legitimate and the person had given away their real card number and PIN.

US CERT quickly responded with an auto-response assigning me an incident number and I received an email from the bank's IT Ops Manager Jack. Jack told me that he was already aware of the site and that this was the third time this little bank had been phished from machines in California and Germany. I gave Jack the name of the school in California, and he said he'd get in contact with them (he'd already called the FBI). I also told Jack about the one card number that looked totally legitimate; he told me he was in charge of all card operations at the bank and had the power to deal with it.

Some hours after that the site went offline.

Friday, September 15, 2006

Image spam filtering BOF at Virus Bulletin 2006 Montreal

I'm leading a BOF meeting at Virus Bulletin 2006 in Montreal next month. The idea is to get together in one room for a practical, tactical meeting to share experiences on how people are currently filtering image spam and what might be done in future (and what we expect spammers to do). I've already got commitments from major anti-spam vendors to be there and talk (as much as they are permitted) about their approach and I'll try to cover what the Bayesian guys are doing.

If you are interested please email me, or post a comment here. If you represent a vendor and want to be involved I'm especially interested to hear from you as I want to get all experiences out on the table (as much as is practical).

Date and Time Confirmed: Thursday, October 12. 17:40 to 18:40 in the Press Room.

Downloadable PDF flyer.

Thursday, September 14, 2006

A C implementation of my simple GPS code

Reader Chris Kuethe wrote in with a version of my simple code for entering latitude and longitude to GPS devices written in C (my demonstration code was in Perl).

Seems Chris is a bit of a GPS fanatic and maintains a page on GPS hackery.

He ported my Perl code to C and is releasing the code freely. He gave me the choice of releasing under two clause BSD license or making it public domain. I think the most generous is public domain (especially since the Perl code was public domain).

Here's the code to compute a SOC:

#include <sys/types.h>
#include <stdio.>

main(int argc, char **argv){
int i, j;
unsigned long long lat, lon, c, p, soc_num;
char soc[11], *alpha = "ABCDEFGHJKLMNPQRTUVWXY0123456789";
int primes[] = { 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 };
float f;

if (argc != 3){
printf("Usage: %s <lat> <lon>\n", argv[0]);

sscanf(argv[1], "%f", &f);
lat = (int)((f + 90.0) * 10000.0);

sscanf(argv[2], "%f", &f);
lon = (int)((f +180.0) * 10000.0);

p = lat * 3600000 + lon;
soc_num = p * 128;

c = 0;
for(i = 0; i < (sizeof(primes)/sizeof(primes[0])); i++){
c += ((p % 32) * primes[i]);
p /= 32;

c %= 127;
soc_num += c;

for(i = 9; i >= 0; i--){
j = soc_num % 32;
soc[i] = alpha[j];
soc_num /= 32;
soc[10] = '\0';

printf("%s\n", soc);

And to compute latitude and longitude from a SOC:

#include <sys/types.h>
#include <stdio.h>

main(int argc, char **argv){
int i, j, c, k;
unsigned long long x, y, p, soc_num;
char soc[11], *alpha = "ABCDEFGHJKLMNPQRTUVWXY0123456789";
int primes[] = { 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 };
float lat, lon;

if ((argc != 2 )|| (strlen(argv[1]) != 10)){
printf("Usage: %s <10-digit-SOC>\n", argv[0]);

soc_num = 0;
for (i = 0; i < 10; i++){
c = (char)argv[1][i];
c = c & 0xff;
c = toupper(c);
case 'I': c = '1'; break;
case 'O': c = '0'; break;
case 'S': c = '5'; break;
case 'Z': c = '2'; break;
default: ;
for (j = 0; j < strlen(alpha); j++)
if (c == alpha[j]){
soc_num = (soc_num * 32 + j);

p = soc_num / 128;
k = soc_num % 128;

lon = ((p % 3600000) / 10000.0) -180.0;
lat = ((p / 3600000) / 10000.0) - 90.0;

c = 0;
for (i = 0; i < (sizeof(primes)/sizeof(primes[0])); i++){
c += ((p % 32) * primes[i]);
p /= 32;

c %= 127;
if (c != k)
printf("warning: checksum mismatch - %d %d\n", c, k);
printf("%0.4f %0.4f\n", lat, lon);

Thanks Chris!

Update: Chris writes to say that B1NLADEN02 can be found in Antarctica: -76.7847/-106.0187 and JIMMYHOFFA is here: -23.3433/-61.6087.

Wednesday, September 13, 2006

Apologia: Sophos and SoftScan

After reading all the blog posts, mailing list and personal mail concerning my post yesterday (Did SoftScan, Sophos and Panda rip off my blog?) I think I need to apologize to two of the companies involved.

As I mention in the updated post both SoftScan and Sophos explain that it's a conincidence and since I have no evidence that they copied stuff from this blog (even though it appeared on the front page of Slashdot before their PR), I think I owe them an apology. It probably would have been prudent of me to restrict yesterday's posting to just Panda and ignore SoftScan and Sophos.


*bows head in shame*

Tuesday, September 12, 2006

Did SoftScan, Sophos and Panda rip off my blog? (Update: SoftScan and Sophos says 'no')

This morning I saw a news article about subliminal spam messages on ZDNet. I was intrigued to read about it because a few days ago Nick FitzGerald wrote to me with an example spam that he dubbed 'subliminal'. I wrote back and told him I was going to blog about it and he said go ahead.

The blog post is Subliminal advertising in spam? and was posted on Monday, September 4, 2006. That same day Slashdot picked up my blog post here. Later it was also picked up by Digg.

So I was a little surprised that the ZDNet article didn't mention Nick, me, my blog, Slashdot, or Digg. In fact, the article contains a link to Panda's press release on the subject: PandaLabs detects a new spam technique in which they state "PandaLabs has detected a spam message that uses subliminal advertising techniques.". No mention of this blog anywhere there either, but there are two images of such a spam, both of which I believe were lifted directly from my blog without attribution. The press release is dated the day after my post/Slashdot headline: Tuesday, September 5, 2006.

Here are the images side by side for comparison

Image from my blog post

Image from Panda's press release (local archive of the image)

And I named my image sub2.gif when I extracted it from the spam, and Panda named the same image sub2.gif. The MD5 checksum of my image is 9cace353b2d8b2db1d8868c07986f768 and the Panda image has the checksum 9cace353b2d8b2db1d8868c07986f768. And I also thought the original was a bit large for my blog so I reduced it from 603x451 to 302x226, the Panda image has the same reduced dimension. Hmm. Exactly the same image.

The other image in the press release is also, I believe, from my blog:

Image from my blog post

Image from Panda's press release (local archive of the image)

Once again, I named my image sub3.gif when I extracted it from the spam, and Panda named the same image sub3.gif. The MD5 checksum of my image is 6e16df2d3b67a7578ca7b09f0ccb9fc1 and the Panda image has the checksum 6e16df2d3b67a7578ca7b09f0ccb9fc1. Again I thought the original was a bit large for my blog so I reduced it from 603x451 to 302x226, the Panda image has the same reduced dimension. Hmm. Exactly the same image, again.

So it looks a lot to me like Panda heard about my blog post (perhaps through Slashdot) and then passed Nick's example off as their own research. Of course, it's possible that Panda the day after my blog post, independently found the same thing, named it subliminal spam, named the frames within the gif the same thing as me, extracted them from exactly the same spam image (which they managed to capture even though spammers are adding random noise so that hashing is impossible) and issued their press release.

On Wednesday, September 6, 2006 (two days after my blog post/Slashdot headline) Sophos put out a press release Spammers use subliminal messages in latest pump-and-dump scams in which they state: "Experts at SophosLabs™, Sophos's global network of virus, spyware and spam analysis centers, have identified a "pump-and-dump" stock spam campaign which uses an animated graphic to display a "subliminal" message to potential investors."

Once again the release doesn't mention me, Nick, this blog, Slashdot, Digg, ... It too includes an image that appears to be from the same spam campaign I was blogging about (a pump and dump for the stock TMXO), but there's no image borrowing here. The image is from the same campaign but different, and they no doubt didn't borrow any images from me.

Clearly, Sophos could have seen the same spam campaign as Nick and I and come to the same conclusion and called it 'subliminal' spam.

On Thursday, September 7, 2006 it appears that SoftScan got into the game too. They are mentioned in this article where it's written: "SoftScan's analysis of the latest pump-and-dump scam has discovered that an image appears for a split second every so often in the email with the word 'buy' repeated several times."

Disclaimer: I can't prove that any of these companies saw my blog post on Slashdot and then issued press releases, but the timing is interesting: my blog post comes first followed by press releases and articles using either the same image, the same campaign and all calling it 'subliminal spam'. Perhaps 'subliminal' spam was an obvious name, and I'm crazy, but...

An offer: on the other hand, if any company would like free reign to pass off things on my blog as their own work I have a simple offer for you: give me a small stock option in your company, call me a 'technical advisor' or similar, and feel free to take what you want from here.

UPDATE: SoftScan's Corporate Communications Manager Bo Engelbrechtsen comments below (see comments section) that they independently found this, and had never heard of this blog before.

UPDATE: In a private email a Sophos employee I know well says: "I personally alerted Sophos's PR team about this spammer trick [...] The word "subliminal" was the first thing that came to my mind when I saw it. [...] I don't read John's blog and am very disappointed with this insinuation. We receive millions of spam e-mails to our traps every day, many of which get analyzed and looked at by spam analysts around the world. We don't need to steal someone else's story..."

Wednesday, September 06, 2006

Slashdot effect = 3.5 * Digg effect

On Monday a post on this blog was on the front page of Slashdot and then on Tuesday the same link made it to the front page of Digg. Since my blog has Google Analytics enabled this gives me an unprecedented opportunity to measure the number of visitors from each site for the same story.

Here are the referrer stats for the period: (45,473) (13,009) (1,975) (1,197) (988) (248)

So Slashdot brought in 45,473 unique visitors and Digg 13,009. That means the posting on Slashdot was worth 3.5 times as many visitors as Digg.

There's one big question which means that the Slashdot effect might be bigger than stated here. Monday was Labor Day in the US with a lot of people taking time off. Perhaps Slashdot's readership was lower on Monday than normal meaning that the Slashdot effect is more than 3.5 the Digg effect.

Monday, September 04, 2006

Optimal SMS keyboard layouts

One of the things I find very frustrating about typing SMS messages on my phone is that I often find that the next letter I want to type is actually on the same key that I just pressed. And that slows me down because either I wait for the timeout, or I click the right arrow key to move on.

For example, here's a standard keyboard on cell phones:

abc def
1 2 3

ghi jkl mno
4 5 6

pqrs tuv wxyz
7 8 9

Very common English letter pairs such as 'ed' and 'on' appear on the same key meaning that if you need to type one of these you are going to incur the cost of dealing with the 'next letter is on same key' problem. In addition, the most common English letters are more than one click away; the most common English letter 'e' is two clicks, 'o' is three, 'n' is two, etc.

What you really want is a keyboard layout that means that most common letters are as few clicks away as possible, and that the common letter pairs are on different keys so that you can maximize typing speed. And if possible make the layout as close as the current one so that it's easy to learn.

There are some people who propose squeezing QWERTY into the the current keyboard. This ends up with a key that starts with 'q' and another that starts with 'z': two of the least common keys are given pride of place on the keyboard.

Other propose using dictionaries. I think the fastest typing would be on an intelligently laid out keyboard without the need for a dictionary.

I took the 1000 most common words in English as a test set, and performed three keyboard optimizations: one by hand and two using different sets of common letter pairs and tested them against the 1000 most common words. Each set received a score equal to the number of clicks required to type all 1000 words. A single click on a key was worth one click (so typing 'a' is one click, typing 'q' is two, etc. on the standard keyboard), and the cost of handling the 'next letter is on same key' was set at the same time as two clicks.

I used letter and letter pair frequency information from the excellent book Cryptanalysis. And of course I wrote some code to perform the layout of the keyboards optimizing for the least number of clicks per common letter and the least number of same key clicks for letter pairs.

The standard keyboard layout gets a score of 12,447 clicks.

The following machine generated layout can be used to type the same words in 8757 clicks (70.35% of the clicks of the standard keyboard):

acb euwj
1 2 3

ipg hmx olv
4 5 6

sfq tdyz nrk
7 8 9

This keyboard doesn't look anything like the original keyboard so I then used a shorter list of letter pairs and hand optimized the keyboard to balance clicks and similarity. The result is 8,912 clicks (or 71.6% of the standard keyboard) and a nice layout:

adc efb
1 2 3

igj hlk omz
4 5 6

srpq tuv nwyx
7 8 9

Now, if there was only a way to get that on my RAZR I could save 30% of my typing time.

Subliminal advertising in spam?

Nick FitzGerald sent me a great example of subliminal advertising in a spam message. At least that's what he thinks the spammer might have been up to. The spam contains an animated GIF with four frames. One of the frames (which contains the actual spam message) remains visible for 17 seconds. The other three frames are displayed for 10ms or 40ms, and each of those contains a little random noise and the word BUY in random positions.

Was the spammer really hoping to make us fall for his pump and dump scam with a quick flash of BUY on screen?

Here's the actual GIF with the animation in place (watch out you might be forced to BUY :-)

And there are the four separate frames:


Friday, September 01, 2006

The hell of Dell France

Last October I started a company in France. The French government kindly supplied my details to various companies without me asking. I suspect this happened because information about companies is a public record and certain marketing-savvy companies slurped up my information and sent my 'useful' junk mail: catalogs for office equipment for example.

One of the companies that felt the sudden urge to write to me was Dell. For a while I owned Dell computers and for various reasons (mostly to do with they terrible support for small business and their weird 'you need to buy Dell racks for your gear') I stopped buying anything from them.

So as each piece of junk mail came in I would unsubscribe. Sometimes this was a phone call, sometimes a fax and sometimes it was necessary to return the item with 'Désinscription' or similar written on it.

And it worked great, except for Dell.

For 10 months I've tried to unsubscribe.

I've emailed them at [email protected] as requested. I've faxed them on 0825 004 682 as they also suggest and I've mailed them at Koba D/03-F, ZI de Chevreuil, F-60490 Ressons Sur Metz. And still their junk keeps coming.