Tuesday, June 26, 2007

Pretty Darn Fancy: Even More Fancy Spam

Looks like the PDF wave of spam is proceeding with a totally different tack. Sorin Mustaca sent me a PDF file that was attached to a pump and dump spam. Unlike the previous incarnation of PDF spams, this one looks a lot like a 'classic' image-spam.

It's some text (which has been misaligned to fool OCR systems), but there's a little twist. Zooming in on a portion of the spam shows that the letters are actually made up of many different colors (a look inside the PDF it's actually an image.

I assume that the colors and the font misalignement is all there to make it difficult to OCR the text (and wrapping it in a PDF will slow down some filters).

Extracting the image and passing it through gocr gave the following result:

_ Ti_ I_ F_ _ Clih! %

%_ % _c. (sR%)
t0.42 N %x

g0 _j_ __ h_ climb mis _k
d% % %g g nle_ Frj%.
_irgs__Dw_s _ rei_ � a
f%%d __. n_ia une jg gtill
oo_i_. Ib _ _ _ _ _ _ 90I

And a run through tesseract resulted in:

9{Eg Takes I:-west.tY`S Far Gawd Climb! UP
fatued Sta=:khlauih. This we 1: still

Close, but no cigar.

Thursday, June 21, 2007

Pretty Darn Fancy: Stock spammers using PDF files

Joe Chongq sent me a fascinating spam that is the first I've seen that's using a PDF file to send its information. I've long predicted that we'll see a wave of complex formats used for spam as broadband penetration increases and sending large spams becomes possible.

This particular spam has a couple of interesting attributes:

1. The PDF file itself is a really nicely formatted report about a particular stock that's being pumped'n'dumped.

2. The file name of the PDF was chosen to entice the user further by using their first name. In this case it was called joe_report.pdf.

3. The PDF is compressed using the standard PDF 'Flate' algorithm and totals 84,398 bytes. That's fairly large, but we've certainly seen image spams that were larger. Use of compression here means that a spam filter that's not aware of PDF formats would be unable to read the message content.

Here's what the actual PDF looks like (click for a larger view):


Wednesday, June 20, 2007

jeaig (jgc's email address image generator) launches

Today is launch day for a simple little web site called jgc's email address image generator. It's a web service that enables anyone to generate a CAPTCHA like image containing their email address for insertion on their web site.

Since web crawlers are currently unable to look inside images to scrape email address this means that your email address will not be scraped from a web site, but can be written down by a human.

Here's an example email address:

Made using jeaig

and the code that generates that is:

<img src="http://jeaig.org/image/UmFuZG9tSVacujftR-zi3H6s63Vd-
<br />
<font size="-2">Made using <a href="http://jeaig.org/">jeaig</a></font>

The server does not store the email address: when a user enters an email address on the site it is padded with a random amount of random data (both before and after the address) with randomness being supplied by /dev/urandom. Then the address is encrypted using Blowfish using a secret key known only to the jeaig server (this key was also generated from /dev/urandom) and a random IV is chosen.

The encrypted data is then modified base-64 encoded so that it can be used in the URL.

When the image is requested using the special base-64 filename the email address is decrypted and then rendered using CAPTCHA code to produce the image. The image changes each time the email address is loaded.

And, yes, this is a free service.

Monday, June 18, 2007

Escaping comma and space in GNU Make

Sometimes you need to hide a comma or a space from GNU Make's parser because GNU Make might strip it (if it's a space) or interpret it as an argument separator (for example, in a function invocation).

First the problem. If you wanted to change every , into a ; in a string in GNU Make you'd probably head for the $(subst) function and do the following:

$(subst ,,;,$(string))

See the problem? The argument separator for functions in GNU Make is , and hence the first , (the search text) is considered to be separator. Hence the search text in the above is actually the empty string, the replacement text is also the empty string and the ;, is just preprended to whatever is in $(string).

A similar problem occurs with spaces. Suppose you want to replace all spaces with ; in a string. You get a similar problem with $(subst), this time because the leading space is stripped:

$(subst ,;,$(string))

That extra space isn't an argument it's just extraneous whitespace and hence it is ignored. GNU Make just ends up appending ; to the $(string).

So, how do you solve this?

The answer is define variables that contain just a comma and just a space and use them. Because the argument splitting is done before variable expansion it's possible to have an argument that's a comma or a space.

For a comma you just do:

comma := ,
$(subst $(comma),;,$(string))

And everything works.

For a space you need to get a space into a string, I find the easiest way is like this:

space :=
space +=
$(subst $(space),;,$(string))

That works because += always space separates the value of the variable with the appended text.

Now, GNU Make has really liberal variable naming rules. Pretty much anything goes, so it's possible to define a variable with the name , or even having the name consisting of a space character.

First, here's how to define them:

, := ,
space :=
space +=
$(space) :=
$(space) +=

The first line is clear, it does an immediate define of a , to the variable named ,. The second one is a little more complex. First, I define a variable called space which contains a space character and then I use it to define a variable whose name is that space character.

You can verify that these work using $(warning) (I like to wrap the variable being printed in square brackets for absolute certainty of the content):

$(warning [$(,)])
$(warning [$( )])

$ make
Makefile:1: [,]
Makefile:2: [ ]

Yes, that's pretty odd looking, but it gets stranger. Since GNU Make will interpret $ followed by a single character as a variable expansion you can drop the braces and write:

$(warning [$,])
$(warning [$ ])

Now that starts to look like escaping. In the examples above you can use these variables to make things a little clearer:

$(subst $(,),;,$(string))
$(subst $ ,;,$(string))

Note that you have to use the $(,) form because function argument splitting occurs before the expansion and GNU Make gets confused. In the second line the space is 'escaped' with the $ sign.

You might be wondering about other crazy variable names: here are a few that's possible with GNU Make:

# Defining the $= or $(=) variable which has the value =
equals := =
$(equals) := =

# Define the $# or $(#) variable which has the value #
hash := \#
$(hash) := \#

# Define the $: or $(:) variable which has the value :
colon := :
$(colon) := :

# Define the $($$) variable which has the value $
dollar := $$
$(dollar) := $$

; := ;
% := %

You probably don't need any of those, but you never know...

POPFile v0.22.5 Released

Here are the details:

Welcome to POPFile v0.22.5

This version is a bug fix and minor feature release that's been over a
year in the making (mostly due to me being preoccupied by other


SourceForge has announced their 'Community Choice Awards' for 2007 and
is looking for nominations. If you feel that POPFile deserves such an
honour please visit the following link and nominate POPFile in the
'Best Project for Communications' category. POPFile requires multiple
nominations (i.e. as many people as possible) to get into the list of




1. POPFile now defaults to using SQLite2 (the Windows installer will
convert existing installations to use SQLite2).

2. Various improvements to the handling of Japanese messages and
improvements for the 'Nihongo' environment:

Performance enhancement for converting character encoding by
skipping conversion which was not needed. Fix a bug where the
wrong character set was sometimes used when the charset was not
defined in the mail header. Fix a bug where several HTML
entities caused misclassification. Avoid a warning 'uninitialized
value'. Fix a bug that the word's links in the bucket's detail
page were not url-encoded.

3. Single Message View now has a link to 'download' the message from
the message history folder.

4. Log file now indicates when an SSL connection is being made to the
mail server.

5. A number of small bug fixes to the POPFile IMAP interface.

6. Installer improvements:

Email reconfiguration no longer assumes Outlook Express is
present. Add/Remove Programs entries for POPFile now show a more
realistic estimate of the program and data size. Better support
for proxies when downloading the SSL Support files. The SSL
patches are no longer 'hard-coded', they are downloaded at
install time. This will make it easier to respond to future
changes to the SSL Support files. The Message Capture Utility
now has a Start Menu shortcut to make it easier to use the
utility. The minimal Perl has been updated to the latest
version. The installer package has been updated to make it work
better on Windows Vista (but further improvements are still




An introduction to installing and using POPFile can be found in the
QuickStart guide:



SSL Support is offered as one of the optional components by the
installer. If the SSL Support option is selected the installer will
download the necessary files during installation.

If SSL support is not selected when installing (or upgrading) POPFile
or if the installer was unable to download all of the SSL files then
the command

setup.exe /SSL

can be used to run the installer again in a special mode which will
only add SSL support to an existing installation.


The current version of SQLite (v3.x) is not compatible with POPFile.
You must use DBD:SQLite2 to access the database.

Users of SSL on non-Windows platforms should NOT use IO::Socket::SSL
v0.97 or v0.99. They are known to be incompatible with POPFile; v1.07
is the most recent release of IO::Socket::SSL that works correctly.


If you are upgrading from pre-v0.22.0 please read the v0.22.0 release
notes for much more information:



Thank you to everyone who has clicked the Donate! button and donated
their hard earned cash to me in support of POPFile. Thank you also to
the people who have contributed their time through patches, feature
requests, bug reports, user support and translations.



Big thanks to all who've contributed to POPFile over the last year.


Friday, June 15, 2007

Is this me?

A friend forwarded me this image with the subject line: Is this you? Looks like an old publicity for Byte magazine, and sure enough it does look rather like me:

So, all I can say is (Austin Powers accent): "Groovy, man. Yeah, baby, that's me."

BTW. The machine looks a lot like a Versatile 2, or perhaps it's a Vector 3. That would date this picture to around 1977.

Monday, June 11, 2007

Measuring my inbox depth

Some time ago I wrote about how I manage the flow of email hitting me. Back then I didn't talk about what happens to the email after it's been filtered and automatically sorted.

Here's a quick picture of my email folders:

POPFile does the automatic sorting into the sub-folders when I download mail (read the article linked above for details on that) and I then manually move mail that I can't respond to immediately to the ACTION folder (to make this quick I use the lovely TB QuickMove Extension which makes all my mail moving a CTRL-# click away).

Anything that's in ACTION needs to be dealt with. Sometimes that means those messages will be gone in a day, sometimes in weeks, just depends on the contents. There are also a couple of other 'action' type folders called Next Newsletter (where I store interesting stuff to mention in my next spam and anti-spam newsletter) and Next GNU Make (where I store stuff that'll go into next month's CM Basics Mr Make column).

It occurred to me that the depth of my inbox might be an interesting thing to track so I wrote a quick Perl script that parses the mbox file associated with the ACTION folder looking for non-deleted messages (I look at the X-Mozilla-Status header and check that the 0x8 bit is not set) to get a count of the messages in ACTION.

Then I pump that data into rrdtool using the RRD::Simple interface from CPAN and use it to create graphs of the number of waiting items. All this runs off a cron job every five minutes (and once an hour for graph creation).

You can see the 'live' graphs here (these will update hourly when my machine is on):

Tuesday, June 05, 2007

A little light relief in an 'enlargement' spam

Jason Steer from IronPort sent me over an image taken from a recent spam for an 'enlargement' product. Here's the image:

Since the text is large I thought it would be fun to run this through gocr to OCR out the text and URL. Here's the output of gocr on the image (I removed a few blanks lines for clarity here):

__ ___

_ ____

So, that worked out well :-) Nevertheless, the domain is listed in the SURBL:

$ dig relies.net.multi.surbl.org
;relies.net.multi.surbl.org. IN A

relies.net.multi.surbl.org. 2100 IN A

So, if you can extract the domain name from the image it's possible to check it against the SURBL and blacklist the message. Switching over to Google's Tesseract OCR system revealed the following:

(I LIT inl 3
IE\)' :
Q ii ,g @1

H ; ik
s$i` wg `i?! J