Friday, June 23, 2006

Proposed uniform naming scheme for spammer/phisher content trickery

This post is a proposal to rename all the tricks in The Spammers' Compendium to a uniform scheme that means that tricks can be referred to easily by spam filtering products, that includes information about the purpose and technology used in the trick, and preserves unique naming for each trick.

I'd love to hear comments on this.

Each name consists of three ! separated parts: a purpose, a name, and a technology. The purpose is the reason for the trick (for example, the trick is used to obscure a URL, or to insert innocent words). The name is derived from the current TSC perjorative name. The technology identifies the way in which the trick is coded (for example, with HTML or MIME).

For a single name there could be multiple tricks using different technologies (e.g. some tricks might be implemented using HTML or CSS), or for different purposes (words might be inserted to fool a Bayesian filter or break a hash).

I propose the following purposes for a trick:
  • BWO (Bad Word Obfuscation) Making it hard for a filter to parse potentially bad words (e.g. Viagra)
  • GWI (Good Word Insertion) Adding words likely to confuse a statistical filter
  • HB (Hash Busting) Inserting randomness designed to make message hashing hard
  • TA (Tokenization Avoidance) Preventing a filter from tokenizing a message
  • UH (URL Hiding) Hiding a URL so that a user is fooled into clicking an incorrect link
  • UO (URL Obfuscation) Making it hard for a filter to identify a URL and check it against a black list
  • WB (Web Bugs) Inserting a beacon that tells the spammer that a message has been read
The following technologies would be recognized in the naming scheme:
  • CSS Use of CSS
  • HTML Any HTML without using CSS
  • Javascript Use of Javascript for trickery
  • MIME Manipulating of MIME
  • Plain Plain text
For example, the original Invisible Ink trick written using HTML would be referred to as GWI!Invisible!HTML and a CSS variant would be GWI!Invisible!CSS. Names would only be generated for tricks actually seen in the wild.

With such uniform naming it would be possible to analyze spams and phishes (perhaps even specific Perl recognizers for each trick could be written) and then trends built up over time to see how individual tricks and individual classes of tricks are changing.

Currently, TSC contains 55 tricks, although I'm not sure that all of them are suitable for renaming. Here's my proposed naming of the current state of TSC:

The Big Picture TA!BigPicture!HTML
Invisible Ink GWI!Invisible!HTML and GWI!Invisible!CSS
The Daily News GWI!BigTag!HTML
Hypertextus Interruptus BWO!Interruptus!HTML
Slice and Dice TA!SliceNDice!HTML
Lost in Space BWO!Space!Plain
Enigma UO!Enigma!HTML
Script Writer TA!Script!Javascript
Ze Foreign Accent BWO!Accent!Plain
Speaking in Tongues HB!Tongues!Plain
The Black Hole BWO!BlackHole!HTML
A Numbers Game BWO!Numbers!HTML
Bogus Login UO!BogusLogin!HTML
Honey, I Shrunk the Font GWI!ShrunkFont!HTML
No Whitespace, No Cry TA!NoWhitespace!Plain
Honorary Title GWI!Title!HTML
Camouflage GWI!Camouflage!HTML
And in the right corner HB!RightCorner!Plain
A Form of Desperation GWI!Form!HTML and BWO!Form!HTML
It's Mini Marquee! GWI!Marquee!HTML
You've been framed BWO!Framed!HTML
Control Freak TA!ControlFreak!Plain
Don't Cramp My Style GWI!Style!CSS
The Microdot BWO!Microdot!CSS
WYSI_not_WYG UH!WYSINotWYG!Javascript
Ultra See Engima
Internet Exploiter UH!InternetExploiter!HTML
Style Wars: Episode 1 Included in other tricks
The tURLing Test UO!TurlingTest!Plain
Flex Hex BWO!FlexHex!CSS
Sound of Silence WB!Silence!HTML
Blankety Blank BWO!BlanketyBlank!HTML
Doing the Splits BWO!Splits!Plain
But is it art? BWO!ASCIIArt!Plain
Absolute Zero Same as Control Freak
Spell Breaker BWO!Splelnig!Plain
About Face BWO!AboutFace!HTML
Catch a Wave TA!Wave!HTML
Treasure Map UH!TreasureMap!HTML
You cannot be serious UO!Mcenroe!HTML
The Matrix TA!Matrix!Plain
Sticky Fingers BWO!StickyFingers!Plain
Floatation Device TA!Floatation!CSS
The Small Picture TA!SmallPicture!HTML
Chop GUI TA!ChopGUI!HTML or perhaps HB!ChopGUI!HTML
Big Header-ed ? Not sure of the purpose of this perhaps TA?
The Rake BWO!TheRake!CSS
Now you see it; now you don't BWO!Copperfield!CSS
Slick Click Trick UH!Caption!HTML
Whiter shade of Pale TA!Pale!HTML

This list is an order of discovery. It's interesting to see the rise of UH (URL Hiding) tricks as phishing has grown.


nih said...

It's a damn good idea. I hope this somehow comes about into mainstream usage. I'll be bookmarking it so I can say "I was there first" when my nerdy grandkids come into existence.

Anonymous said...

.... has a small drawback:

makes it easier for spammers to identify what's hurting them and what they should watch for without reading code.

"creative" naming makes it harder to decypher having to read the rules.

J.D. said...

Part of the confusion out there stems from the fact that many people aleady don't understand what we anti-spam experts are talking will it help to introduce new, even more obscure acronyms?

Kevin W. Gagel said...

I like your idea. I like that it could be used for trending. Trending is a good way to see what is coming and what we might have to do for prevention.

I don't think making it easier to identify what causes a spammer's mail to be tagged is much of a problem. They still have to work to bypass the filters.

As for acronyms, they don't help (or hurt) but prediction based on trends will help any administrator who is activly working on the prevention side of things.

You've got my vote.

Sorin said...

Hi John,

The question here is :
What are you trying to achieve ?

Are you trying to make all Antispam products to name the spam tricks in the same way ? This, most probably won't work.

Are you trying to give to not-so-technical people a way to play with these names ? This might work.

Here is a link which I'm sure you know:
Wildlist organization is trying for over 8 years to make the malware naming uniform. So far, they didn't succeed.

I suggest for spam techniques something more like CME:

We give a short unique identifier (and not name) to each method and then a friendly name (which Marketing will love!) derived from TSC.
This way, we can be sure that we are always talking about the same thing and also Marketing can play with things like "hypertextus interruptus" as much as they like.

The UID could be derived according to some algorithm which has as a basis the TSC. (I am thinking to something like a table with all the tricks from TSC and then we could 'check' which features are present in a spam/phishing. The result would be the UID...

My 2 (euro)cents.


John Graham-Cumming said...

1. I'm not worried about making it harder for the general public to understand spammer tricks. The naming scheme is not designed to deal with people who aren't in the industry.

2. I don't think that worrying about wildlist is a problem. The anti-virus industry has come up with separate names for viruses because of a marketing need. I don't see the same need in anti-spam and hence the tricks could be unified under one name if we start now.

Secondly, IMHO the MITRE scheme is horrible. Oh, yeah, who recalls CME-328? Saying Bagle is much clearer.

My scheme has the benefit of an easily remembered name (e.g. Camouflage) and a strict name that can be used be different people in the industry.


Paul Maddox said...

Hi John,

Looks good. I have two suggestions:

1. Rather than having multiple names, eg.


how about allowing them both separately, or together either as:




2. I know it's minor, but I think the ! makes the names harder to read. How about a . instead? I know it's a bit virus-like, so maybe a hyphen or something..





Keep up the good work.