Wednesday, December 17, 2008

How to write an InfoWorld article

While browsing Hacker News I came across a submission that read like a press release, but was presented as a story. It didn't take much searching to find the associated press release.

Here's a breakdown of the original article (in blue) and the similar text from the press release (in red).

SensAble Technologies, provider of haptic devices, applications, and toolkits, is offering OpenHaptics 3.0, a software development kit to simplify touch-enabling of computer applications.

SensAble Technologies, the leading provider of haptic devices, applications and toolkits, announced the immediate availability of OpenHaptics® version 3.0, a software development toolkit that dramatically simplifies and speeds the touch-enabling of computer applications.

New categories of developers such as scientists and simulation and training providers can add a sense of touch to their applications, the company said. The product works with SensAble Phantom force-feedback haptic devices, which simulate, for example, the feeling of tissues in a human body by pushing back on a user's hand.

SensAble’s enhanced toolkit opens the door to entirely new categories of developers, such as scientists and simulation and training providers, who want to add a realistic sense of touch to their applications, but lack the years of advanced programming and haptics expertise previously required to do so.

The upgraded kit features the QuickHaptics micro API, enabling users with a basic familiarity of C++ to add kinesthetic feedback to what is seen or heard on a computer screen.

Designed to work with SensAble’s PHANTOM® force-feedback haptic devices, the release includes the new QuickHaptics™ micro API, which enables any professional with even passing familiarity with C++ to quickly and easily add kinesthetic feedback to what users see and/or hear on a computer screen.

(Earlier this year, Samsung launched a haptic cell phone.)

The API streamlines three types of complex programming, including operating system-specific windowing, scene graph management, and force rendering in haptics threads, SensAble said. An application to touch and manipulate a 3-D model can be written with 8 lines of programming code instead of 300 lines, the company said.

The new QuickHaptics micro API in OpenHaptics 3.0 greatly streamlines the three distinct types of complex programming typically required when writing a haptics application: operating system-specific windowing, scene graph management, and force rendering in haptics threads. For example, with QuickHaptics, a simple application to touch and manipulate a 3D model can be written with just 8 lines of programming code – instead of 300.

For medical, scientific, and training applications, Version 3.0 supports faster addition of exceptional realism, which is a virtual environment in which users can add touch to applications.

Faster and easier addition of exceptional realism, which is vital to medical, scientific, and training applications.

As an example of how the product works, a developer with no graphics or haptics experience could use QuickHaptics to prototype a training application for veterinary students, SensAble said. The developer could import 3-D models of an animal's anatomy and assign haptic material properties allowing a trainee to use the Phantom device to feel the difference between healthy and diseased organs.

For example, a developer without any graphics or haptics experience could use QuickHaptics to prototype a training application for veterinary students. Using QuickHaptics, the developer could easily import all the necessary 3D models of an animal’s anatomy and then quickly assign haptic material properties, allowing the trainee to use the PHANTOM device to literally “feel” the differences between healthy and diseased internal organs.

"Haptics experts will find QuickHaptics to be invaluable in helping them add virtual touch to their applications in innovative ways. On the other hand, developers who have no experience with haptics programming can easily get to work and be productive quickly," said David Chen, chief technology officer of SensAble, in a statement released by the company.

“Haptics experts will find QuickHaptics to be invaluable in helping them add virtual touch to their applications in innovative ways. On the other hand, developers who have no experience with haptics programming can easily get to work and be productive quickly.”

Version 3.0 also features the ability to build mashups, which combine programming code from various sources into existing applications. Reuse of source code is enabled.

Perform mash-ups – combining programming code from various sources into existing applications

Users also can load 3-D models with textures in standard formats using a single command. There is no need to convert models into specialized file formats before haptic programming.

For example, users can load 3D models with textures in a variety of widely adopted, standard file formats using a single command – eliminating the need to convert models into specialized file formats prior to haptic programming.

OpenHaptics 3.0 is available for 32-bit Windows XP and Vista for $950 per seat for commercial developers. It is free for academic developers. Current OpenHaptics commercial customers on software maintenance contracts get the software at no additional charge.

OpenHaptics v3.0 for Microsoft Windows® 32-bit XP and Vista is available now and is priced at $950 (USD) per seat for commercial developers, and is available at no charge to academic developers and SensAble’s OpenHaptics commercial customers with active software maintenance contracts.

Linux and 64-bit versions will be available early next year.

Linux® as well as 64-bit versions will be available in early 2009.

So the entire article is a small diff from the press release. To be fair to the article's author he did repeatedly add "the company said".

More on the 7 point scale

Some time ago I wrote about my love of seven point scales for measuring things (especially human attitudes). In the original article I mentioned John Ousterhout's scale for hiring people and Kinsey's scale of human sexuality.

I've come across a few more seven pointers:

1. In The God Delusion Dawkins presents a seven point scale for measuring how much you are a theist or atheist.

2. Then there's the Bristol Stool Scale describing the seven types of human faeces

3. And finally, The International Banana Association's ripeness scale.

Got any more?

Friday, December 12, 2008

POPFile v1.1.0 Released

There's a new POPFile out (v1.1.0) and I had almost nothing to do with it. This is the first release where the new (global) POPFile Core Team did all the work. Thanks Brian in the UK, Joseph in the US, Manni in Germany and Naoki in Japan. A truly global effort.

As part of the v1.1.0 release POPFile has moved from SourceForge to its own server and has a totally new web site.

v1.1.0 also includes some great new features: it is the first to use a
SQLite 3.x database and it is the first to offer a Mac OS X installer in addition
to the usual cross-platform and Windows installer versions.

And there are a raft of bug fixes as well which you can read about in the release notes.

Spaces are a pain in painless non-recursive Make

In my book GNU Make Unleashed I published a pattern for doing Make without having a recursive descent into directories. It works well and I know that many people are using it.

But the other day I received an email from Terry V. Bush at VMWare saying that he had trouble with it because of 'the third-party problem'. The third-party problem is my name for the problem that occurs when your beautifully written Make system has to incorporate some wart of source code from a third-party vendor. In Terry's case that third-party has spaces in the path names.

Space is path names are a real bind in Make (that's another topic I cover in GNU Make Unleashed) and Terry really wanted to use my non-recursive Make pattern but needed to handle this ugly third-party.

I'll let him continue the story...

What happens is that if you have a directory name with a space in it your functions fail to find the root. Also, they always walk the entire tree up to the top even after they have found the root of the tree. Here is the version published in "GNU Make Unleashed":

sp :=
sp +=

_walk = $(if $1,$(wildcard /$(subst $(sp),/,$1)/$2) \
$(call _walk,$(wordlist 2,$(words $1),x $1),$2))

_find = $(firstword $(call _walk,$(strip $(subst /, ,$1)),$2))
_ROOT := $(patsubst %/root.mak,%,$(call _find,$(CURDIR),root.mak))

What I have done to solve these two issues is:

1: Add an "if" that returns when the root is found. This actually makes other parts of this function simpler. It also makes is slightly faster, albeit very slightly...

2: Substituted a "|" char (any char that is highly unlikely to be in a real directory name will work) for each space in the path and then put the spaces back when necessary.

Also, to simplify things a little, I added an eval that puts the result of wildcard into a temp var "_X" so that returning it when found is trivial.

sp :=
sp +=
_walk = $(if $1, \
$(if $(eval _X=$(wildcard /$(subst |,\$(sp),$(subst \
$(sp),/,$1))/$2))$(_X),$(_X), \
$(call _walk,$(wordlist 2,$(words $1),x $1),$2)))
_find = $(call _walk,$(strip $(subst /,$(sp),$(subst $(sp),|,$1))),$2)
_ROOT := $(patsubst %/root.mak,%,$(call _find,$(CURDIR),root.mak))

My plan for this is to combine your "Painless non-recursive Make" with Paul D. Smith's "Advanced Auto-Dependency Generation" Make code to produce a fast and extensible Make environment for products at VMware.


Nice, and "Advanced Auto-Dependency Generation" is also covered in GNU Make Unleashed.

Friday, December 05, 2008

Help save Bletchley Park: donate a power of 2

Bletchley Park in the UK needs money to keep operating. For any self-respecting geek there are a few places of pilgrimage in the world, and Bletchley is near the top of the list.

Not only did code breakers crack Enigma and other codes during the Second World War at Bletchley, but it was where much early computing work was done by great scientists including the man: Alan Turing.

Bletchley accepts donations on their web site via PayPal or credit cards. If you want to help may I suggest taking a look at your finances and donating a power of 2?

I just logged into my PayPal and donated the maximum I could afford: 28.

See if you can donate $1, $2, $4, $8, or more. And pass on the link to this blog posting to get the 'A Power of 2 for Bletchley' meme going.

Thursday, December 04, 2008

"The Geek Atlas" gets a proper home page

O'Reilly has put up the permanent home page of my book "The Geek Atlas" here. It has more details of the book:

With this unique traveler's guide, you'll learn about 128 destinations around the world where discoveries in science, mathematics, or technology occurred or is happening now. Travel to Munich to see the world's largest science museum, watch Foucault's pendulum swinging in Paris, ponder a descendant of Newton's apple tree at Trinity College, Cambridge, and more. Each site in The Geek Atlas focuses on discoveries or inventions, and includes information about the people and the science behind them.

(Click through for even more information).

I've also created a Flickr group Geek Atlas which people who've visited sites in the book can add their photographs. The following sites have been publicly revealed at this point:

  1. Experimental Breeder Reactor 1

  2. Bletchley Park in the UK, where the Enigma code was broken

  3. The Horn Antenna in Holmdel, New Jersey, where the Big Bang theory was accidentally confirmed

  4. The Trinity Test site in New Mexico, where the first atomic bomb was exploded

  5. The Alan Turing Memorial in Manchester, England

  6. The National Cryptologic Museum in Fort Meade, Maryland

  7. The Joint Genome Institute in Walnut Creek, California


121 more great places to visit in the book.

Monday, December 01, 2008

"The Geek Atlas" mailing list

A number of people have mentioned to me that they've preordered by book The Geek Atlas from Amazon. If you are interested in the book, but don't want to go for a preorder then I'm setting up a mailing list so that you can hear from me with further details as they become available.

To get on the mailing list send mail to



and I will mail you with news about its availability.

Tuesday, November 25, 2008

Writing my own bio

I came across the Amazon.com entry for my forthcoming book The Geek Atlas: 128 Places Where Science and Technology Come Alive and it contains an ugly bio for me. I asked the publisher to change it to which they said... please write a new bio for yourself.

Ugh. It's hard enough writing a book without coming up with something about yourself. Finally, I submitted:

John Graham-Cumming is a wandering computer programmer who has lived in the UK, California, New York and France. Along the way he's worked for a succession of technology start-ups, created the award-winning open source POPFile email program and written articles for publications such as The Guardian newspaper, Dr Dobbs, and Linux Magazine. His previous effort writing a book was the obscure and self-published computer manual 'GNU Make Unleashed' which saturated its target market of 100 readers in a matter of just months.

He has a doctorate in computer security but has forgotten everything he wrote in his thesis and is now deeply suspicious of people who insist on being called Dr., but he doesn't mind if you refer to him as a geek. He is the proud owner of a three-letter domain name where he hosts his web site: http://www.jgc.org/.

Given that I now realize that people get to write their own bios on the back of books it really makes me wonder about some of the pieces of outright puffery that authors come up with.

One author I know describes himself as a 'world-renowned researcher' on a topic that he appears to know almost nothing about.

Wednesday, November 19, 2008

Testing book titles using Google AdWords

My 'travel book for nerds' book, The Geek Atlas: 128 Places Where Science and Technology Come Alive, will be published in April 2009 by O'Reilly. As part of the process of writing the book I had to come up with a title. I had three titles that I liked: A Voyaging Mind, A Mind Forever Voyaging and A World of Discovery.

Ultimately, O'Reilly came up with the current title after doing their own market research, but before that I wanted to figure out which of the three titles would work best.

To do that I bought ads on Google AdWords that were relevant to the book (such as when people search for 'science museum') and set up three ads that would appear randomly. The ads all had the same text except for the main title which was one of the three possible book titles.



I let the campaign run for 30 days and then analyzed the results to see which one had the greatest clickthrough rate. There was a clear winner: A Voyaging Mind.



And for a long time A Voyaging Mind was going to be the book's title.

It seems to me that Google AdWords could readily be used for other such experiments: it's cheap, it's simple to target your experiment based on keywords so that you can choose the type of people exposed to the experiment and by setting up random display of a set of ads you can try out variations of an idea easily.

Obviously book titles are just one possibility. What other things could be tested using Google AdWords?

Friday, November 07, 2008

I am hiring in London

Are you a really good web-based UI developer? You've worked on rich Internet applications using technologies like AJAX, Flex, Flash or AIR?

Do you want to work in a small, venture-backed start up that's building a rich web-based user interface used to navigate terabytes of data?

You are based in London?

Email me.

Thursday, October 09, 2008

Phishers are using economic problems to catch the unwary

It's hardly surprising, at least to anyone who's spent time looking at phishing scams, but the recent economic turmoil has led phishers to get creative. Here's an example email that preys on the unwary by exploiting the Wachovia/Citibank merger.



Once you visit the site you are asked to download an executable (which actually starts automatically downloading via an automatic refresh after 15 seconds.



The executable contains a nasty piece of work: Mal-EncPk/BU.



I didn't go further and actually unpack the executable to find out what kind of nastiness, but there's plenty of it to do around.

Thursday, October 02, 2008

Introducing The Equationater

I had an itch that needed scratching... I needed to render equations on the web and I didn't want to rely on something like MathML because of poor browser coverage.

So, I created The Equationater. Type in an equation in LaTeX format and it is instantly turned into a PNG file that you can download or link to.

Here's a quick example:



Note. The Equationater is an equal opportunity renderer... it doesn't discriminate against inequalities.

Saturday, September 27, 2008

Monday, September 15, 2008

Dear Nature

Dear Nature,

I'm doing some historical research and needed to read Chadwick's short paper called Possible Existence of a Neutron where Chadwick posited the existence of the neutron from the radiation emitted by beryllium when an alpha particle hits it. This paper was written in 1932... 76 years ago.

So I go to your web site and you have it available from the archive. That's great! But you want to sell it to me. And you want to sell it to me for $32. How do you justify selling a PDF of a 76 year old paper that contains just over 700 words for $32?

As a point of comparison the neutron was reported in the New York Times in 1932. I was able to buy a copy of that article (all 872 words) for $3.95.

But the New York Times is hardly a journal of record for scientists; Nature is. Why are your archives not either free or open for a reasonable fee?

Chadwick won the Nobel Prize in 1935 for this discovery and his lecture on the subject is available for free online. But you still insist on $32 (almost a nickel a word) for the original paper.

PS. You need to fix your web site. It states the price for this article as:



Perhaps spend part of the $32 on that.

PPS. The entire text of Chadwick's article is available on line. Here, here and here. So what does $32 get me? Oh, right, the Nature logo on a PDF.

Friday, September 12, 2008

The Ultimate Nerd Honeymoon

There's been a major gap in this blog because I'm in the midst of writing a book for O'Reilly. As part of the research on the book I came across the ultimate nerd honeymoon.

In 1812, Sir Humphry Davy, the British chemist and inventor who is best remembered today for Davy Lamp used in mines, married.

In October 1813 Davy and his wife set off on a honeymoon across Europe. First stop was Paris to pick up a medal from Napoleon. But Davy needed a valet to help out, so he took Michael Faraday with him. That way Davy and Faraday could perform experiments along the way.

While in Paris they got together with Joseph Louis Gay-Lussac (of Gay-Lussac's Law) and showed that iodine was an element. And André-Marie Ampère stopped by for a chat.

Off they went to Italy to hang out with Alessandro Volta and also did an experiment setting fire to a diamond using the sun's rays and demonstrated that a diamond is made of carbon.

The honeymoon lasted 18 months.

Sunday, July 20, 2008

Switch off the light when you are not in the room, John!

The machinations of governments never cease to amaze me. This week I read a stunning article---stunning to anyone whose mother told them to switch off lights when they are not in the room---about UK government plans to green its IT.

The article, entitled Whitehall to become carbon neutral with aid of smart PCs, contains statements to make the eyes of any regular computer user water:

The proposals, including desktop computers that switch themselves off if they are inactive for too long, are aimed at making energy consumption from all of Whitehall's information and communication technology carbon neutral by 2012.

and

"Turning off every desktop PC in central government for the 16 hours that fall outside the standard working day could save up to 117,500 tonnes of CO2 per year," a Cabinet Office briefing document says.

Wait a minute. So when a central government civil servant goes home he just gets up and walks away from their computer leaving it on for the next 16 hours. Wow. The actual briefing document has more information on the radical proposal: Greening Government ICT (BTW If you have a hard time reading that link it's because the UK government decided that a suitable file extension for a PDF is .ashx).

There are a number of things that are pretty amazing about this:

1. The government's own guidelines for consumers (see here) start with Turn off your appliances – don’t leave them on standby.

2. Computers have had this new found 'smart PC' ability for 16 years. Intel's nice document on the history of power management lays it out pretty clearly. The first power management functionality was defined 1992 (called APM 1.0) and introduced by Microsoft in Windows 3.1. The really advanced version of power management was introduced in 1997 (ACPI 1.0). So the UK government has had between 11 and 16 years to make their machines shut themselves off.

3. The document also proposes reducing the use of screensavers. There's nothing wrong with screensavers if you are doing power management in the first place. You can have a few minutes of screensaver and then the power management kicks in and shuts off the display, or shuts down the machine.

4. Even if you weren't awake for the last 16 years of power management innovation you could have just followed your mother's advice and turned the computer off when you went home. How hard is that?

It really only leaves me with one possible explanation: civil servants don't know how to turn their PCs on, so they have to be left on all night.

The article also states a typical government solution to the problem:

A government source told the Guardian that a centralised system would switch off computers detected as inactive.

Huh? I can understand centralized management of settings, but why can't a computer just do its own detection of whether it's idle or not? (This is possibly just a journalist not understanding the details).

There's also a security implication to all this. If PCs are left on all night, are civil servants actually logging off?

Thursday, July 10, 2008

Photoshopped Iranian Missile Launch

Some time ago I wrote about my implementation of an algorithm for detecting copy/paste for doctoring photographs. Today the New York Times reports that a photograph of an Iranian missile launch appears to have been doctored.

Here's the picture:



I ran it through my code and it quickly shows up large chunks of smoke that appear to have been displaced:







Tweaking the knobs on the algorithm would probably show more smoke copying, but I don't have the free time.

Wednesday, July 09, 2008

Holiday ideas for the geek?

Summer is upon us (at least in the Northern Hemisphere) and I'm looking for holiday ideas. But not just any old holiday.

Where would you go do geek out on something mathematical, scientific, or technological? I'm not looking for the run of the mill (like the London Science Museum), something exceptional (like a trip to the site of the Trinity Test).

Have you been somewhere really nerdy? I'd like to know.

Sunday, June 29, 2008

Advice to a young programmer

I received a mail from an acquaintance who'd come to the realization that his 13-year-old wanted to be programmer, specifically a games programmer. Here's the advice I gave. Perhaps others have things to add:

1. I'm tempted to tell you that the right way to learn to be a programmer is to start with LISP, or the lambda calculus, or even denotational semantics but you can come back to those after a few years getting your feet wet.

2. Lots of programming involves logic (or at least thinking logically) so learning about and enjoying logic is probably a good foundation. You could start by learning about boolean algebra since it's simple and fun and the basis for a lot of what computers do.

3. Since games programmer involves a lot of physics, you should also learn about Newton's Three Laws and Universal Gravitation and play around with things like springs and pendulums.

4. Basic trigonmetry is important to the games programmer. It'll be handy to know about Pythagoras and the relationship with sin, cos and tan.

5. Above all, start with a programming language and a good book and commence hacking: try stuff out, make little simple programs (even if it's a program that prints out "Hello" on the screen, or a program that prints out "Hello" ten times, or asks you for the number of times to print "Hello" and then does it). Just write code, whatever takes your fancy.

6. A good starting language is Python. Get the O'Reilly book Learning Python.

7. Python is dynamic so you'll be able to make progress very quickly, but for games programming you are probably going to need to get a little closer to the machine. And for that you should learn C by reading the classic The C Programming Language.

8. As you learn more there are some great books that will expand on what you can do: read Programming Pearls and The Practice of Programming. Think about getting: Algorithms in C. Read Structure and Interpretation of Computer Programs.

9. Also: avoid debuggers, learn to unit test. Debuggers are useful in limited circumstances, most code can be debugged by using your head and a few 'print's. Unit tests will save your life as you go forward.

10. When you are ready, try to write a version of the first ever computer game: Spacewar!

...

11. When your first company goes public think of me; I'll be an old man and probably won't have saved enough for retirement.

Friday, June 20, 2008

RPNBuddy returns as RPNChat

A long, long time ago I created an IM buddy called RPNBuddy that implements a reverse polish notation calculator as a chat bot. It ran for a while on one of my machines but didn't get a whole lot of use.

A few months ago, Hans Nordhaug, an associate professor at Molde University College in Norway, wrote to ask what had happened to RPNBuddy. I offered him the source code under the General Public License and he readily accepted, improved it and has now relaunched the service as RPNChat.

Connect to the RPNChat buddy on AIM and you can use it as a calculator. Here's a session of me calculating Roger Bannister's average speed in mph when he ran the under 4 minute mile in 1954 (3 min 59.4 s).

Friday, June 06, 2008

The Colarie: A new way of measuring calorie intake

Recommended daily energy intake for a man is generally considered to be roughly 2,500 Calories (or kilocalories: 1 Calorie = 1,000 calories) and for a woman it's 2,000. The problem with those figures is that they are rather abstract. If you are trying to count your energy intake it would be much easier to deal with something smaller and easier to understand.

Hence my idea for the Colarie.

1 Colarie is the number of Calories in a single can of non-diet Coca Cola. It's easy to appreciate that a single can of Coke isn't very good for you and so comparing a food stuff to a can of Coke is an easy measure of whether you are eating something that's got too much fat or sugar in it.

The actual Calorie count for a Coke can varies by country. In France there are 139 Calories in a can, in the US there are 155. So I've settled on 147 as a good measure. So 1 Colarie = 147 Calories.

That means a man needs to consume the equivalent of 17 cans of Coke per day; for a woman it's 13.5 cans of Coke per day. That isn't a recommended diet, however!

So next time you are faced with a snack bar, use the Colarie measure. Just the other day I was presented with a small biscuit to go with a cup of tea on a BA flight. Looking at the Calorie count it was around 230 Calories for this tiny biscuit. That's 1.5 Colaries!

I didn't eat it.

Thursday, June 05, 2008

GNU Make Unleashed release

For 4 years I've written the Ask Mr Make column over at CM Crossroads (and I continue to write it). Since there's been great interest in the column, I've put together all 4 years of columns plus additional unpublished material as a book and ebook.

All the material has been rechecked for accuracy, errata have been incorporated and the text re-edited. The result is a 230 page book covering everything from basics of GNU Make to advanced topics like eliminating recursive make, doing arithmetic in GNU Make or dealing with spaces in file names.





The book contains 43 separate articles about GNU Make, plus a complete reference to the GNU Make Standard Library.

You can buy a copy in either form here.

A big thank you to everyone who's commented, emailed, or made suggestions on my GNU Make articles over the years.

Monday, May 26, 2008

POPFile v1.0.1 released plus a glimpse of the future

POPFile v1.0.1 was released today; this is the first ever POPFile release that I didn't do. POPFile is now being managed by a core team of developers: Manni Heumann (in Germany), Brian Smith (in the UK), me (in France), Joseph Connors (in the US) and Naoki Iimura (in Japan). A truly international effort. The actual release binaries were built by Brian Smith who, for a long time, has been the installer guru.

This release contains minor feature improvements and a number of bug fixes. Some of the bugs fixes were for annoying bugs that showed up only occasionally: that makes it a worthwhile upgrade.

Since I pulled back from being involved in every detail of POPFile's evolution the core team has been liberated to work on the project. v1.0.1 is their first release, and it is minor, but much greater things are coming:

1. A native Mac installer

2. A SOHO version of POPFile. Some time ago I did most, but not all, of the work to make a multi-user version of POPFile. That work is being completed by the core team and will allow a single POPFile installation to be shared by multiple users.

Thank you to the POPFile Core Team for this great start to a new chapter in POPFile history.

Monday, May 19, 2008

A post (anti-spam-) retirement note

One of the anti-spam companies I was/am involved with, MailChannels, made an interesting announcement recently about a commercial offering for SpamAssassin. What makes the announcement interesting to me is that Justin Mason (who wrote SpamAssassin) is also an advisor to MailChannels.

The program, Traffic Control 3 for SpamAssassin, is a free download and for sites that process less than 10,000 messages per day there's no charge at all (and no need to go and get a license from MailChannels).

Basically, the new product acts as a front-end to SpamAssassin traffic shaping incoming messages so that load is taken off SpamAssassin and the mail server.

Saturday, May 17, 2008

Breaking the Fermilab Code

A story appeared on Slashdot about a mysterious fax received at Fermilab written in an unknown code. The full story is here. I looked at it and immediately noticed a few things:

1. The first part looked like ternary (base 3) with digits 1 (|), 2(||) and 3(|||).

2. The last part looked like binary with digits 1(|) and 2(||)

3. The middle bit looked like either a weird substitution code, or I wondered if it might be machine code.

4. In the last part the digit 2 (||) never occurs more than once, perhaps it was actually a separator and the last part is not binary.

The first step was to convert the bars into numbers. Here's a copy of my marked up print out:



The first part has the numbers (or at least I thought):

323233331112132
333231322123312
111331132312233
333212123213113
311333313331111
211333323232211
232313331121231
33231312

Noticing this had 113 digits (which is a prime number) I went off on a wild goose chase around primes, and then around the interpretation of this number in hexadecimal as a string in ASCII, Unicode or binary... waste of time.

Then I started thinking about ternary again and wrote down the largest ternary numbers that can be expressed with 1, 2, 3, ... digits:

23 = 210
223 = 810
2223 = 2610
22223 = 8010

One of those stood out: with three digits the maximum number is 26 and there are 26 letters in the alphabet! Then the only question was was how to map the three digits used in the code (1, 2, 3) to the three ternary digits (0, 1, 2).

To simplify things I wrote a small Perl program that tries out all the possible mappings and outputs the ternary interpreted as a string (with 001 = A, etc.):

use strict;
use warnings;

my $top = $ARGV[0];

$top =~ tr/321/abc/;

my @chunks;

while ( $top =~ s/^([abc]{3})// ) {
push @chunks, $1;
}

my @digits = ( '0', '1', '2' );

foreach my $d0 (@digits) {
foreach my $d1 (grep {!/$d0/} @digits) {
foreach my $d2 (grep {!/[$d0$d1]/} @digits) {
print "($d0$d1$d2) ";
foreach my $c (@chunks) {
my $v = 0;
my $m = 1;
foreach my $d (reverse split( //, $c )) {
$d =~ s/a/$d0/;
$d =~ s/b/$d1/;
$d =~ s/c/$d2/;
$v += $d * $m;
$m *= 3;
}
print chr( 64 + $v );
}
print "\n";
}
}
}

With my initial interpretation of the top part of the coded message I got the following output:

(012) [email protected]@[email protected]@CJQJFBWKAF
(021) [email protected]@[email protected]@FTVTCAPSBC
(102) JDNXUMEISOZNUODMFSGYQMPNZHMJCHCPNTELP
(120) [email protected]@RMPWRWJLFUNJ
(201) THYLOZGRKUMYOUHZCKENVZWYMDZTFDFWYJGXW
(210) [email protected]@IZWPIPTXCOYT

A ha! The 021 block (which corresponds to the mapping 3 -> 0, 2 -> 2, 1 -> 1) seems to have a partial message: [email protected]@WOULD and then it's garbage. Going back to the original message I realized that 113 is not divisible by three and that I'd either missed a symbol, or had two too many.

After much fiddling around I discovered that the correct interpretation of the top block is that two of the threes are wrapped from one line to another (there appears to me some indentation in the message that indicates this, take a look at the original, but this could be just random).

323 233 331 112 132
333 231 322 123 312
111 331 132 312 233
333 212 123 213 113
311 333 313 331 113
113 333 232 322 133
231 333 112 123 133
231 312

Rerunning my Perl program output the full message:

(012) [email protected]@[email protected]@[email protected]
(021) [email protected]@[email protected]@[email protected]
(102) JDNXUMEISOZNUODMFSGYQMPNYYMCIVEMXSVEO
(120) [email protected]
(201) THYLOZGRKUMYOUHZCKENVZWYNNZFRQGZLKQGU
(210) [email protected]

So much for the first part. The second part took me off into Z-80, 6502 and 6809 machine code wondering if it was a program and then nowhere. I still don't understand what this part is trying to say.

The third part looked initially like binary but on closer examination I decided that the 2s (||) were actually separators and the message should be interpreted as number separated by 2s by counting the 1s (|). That yields:

31211112111312
32213123123331
12213111332312
23333333233123
12313123332311
33223232312312
112

(Once again there was a wrapping 'problem' in the message where a run of 8 |s was actually 3 |s then 1 || and 3 more |s.) Using the little Perl program reveals:

(012) [email protected]@[email protected]
(021) [email protected]@[email protected]
(102) OZTYSBOOMXGZLODMLNEEOMEVACOOX
(120) [email protected]@NKVMNLUUKMUDYWKKB
(201) UMJNKAUUZLEMXUHZXYGGUZGQBFUUL
(210) [email protected]@YSQZYXOOSZOHNPSSA

So, the same mapping between digits is used.

That leaves some final questions:

1. Who is Frank Shoemaker?
2. Why is base spelt incorrectly?
3. Is the extra S in BASSE a reference to the middle section where three symbols start with S.
4. If #3 is correct, then those three symbols could be intepreted as FC16 which is 252. Could this be the employee number of the author?
5. Why is the letter A missing from the middle section when all the other hexadecimal digits are there?

Thursday, May 15, 2008

Which countries have the most beautiful women? (My deeply flawed analysis)

So, I happened upon the Wikipedia page about the Miss World pageant and noticed that it had a list of winners by country. For example, India has won Miss World 5 times. But, of course, India has a very large population so you'd expect it to be able to churn out a few beauties. So, to get a better idea here is a population adjusted list of countries that have won Miss World:



































CountryWinsPop.Wins/Pop.Normalized
Bermuda1661630.0000151141876879827100.00%
Iceland33162520.0000094861060167208462.76%
Grenada11100000.0000090909090909090960.15%
Guam11734560.0000057651508163453638.14%
Jamaica326510000.0000011316484345537.49%
Trinidad and Tobago113050000.0000007662835249042155.07%
Sweden391829270.0000003266932210176562.16%
Puerto Rico139942590.0000002503593282258361.66%
Austria283164870.0000002404861571959411.59%
Ireland143390000.0000002304678497349621.52%
Finland153082080.0000001883874934817931.25%
Venezuela5281998220.0000001773060837050671.17%
Israel172820000.0000001373249107388080.91%
Netherlands2164085570.0000001218876224155480.81%
Dominican Republic197600000.0000001024590163934430.68%
Czech Republic1103811300.00000009632862703771170.64%
Australia2212900000.00000009394081728511040.62%
Greece1112167080.00000008915271753530540.59%
Peru2286747570.00000006974775758343830.46%
UK4604873000.00000006612958422677160.44%
Argentina2403019270.00000004962541865553970.33%
South Africa2437000000.0000000457665903890160.30%
Poland1385182410.00000002596172551077810.17%
France1644731400.00000001551033500152160.10%
Turkey1705862560.00000001416706391113870.09%
Egypt1803350360.00000001244786894724240.08%
Germany1822100000.00000001216397031991240.08%
Russia11420088380.000000007041815242513290.05%
Nigeria11480000000.000000006756756756756760.04%
US23040720000.000000006577389565629190.04%
Brazil11867576080.000000005354534204571740.04%
India511324460000.000000004415221564648560.03%
China113218518880.0000000007565144091241790.01%

So, far and away, the top three are Bermuda, Iceland and Grenada. Given that Bermuda is the winner, and a tax-haven, and has a sub-tropical climate... Hamilton here I come!

Thursday, May 01, 2008

The Spammers' Compendium finds a new home

Shortly after I announced that I was getting out of anti-spam the folks at Virus Bulletin contacted me about taking over The Spammers' Compendium. I was delighted.

Today the transfer is complete and the new home is here. It will be maintained and updated by Virus Bulletin. Please send submissions to them.

Tuesday, April 22, 2008

Bookmark-based registration

Recently, I've been learning Ruby on Rails and I can never learn anything unless I build something with it. I also recently read Programming Collective Intelligence and had a desire to use some of those algorithms too.

I'll post more about the actual web site I created another time; currently it's in alpha form running on Heroku. The web site is used for naming babies, the initial alpha-release is the girls' names only site EmilyOrEmma? which uses "Hot or Not" style voting and manages to incorporate in one page the Levenshtein distance, Metaphone algorithm and both item-based and user-based filtering.



But my biggest bug bear with baby naming web sites is the need to create an account. You can browse baby names all you want, but as soon as you want to do something like add a name to a list of favourites you are forced into registration. Or you don't have to register but anything you do is ephemeral.

For my site, I came up with a better solution: bookmark-based registration. If you visit the site you'll see at the bottom of every page a link to bookmark. This link is unique to you. It contains your user id and a hash which I used to prevent forgery.



Bookmark this link and you can return to your recommendations, saved names, etc. any time.

Tuesday, April 08, 2008

Interesting real-world Apache Problem

I'm working with a large client who has a number of web servers behind a load balancer. This morning one Apache 1.3 had failed to come up on one of them. The client sends a SIGUSR1 to each Apache once an hour to force a graceful reload. This particular machine had operated correctly restarting Apache once per hour for 54 hours (since a recent reboot of the machine) and then died.

A quick look in the Apache error.log file showed the following:

module "mod_jk.c" could not be loaded, because the dynamic module limit was reached. Please increase DYNAMIC_MODULE_LIMIT and recompile.

Naturally I went looking for a problem with mod_jk which was the wrong place to look. Scrolling through the log file I noticed that every time Apache restarted we'd get the error:

Cannot remove module mod_include.c: not found in module list

This was where the real problem lay. A quick httpd -l showed that mod_include was compiled into the client's Apache and looking in the httpd.conf revealed that mod_include was also being loaded with LoadModule:

LoadModule includes_module modules/mod_include.so

When a module is both statically linked into Apache and dynamically loaded you run into a nasty problem: Apache doesn't complain when you start, but it will fail to unload the double loaded module on exit. So for every SIGUSR1 a single slot of the DYNAMIC_MODULE_LIMIT was used up. The default DYNAMIC_MODULE_LIMIT is 64 and with 10 real dynamic modules and a boot once per hour it took 54 hours to consume every slot in the module limit.

Removing the errorneous LoadModule fixed the problem.

Friday, April 04, 2008

Digg 3 Million

A quick update on my previous estimate of Digg users shows that, as predicted, Digg passed the 3 million user mark during March, 2008.



Comparing this estimated data and data from the Digg API shows that around 20% of the 3m accounts are not active. I speculated before that these accounts had been banned for spamming or other activities (that's around 600,000 bad accounts).

Growth appears to be the same as before adding around 150,000 accounts per month.

Thursday, April 03, 2008

Juxtaposition

It's 3am and there's a crisis somewhere in the world

If you follow US politics then you'll know that Hillary Clinton has a couple of ads that start with a 3am phone call to the Whitehouse. The first ad was intended as a slam against Barack Obama implying that he didn't have the experience to deal with such a crisis. The second is going up against John McCain claiming he doesn't want to do anything about the housing finance problem in the US.

You know if it's 3am and there's a crisis in the world there's only one place and only one man to call.

CTU and Jack Bauer.

First of all, he's already up. 3am is nothing to Jack. Hillary and McCain both look like they could use the sleep and Obama looks like he gets his beauty sleep every night. So, Jack's ready to go before any of them.

As much as you might think Obama is David Palmer (safe pair of hands), he's more like a Wayne Palmer (a slick little fighter) and you know what that means: gets blown up within five minutes of being president and then pops a brain vein and the evil VP has to take over. The only good thing to say about Obama is that he would call Jack.

Now, McCain might look like a Bauer type with his military background and heroic time spent as a PoW. But here's the difference. McCain was 5 years as a PoW, no one came to get him out so he can't be that valuable. Also, if Jack had been held hostage in North Vietnam for 5 years there wouldn't be a North Vietnam now. Because you can bet the life of the next random CTU cast member the Jack would have (a) escaped and (b) annihilated everyone involved.

So, that leaves Hillary. She can't tell sniper fire from a little girl with flowers. In fact she reminds me more and more of the evil Vice Presidents that pop up in every 24 trying to take power from the real president. Hell, she's probably even got backing from Phillip Bauer.

Only one word of warning: make sure it's a man that calls Jack. If it's a woman he's bound to have been involved with her, she'll turn out to be a double agent or her father will be evil, and Jack'll be distracted.

If it has to be a woman, make sure it's Chloe.

Monday, March 31, 2008

Multi-route (email and phone) self-aware phishing

Today, I received the following email:

This communication was sent to safeguard your account against any
unauthorized activity.

Max Federal Credit Union is aware of new phishing e-mails
that are circulating. These e-mails request consumers to click
a link due to a compromise of a credit card account.

You should not respond to this message.

For your security we have deactivate your card.

How to activate your card

Call +1 (800)-xxx-9629

Our automated system allows you to quickly activate your card

Card activation will take approximately one minute to complete.


Of course, I don't have an account with Max Federal Credit Union and this is obviously a phish. Notice that the English is quite right:

"For your security we have deactivate your card." and "You should not respond to this message." doesn't make sense in context.

What's more interesting is that the message itself warns you about phishing emails and asks you to call an 800 number.

If you call the 800 number an electronic voice reminds you again to never give your PIN, password or SSN in email and then proceeds to ask you for the card number, PIN, expiry date and CVV2. The assumption is that you've been warned twice not to do something in email, so it's OK by phone.

It's painful to see the phisher use the existence of phishing as a way to phish.

Names: Boys vs. Girls

Using data from the 1990 US Census I was amazed to discover that 90% of the US male population has one of 1,219 first names, but 90% of the female population has one of 4,275. There are 3.5x as many female first names as male first names.

The top 10 male first names are: James, John, Robert, Michael, William, David, Richard, Charles, Joseph and Thomas (which account for 23.2% of the male population; 50% of the population have one of only 60 names).

The top 10 female first names are: Mary, Patricia, Linda, Barbara, Elizabeth, Maria, Susan, Margaret and Dorothy (which account for 10.7% of the female population; 50% of the population have on of 139 names).

You can also see that all the variety in names happens between the 80% and 90%. For males 80% of the population is covered by 27% of the names; for females 80% is covered by 19% of the names).



The large numbers of female names appears to be because there are lots of variants of female names compared to male names. A quick run through calculating the Levenshtein distance between names and selecting the 10 closest for each gives an average distance of:

Male: 2.62
Female: 2.01

So female names are more 'similar' than male names, hence the variety created by all these variants.

The other thing we can extract from this data is the prevalence of names beginning with certain letters and weight adjust based on the occurrence of each name.





Things are much more polarized when you look at trailing letters (for example, the trailing letter A is an almost sure sign that it's a woman; the opposite is true of D):





So combining the two it's possible to give a 'maleness' score (the blue part) to each final letter:

BOUTS: The Complete Song Parodies

Back when my site UseTheSource was a "blog" (this was late 90s/early naughties) I wrote a number of song parodies. Someone emailed me and asked where they were.

So, here are my complete song parodies from back in 2001:

March 7, 2001: "Candle in the Wind"

Yahoo! was in trouble with banner ad sales falling, profits disappearing and the then CEO, Tim Koogle, was off to spend more time with his family.

Goodbye Tim Koogle
Though I never Yahoo! at all
You had the faith, to sell banners
While those around you failed
They crashed into the deadpool
And they whispered into your brain
You need to begin to charge
For things you give for free

And it seems to me you lived your life
Like a candle in the wind
Never knowing what to add next
To your list of links
And I would have liked to have told you
But I was just a geek
Your web brand burned out long before
Your stock price ever did

Jerry Yang was tough
The toughest boss you ever had
Softbank created a superstar
And pain was the price you paid
As the whole web died
Oh CNET still hounded you
All they had to say
Was that Google was the site to use

[Repeat chorus]

Goodbye Tim Koogle
From the young man on the DSL link
Who sees you as something more successful
More than just our long lost CEO

[Repeat chorus]

March 22, 2001: "Don't Cry for Me Argentina"

Steve Jobs was back at Apple, the blue iMac was out, Microsoft was in big anti-trust trouble and had just invested $100m in Apple, Steve had bought out NeXT. But the future wasn't yet assured:

This won't be easy,
you'll think it's strange.
When we try to explain what we need
that we now need your help
after all that we've said.

You won't believe us.
All you'll see is Apple you once knew,
although we've crashed down in the dumps
begging for Microsoft cash.

It didn't have to happen.
We should have won
Better software and patents than Bill
Looking down on Windows,
staying far from Sun.
So we chose NeXT.
Running aground, trying computers in blue.
But nothing revived us at all.
You never expected it too.

Don't cry for us, William H. Gates.
The truth is we're dead without you.
We need your dollars
We need Mac Office
You need a rival, for your survival.

And as for Fortune,
and as for Time,
We never invited them in
though it appeared, to the world,
they were all Steve desired.
Even Adobe,
they're making solutions for Windows right now
The answer was here all the time.
We need you, and hope you need us.

[chorus]

Have we begged too much?
There's nothing more we can think of to say to you.
But all you have to do,
is look at us to know,
we're through without you.

April 1, 2001: "I Just Called To Say I Love You"

In the midst of the crash, .coms were going out of business like crazy:

No IPO to celebrate
No friends and family stocks and shares to give away
No big opening
No first day ping
In fact here's just another ordinary day

No Aeron chair
No onsite chef
No working Saturday until the site is done
But what it is, is something blue
Made up of these few words that I must say to you

We just failed to get more funding
We just failed to keep our doors open
We just failed to get more funding
And we need it just to avoid bankruptcy

No free massage
No free soda
No caffeine trip to keep us working every night
No dry cleaning
No stock option
Not even time for us to pack our things and leave

No beanbag room
No Maui trip
No giving thanks to all that NASDAQ did for us
But what it is, though old so new
Grab what you can before your jobs right here are through

[chorus]

[chorus]

April 6, 2001: "Uptown Girl"

Ah, to be in love with a marketing .com girl:

.com girl
She's been living in her .com world
I bet she never had a software guy
I bet her mama never told her why
I'm gonna try for a .com girl
She's been living in her wide web world
As long as anyone in marketing can
And now she's looking for a comp. sci. man
That's what I am

And when she knows what
She wants from her time
And when she wakes up
And makes up her mind

She'll see I'm not a nerd
Just because
I'm in love with a .com girl
You know I've seen her in her online world
She's getting tired of her high tech toys
And all her presents from her VC boys
She's got a choice
.com girl
You know I can't afford to buy her a Porsche
But maybe someday when my stock cashes in
She'll understand what kind of guy I've been
And then I'll win

And when she's walking
She's using her Nokia
And when she's talking
She'll say that she's mine

She'll say I'm not a nerd
Just because
I'm in love
With a .com girl
She's been living in her latte world
As long as anyone in marketing can
And now she's looking for a comp. sci. man
That's what I am
.com girl
She's my .com girl
You know I'm in love
With a .com girl

April 13, 2001: "Gangsta's Paradise"

Linus Torvalds was the flavor of the day as one of the thorn's in Microsoft's side:

As I drive through the Valley of the Silicon Dream
I take a look at my life and realize there's nothing left
'Cause I've been coding and debuggin' so long
That even my manager thinks that my mind has gone
But I ain't never crossed a man that didn't deserve it
Linus treated like a punk, ya know that's unheard of
Ya better watch how ya postin'
And what ya codin'
Or you Dr Tanenbaum'll be lined in chalk
I really hate Minix and FreeBSD
As they croak, I see myself in the pistol smoke
Fool, I'm the kinda hacker script kiddies wanna be like
On the Net in the night, writin' layers of the core code

CHORUS:
Been spending most our lives living in a Windows paradise
Been spending most our lives living in a Windows paradise
Keep spending most our lives living in Bill Gates' paradise
Keep spending most our lives living in Bill Gates' paradise

Look at the situation they got me facing
I can't live a normal life, I was raised on the PC
So I gotta be down with the kernel team
Too much crazy Usenet posting got me chasing dreams
I'm a educated fool with Posix on my mind
Speak Swedish in my home and English on the phone
I'm a loc'd out hacker, wrote my life story
And my homies is down so don't arouse my anger
Fool, death ain't nothin' but a heart beat away
I'm livin' life, do or die, what can I say?
I'm 28 now, but will I ever see 29?
The way things is going, I don't know

Tell me why are we so blind to see
That Microsoft's a monopoly

CHORUS
CHORUS

Tell me why are we so blind to see
That Microsoft's a monopoly

Tell me why are we so blind to see
That Microsoft's a monopoly

April 18, 2001: "Copacabana"

Carly Fiorina was fighting for her life as she tried to merge HP and Compaq with Walter Hewlett attempting a proxy fight to stop her in the name of the family:

Her name was Carly, she was a VP
With Lucent and AT&T and a degree from MIT
She went to HP and wowed the board room
And while she tried to be a star, sometimes went a bit too far
And then September 4, Compaq became the score
They were failing and needed each other
Leaning drunks galore!

At the HP, HP/Com-pa-q
The merger that upset the family
At the HP, HP/Com-pa-q
David and William were always the fashion
At the HP... they ran the show

His name was Walter, his dad was famous
He wasn't present for the board, but he wouldn't be ignored
And what she pro-posed, "Dad would've hated"
Then Walter went a bit too far, "Carly: time for au revoir!"
And then the insults flew and careers were smashed in two
There were ads and a lot of bankers, but just who screwed who?

At the HP, HP/Com-pa-q
The merger that upset the family
At the HP, HP/Com-pa-q
David and Walter are today the fashion
At the HP... they run the show

Her name is Carly, she was CEO
But that was 30 weeks ago, when she used to run show
Now she's a VC, but that's our Carly
Still in the suit she used to wear, new blonde highlights in her hair
She sits there so refined, and drinks to Walter's health
She lost her job and she lost the proxy, now she enjoy's her wealth!

At the HP, well just the HP
The toughest job belongs to Walter
At the HP, well just the HP
William and David were always the fashion
At the HP, don't buy the stock...


If you find these funny and can sing... feel free to set them to music and give me a laugh.

Saturday, March 29, 2008

More 11:11 mystical nuttery

Out of the blue I received an email about my post the other day about Benford's Law and 11:11:

every time i look at the clock the number add up to 11.
how does that get explained

OK. Well, it turns out that that's pretty simple to explain: the sum 11 is the most common sum you'll see on a clock. The following graph shows the count for each sum of digits. You'll see that for a 12 hour clock the peak is at 11 and for a 24 hour clock the peak is at 12 with 11 being a close second.



For a 12 hour clock the probability that the sum of digits will be 128/1440 (or about 9% of the time). For a 24 hour clock it's 124/1440 (or about 9% of the time). So it's unsurprising that 11 comes up a lot here.

Another area of 11 craziness is airline seating. This is probably because people get freaked out by flying and look for patterns. Suppose you sit in economy on a British Airways long haul flight. You'll be sitting in a 747, 767 or 777. You then take your seat number add up the digits and then add the letter on using its place in the alphabet (e.g. sitting in 14F then you have 1 + 4 + 6 = 11). Using the British Airways seat maps you can compute the value of for each seat in economy:



On a 767 11 is the most frequently occurring sum, on a 747 it's 10 (with 11 close behind) and on a 777 11 is just beaten out by 12.

Tuesday, March 25, 2008

"Retiring" from anti-spam

Today, I'm "retiring" from anti-spam work. Practically, that means the following:
  • No more updates to The Spammers' Compendium or Anti-spam Tool League Table pages. These remain on line, but are not being maintained.
  • I'm looking for a new leader for the POPFile project.
  • I'm no longer active on any anti-spam mailing lists.
  • I am leaving all anti-spam conference committees.
  • My anti-spam newsletter is no longer being published.
I will, however, be continuing with commercial anti-spam work where I have agreements currently in place with customers. No change to their support, terms or assistance.

The obvious question is why? For me, the interest just isn't there. The battle against spam continues but is now about trench warfare rather than creating new weapons. We'll continue to see innovation, but for any hacker it's the new, new thing that's important. For me, spam is yesterday's news. Watching companies squabble and refuse to cooperate, seeing a decline in quality at anti-spam conferences, and major companies essentially killing their consumer anti-spam means anti-spam just isn't where I want to be.

Of course, there are many really good people fighting spam out there. This post isn't meant to demean them.

Thank you to everyone who has supported what I've done over the last 7 years, and good luck!

Saturday, March 22, 2008

Building a temperature probe for the OLPC XO-1 laptop

I bought an OLPC XO-1 laptop through the G1G1 program and was intrigued to discover the Measure activity.

The measure activity uses the internal audio system to measure a value input on the microphone socket. With nothing connected this application reads the value of the internal microphone and displays a waveform. You can have fun just by whistling, speaking or singing with Measure running.

But since you can measure a voltage input into the microphone socket, it's possible to build sensors and connect them to th OLPC XO-1. On the Measure web site they mention building a simple temperature sensor using an LM35 temperature sensor that looks like this:

The LM35 can measure a temperature between 0 and 155 Celsius just by hooking it up to a 5v supply. It outputs 10mv per degree so a temperature of 20 Celsius corresponds to 0.200v.

Since the OLPC XO-1 has a USB port it's possible to get 5v from the laptop by hacking a USB connector, and connect 5v to the LM35 and then take the signal coming from the LM35 (the middle pin) and connect it to the microphone socket.

I did this by building two parts: a generic adapter which gives me 5v and a signal line out of a standard stereo 3.5mm jack:

The stereo jack is wired up so that the tip is +5v, the base is Gnd and the middle is the signal going to the microphone socket. The USB plug has only two wires connected (for +5v and Gnd), and the jack going to the microphone socket (which is mono) has the connected to the middle of the stereo jack, and the base is Gnd. All the grounds are joined together.

When plugged into the OLPC XO-1 it creates a generic connector for any other projects I might work on:

For the temperature sensor I simply connected the LM35 to a stereo socket with the correct connections to match up with the stereo jack plug. Then I created a probe with an old plastic pen and some waterproofing compound (so that I can do things like shove the probe in a cup of coffee without wetting the contacts on the LM35). Here it is:

Connect the two together and run the standard Measure activity and you can start to look at the output of the sensor and hence the temperature.

But there's a problem. The microphone input can only handle voltages in the range 0.3v to 1.9v (and my measurements of my OLPC XO-1 show this range to actually be 0.4v to 1.9v). So that means as is the probe can be used to measure temperatures in the range 40 Celsius to 155 Celsius. That low end is a bit high for the sorts of experimentation you can do at home (e.g. measure the temperature in the fridge, or a glass of cold water, or even the temperature inside your mouth).

So we need to scale the voltages coming from the sensor to fit better into the range that's readable by the laptop. The standard way to do that is with an operational amplifier which is used to add two voltages together: the voltage coming from the sensor and a reference voltage. Doing this will move the voltage up.

For that I used the LM1458 which in a single 8 pin package contains a pair of operational amplifiers.

Here's the circuit diagram:

The circuit has three parts: a voltage divider, a summing amplifier and an inverting amplifier.

Voltage divider: the reference voltage is created by taking the 5v available from the USB port and passing it through resistors R8 and R9. The voltage at the middle point of these two resistors is determined by the standard formula for a voltage divider of 5v * R9/(R8 + R9) = 5v * 1 / ( 10 + 1 ) = 0.45v. In my actual circuit with 1% tolerance resistors the measured voltage was 0.41v.

Summing amplifier: the middle portion of the circuit takes the two inputs and adds them together (and because of the nature of the circuit inverts the summed value). So its output going into R7 is -ve the sum of the reference voltage and the sensor voltage.

Inverting amplifier: the final part just inverts the voltage so that the output is +ve and in the range that the OLPC XO-1 can read.

One complexity is that this circuit requires +9v, Gnd and -9v to operate. I obtain that with a pair of 9v batteries linked together giving Gnd where the two are connected. Here's the final circuit with appropriate connectors to hook up to my existing probe and laptop adapter:



And here's what it looks like when it's all hooked together:

Now, this wouldn't be any fun without a bit of software and since the Measure activity can only display the voltage being presented (which is now a mixture of the sensor voltage and the reference voltage) what's needed as a new activity.

I found the developer documentation to be very hard to follow and I ended up hacking the existing Measure activity and renaming it Temperature.

The critical code is in the file drawWaveform.py where it reads self.avg (the value coming from the microphone input via the ADC) and scale it for display. I measured voltages coming from my probe for a couple of known temperatures and worked out a scale factor (The +32768 is because the self.avg ranges from -32768 to 32767):

layout.set_text("Temperature: %.1f C" % (0.00221833*(self.avg+32768)) )

Here's a screenshot of Temperature running on the laptop and measuring the ambient temperature in my office:

You can download my Temperature activity using the browser on your OLPC XO-1 to install it.

Thursday, March 20, 2008

Sleeping with the enemy

I loaded the dish washer:



My SO loaded the dish washer:



Who needs help?

Tuesday, March 11, 2008

First assume all new email is useless

When I download email none of it goes in my Inbox. In fact, I don't have an Inbox. I work on the assumption that all new email is useless.

Many reports tell us that between 80% and 90% of all email is spam, so for starters only 10% to 20% is at all likely to be useful. Then, if you account for being on mailing lists, being CC:ed needlessly and receiving automatic updates such as order confirmations from Amazon.com, you'll see that almost all email is useless. Only a tiny fraction of the mail you receive is useful. And by useful I mean requiring action.

I use Thunderbird and my email folder structure looks like this:



When email arrives it is automatically sorted using POPFile into the folders: Family, GNU Make, Misc, polymail, POPFile and Spam. These six folders are the categories of mail that I receive:

  • POPFile: Since I wrote POPFile I get lots of mail about it and I use this is a general box for other open source projects I work on and anything else about anti-spam
  • polymail: Anything to do with my commercial product polymail and my consulting business
  • GNU Make: Anything to do with GNU Make or the company, Electric Cloud, that I co-founded
  • Family: Anything from my family
  • Misc: Order confirmations, airline tickets, PayPal statements, etc.
  • Spam: spam

POPFile uses Naive Bayesian text classification to automatically sort my email (with just a point and click interface for training) and then six rules (which never need updating) move the incoming mail based on POPFile's classification to one of those folders.

Of course, POPFile can be used to sort mail in any way you choose: my categories are unlikely to be yours. You might use POPFile to sort Work from Home from Spam. At least one journalist I know uses POPFile to sort Interesting from Boring from Spam so that he only gets to read interesting press releases.

When I identify mail that does need action taken I move it to the ACTION folder (which is the closest I've got to an Inbox). Moving mail there is a snap because I use the QuickMove extension for Thunderbird and have ALT-number keys mapped to each folder: one key press and the message is moved into or out of ACTION.

To keep on top of things I publish the number of items in my ACTION folder on my web site. Here's a live view over the last 24 hours. Currently, 9 items need dealing with.



My rules for managing email:

  1. Assume that all new email is useless
  2. Automatically sort email into folders on delivery
  3. Take control of your Inbox: only you put email in it

Monday, March 10, 2008

Why Rails rules: continuous forward motion

Lately I've been playing with Ruby on Rails and I'm impressed. Not by the documentation (I was pulling my hair out trying to map my copy of the Rails book that deals with 1.x to Rails 2.0 installed on my machine). Not by the screencasts, or by DHH being arrogant.

I'm impressed by the fact that Rails keeps you (or at least me) in continuous forward motion.

Yesterday sitting in an airport I decided to learn Rails. I had the two books (one on Ruby, one on Rails) which I'd read before, but I'd never actually coded anything. I had an idea for an application that was CRUD worthy.

Tonight, after a total of 4 hours of programming I have a working application in Rails that allows me to track health care expenses (appointments, bills, insurance reimbursements, payments, ...). Zero knowledge to working application in four hours isn't meant to illustrate my genius, it illustrates that Rails/Ruby is easy to learn and that the combination of generators and scaffolding keep you moving.

I've noticed in the past when working on apps that I'll come up against a difficult bit and go work on something easier ("Oh, I don't want to come up with foo-bar algorithm right now, I'll go design the buttons"). And the easier things are lower value.

Rails keeps me going after the functionality because it puts in place most of the functionality and then lets me evolve it. My application looks horrible (I've wasted no time on the CSS or HTML), but the functionality is there.

Some sleep and a little design work and it'll look like something.

Anyone else like access to a free application for health care expense management?

Monday, March 03, 2008

To the idiotic spammer posting comment spam on this site

Since your name is two Chinese characters I'm going to address you as "Dude".

Dude,

Lately you've been posting comment spam on my blog for your World of Warcraft Gold. This is a little silly:

1. I'm fairly well known in anti-spam circles, did you really think I was going to let comment spam through on this site?

2. Comment moderation is turned on on this site. So your comment spam goes nowhere when I click the Discard button.

3. There has been a little some collateral damage from your World of Warcraft spamming. I accidentally killed two comments by Hypermechanic and I can't retrieve them. He/she wanted to say something useful about an old post:


You could do that like cameroid.com .
I guess in JAVA or .NET.

and

Cool I will hunt for it… This is a very sweet look app you have here. Even though you down play your role this is still brilliant.

Thank you for something new and useful.

Thursday, February 28, 2008

Any sufficiently simple explanation is indistinguishable from magic

Well, that's true if you are a fool.

Take for example the mystical belief that the number 11 or 11:11 is somehow significant. Uri Geller goes on about this on his web site. To quote from Geller's web site (and you'll find other similar thinking on many 11:11 web sites):

String theory is said to be the theory of everything. It is a way of describing every force and matter regardless of how large or small or weak or strong it is. There are a few eleven's that have been found in string theory.

I find this to be interesting since this theory is supposed to explain the universe! The first eleven that was noticed is that string theory has to have 11 parallel universes (discussed in the beginning of the "11.11" article) and without including these universes, the theory does not work.

The second is that Brian Greene has 11 letters in his name. For those of you who do not know, he is a physicist as well as the author of The Elegant Universe, which is a book explaining string theory. (His book was later made into a mini series that he hosted.) Another interesting find is that Isaac Newton (who's ideas kicked off string theory many years later) has 11 letters in his name as well as John Schwarz. Schwarz was one of the two men who worked out the anomalies in the theory. Plus, 1 person + 1 person = 2 people = equality.

Also, the two one's next to each other is 11. The two men had to find the same number (496) on both sides of the equation in order for the anomalies to be worked out, so the equation had to have equality! There were two matching sides to the equation as well because they ultimately got 496 on both sides. So, the 1 + 1 = 2 = equality applies for the equation as well.

I added a little bold type there because it amused me; pity that Mr Geller didn't look up the definition of equation before writing that line.

But key to this whole belief is that the number 11 keeps turning up at random. When I first read about this I looked up at the clock and it was 11:43. Whoa! Spooky!

But then I remembered Benford's Law. Benford's Law is essentially that in lots of real-life data the leading digit is 1 with a probability of about 30% (instead of the 10% you'd expect if the first digit was random from 0 through 9) and hence numbers beginning with 1 occur more often than numbers starting with any other digit.

A simple illustration is my clock experience. What's the probability that if you look at a clock at random that the first digit is a 1? Well it's more likely than any other number.

For a clock showing 12 hour time it cycles through: 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. A simple count will show you that the number 1 is the first digit for 8 out of the 24 hours and that all the other digits occur 2 times in 24 hours. So what's the probability that if I glance at a clock at random I'll see a 1 at the beginning? 8/24 or 1/3 of the time... which is Benford's Law.

Now, Benford's Law isn't restricted to time. It occurs all over the place (Wikipedia lists: electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants) and so if you walk through life looking at random numbers you'll see numbers starting with a 1 more often than any other number. In 1988 a mathematician named Ted Hill showed why this is the case for many real-world systems.

But, what about 11? I hear you ask. Well if the first digit is more likely to be 1 than any other than it's clear that you are more likely to see numbers in the range 10 through 19 more than other two digit numbers, but a more interesting offshoot of Benford's Law is explained here.

Essentially as you walk through the digits of a number you are more likely to see a 1 than another digit, but that effect diminishes the longer the number gets. The probability that the the second digit is a 1 is about 11% (instead of the expected 10%) and given that the probability that the first digit is a 1 is 30% you are bound to come across 11 more frequently than you'd expect (if numbers were random).

So, it's no surprise that we see lots of 11s, and hence there's a simple explanation for all those 11s. Either that or I've been missing the call of the 11:11 Spirit Guardians all these years:

These 11:11 Wake-Up Calls on your digital clocks, mobile phones, VCR’s and microwaves are the "trademark" prompts of a group of just 1,111 fun-loving Spirit Guardians, or Angels. Once they have your attention, they will use other digits, like 12:34, or 2:22 to remind you of their presence. Invisible to our eyes, they are very real.