Thursday, February 28, 2008

Any sufficiently simple explanation is indistinguishable from magic

Well, that's true if you are a fool.

Take for example the mystical belief that the number 11 or 11:11 is somehow significant. Uri Geller goes on about this on his web site. To quote from Geller's web site (and you'll find other similar thinking on many 11:11 web sites):

String theory is said to be the theory of everything. It is a way of describing every force and matter regardless of how large or small or weak or strong it is. There are a few eleven's that have been found in string theory.

I find this to be interesting since this theory is supposed to explain the universe! The first eleven that was noticed is that string theory has to have 11 parallel universes (discussed in the beginning of the "11.11" article) and without including these universes, the theory does not work.

The second is that Brian Greene has 11 letters in his name. For those of you who do not know, he is a physicist as well as the author of The Elegant Universe, which is a book explaining string theory. (His book was later made into a mini series that he hosted.) Another interesting find is that Isaac Newton (who's ideas kicked off string theory many years later) has 11 letters in his name as well as John Schwarz. Schwarz was one of the two men who worked out the anomalies in the theory. Plus, 1 person + 1 person = 2 people = equality.

Also, the two one's next to each other is 11. The two men had to find the same number (496) on both sides of the equation in order for the anomalies to be worked out, so the equation had to have equality! There were two matching sides to the equation as well because they ultimately got 496 on both sides. So, the 1 + 1 = 2 = equality applies for the equation as well.

I added a little bold type there because it amused me; pity that Mr Geller didn't look up the definition of equation before writing that line.

But key to this whole belief is that the number 11 keeps turning up at random. When I first read about this I looked up at the clock and it was 11:43. Whoa! Spooky!

But then I remembered Benford's Law. Benford's Law is essentially that in lots of real-life data the leading digit is 1 with a probability of about 30% (instead of the 10% you'd expect if the first digit was random from 0 through 9) and hence numbers beginning with 1 occur more often than numbers starting with any other digit.

A simple illustration is my clock experience. What's the probability that if you look at a clock at random that the first digit is a 1? Well it's more likely than any other number.

For a clock showing 12 hour time it cycles through: 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. A simple count will show you that the number 1 is the first digit for 8 out of the 24 hours and that all the other digits occur 2 times in 24 hours. So what's the probability that if I glance at a clock at random I'll see a 1 at the beginning? 8/24 or 1/3 of the time... which is Benford's Law.

Now, Benford's Law isn't restricted to time. It occurs all over the place (Wikipedia lists: electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants) and so if you walk through life looking at random numbers you'll see numbers starting with a 1 more often than any other number. In 1988 a mathematician named Ted Hill showed why this is the case for many real-world systems.

But, what about 11? I hear you ask. Well if the first digit is more likely to be 1 than any other than it's clear that you are more likely to see numbers in the range 10 through 19 more than other two digit numbers, but a more interesting offshoot of Benford's Law is explained here.

Essentially as you walk through the digits of a number you are more likely to see a 1 than another digit, but that effect diminishes the longer the number gets. The probability that the the second digit is a 1 is about 11% (instead of the expected 10%) and given that the probability that the first digit is a 1 is 30% you are bound to come across 11 more frequently than you'd expect (if numbers were random).

So, it's no surprise that we see lots of 11s, and hence there's a simple explanation for all those 11s. Either that or I've been missing the call of the 11:11 Spirit Guardians all these years:

These 11:11 Wake-Up Calls on your digital clocks, mobile phones, VCR’s and microwaves are the "trademark" prompts of a group of just 1,111 fun-loving Spirit Guardians, or Angels. Once they have your attention, they will use other digits, like 12:34, or 2:22 to remind you of their presence. Invisible to our eyes, they are very real.

Wednesday, February 27, 2008

Would they hide me?

This is pretty much exclusively a technical blog, but I was very struck by something legendary investor Warren Buffett said when answering the question "How do you define happiness and what about your life makes you most happy?":

I know a woman in her 80’s, a Polish Jew woman forced into a concentration camp with her family but not all of them came out. She says, “I am slow to make friends because when I look at people, I have one question in mind; would they hide me?” If you get to be my age, or younger for that matter, and have a lot of people that would hide you, then you can feel pretty good about how you’ve lived your life.

I know people on the Forbes 400 list whose children would not hide them. “He’s in the attic, he’s in the attic.” Some of them keep compensating by joining board seats or getting honorary degrees, but it doesn’t change the fact that no one will give a damn when they are gone. The most powerful force in the world is unconditional love. To hoard it is a terrible mistake in life. The more you try to give it away, the more you get it back.

Thursday, February 14, 2008

The sum of the first n odd numbers is always a square

I was staring at the checked pattern on the back of an airline seat the other day when I suddenly saw that the sum of the first n odd numbers is always a square. For example,

1 + 3 = 4
1 + 3 + 5 = 9
1 + 3 + 5 + 7 = 16

And, of course, it occurred to me that it would be nice to be able to prove it. There are lots of ways to do that. Firstly, this is just the sum of an arithmetic progression starting at a = 1 with a difference of d = 2. So the standard formula gives us:

sum_odd(n) = n(2a + (n-1)d)/2
= n(2 + (n-1)2)/2
= n(1 + n - 1)
= n^2

So, the sum of the first n odd numbers is n^2.

But using standard formulae is annoying, so how about trying a little induction.

sum_odd(1) = 1

sum_odd(n+1) = sum_odd(n) + (2n + 1)
= n^2 + 2n + 1
= (n+1)^2

But back to the airline seat. Here's what I saw (I added the numbering, Lufthansa isn't kind enough to do that for you :-):

The other thing I noticed was this:

You can view the square as the sum of two simpler progressions (the sum of the first n numbers and the sum of the first n-1 numbers):

1 + 3 + 5 + 7 =
1 + 2 + 3 + 4 +
1 + 2 + 3

And given that we know from Gauss the sum of the first n numbers if n(n+1)/2 we can easily calculate:

sum_odd(n) = sum(n) + sum(n-1)
= n(n+1)/2 + (n-1)n/2
= (n^2 + n + n^2 - n)/2
= n^2

What do you do on long flights?

Wednesday, February 13, 2008

Tonight, I'm going to write myself an Aston Martin

This is the story of my attempt to 'cheat' in an on-line spot-the-ball competition to win an Aston Martin. It's also the story of my failure, but you get free source code that implements automatic detection of image alteration using copy/paste or tools like the Clone Tool in Photoshop.

First, take a look at this photo:

Notice anything strange? In fact this image has been tampered with to cover up a truck. The truck is completely hidden by foliage. Here's the original:

Wouldn't it be nice to be able to detect that automatically? It is possible. Here's an image automatically generated by my code showing what was moved. All of the red was moved to the blue (or the other way around).

I was motivated to work on this program by greed (or at least my never-ending love of having a little flutter on things). Best of the Best runs spot-the-ball competitions in airports to win very expensive cars. But they also run the same competition online. That meant I could get my hands on the actual image used... could I process it to discover where the ball had been removed? (In reality, this isn't the right way to win because the actual ball position is not governed by where it actually was, but where a judge thinks it was).

Would it be cheating if I could? Apparently not, the competition rules say I should use my skill and judgment in determining the ball position. Surely, skill covers my programming ability.

So, I went looking for tampering algorithms and eventually came across Detection of Copy-Move Forgery in Digital Images written by Jessica Fridrich at SUNY Binghamton. The paper describes an algorithm for detecting just the sort of changes I thought I was looking for.

Unfortunately, I know nothing about image processing. Fortunately, the paper is written in a very clear style and a bit of Internet research enabled me to track down the knowledge I didn't have. (Also, thanks to Jessica for sending me the original images she used to test my implementation).

In brief the algorithm does the following:
  1. Slide a 16x16 block across the entire image from left hand corner to bottom right hand corner. For each 16x16 block perform a discrete cosine transform (DCT) on it and then quantize the 16x16 block using an expanded version of the standard JPEG quantization matrix.

  2. Each quantized DCT transformed block is stored in a matrix with one row per (x,y) position in the original image (the (x,y) being the upper left hand corner of the 16x16 block being examined).

  3. The resulting matrix is lexicographically sorted and then rows that match in the matrix are identified. For each pair of matching rows (x1,y1) and (x2,y2) the shift vector (x1-x2,y1-y2) (normalized by swapping if necessary so that the first value is +ve) is computed and for each shift vector a count is kept of the number of times it is seen.

  4. Finally the shift vectors with a count > some threshold are examined, the corresponding pair of positions in the image are found and the 16x16 blocks they represent are highlighted.

Here's another picture showing a golfing image that's been touched up to remove something from the grass:

To get access to image data I used the FreeImage library and wrote a small C program that implements Jessica's algorithm. You can download the source here; it's released to you under the GNU GPL.

The program has two key parameters that affect how the image is processed: the quality factor and the threshold.

The quality factor is a number used to 'blur' the image (actually it changes the quantization): the higher the factor the more blurring and hence more 16x16 blocks are likely to seem the same to the algorithm. Increasing the quality factor will tend to increase the false matches.

The threshold is simply the number of blocks that have to appear to have been copied together. This prevents us from seeing a single 16x16 block as evidence of copying. Increasing the threshold means ever larger groups of blocks have to be identified together before they are identified as copying.

Back at Best of the Best I grabbed the image for Supercar Competition (SC-272), cut out a section that I thought the ball had to be in (just to speed up processing) and ran the algorithm. After some parameter tweaking the algorithm came up only with what look like false matches to me (along the bar where it's all one color):

And, of course, that's not where the judge thought the ball was. So, I guess I won't be driving home in the V8 Vantage, but what geek needs that when they've got a cool piece of software that detects copy/move forgery in images?

Which leaves me with one question: how are spot-the-ball images generated? Is this an algorithm problem, a problem because they use JPG (which is already transformed) for their images, or are these images generated in some other way?

Tuesday, February 12, 2008

Interface to SQLite database in 23 lines of Arc

One thing that the first release of Arc was missing was access to any sort of database, but that's easily remedied. Here are 23 lines of Arc code that provide access to a SQLite database:

(= db! 'nil)

(def db+ (name (o host "localhost") (o port 49153))
(let (i o) (connect-socket host port)
(db> o name)
(if (db< i) (list i o))))

(def sql ((i o) q)
(db> o q)
(if (db< i) (readall i 200)))

(def db- (db)
(map close db))

(def db> (o s)
(write s o)
(writec #\return o)
(writec #\newline o)
(flush-socket o))

(def db< (i)
(= db! (read i))
(iso db! 200))

The three functions you need to care about are db+ (get a connection to a named SQLite database), db- (close a connection to a database) and sql (execute a SQL query and return a list (or lists) of rows. There's also db! which contains the status of the last command (200 for OK, or 500 followed by a string explaining the error).

Here's a little Arc session creating a database, putting some data in it and then querying it. The database called test didn't exist at the start of this session:

arc> (= db (db+ "test"))
(#<input-port> #<output-port>)
arc> (sql db "create table foo (id integer primary key, text varchar(255))")
arc> (sql db "select * from foo")
arc> (sql db "insert into foo (text) values ('first');")
arc> (sql db "select * from foo")
(("1" "first"))
arc> (sql db "insert into foo (text) values ('something else')")
arc> (sql db "select * from foo")
(("1" "first") ("2" "something else"))
arc> (db- db)

To make this work I had to write a TCP server that wraps SQLite (it's just a small C program that you can get here). The C program listens on a port for connections from your Arc program and handles queries.

I did have to make a small patch to Arc itself (since arc0 doesn't contain any outgoing socket code). My patch adds the ability to make a TCP connection to a remote machine and to flush an output port (add this to your ac.scm):

(xdef 'connect-socket (lambda (host port)
(let-values ([(in out) (tcp-connect host port)]) (list in out))))
(xdef 'flush-socket (lambda (s) (flush-output s)))

(Apologies if I have abused Scheme there, I'm a Scheme n00b)

All this code is released under the same license as Arc itself.

The leakiness of web mail

Many people seem to use web mail systems like Hotmail or Yahoo! Mail as a way of providing anonymity. This is a mistake because all these systems leak the IP address of the machine the user is typing on!


Here are part of the headers of a message that a family member sent me from their Hotmail account:

Received: from mail pickup service by with Microsoft SMTPSVC;
Received: from by with HTTP;
X-Originating-IP: []

This leaks that original IP address ( twice: once in an X-Originating-IP header and once in the first Received header which indicates that it was received from the same IP address using HTTP (i.e. using the web). A quick lookup shows that that IP address is in Birmingham, UK (which I happen to know is correct). So, if they were trying to keep their location secret, they've failed.

A whois lookup on that IP address tells me even more information, including that fact that is belongs to an Aston University. So, it's easy to conclude that this family member was student or staff at that university.

Yahoo! Mail

Yahoo! Mail leaks in a similar way. Here are part of the headers of a message I received from someone with what looks like a random email address and no name:

Received: from [] by via HTTP;

Geo locating that IP address shows me that the writer is in Tunisia.

Another Yahoo! Mail leak from an old colleague in California let's me track down their home city from their DSL line.

Received: from [] by via HTTP;

AOL Mail

Here are some headers from a message sent from an AOL web mail account that reveal that the sender is in Germany and looks like it gives away the name of the company that they are working for in the DNS name of the machine:

X-MB-Message-Source: WebUI

The X-AOL-IP gives the IP address of the machine that generated the message (i.e. where the web browser is running) and the helpful X-MB-Message_Source tells us they are using the web interface.


Here's an email I received from the editor of Wired who was using Earthlink:

Nice one! When I get off dialup from the French countryside, I'll blog

Was he really in France?




A search of my own email showed me that X-Originating-IP is a popular leak point (used by,, Network Solutions, and others).

Google Mail and Hushmail

Neither Google Mail nor Hushmail appear to leak the IP address. They may include the IP address (for example, in the Message-ID) but it does not appear to be readily discoverable.

Monday, February 11, 2008

My first Arc project: a simple Wiki

The only way to learn a programming language is to write something in it. So, I decided it was time to dig into Arc and my first project is a very simple Wiki.

Here's the source (wiki.arc):

; A wiki written in Arc (arc0)
; Copyright (c) 2008 John Graham-Cumming
; (load "wiki.arc")
; (wsv)
; Then go to http://localhost:8080/show

(load "web.arc")
(load "util.arc")

(= pagedir* "wiki/")

(def histfiles (page)
(sort > (map [coerce _ 'int] (rem [is "current" _] (dir (pagepath page))))))

(def nexthist (page)
(let h (histfiles page)
(if h (++ (car h)) 0)))

(def pagepath (page)
(string pagedir* (page 0) "/" (page 0) (page 1) "/" page ))

(def pagefile (page (o file))
(string (pagepath page) "/" (or file "current")))

(def slurp (page (o file))
(let p (pagefile page file)
(if (file-exists p) (readfile p)))))

(def upperlen (word)
(len (keep upper word)))

(def is-wikilink (word)
(if (alphas word)
(if (~is (word 0) (downcase (word 0)))
(>= (upperlen word) 2))))

(mac url-show (page)
`(string "show?p=" ,page))

(mac url-edit (page)
`(string "edit?p=" ,page))

(mac link-show (page text)
`(link ,text (url-show ,page)))

(mac link-edit (page text)
`(link ,text (url-edit ,page)))

(def wikify (word)
(if (is-wikilink word)
(if (file-exists (pagefile word))
(link-show word word)
(pr word)(link-edit word "?"))
(pr word))

(mac spew-raw (page)
`(spew ,page [pr _ " "]))

(mac spew-wiki (page (o file))
`(spew ,page [wikify (string _)] ,file))

(def spew (page f (o file))
(let p (pagepath page)
(if (dir-exists p)
(map f (flat (map tokens (slurp page file))))
(pr "This page does not yet exist."))))

(def squash (file body)
(writefile1 body file))

(def save-page (req)
(w/$ p
(w/$ t
(squash (pagefile p) t)
(squash (string (pagepath p) "/" (nexthist p)) t))
(url-show p)))

(mac mtime (f)
`(datetime (file-mtime ,f)))

(def last-modified (page)
(let f (pagefile page)
(if (file-exists f)
(pr "Last modified: " (mtime f)))))

(mac show-page (page)
(tag h1 (link-show ,page ,page))
(spew-wiki ,page)
(br 2)
(last-modified ,page)
(link-edit ,page "[edit]")
(link "[history]" (string "history?p=" ,page))))

(mac edit-page (page)
(tag h1 (pr (string "Editing " ,page)))
(arform save-page
(textarea "t" 25 80 (spew-raw ,page))
(hidden "p" ,page)
(submit "Save"))
(link-show ,page "[cancel]")
(br 2)))

(def revision (page rev)
(tag li
(pr "Revision: " )
(link rev (string "revision?p=" page "&r=" rev))
(pr " modified " (mtime (string (pagepath page) "/" rev)))))

(mac history-page (page)
(tag h1 (pr (string "History of " ,page)))
(tag ul (map [revision ,page _] (histfiles ,page)))
(link-show ,page (string "Back to " ,page))))

(mac revision-page (page rev)
(tag h1 (pr "Showing revision " ,rev " of " ,page))
(spew-wiki ,page ,rev)
(br 2)
(last-modified ,page)
(link-show ,page (string "Back to " ,page))))

(defop show req
(w/$ p
(if p
(show-page ($ "p"))
(show-page "HomePage"))))

(defop edit req
(w/$ p
(ensure-dir (pagepath p))
(edit-page p)))

(defop history req
(history-page ($ "p")))

(defop revision req
(revision-page ($ "p") ($ "r")))

(def wsv ()
(ensure-dir pagedir*)

It loads two helpers. The first contains common utilities that aren't really Wiki-related (util.arc):

(def alpha (c)
(or (<= #\a c #\z) (<= #\A c #\Z)))

(def alphas (str)
(is (keep alpha str) str))

(def upper (c)
(is (upcase c) c))

(def datetime ((o time (seconds)))
(let val (tostring
(system (string "date -u -r " time " \"+%Y-%m-%d %H:%M\"")))
(subseq val 0 (- (len val) 1))))

And the second contains enhancement to Arc's web/HTML handling (web.arc):

(mac hidden (name val)
`(gentag input type 'hidden name ,name value ,val))

(mac hr ()
`(gentag hr))

(mac ws ()
`(pr " "))

(mac $ (r)
`(arg req ,r))

(mac w/$ (r . body)
`(with (,r ($ (string ',r))) ,@body))

In web.arc there are a couple of bits of syntax to make accessing form/URL arguments easier: ($ "p") (which gets the value of the argument p) and (w/$ p ...) which sets a variable called p to the value of the argument p and then evaluates the rest of the expression.

All this is released under the same license as Arc. (Since I have never programmed in Arc before, and it's been almost 20 years since I stopped coding in LISP or ML, I'd appreciate constructive comments).

PPP3 (final version) in Java and C

Steve Gibson has released the final version of his PPP system: PPPv3 and so I've updated my code to be compatible.

Two versions of PPPv3 are available:

Both are released, as before, under the BSD license.

Friday, February 08, 2008

The Arc Challenge Explained

When I first looked at the Arc Challenge code my reaction, like that of many people, was WTH? Here's the code:

(defop said req
(aform [w/link (pr "you said: " (arg _ "foo"))
(pr "click here")]
(input "foo")

Within the context of the Arc web/app server this creates a page called /said which has a form on it:

<form method=post action="x">
<input type=hidden name="fnid" value="JtCw8ju328">
<input type=text name="foo" value="" size=10>
<input type=submit value="submit">

That form accepts a single parameter called foo and redirects to /x.

When clicking submit the user is taken to a page with a single link on it:

<a href="x?fnid=bHJpJ5G1DH">click here</a>

Following that link brings up a page showing what you typed in the first; here's the output when I typed hello in the form:

you said: hello

So, how does that work?

Firstly, the defop defines an 'operation' (which is just a page within the web server). In this case the page is called said and hence is bound to /said. There's a single argument, called req, which will contain the HTTP request when said is called by the server.

When said is called it uses aform to create an HTML form. To see this more clearly I've removed the clever part (and replaced it with X):

(aform X
(input "foo")

So aform creates a form with an simple HTML input with the name foo and a submit button. The clever bit is what happens when the form is submitted.

By default the form submits to the page /x. This is hard-coded in the source of the Arc server. It makes use of a neat feature of the Arc server: fnids. When the form was generated a hidden field was inserted with a unique 'function id' (the fnid). This fnid is used by the /x URL to lookup a function to call when /x is activated. (Note this example uses URLs/hidden form fields for the fnid, there's no reason why it couldn't be stored in a cookie).

The function called is actually the first argument to aform which has been stored away to be called when necessary. Here's the function definition:

[w/link (pr "you said: " (arg _ "foo")) (pr "click here")]

[ ... _ ... ] is special Arc syntax for a function with a single argument called _. So the first argument to aform is a function definition, and that function is assigned a unique fnid and that fnid can be used to lookup that function and call it. The single argument consists of the HTTP request used to activate the function.

The w/link macro creates a page consisting of the words click here linked to another page. The link is, once again, done using a function and fnid. The function called when the link is clicked is:

(pr "you said: " (arg _ "foo"))

w/link's first argument is an expression that will be evaluated within the context of a function (which is entirely hidden inside the server) and used to output the page. It retrieves the foo argument from the HTTP request at the time of the initial POST.

What's neat here is the mapping between functions and fnids so that pages are just functions and the lookup of the right page to go to is handled automatically.

Wednesday, February 06, 2008

A clever, targeted email/web scam with a nasty sting

Steve Kirsch sent me an interesting message he'd received from (i.e. an email address from the Better Business Bureau) containing an apparent complaint from a customer submitted through the BBB. The email itself was actually sent from a BellSouth ADSL line (i.e. almost certainly a zombie machine). The address was not authorized to send as according to BBB's SPF records.

But the content of the email message is very interesting. Here's a screenshot:

Notice how the email contains the correct address for Steve, his name and the name of his company and thus appears to be a real complaint. The link below the complaint, where you can get full details, is the first of two nasty stings in this message.

The actual URL is:

i.e. the link actually goes to the BBB's own web site (making it seem even more likely that this is a genuine message). The link manipulates the search option on the BBB web site using the lnk parameter to perform a redirect to which in turn redirects to And it's on that, presumably hacked, site that the real scam starts.

If you are not using Microsoft Internet Explorer you'll be presented with the following web page:

Once you've upgraded you get told that the web site requires the "Adobe Acrobat ActiveX" control and you need to install it.

The control itself is embedded using the following code:

<object classid="clsid:D68E2896-9FD9-4b70-A9AE-CCDF0C321C45" height="0" width="0" codebase=""></object>

Notice how instead of pointing to Adobe's web site to get the control it's available locally as So when you follow the instructions you download and install an ActiveX control from the scammer web site.

Once you've done that you get told that in fact the customer has withdrawn their complaint and there's nothing to worry about:

Now for the second sting. There must be something about this ActiveX control that's malicious... the scammer didn't go to all that trouble for nothing. But none of the current anti-virus programs report any problems with the file.

For example, my Sophos anti-virus says nothing, and online scanners such as Kaspersky's say that it's clean:

So, perhaps the file really is clean, but I suspect that this is a new threat which isn't currently detected by anti-virus. I'll post again when I get a response from Sophos' anti-virus brainiacs. Perhaps, I'm wrong but be very wary of these mails.

Further information about BBB related scams on their web site.

UPDATE: McAfee WebImmune tells me that this is a new detection of the SpyWare which steals information about your web surfing.

UPDATE: A scan using VirusTotal shows that very few anti-virus programs are detecting this (although their version of Kaspersky is finding it---curious that the online Kaspersky scanner does not).

Tuesday, February 05, 2008

The worst designed telephone in the world

OK, that might be an exaggeration, but the Doro Matra 5035 suffers from a bang-your-head-against-the-wall design fault. Here's the phone:

The phone includes an answering machine which has two indicators: a flashing red light and a small LCD US-style letter box symbol. Here's what the indicator looks like:

Great, very clear. Except... it's on the back of the phone and when the phone is in its cradle it is completely hidden from view. So hidden, in fact, that even in a darkened room the red light is invisible.

Which means you have to pick the phone up every time to see if someone left a message. As for the LCD message waiting indicator, it's miniscule, there's no LCD backlight and the LCD is deeply recessed in the phone meaning that even ambient light is blocked from making it easier to see the indicator. In fact, the LCD is so useless that you might as well pick up the phone to see the flashing indicator on the back.

Got a dumb product in your home? Write to me about it.

Friday, February 01, 2008

The Digg Heat Map

After my post about the number of registered Digg users got picked up by bloggers (Cliff Notes: 2.7m registered users, about 19% have been banned) I took at look at the information available through the Digg API.

The API has and end point to get user information (in chunks of up to 100 users) and that user information includes the date and time of registration to the nearest second (along with the user's name and icon and number of profile views).

So I wrote up a little script and used it to pull down information on 100,000 Digg users so that I could look at the pattern in registration times. I was specifically looking to see if there was evidence that Digg's audience is based in a certain geography, or that the geography had changed over time.

To make sure that I had a good spread of data I plotted the chart of signups over time for comparison with the chart I generated for my previous blog post. Comparing the two it looks like I have good coverage of the life of Digg:

I then plotted the registration times from the Digg API on a simple heat map: the x-axis is the hour during the day that the user registered (US West Coast time) and the y-axis is the month and year. Here's the map:

The brighter the red, the more people signed up during that hour. Note that brightness is calculated on a per-row basis so that although the number of users has increased each row is considered on the same scale (0-255).

In 2004 Digg was really only just starting and the early developers were clearly not morning people :-)

It's very obvious looking at this that Digg's audience has not changed geographically greatly since 2005. There's a strong band of signups between 0600 and 1500 US West Coast time. That indicates to me that Digg's audience is overwhelmingly US based.

If you consider the 4 US time zones, and imagine people are working between 0900 and 1800 (most web surfing is done at work) then it's easy to draw a little chart that shows when you'd expect the peak to be. The following chart shows the number of US time zones that are working relative to US West Coast time:

That corresponds very nicely to the hot zone of Digg registrations.

It's also obvious the US registrations vastly outweigh UK and European. The UK is 8 hours ahead of the US West Coast, and most of Europe is 9. So at midnight in California it's 0800 in the UK and 0900 in Europe. Looking at the blackness of night in the heat chart indicates that there aren't that many European Digg users.

So, my summary is that I think Digg remains mostly a US phenomenon with most of the users signing up while at work.

Thank you Digg for providing such a nice API. And if anyone else has suggestions for data I can poke at, please email me.