Tuesday, January 29, 2008

How many users does Digg have?

According to my calculations: around 2.7m registered users.

I obtained this number by finding random Digg users and extracting their user id. The user id is in a hidden HTML form input field on each Digg user's page. The Digg user page also gives their date of registration. Using this I was able to plot every month from December 2004 (when Kevin Rose registered) up to this month.

Here's a picture:

Of course, the user id might be being generated in some other way, but I suspect that it's an auto_increment field in their database. To validate my data I checked the Digg blog and in March 2007 Kevin Rose wrote that Digg had 1 million users. This ties directly with my data, and hence I think I'm correct.

If this data is correct then looking at the last 6 months Digg is growing at a rate of around 110,000 users per month.

Now that doesn't say how many are active, but it's interesting data nonetheless.


1. The inflection point at June 2006 corresponds to the launch of Digg v3 which marks the point that Digg stopped being just about technology and added categories like World and Business and Entertainment.

2. Kevin Rose was on the cover of Business Week in August 2006.

3. Another calibration point is this post listing the number of Digg users as around 700,000 in December 2006. That also matches my data.

4. There's a big jump of users in December 2006 when Digg added multimedia features like videos.

5. The May 2007 jump corresponds to the AACS key "controversy".

6. Yet more confirmation that these figures are correct. TechCrunch reports that Digg had 200,000 users in March 2006.

7. "Kurt" pointed out that the Digg API gives a user count of 2,211,964. That's interesting because it gives us a way to estimate the number of spammer/abuser accounts banned by Digg: about 500k or around 19% or registered users.

Monday, January 21, 2008

Proof that the sum of the squares of the first n whole numbers is n^3/3 + n^2/2 + n/6

A recent thread on Hacker News that I started with a flippant comment turned into a little mathematical puzzle.

What's the sum of the square of the first n whole numbers?

It's well known that the sum of the first n whole numbers is n(n+1)/2. But what's the value of sum(i=1..n) n^2? (I'll call this number S for the remainder of this post).

It turns out that it's easy to prove that S = n^3/3 + n^2/2 + n/6 by induction. But how is the formula derived? To help with reasoning here's a little picture of the first 4 squares stacked up one on top of the other:

If we fill in the blank squares to make a rectangle we have the basis of a derivation of the formula:

Looking at the formerly blank squares (that I've numbered to assist with the thinking) we can see that the columns have 1 then 1+2 then 1+2+3 and finally 1+2+3+4 squares. Thus the columns are sums of consecutive whole numbers (for which we already have the n(n+1)/2 formula.

Now the total rectangle is n+1 squares wide (in this case 5) and its height is the final sum of whole numbers up to n or n(n+1)/2 (in the image it's 4 x 5 / 2 = 10. So the total number of squares in the rectangle is (n+1)n(n+1)/2 (in the example that's 5 x 10 = 50).

So we can calculate S as the total rectangle minus the formerly blank squares which gives:

S = (n+1)n(n+1)/2 - sum(i=1..n)sum(j=1..i) j
= (n(n+1)^2)/2 - sum(i=1..n) i(i+1)/2
2S = n(n+1)^2 - sum(i=1..n) i(i+1)
= n(n+1)^2 - sum(i=1..n) i^2 - sum(i=1..n) i
= n(n+1)^2 - S - n(n+1)/2
3S = n(n+1)^2 - n(n+1)/2
= n(n+1)( n+1 - 1/2 )
= n(n+1)(n+1/2)
= (n^2+n)(n+1/2)
= n^3 + n^2/2 + n^2 + n/2
= n^3 + 3n^2/2 + n/2
S = n^3/3 + n^2/2 + n/6

Thursday, January 17, 2008

Another use of POPFile: detecting weakly encrypted email

Almost all users use POPFile as a spam filter, most of them also use the fact that POPFile can sort in arbitrary categories of mail. However, some people have pushed POPFile even further... Martin Overton (of IBM) has used POPFile to discover email borne malware, even finding that POPFile could automatically detect mutations. Now, some researchers in Japan have used POPFile to detect weak encryption of email with 80% accuracy.

The researchers were building a system to detect improper sending of personal information by email. Their system first checked for the use of strong encryption (if the mail is strongly encrypted then there's no need to worry about eavesdropping), the system also checked for things like telephone numbers, email addresses and other personal data in non-encrypted mail.

But they also wanted a system to detect poor encryption (such as ROT-13), and for that they turned to POPFile. After a mere 30 emails had been trained in POPFile it was able to distinguish plain text messages from those encrypted with weak ciphers.

Some details in their paper.

Monday, January 07, 2008

First release of my 'shimmer' project

A couple of months ago I blogged about a system for open and closing ports based on a crytographic algorithm that makes it hard for an attacker to guess the right port. It's a sort of port knocking scheme that I called C3PO.

Many commentators via email or on the blog and in other forums requested that I open source the code. I couldn't do that because the code was a nasty hack put together for my machine, but I gone one better.

Today I'm releasing the first version of shimmer. shimmer is completely new, GPL-licensed code, implementing my original idea. Read more about it on the site.

Hit the right port and you're in, hit the wrong one and your blacklisted. Ports change every 60 seconds.