Monday, January 21, 2008

Proof that the sum of the squares of the first n whole numbers is n^3/3 + n^2/2 + n/6

A recent thread on Hacker News that I started with a flippant comment turned into a little mathematical puzzle.

What's the sum of the square of the first n whole numbers?

It's well known that the sum of the first n whole numbers is n(n+1)/2. But what's the value of sum(i=1..n) n^2? (I'll call this number S for the remainder of this post).

It turns out that it's easy to prove that S = n^3/3 + n^2/2 + n/6 by induction. But how is the formula derived? To help with reasoning here's a little picture of the first 4 squares stacked up one on top of the other:



If we fill in the blank squares to make a rectangle we have the basis of a derivation of the formula:



Looking at the formerly blank squares (that I've numbered to assist with the thinking) we can see that the columns have 1 then 1+2 then 1+2+3 and finally 1+2+3+4 squares. Thus the columns are sums of consecutive whole numbers (for which we already have the n(n+1)/2 formula.

Now the total rectangle is n+1 squares wide (in this case 5) and its height is the final sum of whole numbers up to n or n(n+1)/2 (in the image it's 4 x 5 / 2 = 10. So the total number of squares in the rectangle is (n+1)n(n+1)/2 (in the example that's 5 x 10 = 50).

So we can calculate S as the total rectangle minus the formerly blank squares which gives:

S = (n+1)n(n+1)/2 - sum(i=1..n)sum(j=1..i) j
= (n(n+1)^2)/2 - sum(i=1..n) i(i+1)/2
2S = n(n+1)^2 - sum(i=1..n) i(i+1)
= n(n+1)^2 - sum(i=1..n) i^2 - sum(i=1..n) i
= n(n+1)^2 - S - n(n+1)/2
3S = n(n+1)^2 - n(n+1)/2
= n(n+1)( n+1 - 1/2 )
= n(n+1)(n+1/2)
= (n^2+n)(n+1/2)
= n^3 + n^2/2 + n^2 + n/2
= n^3 + 3n^2/2 + n/2
S = n^3/3 + n^2/2 + n/6

Thursday, January 17, 2008

Another use of POPFile: detecting weakly encrypted email

Almost all users use POPFile as a spam filter, most of them also use the fact that POPFile can sort in arbitrary categories of mail. However, some people have pushed POPFile even further... Martin Overton (of IBM) has used POPFile to discover email borne malware, even finding that POPFile could automatically detect mutations. Now, some researchers in Japan have used POPFile to detect weak encryption of email with 80% accuracy.

The researchers were building a system to detect improper sending of personal information by email. Their system first checked for the use of strong encryption (if the mail is strongly encrypted then there's no need to worry about eavesdropping), the system also checked for things like telephone numbers, email addresses and other personal data in non-encrypted mail.

But they also wanted a system to detect poor encryption (such as ROT-13), and for that they turned to POPFile. After a mere 30 emails had been trained in POPFile it was able to distinguish plain text messages from those encrypted with weak ciphers.

Some details in their paper.