Friday, July 28, 2006

Unbanned from Digg

After two emails to Digg's abuse email address, and an intervention by Leo Laporte (thanks Leo!) direct to Kevin Rose, my Digg account has been unbanned.

Here's the official word I got from Digg:

Your account has been unbanned. Your account was banned for violating digg Terms Of Use, submitting the exact same story in less then 3 minutes time, that's what spammers usually do on digg. As a consequence, we banned your account. Your account was NOT banned for linking to or for submitting a joke.

-The Digg Watch Team.

So I'm letting this go now. They say they were OK with the joke, and just thought (erroneously) that I was spamming. I never believed that it was because of some reddit/digg rivalry.

So, it's over.

But a note to Digg: please alter the way Bury Story works so that it's obvious when a story has been buried, that the reason it was buried is clear, and that the list of people who buried the story is given (just like you do for who dugg a story). Just doing that one thing would end a lot of confusion.

Update on August 1, 2006: I just came across a blog entry that makes some untrue claims about me concerning this:

1. The problem was, then he submitted the story multiple times. Actually, I submitted the same story twice, not multiple times and as I've explained this was because I screwed up the URL on the Reddit side and had to start again. You can check for yourself how many times I submitted the story my looking at my Digg account submission history.

You can also see that the two stories have different URLs, and that one story has just one digg. That's partly because I buried that story myself once I realized I had screwed up.

2. and then created multiple fake accounts and dugg his own stories. Once I discovered that my account was banned I did two things: I emailed [email protected] asking why and I created a new account for myself. So multiple here is 1, and I only did that because Digg killed my regular account.

Thursday, July 27, 2006

Badges of Honor

So after yesterday's little recursion hack I've received a few badges of honor that I'm very proud of:
  1. I've been banned from Digg and the Digg folks don't have the decency to answer my polite email asking for an explanation of my banning. At this point, I consider this to be a badge of honor: if a technical web site bans you for submitting a clever link that demonstrates a well known programming paradigm and harms no one, and the founder of the web site claims to be such a cool hax0r, then your submission revealed something very important: that site is run by fools.

  2. The reddit folks honored me with a special reddit logo for the day celebrating my never ending recursion between Digg and reddit. I'm probably violating reddit's terms of service by publishing this, but here's a copy of the logo:

  3. My story was also #1 on reddit yesterday and so I've been awarded a Golden Reddit in recognition. Cool, thanks guys!

  4. And I guess the reddit guys really liked it because they've offered me a free shirt from their store.
So, it seems like the reddit people and community have a sense of humor, and the Digg people don't. Not only did their staff ban me, but it looks like part of their community buried the story and since Digg has not feedback on when stories are buried it just magically disappeared.

Wednesday, July 26, 2006

Sense of humor failure at Digg

This afternoon I decided to play a little joke on Digg and Reddit by submitting a story about recursion that pointed from the Reddit story to the Digg story and back. Since Digg URLs are totally predictable it was possible to set this up by writing the Reddit story first (and they don't verify that URLs work) and then posting the correct Digg story with the Reddit link (which I couldn't predict).

It's a classic programmer joke (it's even in the Digg Programming section).

And for that my Digg account was cancelled without warning.

You can start the recursion at Digg here.

But Digg folks, get a life, ok? It was a joke, a classic programming joke. And this from a guy who's a 'dark tipper' (scary music).

A well informed reader writes to me:
Your amusing recursion hack has just had a practical consequence:
you inadvertantly caused the smoking gun proving Digg censors
stories on the front page. Your link actually made the digg
front page, according to the rss feed used by sites like popurls:

but it's been censored from the actual front page.

This also shows an unexpected consequence of feeds: you can't
get away with censoring your site, unless you have some kind of
delay in the feed.
So I looked into this and sure enough the story is buried. If you try searching on Digg for 'reddit recursion' you get no results, but if you look at the main RSS feed my story is there as #2 on the front page. Yet on the actual page the story is not there.

In the RSS feed it appears between "Bug or Conspiracy? Digg users go CRAZY modding down comments!" and "Metallica finally joins iTMS". I'll email anyone a copy of the RSS feed showing my story.

And finally the number of stories promoted to the front page for user jgrahamc (my now dead account) just dumped from 1 to 2... I guess my silly recursion story should be on the front page but it isn't. This appears to be because members of the Digg community buried the story because they didn't like it, but Digg has no mechanism for showing when a story is buried.

Tuesday, July 25, 2006

The Open Source Elephant in the room: the source sucks

Firstly, since I'm about to slam Free and Open Source Software let me point out that I am not a Microsoft fan boy, that I use Firefox, Thunderbird, Ubuntu, EMACS, all the GNU tools, the GIMP, Apache, TRAC, OpenOffice, etc. on a daily basis. I live and work on FOSS.

But there's a real problem with most FOSS: for something that prides itself on the source being readable by everyone, and even cames up with 'laws' like 'given enough eyeballs all bugs are shallow', the actual source code of most FOSS is horrible, unreadable, garbage. Actually, I wonder if 'Linus' Law' shouldn't actually be something like 'Linus' Necessity': given that the source is so horrible we need lots of people so that one of them will be able to figure out what the hell it was we wrote.

When I started my well-known open source project I decided that I'd better make the code readable for two reasons: firstly, I was sure that I wouldn't get to work on it often so I'd have to come back and read old code and comments and other coding standards would make that easier; secondly, I was sure that other people were going to read my code.

The second thing turned out to be really important for two reasons: firstly, other people were able to read my code and contribute and I kept them to a similar coding standard and style and hence the code is (reasonably, I'm not claiming I'm perfect) readable. Perhaps more importantly one day I was being interviewed for a job and the interviewer said: "Yes, we've all read your code". They'd downloaded my project and checked me out. (I got the job).

Now, I'm not trying to slam all FOSS here and for the purposes of this entry I have not examined some of the most famous projects (e.g. Linux kernel, Apache, Firefox, ...), but I decided to take a look at the top 50 most downloaded projects of all time on SourceForge.

Then I would pick at random two source files (each source file had to be fairly large, i.e. more than 100 lines of code) and score them being as generous as possible using the following categories and assigned a score to each. I weighted the scores heavily towards doing simple things that have a high benefit (for example, describing the purpose of a file of function):
  • File Description (FD): did the file I open have some sort of description (near the top) of what the purpose of the file was for. I wasn't asking for a detailed explanation, but just a little helper so that a new reader could get going on the purpose. Score: +5 (if present), -5 (if not)
  • Function/Interface Description (FID): did any of the functions, or interfaces, in the file have a description. I would have liked to have seen all the arguments specified and return codes and caveats explained, but I was extremely generous: even if one function had a little header with a minimal description of the function it got into this category. Score: +5 (if present), -5 (if not)
  • Useful Comments (UC): did the file contain at least one useful comment. A useful comment points out something that isn't obvious to the reader, or some trap for the unwary. Score: +1 (if present), -1 (if not)
  • Stupid Comments (SC): did the file contain at least one stupid comment like 'increment i' or 'loop through records'. Score: -1 (if present), +1 (if not)
  • Understandable (U): did I feel like I would be able to understand most of the code given 30 minutes of reading the file and browsing the rest of the source. This was very subjective, but was used to take into account things like clearly named functions, or really well named member variables. Score: +5 (if understandable), -5 (if not)
  • Commented out code (COC): people we have source code control systems. Don't // out your code, or #if 0 it. ok? Score: -1 (if present), +1 (if not)
  • Bonus (B): I had a special bonus category which I could hand out if I felt like it. A positive score here was for particularly well documented, and written code, neutral for most code and negative for really hideous stuff. Score: +10 (loved it), -10 (yuck), 0 (in general)
Of the top 50 projects one (XAMPP) was excluded because it's a distribution of other code and not new code.

What I found was not a pretty picture:
  • 65% don't bother describing even in the most minimal way even one of the functions I saw
  • 60% of the projects don't bother with describing the purpose of a file
  • 59% of the projects scored negatively using my system
  • 53% contained useless comments
  • 40% looked incomprehensible to me without major effort
  • 33% contained commented out or #if 0 code
There was one bright spot: 85% contained at least one useful comment. But given that my percentages underestimate the problems (because I was very generous) these figures are horrible.

The best projects were (in order of score): GNUWin32 (thanks GNU Project!), GTK+ and The GIMP installers for Windows, NASA World Wind, Ghostscript, WINE, Miranda, MinGW (thanks GNU Project!), Erases, and DC++.

Come on FOSS people. Have some pride in your work! Remember, writing some decent comments is a gift you are given to people who read your code, and to yourself.

(Note that if you are the author of one of the projects above it's possile that I made a mistake and just happened to pick the wrong files to read. Send me examples of how great your code is and I'll publish a rebuttal here).

Here's a table with all the data:

Azureus -1-111110-62
BitTorrent -1-1-1-1-1-10-143
DC++ 11111-10164
Ares Galaxy 1-11-1-1-10-25
CDex -1-111110-66
VirtualDub -1-11-1-1-10-127
Shareaza -1-111110-68
eMule Plus 1-11-111069
GTK+ and The GIMP installers for Windows 111-11-112810
7-Zip -1-11-1-1-10-1211
FileZilla -1-1-1-1-1-10-1412
guliverkli 1-1-11-1-10-613
Gaim 1-11-11-10814
Audacity 1-11-1-1-10-215
phpBB -1-1111-10-416
ZSNES -1-11-1110-417
TightVNC -1-111110-618
phpMyAdmin 1111-1-10619
NASA World Wind 111-11-101820
ABC [Yet Another Bittorrent Client] -11111-10621
Dev-C++ -1-11-11-10-222
aMSN -1111110423
ffdshow -1-1-1-1-1-1-1-2424
WinSCP -1-11-1-1-10-1225 111-1-1-10826
AC3Filter -1-111-1-10-1427
Ghostscript 11111-101628
[email protected] -1-111-110-1629
Webmin 11-1-1-1-10630
PDFCreator -1-111-1-10-1431
VisualBoyAdvance -1-11-1-110-1432
MinGW - Minimalist GNU for Windows 11111-101633
eMule Morph -1-111-110-1634
GnuWin32 111-11-112835
The CvsGui project 11-111-101436
FlasKMPEG -1-11-1110-437
VirtualDubMod -1-11-1-110-1438
Gallery 1-11-1110639
DOSBox DOS Emulator -1-1111-10-441
Miranda 11111-101642
DScaler Deinterlacer/Scaler -11111111443
Celestia -1-11-11-10-244
PeerGuardian -11111-10645
XOOPS Dynamic Web CMS -1-11-11-10-246
Eraser 111-11101647
Wine Is Not an Emulator 11111-101649
burst! -1-1-1-1-1-10-1450

Monday, July 10, 2006

A simple code for entering latitude and longitude to GPS devices

This post proposes a coding system for entering any location on earth with 10m of accuracy using a 10 character code that includes features to prevent errors in entering the code.

The idea is that any one could publish their location by writing something like VUF DDC F8UG. This short code could be entered into a GPS device giving you any spot on the globe.

I'm calling it the SOC: Simple Orientation Code.

Some example uses:
  1. I could print my company's SOC on my business cards and visitors could punch it into their car navigation system and come visit
  2. A restaurant could publish its SOC along with its phone number (after all it's the same length as a phone number so it's something people can easily grok) making the restaurant easy to find
  3. Geocachers could publish SOC trails for people hunting down caches
  4. SCUBA divers could refer to dive sites by their SOC (10m of accuracy is enough surface accuracy for most people)
Here's how the code works.

First you need the latitude and longitude of the location you are talking about to 4 decimal places of accuracy. 4 decimal places gives about 10m of accuracy. So treating latitude as ranging from 0 to 180 degrees (basically change it from -90 to 90 degrees by adding 90) and longitude as from 0 to 360 degrees (ignoring east/west or +/- values) and then treating the two numbers as integers (i.e. take the 4 decimal place latitude or longitude and multiply by 10000) you get two numbers: La and Lo.

La varies from 0 to 1,799,999 and Lo from 0 to 3,599,999. These two numbers can be combined to form a single number that I call P (your position) like this:

P = La * 3600000 + Lo

Extracting the La and Lo from P is simply a matter of dividing P by 3,600,000 (to get La) and calculating the remainder (to get Lo).

P varies from 0 to 6,479,998,200,000 which can be stored in 43 bits.

Now encoding P in some form typeable by a human requires an alphabet. The SOC alphabet consists of the following 32 characters:


This is the standard English alphabet plus Arabic numerals 0 through 9 with the following letters removed: I, O, S, and Z. These are removed because I is easily confused with both 1 and J; O is easily confused with 0; S is easily confused with 5 and Z is easily confused with 2. These characters are removed to ensure that the code is minimally affected by bad handwriting.

Moreover an implementation using the SOC should silently perform the following translations: I becomes 1; O becomes 0; S becomes 5 and Z becomes 2. This way the user will not have to correct a poorly written SOC.

Each character in the alphabet represents a number between 0 and 31.

A(0) B(1) C(2) D(3) E(4) F(5) G(6) H(7) J(8) K(9) L(10) M(11) N(12) P(13)
Q(14) R(15) T(16) U(17) V(18) W(19) X(20) Y(21) 0(22) 1(23) 2(24) 3(25)
4(26) 5(27) 6(28) 7(29) 8(30) 9(31)

P can be encoded using 10 characters from this alphabet. Since each character contains 5 bits of information and only 43 bits are needed for the position that leaves 7 bits for an error checking code. The algorithm used to generate the check digit is a variant of the scheme used for ISBNs.

The 43 bit P is broken into 11 4 bit numbers with a zero padded on the left of P. The 11 numbers are p0 through p10. A check digit C is calculated as follows:

C = ( p0 * 37 + p1 * 31 + p2 * 29 + p3 * 23 + p4 * 17 + p5 * 13 + p6 * 11 + p7 * 7 + p8 * 5 + p9 * 3 + p10 * 2 ) mod 127

C is then appended to P to create the SOC.

Now for some Perl code that implements the coding and encoding of SOCs.

Converting a latitude and longitude to a SOC:

use strict;

if ( $#ARGV != 1 ) {
die "Usage: to-soc ";

my ( $lat, $lon ) = @ARGV;

my $alpha = 'ABCDEFGHJKLMNPQRTUVWXY0123456789';
my @alphabet = split(//,$alpha);

$lat += 90;
$lon += 180;

$lat *= 10000;
$lon *= 10000;

my $p = $lat * 3600000 + $lon;

my $soc_num = $p * 128;

my @primes = ( 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 );

my $c = 0;

foreach my $prime (@primes) {
$c += ($p % 32) * $prime;
$p = int($p / 32);

$c %= 127;

$soc_num += $c;

my $digits = 10;

my $soc = '';

while ( $digits > 0 ) {
my $d = $soc_num % 32;
$soc = $alphabet[$d] . $soc;
$soc_num = int($soc_num/32);

print "$soc\n";

Converting a SOC back to a latitude and longitude:

use strict;

if ( $#ARGV != 0 ) {
die "Usage: from-soc <10-digit-soc>";

my $soc = uc($ARGV[0]);
$soc =~ tr/IOSZ/1052/;

my $alphabet = 'ABCDEFGHJKLMNPQRTUVWXY0123456789';

my $soc_num = 0;

foreach my $letter (split(//, $soc)) {
$soc_num *= 32;
$alphabet =~ /(.*)$letter/;
$soc_num += length($1);

my $p = int($soc_num / 128);
my $check = $soc_num % 128;

my $lon = $p % 3600000;
my $lat = int($p / 3600000);

$lat /= 10000;
$lon /= 10000;

$lat -= 90;
$lon -= 180;

my @primes = ( 2, 3, 5, 7, 11, 13, 17, 23, 29, 31, 37 );

my $c = 0;

foreach my $prime (@primes) {
$c += ($p % 32) * $prime;
$p = int($p / 32);

$c %= 127;

if ( $check != $c ) {
die "Incorrect SOC";
} else {
print "$lat $lon\n";

This idea and code is being released by me into the public domain.

Those of you with a twisted mind like to try to find points on the globe that have human-readable SOCs. For example, by picking coordinates that contain a word in the SOC. Challenge: find a location on the blog that's something along the lines of TREASURE or STARTHERE.