Tuesday, July 25, 2006

The Open Source Elephant in the room: the source sucks

Firstly, since I'm about to slam Free and Open Source Software let me point out that I am not a Microsoft fan boy, that I use Firefox, Thunderbird, Ubuntu, EMACS, all the GNU tools, the GIMP, Apache, TRAC, OpenOffice, etc. on a daily basis. I live and work on FOSS.

But there's a real problem with most FOSS: for something that prides itself on the source being readable by everyone, and even cames up with 'laws' like 'given enough eyeballs all bugs are shallow', the actual source code of most FOSS is horrible, unreadable, garbage. Actually, I wonder if 'Linus' Law' shouldn't actually be something like 'Linus' Necessity': given that the source is so horrible we need lots of people so that one of them will be able to figure out what the hell it was we wrote.

When I started my well-known open source project I decided that I'd better make the code readable for two reasons: firstly, I was sure that I wouldn't get to work on it often so I'd have to come back and read old code and comments and other coding standards would make that easier; secondly, I was sure that other people were going to read my code.

The second thing turned out to be really important for two reasons: firstly, other people were able to read my code and contribute and I kept them to a similar coding standard and style and hence the code is (reasonably, I'm not claiming I'm perfect) readable. Perhaps more importantly one day I was being interviewed for a job and the interviewer said: "Yes, we've all read your code". They'd downloaded my project and checked me out. (I got the job).

Now, I'm not trying to slam all FOSS here and for the purposes of this entry I have not examined some of the most famous projects (e.g. Linux kernel, Apache, Firefox, ...), but I decided to take a look at the top 50 most downloaded projects of all time on SourceForge.

Then I would pick at random two source files (each source file had to be fairly large, i.e. more than 100 lines of code) and score them being as generous as possible using the following categories and assigned a score to each. I weighted the scores heavily towards doing simple things that have a high benefit (for example, describing the purpose of a file of function):
  • File Description (FD): did the file I open have some sort of description (near the top) of what the purpose of the file was for. I wasn't asking for a detailed explanation, but just a little helper so that a new reader could get going on the purpose. Score: +5 (if present), -5 (if not)
  • Function/Interface Description (FID): did any of the functions, or interfaces, in the file have a description. I would have liked to have seen all the arguments specified and return codes and caveats explained, but I was extremely generous: even if one function had a little header with a minimal description of the function it got into this category. Score: +5 (if present), -5 (if not)
  • Useful Comments (UC): did the file contain at least one useful comment. A useful comment points out something that isn't obvious to the reader, or some trap for the unwary. Score: +1 (if present), -1 (if not)
  • Stupid Comments (SC): did the file contain at least one stupid comment like 'increment i' or 'loop through records'. Score: -1 (if present), +1 (if not)
  • Understandable (U): did I feel like I would be able to understand most of the code given 30 minutes of reading the file and browsing the rest of the source. This was very subjective, but was used to take into account things like clearly named functions, or really well named member variables. Score: +5 (if understandable), -5 (if not)
  • Commented out code (COC): people we have source code control systems. Don't // out your code, or #if 0 it. ok? Score: -1 (if present), +1 (if not)
  • Bonus (B): I had a special bonus category which I could hand out if I felt like it. A positive score here was for particularly well documented, and written code, neutral for most code and negative for really hideous stuff. Score: +10 (loved it), -10 (yuck), 0 (in general)
Of the top 50 projects one (XAMPP) was excluded because it's a distribution of other code and not new code.

What I found was not a pretty picture:
  • 65% don't bother describing even in the most minimal way even one of the functions I saw
  • 60% of the projects don't bother with describing the purpose of a file
  • 59% of the projects scored negatively using my system
  • 53% contained useless comments
  • 40% looked incomprehensible to me without major effort
  • 33% contained commented out or #if 0 code
There was one bright spot: 85% contained at least one useful comment. But given that my percentages underestimate the problems (because I was very generous) these figures are horrible.

The best projects were (in order of score): GNUWin32 (thanks GNU Project!), GTK+ and The GIMP installers for Windows, NASA World Wind, Ghostscript, WINE, Miranda, MinGW (thanks GNU Project!), Erases, and DC++.

Come on FOSS people. Have some pride in your work! Remember, writing some decent comments is a gift you are given to people who read your code, and to yourself.

(Note that if you are the author of one of the projects above it's possile that I made a mistake and just happened to pick the wrong files to read. Send me examples of how great your code is and I'll publish a rebuttal here).

Here's a table with all the data:



















































ProjectFDFIDUCSCUCOCBScorePop.
eMule-1-111110-61
Azureus -1-111110-62
BitTorrent -1-1-1-1-1-10-143
DC++ 11111-10164
Ares Galaxy 1-11-1-1-10-25
CDex -1-111110-66
VirtualDub -1-11-1-1-10-127
Shareaza -1-111110-68
eMule Plus 1-11-111069
GTK+ and The GIMP installers for Windows 111-11-112810
7-Zip -1-11-1-1-10-1211
FileZilla -1-1-1-1-1-10-1412
guliverkli 1-1-11-1-10-613
Gaim 1-11-11-10814
Audacity 1-11-1-1-10-215
phpBB -1-1111-10-416
ZSNES -1-11-1110-417
TightVNC -1-111110-618
phpMyAdmin 1111-1-10619
NASA World Wind 111-11-101820
ABC [Yet Another Bittorrent Client] -11111-10621
Dev-C++ -1-11-11-10-222
aMSN -1111110423
ffdshow -1-1-1-1-1-1-1-2424
WinSCP -1-11-1-1-10-1225
JBoss.org 111-1-1-10826
AC3Filter -1-111-1-10-1427
Ghostscript 11111-101628
[email protected] -1-111-110-1629
Webmin 11-1-1-1-10630
PDFCreator -1-111-1-10-1431
VisualBoyAdvance -1-11-1-110-1432
MinGW - Minimalist GNU for Windows 11111-101633
eMule Morph -1-111-110-1634
GnuWin32 111-11-112835
The CvsGui project 11-111-101436
FlasKMPEG -1-11-1110-437
VirtualDubMod -1-11-1-110-1438
Gallery 1-11-1110639
DOSBox DOS Emulator -1-1111-10-441
Miranda 11111-101642
DScaler Deinterlacer/Scaler -11111111443
Celestia -1-11-11-10-244
PeerGuardian -11111-10645
XOOPS Dynamic Web CMS -1-11-11-10-246
Eraser 111-11101647
Wine Is Not an Emulator 11111-101649
burst! -1-1-1-1-1-10-1450

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

<$BlogCommentBody$>

<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

Post a Comment

Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
<$BlogBacklinkSnippet$>
Create a Link

<< Home