Monday, April 30, 2012

Make your own 'prime factorization' diagram

The Prime Factorization Sweater is a lovely idea and I thought it would be fun to reproduce the same idea electronically so that I could print out a poster version for home.

Enter Processing.

With it I've developed a small program that produces a diagram of the first 100 numbers and for each number there's a circle broken up into arcs.  Each arc is a prime factor.  As in the original sweater each factor gets a unique color (assigning unique colors is rather complex and I ended up using the color difference method based on CMC l:c and a nice online tool that does the work for you).

Here's the finished product.  The top left corner is the number 1 and the numbers read right to left.  So the first red circle is a prime number (2), the second the next number (3, which is prime) and so on.

There's also an option to print the numbers involved.

The source code is in the pfd repository on GitHub and licensed under GPLv2. Processing is a really nice environment for this sort of rapid hacking of anything graphical. See, for example, how I used it to visualize Ikea Lillabo Train Set layouts.

PS After encouragement in the comments from the person who had the original idea for the prime factorization sweater I've made a CafePress store in which you can buy men's and women's T-shirts printed with the prime factorization diagram.

Friday, April 27, 2012

tacoli: a simple logging format

A post on Hacker News entitled Log Everything As JSON. Make Your Life Easier reminded me of my private logging strategy which has the following properties:

1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc

2. Easy to parse and analyze in code using Perl, Ruby, or Go

3. Compact

4. Easily expandable and lacking the ambiguity of simple delimited log formats

I call it tacoli (which stands for Tabs, Colons and Lines).  Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.

That's it.  Here's an example log line from Apache in this format:

22/Apr/2012:06:29:07 +0000      apache  ip: method: GET     uri: /example.html code:301        size:305        referer:        agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162

Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat.  No special logger module required.  In fact, anything that can 'printf' a string can create tacoli lines trivially.

It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value.  No specialized JSON (or other parser) required.

It's trivial to extend without breaking any tools.  Just add a new field (anywhere on the line) with a new key.

It's simple to work with using Unix tools.  Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).

If you want to extract a single field from each line of the log file then it's easy to do with grep.  Here's an example that extracts all the lines that have an ip entry and just extracts that

grep -Po "\tip: [^\t]+" access.log

The key name can be trivially removed using cut

grep -Po "\tip: [^\t]+" access.log | cut -d: -f2-

and the output can be fed into the other Unix tools.  Also, if you know that your log file format hasn't changed you can still use the positional information to simplify parsing and fall back to cut.
It isn't quite as compact as a log file format that only uses position to indicate meaning, but compression largely overcomes that problem and key names can be chosen to be short and unique.

The Greatest Machine That Never Was

I was invited to talk at TEDx Imperial College and gave a talk about Charles Babbage's Analytical Engine called The Greatest Machine That Never Was. Here's the video of that talk:

All the other talks are here. The project to build the Analytical Engine is Plan 28.

Wednesday, April 25, 2012

Simply user hostile

The other day I was in San Francisco and stayed at an upmarket hotel.  I decided to use their 'checkout using the TV option' and found the right menu option.  The first step was to review the charges for the room and I was presented with the following message:

Wow.  Talk about user hostile.  This message can be rewritten as: our system is insecure, guests in other rooms can spy on your information displayed on this screen.  OK? Press Select to sign a legal waiver.

Utterly brain dead.  

Either secure the damn system or don't offer this option.  Asking the guest to confirm that they are OK with the fact that someone else might see their personal details and bill is ridiculous.

Sunday, April 22, 2012

Deglitching a Sparkfun 7-segment Serial Display

The display in my Ambient Bus Arrival Monitor is a Sparkfun 7-segment Serial Display connected to the TTL serial port.  I had noticed that occasionally the display would reset itself to 0000 (or sometimes 0, 00 or 000).  It was even possible to make it do this by touching the body of the bus.  It didn't happen often so I was able to ignore it but then it began to happen more and more.

After a very long and tedious investigation I discovered why.  I started out by blaming my code, my soldering, the cable I was using, the quality of the connectors, ...  Only having eliminated everything that I'd touched did I realize it must be something else.

The display has two input methods: serial (which I am using) and SPI (which I am not).  The SPI interface has a clock signal (which in the case of the display is acting as an input) called SCK.  The manual for the display says "The display is configured using SPI mode 0 (CPOL = 0, CPHA = 0), so the clock line should idle low".

If you take a quick look at the schematic for the display you'll discover that the SCK pin on the microcontroller is just connected to a solder pad for connection.

The upshot is that if you (the user) don't connect that pin to something Sparkfun aren't doing it for you.  The display glitch I was seeing was that this floating clock input would sometimes go high and the firmware for the device would then read a byte (all zeroes of course) and write to the display.

There were two possible fixes: hack the firmware so that it ignores SPI completely, or force the SCK pin low all the time.  I opted for the latter (since it was a quick fix) and connected a 10k resistor between that pin and ground.  Glitch gone.

Pity that Sparkfun didn't include a pull-down resistor on SCK (and possibly on RX as well).

Wednesday, April 18, 2012

Microsoft is holding back the secure web

Today on Hacker News there's a story about getting round the problem that SSL can only manage one host (i.e. domain name) per IP address.  If you want more than one secure web site on a machine it's going to need one IP address per web site.  The Hacker News story is about a hack that works for one cloud provider, but doesn't address the fundamental issue.

The actual solution to this problem is called Server Name Indication and allows the connecting web browser to specify the web site domain name when making an SSL/TLS connection.  It's been around in various incarnations since 2004.

Currently it's supported by all the major web servers and (almost) all the major web browsers.  But there's one important platform/browser combination that's holding back its widespread use: any version of Internet Explorer running on Windows XP.

Although Windows XP is ancient history it hasn't disappeared.  In fact, far front it.  Many large corporations standardized on Windows XP long ago and will not change operating system once they have a stable desktop setup.  Here's the global distribution of desktop operating systems worldwide from StatsCounter:

It shows Windows XP declining but still in use by over 30% of users.  Here's how Asia, Europe, North and America compare.

A secure web site cannot ignore Windows XP no matter where in the world its users come from.

Microsoft has to fix this problem.  SNI could be rolled out everywhere if they were to patch Internet Explorer on Windows XP to support SNI.  Other Windows XP browsers already support SNI, just not Internet Explorer.

Enabling the free use of SNI would greatly reduce the complexity of SSL (especially for cloud providers who could start offering SNI certificates on cloud-based IP that are typically shared across users) and allow for a more secure web.

Come on, Microsoft.  Fix it.

Monday, April 16, 2012

Getting around the London 2012 branding police

The Guardian reported the other day on the London 2012 Olympics branding police who ensure that words like London, 2012 or Games aren't being used by people who didn't pay to use them:
As well as introducing an additional layer of protection around the word "Olympics", the five-rings symbol and the Games' mottoes, the major change of the legislation is to outlaw unauthorised "association". This bars non-sponsors from employing images or wording that might suggest too close a link with the Games. Expressions likely to be considered a breach of the rules would include any two of the following list: "Games, Two Thousand and Twelve, 2012, Twenty-Twelve". 
Using one of those words with London, medals, sponsors, summer, gold, silver or bronze is another likely breach. The two-word rule is not fixed, however: an event called the "Great Exhibition 2012" was threatened with legal action last year under the Act over its use of "2012" (Locog later withdrew its objection).
And today I received a funny email from Novotel where they are forced to use euphemisms for the London 2012 Olympic Games because they are clearly not an official sponsor.  It reminded me of the wonderful world of email penis enlargement spam where filters would spot all the common terms and spammers would insert "male enhancement" and other terms to get through.

Here Novotel is forced to refer to "London's summer of sport" and "London's Big Event".  So, that's how you do it folks, think like a penis enlargement spammer and you can talk about the London 2012 Olympics all you want.

My blog is in no way associated with the Londinium 0x7DC Ολυμπιακοί Demonstrations of World Class Athletic Ability.  But if you do wish to talk about them, may I suggest the official name Londinium MMXII and hash tag #mmxii.  And here's an alternative logo representing the five interlocking benzene rings of benzopyrene:

Why benzopyrene you ask?  Well, it's because benzopyrene messes up DNA transcription (or copying).  So benzopyrene is a reminder to not copy any of the official DNA of London 2012.

Philippe Starck: the impractical designer

There's news today that famed French industrial designer Philippe Starck is working with Apple on something.  This is bad news because for all his renown and fame Starck's products are often horrible to live with.  And the one thing Apple seems to try to do is make livable products.

Take Starck's famous juicer.  I was given one as a gift.  It looks really cool.  Perhaps it makes you think of War of the Worlds, or a squid.

If you own one of these you'll know that it's the worse juicer ever. Why?


This juicer has no way to catch pips.  That means that you have to use it with something like a sieve. And the legs are very close together so you need a small bowl and a small sieve.  Compare that with a cheap, glass juicer like this.  Or in fact pretty much any juicer other than the Starck one.

Another present I received was Starck's Mangetoo cutlery.  Here's the knife:

Now this knife does cut, but what it won't do is sit on the plate. Unless you very carefully balance it on the knife blade and handle it will fall over and rattle around.  It cannot be laid down on its side otherwise it will flip over because of the shape and weight of the handle.  So, you can't at the end of the meal simply lay the knife and fork down on the plate.

And then there's Starck's bicycle designed for the city of Bordeaux:

Set aside a moment the mechanical issue of the strain that's going to be on that platform at the bottom and ask yourself two things: who needs a platform to place their feet on while cycling and doesn't something look odd with the pedals?

The platform is there apparently because you'll be able to use the bike as a scooter (a combination that no one appears to have ever thought of, undoubtedly for a good reason I fail to think of).  But the pedals are a more serious problem.

If you look at any bike you'll see a line from the seat to the crank.  On this bike the line is broken with the pedal set back (notice how the frame bends backwards): I wonder if any humans have tried cycling like that?  The reason the saddle is usually behind the crank is that when you sit your upper body weight is partly supported by the seat; if it's too far forward then you end up holding yourself up on the handlebars (which is tiring).

Goodness knows what exciting product Apple and S+arck will be able to come up with.

More support for open software in science

In the space of two months both the most famous scientific journals world-wide have published pieces arguing for open source code.

Back in February myself and two co-authors had a paper in Nature arguing for open software in science.  That paper was entitled The Case for Open Computer Programs.  Last week the US journal Science published a piece entitled Shining Light into Black Boxes arguing the same thing and giving policy recommendations.

Is it not now time for an international cooperation on defining standards for code openness and associated policies?  The Science paper lays out suggested policies and could be used as a starting point:

Saturday, April 14, 2012

Brief Plan 28 Update

Starting today people who asked to be kept informed about Plan 28 and the construction of Babbage's Analytical Engine have started to receive emails asking them to confirm subscription to the official mailing list.  People who want to join the mailing list can subscribe here.  The official Twitter account is @plan28.

Finally, Plan 28 is getting moving.

Over the next few weeks expect announcements about initial funding and the general schedule for the project.

Tuesday, April 10, 2012

Bletchley Park is Blooming

Despite the persistent drizzly rain yesterday it was clear that spring time had come to Bletchley Park in more ways than one.  The trees and flowers around the grounds were starting to blossom and bloom and inside the slightly rickety Second World War walls the museum is undergoing its own springtime.

After years of struggle to first save, then preserve and now, finally, improve this precious part of British history, the hard work by staff and volunteers is beginning to become obvious to even the most casual visitor.

By flickr user Draco2008
I've been visiting Bletchley Park for a long time and for a while it was hard to take a non-enthusiast around because the museum itself was a bit of a jumble.  BP simply didn't have the money (or spare time away from fighting for survival) to create a fantastic museum suitable for all.  But now it's really happening and it's easy to see how Bletchley Park's spring time can turn into summer.

It's easy for me to sing the praises of Bletchley Park because I'm so fascinated by the technical history of the place, but it's important to realize that Bletchley Park has something that most museums do not: the place is part of the exhibit.

Bletchley Park doesn't contain a collection of objects or stories of things that happened elsewhere.  When you walk through the front gates you are entering a time warp world.  Your first clue comes in the form of the low-rise buildings hastily constructed during the Second World War that first housed the code breakers and now house the museum itself.

For Bletchley Park is both place and museum, and unlike some stuffily preserved country house, it's full of life.  For as well as having the place and the exhibits, Bletchley Park is filled with the stories of what happened there.  And these stories are brought to life by a continuous stream of enthusiastic volunteers and veterans.

Of course, Bletchley Park is not today at the same level of sophistication as many British museums that have had years to perfect their displays and explanations (and in some cases drive out any enthusiasm that was present in their staff).

But the new things that are happening at Bletchley Park show the route to a glorious future to reflect its glorious past.  The new Alan Turing Exhibit has been deservedly nominated for the Art Fund Prize and puts the rebuilt Bombe in proper context.  Colossus has finally got a proper viewing gallery.  And the Radio Society of Great Britain has opened the National Radio Centre.

Couple that with the constant activities available (yesterday children were following the Easter Bunny around going on a children-themed visit) and Bletchley Park is becoming a great day out.  And it's easy to reach.  If you haven't visited Bletchley Park do so now before it becomes so popular that you are forced to apply for tickets on line with timed entry!

Of course, Bletchley Park isn't out of the woods yet.  Support is still needed and it still doesn't have any continuous form of government funding.  Donation information is here.

And, one specific project is looking for sponsors.  I've written before about the project to build one of Alan Turing's other inventions: Delilah.  Delilah was a secure speech system (or scrambler) that Turing worked on and thanks to the declassification of documents surrounding it, it is currently being reconstructed by the team that worked on the Bombe.  They are currently funding it out of their own pockets (to the tune of £1,000s) and are looking for sponsors (corporate or personal) to help finish the machine.  Contact me if you are interested.

Tuesday, April 03, 2012

In praise of... text files and protocols

The other night I had to debug a problem where CMYK colors specified in an OmniGraffle file weren't making it into an exported PDF (or at least appeared not to be). At first it looked like it might be a nightmare because what I really wanted to do was ignore the OmniGraffle UI and look inside the .graffle file and the PDF itself. But salvation was at hand: both .graffle and PDF are text formats. The OmniGraffle file is actually an XML document (in some cases it's a gzipped XML document but it can be decompressed with gunzip). Here, for example is part of the Colors.graffle file that's provided as a sample. It's easy to see the RGB colors that are specified and just as easy to modify them by adjusting the text file.
Yes, it's an image of text.  Just like a binary file is an 'image' of something that could have been easy to manipulate.
While fiddling around in the .graffle file looking at the CMYK colors I spotted that some straight lines that had been drawn in OmniGraffle were not quite straight. That's quite tricky to see in the UI, but dead easy in the XML document and you can simply fix the coordinates. Here, for example, are segments of a line drawn from the Nucleobases.graffle sample file:

The text format made it easy to examine what was happening under the hood of the fancy UI, to quickly fix small problems and to manipulate the file using other programs. Similarly, once I'd determined that in the file I was working with the CMYK colors were fine I exported a PDF and decompressed the result using pdftk. It was fairly easy to follow a color specification through from the .graffle file and into the PDF. Here, for example, is an RGB color specified in the Nucleobases.graffle and the corresponding color appearing in the exported PDF of the same file:

And with that I was able to determine that the CMYK colors were correct and that any problem lay with the person I was sending the PDF to.

The deeper story is that human-readable text formats are wonderful: they are easily debugged, they are easily manipulated (with text editors and other tools like awk and sed), and they can be compressed using common compression programs if space is a problem.

Similarly text based protocols (such as HTTP, IMAP, SMTP, FTP and POP3) make it easy for humans to write, read and debug. One of the things that made POPFile easy to implement was that all the mail protocols are text based (the entire POP3 proxying module is able to use simple string matching and regular expressions to handle POP3). And they are also line oriented (a command is read by reading to the line ending). That makes programs to handle them very easy to implement.

Recently I used an undocumented API that was entirely text-based (using JSON) to obtain live bus arrival times in London and make an Ambient Bus Arrival Monitor.

Another great example of a text format appears in the code that's behind Hacker News and UseTheSource.  In the Lisp philosophy your program is also data it can consume and the data about users is simply sent to a file as Arc code meaning that any admin tasks that don't (yet) have UI can be performed trivially by hand:

Of course, the downside is that text takes up extra space and for low-level protocols (such as IP) it makes sense to use binary. But for almost everything else it's best to use text. Only use binary protocols where the performance is so sensitive that it's worth the implementation and debugging downside. The upside is that no special tools are needed.

I wonder how much of the success of the Internet can be put down to the decision to use text-based protocols for almost everything that people will need to implement.  And how much we owe the early writes of the RFCs in deciding that text was best.

PS A reader points to Eric Raymond's Art of Unix Programming and specifically the chapter called Textuality.

PPS A commenter over at Hacker News makes the very good point that it's easy to version/diff text files and very hard with binary.

PPPS Another commenter over at Hacker News points out that there's a chapter in The Pragmatic Programmer called The Power of Plain Text.