Wednesday, May 16, 2012

To boldly Go where Node man has gone before

With all the chatter about how uber-amazing Node.js is I figured I'd do a little comparison with my favorite language du jour: Go.  Node's claim is that it's "a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications."

So, easy to build; fast; scalable.

Here's the canonical Node program for Hello, World from the Node home page.

var http = require('http');
http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
}).listen(1337, '127.0.0.1');

console.log('Server running at http://127.0.0.1:1337/');

And here's the equivalent program written in Go. It's a little longer because Go insists on explicitly importing the things you use and has a little more boilerplate (such as having a func main()).
package main

import (
 "net/http"
 "log"
 "fmt"
)

func main() {
 http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "text/plain")
  fmt.Fprintf(w, "Hello, World\r\n")
 })

 log.Printf("Server running at http://127.0.0.1:1337/")
 http.ListenAndServe("127.0.0.1:1337", nil)
}

So, in terms of 'easy to build' there's no clear winner. Node is a little more compact, but the core functionality is the same: start a server and do a callback when a connection is made.

So, then there's 'fast' and 'scalable'.  To test those I used ab on Ubuntu on a MacBook Pro with 8GB of RAM.  Here are the results.

First test was ab -n 1000000 (i.e. 1,000,000 requests):

LanguageElapsed time (seconds)Requests/secondms per requestTransfer rate (KBps)Peak real memory (KB)Peak virtual memory (KB)
Go137.5427270.510.138681.614,120145,308
Node200.3414989.260.200370.3049,258638,700

The second test was ab -n 1000000 -c 100 (i.e. 1,000,000 requests with 100 simultaneously)

LanguageElapsed time (seconds)Requests/secondms per requestTransfer rate (KBps)Peak real memory (KB)Peak virtual memory (KB)
Go141.8247051.020.142661.0021,684902,884
Node177.4725634.680.177418.2050,724643,912

So, Node was always slower than Go and (almost always) used more memory.   The only time Go was 'worse' than Node was in virtual memory usage in the second test.

I'm unimpressed by Node.  Go's approach (here it is spawning a goroutine per connection) is much simpler from a programming perspective and more performant.  The code handling the connection doesn't have to be concerned about blocking/non-blocking calls or whether something is asynchronous.  You just write the code to handle that particular URL.

PS I should add that I did these tests in a Ubuntu VM which was restricted to running on a single processor core.  That was done so that any advantage Go would get because it can inherently use multiple cores would be eliminated.  Bottom line is that Go is faster, and easy to write.

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

Thursday, May 10, 2012

Simonoids. It's 'Simon' in an Altoids can powered by Arduino

The classic game Simon came out in 1978 and used a microcontroller (one of the first) at its heart: the TMS 1000.  The circuit for Simon is pretty simple and can be seen in the patent filing #4,207,087.  Its simplicity makes it ideal for implementation as a hobby project with a microcontroller such as Arduino.

My Simonoids is an Arduino-based Simon game that fits inside an Altoids can:


The four illuminated keys are a SparkFun 2x2 Button Pad and the associated PCB fitted with four different colored LEDs.  Each LED is connected to a single Arduino digital pin with a 330 ohm resistor to limited the current passing through it to about 10mA.

The four LEDs are soldered into the PCB with the cathodes tied together, the anodes go to the Arduino via the 330 ohm resistors.  The four buttons are also tied together and jumpers are in place in the middle of the board to wire them correctly.  In total the PCB uses 4 digital pins for the LEDs and four digital pins for the buttons.  The internal pull-up resistors on the Arduino are used to ensure that the buttons are pulled high and only pulled low when a button is pressed.


The button/LED PCB and the Arduino Pro are linked together with connecting wires and fit together as a sandwich that will just fit inside the Altoids can.  I superglued some little rubber feet to the bottom of the button/LED PCB to prevent it from being squashed onto the Arduino Pro.



The final parts are a battery connector, on/off switch and a piezo buzzer connected to another Arduino digital pin.  Setting the digital pin to 1/0 with the right frequency will generate a square wave.  The frequencies are chosen to be the same as those used on the original Simon (G below middle C, middle C, A above middle C, G above middle C).

The button pad is kept in place with a rubber band around the boards; it keeps the sandwich together.


The sandwich is placed inside the Arduino can and screwed into place through holes in the Arduino Pro.  The power switch gets its own hole and the key pad fits through holes drilled and then carefully bent to shape using pliers.  To help make the large holes I first made a cardboard template and stuck it to the can for drilling out:


To make programming the game easy I position the Arduino Pro so that the I/O port header points into the space where the battery goes.  That makes it possible to insert an FTDI cable and program the game without unscrewing the boards.


Once programmed (the code I wrote is in my Simonoids repository on Github) it's just a matter of connecting the battery.


And close the lid and you've got a portable, home-made Simon game.


And, finally, here's a short video of it in action:


The game has four possible lengths of play: 8, 14, 20 and 31 flashes.  These are selected by pressing one of the four buttons (8 = TL, 14 = TR, 20 = BL, 31 = BR).  In the video I pressed the top left button to get a game of 8 flashes: the first time around I failed, the second time I succeeded.  

My other Arduino-based projects are: Cansole (video games console in a can), programmable 7x7 Color Display (from a set of hacked Christmas lights), and GAGA-1 (high-altitude balloon).  If you're into this sort of thing then you might also enjoy the Ambibus (an ambient 'next bus' monitor in a model bus).

PS While writing this blog post I discovered that SparkFun sells a Simon Kit for those who like to start with a complete set of components. 

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

Thursday, May 03, 2012

Patching the Internet

When CloudFlare approached me about joining the company there was one thing that really stood out about the potential for their service: the ability to 'patch the Internet'.

CloudFlare sits between people's browsers and the web servers they are trying to reach.  All the traffic (DNS, HTTP, and HTTPS) passes through the CloudFlare network.  This blog post was served up (and protected and accelerated) by CloudFlare.

But as the traffic passes through CloudFlare it's possible to modify it, and that opens up huge potential for fixing Internet problems on an enormous scale.

Today, CloudFlare has rolled out a service that informs people that they've been infected by the nasty DNSChanger malware.  This makes sense for CloudFlare to do because so many of the web's users touch CloudFlare sites every month.  In this case CloudFlare is helping to protect end-users, just as it protects web sites.

And this sort of virtual patching can come anywhere in the network stack from fixing DDoS attacks, to filtering out an Apache Range vulnerability, to deleting hashing attacks, to killing SQL injections.  As new attacks arise we are able to, for our users, 'patch the Internet'.

Patching allows us to do other things like insert any service automatically across a web site (such as adding web analytics), to filter out private information (such as an email address) if the visitor might be malicious, or simply insert a message notifying visitors of, for example, an upcoming service disruption.

It also lets us do things like add SSL quickly to site, enable IPv6 even when the site is on IPv4 only and will, soon, allow us to turn on new protocols like SPDY even when the actual web site only supports HTTP.

The potential for this two way patching is very large and we've recently announced a developer program to let people build their own apps that can be installed with a single click of an On button in the CloudFlare UI.

I'd be interested in hearing from people about ideas on how best to 'patch the Internet'.  I'll personally send a signed copy of The Geek Atlas to the person with the best idea.

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

Wednesday, May 02, 2012

Reverse authentication for banking

A persistent problem with retail banks is that they phone you and then ask you for information.  A common scenario is that the bank's fraud department calls because of a suspicious debit or credit card transaction.  What follows is, from a security perspective, dangerous.

Either the bank just assumes that you'll accept that they really are your bank (and not some random person trying to get private information from you) or they'll go through some weak authentication (such as telling me half my postcode).  Sometimes they have the audacity to call me and then ask me to prove that I am me even though they just called me.

All this ridiculous nonsense can be fixed by use of the two factor authentication tokens that banks are now giving out.  In the UK Barclays has PINSentry, HSBC has Secure Key, NatWest has Card-Reader.  These tokens are usually used for logging in to online banking or authorizing a transaction, i.e. they are used so that you can prove to your bank that you are you.

But they can be used the other way around.

Imagine the phone ringing in your home:

Caller: Hello, Mr Foo it's Barclays Fraud Department calling.  We need to ask you about a transaction on your account?

You: OK

Caller: Do you have your PINSentry handy? I'd like to use it to prove that this is Barclays calling.

You: Yes, I have it right here.

Caller: Please switch it on.  A six digit number will appear on the screen.  I'm going to tell you the first three digits.

You: OK, it's on.

Caller: The first three digits are 4 7 2.  You should be able to see it on the screen.  That proves that this really is Barclays calling as only we would be able to predict the next three digits.

You: Yes, I see that.

Caller: And can you tell me the other three digits?  That way I'll know you really are Mr. Foo.

You: Yes, it reads 4 9 7.

Caller: Great.  Let's talk about the transaction our system has flagged...

With a simple conversation like that you've proved that you are you, and the bank has proved that they are who they say they are. Additional levels of authentication can be added (such as asking for personal information), but the key is that the two factor device contains a secret shared between your bank and you.

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

Monday, April 30, 2012

Make your own 'prime factorization' diagram

The Prime Factorization Sweater is a lovely idea and I thought it would be fun to reproduce the same idea electronically so that I could print out a poster version for home.

Enter Processing.

With it I've developed a small program that produces a diagram of the first 100 numbers and for each number there's a circle broken up into arcs.  Each arc is a prime factor.  As in the original sweater each factor gets a unique color (assigning unique colors is rather complex and I ended up using the color difference method based on CMC l:c and a nice online tool that does the work for you).

Here's the finished product.  The top left corner is the number 1 and the numbers read right to left.  So the first red circle is a prime number (2), the second the next number (3, which is prime) and so on.


There's also an option to print the numbers involved.

The source code is in the pfd repository on GitHub and licensed under GPLv2. Processing is a really nice environment for this sort of rapid hacking of anything graphical. See, for example, how I used it to visualize Ikea Lillabo Train Set layouts.

PS After encouragement in the comments from the person who had the original idea for the prime factorization sweater I've made a CafePress store in which you can buy men's and women's T-shirts printed with the prime factorization diagram.


If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

Friday, April 27, 2012

tacoli: a simple logging format

A post on Hacker News entitled Log Everything As JSON. Make Your Life Easier reminded me of my private logging strategy which has the following properties:

1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc

2. Easy to parse and analyze in code using Perl, Ruby, or Go

3. Compact

4. Easily expandable and lacking the ambiguity of simple delimited log formats

I call it tacoli (which stands for Tabs, Colons and Lines).  Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.

That's it.  Here's an example log line from Apache in this format:

22/Apr/2012:06:29:07 +0000      apache  ip: 18.12.25.55 method: GET     uri: /example.html code:301        size:305        referer: http://blog.jgc.org/2009/08/geek-weekend-day-1-bletchley-park.html        agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162

Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat.  No special logger module required.  In fact, anything that can 'printf' a string can create tacoli lines trivially.

It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value.  No specialized JSON (or other parser) required.

It's trivial to extend without breaking any tools.  Just add a new field (anywhere on the line) with a new key.

It's simple to work with using Unix tools.  Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).

If you want to extract a single field from each line of the log file then it's easy to do with grep.  Here's an example that extracts all the lines that have an ip entry and just extracts that

grep -Po "\tip: [^\t]+" access.log

The key name can be trivially removed using cut

grep -Po "\tip: [^\t]+" access.log | cut -d: -f2-

and the output can be fed into the other Unix tools.  Also, if you know that your log file format hasn't changed you can still use the positional information to simplify parsing and fall back to cut.
It isn't quite as compact as a log file format that only uses position to indicate meaning, but compression largely overcomes that problem and key names can be chosen to be short and unique.

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.

The Greatest Machine That Never Was

I was invited to talk at TEDx Imperial College and gave a talk about Charles Babbage's Analytical Engine called The Greatest Machine That Never Was. Here's the video of that talk:


All the other talks are here. The project to build the Analytical Engine is Plan 28.

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available. Looking for a new job? Try UseTheSource.