## Monday, April 29, 2013

### How I coded in 1985

Back in 1985 I worked on the computerization of a machine designed to stick labels on bottles. The company that made the machines was using electromechanical controls to spool labels off a reel and onto products (such as bottles of shampoo) passing by on a conveyor. The entire thing needed to work with mm accuracy because consumers don't like labels that aren't perfectly aligned.

Unfortunately, electromechanical controls aren't as flexible as computer controls and so the company had contracted a local technical college (where I was studying electronics) to prototype computer control using a KIM-1. Another student had put together the machine with a conveyor, a mechanism for delivering the labels, control of stepper motors, and infrared sensors for detecting labels and products.

My job was to write the software in 6502 assembly. Unfortunately, there wasn't an assembler and the KIM-1 just had a hex keypad and small display. So, it meant writing the code by hand, hand assembling and typing it in.  The code looked like this:

It was immediately obvious that computer control was going to be more flexible. The program first did automatic calibration: it measured the length of labels on the spool itself, it measured the distance between labels itself and it enabled an operator to quickly set up the 'overhang' distance (how much of the label is sticking out so the product can catch onto it.

While running it could automatically detect how fast the conveyor was moving and compensate, and spot when a label was missing from the supply spool (which happened when one peeled off by accident).

Of course, writing code like this is a pain. You first had to write the code (the blue), then turn it into machine code (the red) and work out memory locations for each instruction and relative jumps. At the time I didn't own a calculator capable of doing hex so I did most of the calculations needed (such as for relative jumps in my head).

But it taught me two things: to get it right the first time and to learn to run code in my own head.  The latter has remained important to this day. I continue to run code in my head when debugging, typically I reach for the brain debugger before gdb or similar. On the KIM-1 there were only the most basic debugging functions and I built a few into the program, but most of the debugging was done by staring at the output (on the hex display), the behaviour (of the steppers) and by running the code through in my head.

Here's the full program for the curious.

PS A number of people have pointed out that in 1985 the KIM-1 was far from the state of the art and we had lots of niceties like compilers etc. True. In fact, prior to this I had been programming using BASIC and ZASM (Z80 assembler) under CP/M, but you go to war with the army you have: the technical college had a KIM-1 to spare, it had good I/O and it thus it made a fine prototyping system for an embedded controller.

## Tuesday, April 23, 2013

### The minimum coin problem (with prize)

I have in my pocket the following British coins: one £2, two £1, two 50p, three 20p, one 10p, two 5p, two 2p and two 1p. I wish to buy a copy of today's paper at a cost of £1.70.

What is the maximum number of coins I can use to pay for the paper? With the restriction that I pay in a reasonable manner (e.g. I could give exactly £1.70 or an amount greater than that but without giving extraneous coins: e.g. giving the seller all the coins would not be acceptable, but giving him one £1 an two 50ps and expecting 30p in change is OK). The reasonable manner is simply: if I give the seller a set of coins he should not be able to return any to me without the sum of the remaining coins dropping below £1.70.

Looks like it's: two 50p, three 20p, one 5p, two 2p and one 1p. That gives away 9 coins from the 15 coins I have.

The general statement of this problem is as follows.  There are $$n$$ coins in a currency: $$d_0$$ to $$d_{n-1}$$ are the denominations of the coins (normalize all to a single unit, e.g. a £2 coin would have denomination 200) where $$d_0 < d_1 < ... < d_{n-1}$$. I have $$x_0$$ to $$x_{n-1}$$ of each coin ($$x_i \ge 0$$). I wish to pay for an item costing $$P$$.

Problem 1 (giving away a maximum number of coins with the exact amount): Find amounts $$g_0$$ to $$g_{n-1}$$ where $$g_i \le x_i$$ and

$$\sum_{i=0}^{n-1} g_i \times d_i = P$$

and minimizes

$\sum_{i=0}^{n-1} (x_i - g_i)$

Problem 2 (ending up with the minimum number of coins when change is optionally given): Find amounts $$g_0$$ to $$g_{n-1}$$ where $$g_i \le x_i$$ and

$\sum_{i=0}^{n-1} g_i \times d_i \ge P$

and minimizes

$\sum_{i=0}^{n-1} (x_i - g_i) + c_i$

where $$c_i$$ are the number of coins of change given by the newspaper seller in each denomination $$d_i$$.

Bonus: take into account the weight of each coin and minimize the weight in my pocket not the number of coins.

Prize: a free copy of my book, The Geek Atlas, to be picked by me on May 15, 2013 from algorithms and implementations posted in the comments of this blog post.

## Monday, April 22, 2013

### Update on Go 1.1 crypto performance

Some time ago I integrated OpenSSL's MD5, SHA1 and RC4 functions into Go because Go's native implementations were slow. The Go team has made great strides in fixing this (for 386 and AMD64) with Go 1.1 with faster RC4, SHA1 and MD5.

Rerunning my tester program shows the following:

MD5SHA1RC4
Native Go (pre-1.1)404 MB/s123 MB/s111 MB/s
Via OpenSSL607 MB/s636 MB/s744 MB/s
Go 1.1602 MB/s383 MB/s662 MB/s
Go 1.1 vs. OpenSSL0.99x0.6x0.89x

So, the Go team has essentially reached parity with OpenSSL on MD5, is close on RC4, but could still do work to get the SHA1 performance close. (All testing of Go 1.1 was done with 84e21a8c2137 on Ubuntu 12.04 2.4Ghz Intel Core i7; 4 cores with 2GB of RAM).

As a further comparison Go's own benchmark tests shows the following.

MD5, via OpenSSL: BenchmarkHash8K (593 MB/s), BenchmarkHash8KUnaligned (581 MB/s); MD5, Go 1.1: BenchmarkHash8K (584 MB/s), BenchmarkHash8KUnaligned (589 MB/s)

SHA1, via OpenSSL: BenchmarkHash8K (606 MB/s); SHA1, Go 1.1: BenchmarkHash8K (373 MB/s)

RC4, via OpenSSL: BenchmarkRC4_8K (740 MB/s), RC4, Go 1.1: BenchmarkRC4_8K (681 MB/s).

So, that puts Go 1.1 MD5 at parity with OpenSSL, RC4 at around 90% of the speed and SHA1 at around 60%.

PS Reader Luit points to the fact that AES performance has been greatly improved as well: 6x to 7x faster than before on AMD64.

## Thursday, April 18, 2013

### The importance of open code

Last February myself, Professor Darrel Ince and Professor Les Hatton had a paper published in Nature arguing for openness in the code used for scientific papers. The paper is called The Case for Open Computer Programs.

In a coda to that piece Darrel wrote the following:
Our intent was not to criticise; indeed we have admiration for scientists who have to cope with the difficult problems we describe. One thesis of the article is that errors occur within many systems and does not arise from incompetence or lack of care. Developing a computer system is a complex process and problems will, almost invariably, occur. By providing the ability for code to be easily perused improvement will happen. This is the result detailed in both the boxes in the article: the Met Office data is more accurate, admittedly by a small amount, and because of feedback to developers the geophysical software was considerably improved.
Recently, an important paper in economics has been in the news because its conclusions turn out to be inaccurate for a number of reasons. One of those reasons is a programming error using the popular Microsoft Excel program. This error, in an unreleased spreadsheet, highlights just how easy it is to make a mistake in a 'simple' program and how closed programs make reproducing results difficult.

The original paper by Reinhart and Rogoff is Growth in a Time of Debt and it concludes the following:
[...] the relationship between government debt and real GDP growth is weak for debt/GDP ratios below a threshold of 90 percent of GDP. Above 90 percent, median growth rates fall by one percent, and average growth falls considerably more.
They point to a serious problem with growth rates once the debt/GDP ratio is above 90%. As this is an important economic topic at the moment other economists have attempted to replicated their findings from the original data. One such reproduction is Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff which finds:
Herndon, Ash and Pollin replicate Reinhart and Rogoff and find that coding errors, selective exclusion of available data, and unconventional weighting of summary statistics lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period. They find that when properly calculated, the average real GDP growth rate for countries carrying a public-debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0:1 percent as published in Reinhart and Rogo ff. That is, contrary to RR, average GDP growth at public debt/GDP ratios over 90 percent is not dramatically different than when debt/GDP ratios are lower.
The coding error referred to there is a mistake in an Excel spreadsheet that excluded data for certain countries. And the original authors have admitted that that this reproduction is correct:

On the first point, we reiterate that Herndon, Ash and Pollin accurately point out the coding error that omits several countries from the averages in figure 2.  Full stop.   HAP are on point.   The authors show our accidental omission has a fairly marginal effect on the 0-90% buckets in figure 2.  However, it leads to a notable change in the average growth rate for the over 90% debt group.
All this brought to mind my own discovery of errors in code (first error, second error) written by the Met Office. Code that was not released publicly.

There's a striking similarity between the two situations. The errors made by the Met Office and by Reinhart and Rogoff were trivial and in the same type of code. The Met Office made mistakes calculating averages, as did Reinhart and Rogoff. Here's the latter's spreadsheet with the error:

The reality of programming is that it is very easy to make mistakes like this. I'll repeat that: very easy. Professional programmers do it all the time (their defense against this type of mistake is to have suites of tests that double check what they are doing). We should expect errors like this to be occurring all the time.

What's vital is that scientists (including the dismal kind) consider their code (be it in Excel or another language) as an important product of their work. Publishing of data and code must become the norm for the simple reason that it makes spotting errors like this very, very quick.

If Herndon, Ash and Pollin had had access to the original Excel spreadsheet along with the data they would have very quickly been able to see the original authors' error. In this case Excel even highlights for you the cells involved in the average calculation. Without it they are forced to do a ground-up reproduction. In this particular case they couldn't get the same results as Reinhart and Rogoff and had to ask them for the original code.

An argument against openness in code is that bad code may propagate. I call this the 'scientists protecting other scientists from themselves' argument and believe it is a bad argument. It is certainly the case that it's possible to take existing code and copy it and in doing so copy its errors, but I believe that the net result of open code will be better science not worse. Errors like those created by the Met Office and Reinhart and Rogoff can be quickly seen and stamped out while others are reproducing their work.

A good scientist will do their own reproduction of a result (including writing new code); if they can't reproduce a result then, with open code, they can quickly find out why (if the reason is a coding error). With closed code they cannot and science is slowed.

It is vital that papers be published with data and code for the simple reason that even the best organizations and scientists make rudimentary errors in code that are hard to track down when the code is closed.

PS It's a pity that one year after the Met Office argued that for open data and code the code to reproduce CRUTEM4 is yet to be released. I hope, one day, that when papers are published the code and data will be available at the same time. We have the networking technology and storage space to do this.

## Tuesday, April 16, 2013

### PDF and eBook version of "A non-mathematical introduction to one-way functions and their use in securing passwords"

My four blog posts on this subject (starting here) have been very popular with well over 100,000 page views in four days. Because of their popularity, and because some people may wish to share the story of Alice and Bob in a different format I've put together an 11 page PDF and eBook containing and edited, illustrated and slightly expanded version of the posts.

## Monday, April 15, 2013

### Alice strikes back against Bob's 'reverse dictionary'

This is part 3 of a series of blog posts about one way functions and their use in securing passwords. If you haven't read the first two parts start here.

Now Alice is angry. She realizes that Bob has defeated her clever one way function with his reverse dictionary and that now that Bob's created it he can use it over and again. All his initial effort will be amortized day after day as he steals crossword answers from Alice.

And so Alice strikes back.

The following day she calls Bob and says that she's making a small change to her one way function. It'll still involve looking up five words in the dictionary and following the definitions, but the order will change. In fact, she'll get to pick the order.

In the original scheme the first word of the first definition was used, the second word of the second and so on. Alice proposes to verify that she knows the answer to 4D (TANGLE) by giving Bob the final word (just as before) but also five numbers indicating which word to pick from each definition. In the original scheme the order was 1,2,3,4,5 (indicating the first word from the first definition, the second word from the second definition and so on).

But today she tells Bob that the answer is HUMAN and the order is 2,2,4,5,1. So Bob has to start from TANGLE (if he knows it!) follow the second word of the first definition, the second word of the second definition, the fourth word of the third, the fifth word of the fourth and take the first word of the fifth definition (which should be HUMAN).

This renders Bob's reverse dictionary useless. It's only good for the original order; 1,2,3,4,5. Now Alice is free to pick any five numbers from 1 to 5 before working out her one way function. Bob can still easily verify that Alice knows TANGLE, as long as she gives him the order, because he too can follow the definitions from TANGLE to HUMAN. But, of course, he has to have solved TANGLE by himself!

And five numbers in the range 1 to 5 gives Alice 5 * 5 * 5 * 5 * 5 = 3,125 possible orders in which to follow definitions. That would force Bob to create 3,125 reverse dictionaries for each of the possible combinations that Alice could use. He doesn't have the room in his house for that.

And Alice is free to pick a new sequence for every word Bob wants to verify. So Bob has to either work backwards through the one way function (which is really hard/almost impossible) or go through the entire dictionary trying to find which word Alice started with.

Because Alice can pick a different order each time she wants to disguise a word she can even make the same word turn into completely different words. With the order 2,2,4,5,1 TANGLE became HUMAN, but with 1,4,5,3,2 TANGLE becomes AUTOMATIC. So, Bob won't even be able to learn from past words.

In the real, mathematical world of one way functions and passwords the extra piece of information that Alice has chosen (the numbers) is called salt. It is typically chosen randomly and either added to the password to be stored or used as a parameter to the encryption used to store the password. It need not be numbers; in fact it's often just random characters.

The password database will contain both the salt and the result of the one way function applied to the salt and password combined. It's still possible to check a password when the user types it in (the web site computes the one way function of the salt and typed in password combination and compares it with what's stored in the database).

But if the password database is stolen an attacker is frustrated because each password will have a different salt and cracking passwords will be greatly slowed down. Pre-calculated 'reverse dictionaries' or rainbow tables will be useless and each password will have to be cracked individually (without salt if two or more people have the same password it only has to be cracked once). This means an attacker has to run through all possible passwords for each entry in the database as the salt is different for each one.

The salt value that goes along with each password is not a secret. All it needs to be is long and different for each password stored in the database. Just as Alice tells Bob the salt 'order', a well designed password system will not relying on the secrecy of the salt: the strength of the system relies on the strength of the one way function (i.e. how hard it is to calculate 'backwards').

Fortunately, or unfortunately, this isn't the end of the story. Even with salt it's possible to crack passwords.

PART 4: Bob's wife Carla lends a speedy mind

PS There's actually a way round the need to create 3,125 dictionaries which is entirely separate from what Carla's going to do! Either see if you can figure this out yourself or read this comment.

## Thursday, April 11, 2013

### A non-mathematical explanation of one way functions

Alice and Bob are crossword enthusiasts. Every morning they rush to complete the Daily Telegraph cryptic crossword. One morning Alice finishes the crossword and telephones Bob to gloat. Bob challenges her by asking for the solution to 21D as proof that she's completed the entire thing.

Alice knows the answer (FOLIO) but doesn't want to tell Bob because that would give away an answer that Bob might not yet have. And she knows that Bob may ask her for more solutions as further challenges, and she really doesn't want to help him complete the crossword.

So, she proposes to Bob that they both get out their Shorter Oxford English Dictionaries. Alice proceeds as follows:

1. She finds the word FOLIO and its definition: "an individual leaf of paper or parchment, either loose as one of a series or forming part of a bound volume, which is numbered on the recto or front side only."  She finds the first word (that's not an article) from the definition, in this case INDIVIDUAL.

2. She finds the word INDIVIDUAL and its definition: "a single human being as distinct from a group". From its definition she finds the second word, in this case HUMAN.

3. She finds the word HUMAN and its definition: "relating to or characteristic of humankind". From its definition she finds the third word, in this case OR.

4. She finds the word OR and its definition: "used to link alternatives". From its definition she finds the fourth word (if the definition were shorter than four words she would have just wrapped around to the beginning of the definition counting up to four). In this case the word is ALTERNATIVES.

5. Finally, she finds ALTERNATIVES and its definition: "(of one or more things) available as another possibility or choice". The fifth word is THINGS.

She tells Bob the word THINGS. If Bob has completed 21D then he can follow the same procedure as Alice, arrive at THINGS and know that Alice had the answer to 21D all along.

But if Bob didn't know the answer to 21D (FOLIO) he'll have a very hard time figuring it out from THINGS. To do that he'd need to work backwards and find every definition in the dictionary where the fifth word is THINGS, and for each of those find the definitions with that corresponding fourth word and so on. Not only is this procedure very slow (Bob would do better to solve the crossword) it will result in a number of possibilities for the solution to 21D as paths through the dictionary from different words will overlap.

So Alice's dictionary procedure is a 'one way function': it's easy to go from a word to another in one direction, but very hard to do so in the opposite.  Such one way functions are widespread in computer security.

For example, one class of one way functions is often used for password verification. A web site need not store your actual password, just a one way function of it. That way if the database of stored passwords is stolen it's hard for an attacker to retrieve the actual passwords (just as Bob would have a hard time going from THINGS to FOLIO), but in day to day use it's easy to verify a password by calculating a one way function of what you type in (just as how Bob can easily go from FOLIO to THINGS) and comparing the result with what's stored in the database.

Of course, computers are much too fast and capable to use actual dictionaries: they use mathematical functions that are easy for computers to calculate in one direction, but difficult for them to compute in the other. Nevertheless the principle is the same.

PS In the algorithm above Alice proceeds through the dictionary looking up five words. There's nothing magical about the number five. In fact, she could just look up the first word found in the definition of FOLIO and give that to Bob (INDIVIDUAL). But following a deeper chain of words makes any attempt by Bob to work backwards harder. I picked five 'rounds' of word look ups to illustrate the principle, but any other number could be chosen. And in the real world of mathematical one way functions it's not uncommon to use multiple 'rounds' to increase the difficulty of reversing the function.

PART 2: Bob outsmarts Alice's one way function

This entire series of blog posts is available for purchase as an illustrated PDF or eBook for \$1.99.

## Thursday, April 04, 2013

### lulip: a line-level profiler for LuaJIT

If you are doing profiling of code written in Lua and running in LuaJIT it can be useful to know where your code is 'hot'. Using the debug.sethook I've put together a small profiler that outputs an HTML page showing execution times on a per-line basis.  It's called lulip.

Usage is simple:
And it creates a HTML showing the lines that consume the most time with the total time in ms, the number of times the line was executed and the line itself colorized.

For example,

file:linecount elapsed (ms)line
wr.lua:11292822.455 hash = ngx_sha1_bin(value)
wr.lua:1172428470.849 captures, err = ngx_re_match(v, p)
wr.lua:11973762207.487 x = string_find(v, f)
wr.lua:212157154.386 string_gsub(v, "//([^/]+)//", "%1")
wr.lua:1196378887.475 for i=1,g() do
wr.lua:1158156352.906 if not f() then

LuaJIT is required because I used the FFI interface to get interface to the gettimeofday function to get microsecond timing. LuaJIT makes this trivial to do:

## Wednesday, April 03, 2013

### Performance of array creation in Lua

The Lua 5.1 Optimization Notes say: "Short inline expressions can be faster than function calls. t[#t+1] = 0 is faster than table.insert(t, 0)." But the question is what's the fastest way to insert a number of items into a table (as an array).

Naive base test case (naive.lua): using Lua 5.2 is takes (average of 5 runs) 1718ms.

Make table.insert into a local (local.lua): using Lua 5.2 is takes (average of 5 runs) 1408ms (vs. base 82.0%).

Following the advice from the optimization tips (optimized.lua): using Lua 5.2 is takes (average of 5 runs) 1180ms (vs. base 68.7%)

Using a simple implementation with a counter (counter.lua): using Lua 5.2 is takes (average of 5 runs) 193ms (vs. base 11.2%)

So, the rather simplistic implementation with a counter is almost 9x faster than the naive implementation and it's 6x faster than the advice from the optimization page.

Moral: it pays to measure.

Switching to the most recent LuaJIT gives the following times: naive base case: 845ms, table.insert is local: 836ms, following the advice: 837ms, simple implementation with counter: 55ms. So, the simple implementation is 15x faster than any other implementation.