Friday, December 12, 2008

POPFile v1.1.0 Released

There's a new POPFile out (v1.1.0) and I had almost nothing to do with it. This is the first release where the new (global) POPFile Core Team did all the work. Thanks Brian in the UK, Joseph in the US, Manni in Germany and Naoki in Japan. A truly global effort.

As part of the v1.1.0 release POPFile has moved from SourceForge to its own server and has a totally new web site.

v1.1.0 also includes some great new features: it is the first to use a
SQLite 3.x database and it is the first to offer a Mac OS X installer in addition
to the usual cross-platform and Windows installer versions.

And there are a raft of bug fixes as well which you can read about in the release notes.

Spaces are a pain in painless non-recursive Make

In my book GNU Make Unleashed I published a pattern for doing Make without having a recursive descent into directories. It works well and I know that many people are using it.

But the other day I received an email from Terry V. Bush at VMWare saying that he had trouble with it because of 'the third-party problem'. The third-party problem is my name for the problem that occurs when your beautifully written Make system has to incorporate some wart of source code from a third-party vendor. In Terry's case that third-party has spaces in the path names.

Space is path names are a real bind in Make (that's another topic I cover in GNU Make Unleashed) and Terry really wanted to use my non-recursive Make pattern but needed to handle this ugly third-party.

I'll let him continue the story...

What happens is that if you have a directory name with a space in it your functions fail to find the root. Also, they always walk the entire tree up to the top even after they have found the root of the tree. Here is the version published in "GNU Make Unleashed":

sp :=
sp +=

_walk = $(if $1,$(wildcard /$(subst $(sp),/,$1)/$2) \
$(call _walk,$(wordlist 2,$(words $1),x $1),$2))

_find = $(firstword $(call _walk,$(strip $(subst /, ,$1)),$2))
_ROOT := $(patsubst %/root.mak,%,$(call _find,$(CURDIR),root.mak))

What I have done to solve these two issues is:

1: Add an "if" that returns when the root is found. This actually makes other parts of this function simpler. It also makes is slightly faster, albeit very slightly...

2: Substituted a "|" char (any char that is highly unlikely to be in a real directory name will work) for each space in the path and then put the spaces back when necessary.

Also, to simplify things a little, I added an eval that puts the result of wildcard into a temp var "_X" so that returning it when found is trivial.

sp :=
sp +=
_walk = $(if $1, \
$(if $(eval _X=$(wildcard /$(subst |,\$(sp),$(subst \
$(sp),/,$1))/$2))$(_X),$(_X), \
$(call _walk,$(wordlist 2,$(words $1),x $1),$2)))
_find = $(call _walk,$(strip $(subst /,$(sp),$(subst $(sp),|,$1))),$2)
_ROOT := $(patsubst %/root.mak,%,$(call _find,$(CURDIR),root.mak))

My plan for this is to combine your "Painless non-recursive Make" with Paul D. Smith's "Advanced Auto-Dependency Generation" Make code to produce a fast and extensible Make environment for products at VMware.


Nice, and "Advanced Auto-Dependency Generation" is also covered in GNU Make Unleashed.

Friday, September 12, 2008

The Ultimate Nerd Honeymoon

There's been a major gap in this blog because I'm in the midst of writing a book for O'Reilly. As part of the research on the book I came across the ultimate nerd honeymoon.

In 1812, Sir Humphry Davy, the British chemist and inventor who is best remembered today for Davy Lamp used in mines, married.

In October 1813 Davy and his wife set off on a honeymoon across Europe. First stop was Paris to pick up a medal from Napoleon. But Davy needed a valet to help out, so he took Michael Faraday with him. That way Davy and Faraday could perform experiments along the way.

While in Paris they got together with Joseph Louis Gay-Lussac (of Gay-Lussac's Law) and showed that iodine was an element. And André-Marie Ampère stopped by for a chat.

Off they went to Italy to hang out with Alessandro Volta and also did an experiment setting fire to a diamond using the sun's rays and demonstrated that a diamond is made of carbon.

The honeymoon lasted 18 months.

Sunday, June 29, 2008

Advice to a young programmer

I received a mail from an acquaintance who'd come to the realization that his 13-year-old wanted to be programmer, specifically a games programmer. Here's the advice I gave. Perhaps others have things to add:

1. I'm tempted to tell you that the right way to learn to be a programmer is to start with LISP, or the lambda calculus, or even denotational semantics but you can come back to those after a few years getting your feet wet.

2. Lots of programming involves logic (or at least thinking logically) so learning about and enjoying logic is probably a good foundation. You could start by learning about boolean algebra since it's simple and fun and the basis for a lot of what computers do.

3. Since games programmer involves a lot of physics, you should also learn about Newton's Three Laws and Universal Gravitation and play around with things like springs and pendulums.

4. Basic trigonmetry is important to the games programmer. It'll be handy to know about Pythagoras and the relationship with sin, cos and tan.

5. Above all, start with a programming language and a good book and commence hacking: try stuff out, make little simple programs (even if it's a program that prints out "Hello" on the screen, or a program that prints out "Hello" ten times, or asks you for the number of times to print "Hello" and then does it). Just write code, whatever takes your fancy.

6. A good starting language is Python. Get the O'Reilly book Learning Python.

7. Python is dynamic so you'll be able to make progress very quickly, but for games programming you are probably going to need to get a little closer to the machine. And for that you should learn C by reading the classic The C Programming Language.

8. As you learn more there are some great books that will expand on what you can do: read Programming Pearls and The Practice of Programming. Think about getting: Algorithms in C. Read Structure and Interpretation of Computer Programs.

9. Also: avoid debuggers, learn to unit test. Debuggers are useful in limited circumstances, most code can be debugged by using your head and a few 'print's. Unit tests will save your life as you go forward.

10. When you are ready, try to write a version of the first ever computer game: Spacewar!

...

11. When your first company goes public think of me; I'll be an old man and probably won't have saved enough for retirement.

Friday, June 06, 2008

The Colarie: A new way of measuring calorie intake

Recommended daily energy intake for a man is generally considered to be roughly 2,500 Calories (or kilocalories: 1 Calorie = 1,000 calories) and for a woman it's 2,000. The problem with those figures is that they are rather abstract. If you are trying to count your energy intake it would be much easier to deal with something smaller and easier to understand.

Hence my idea for the Colarie.

1 Colarie is the number of Calories in a single can of non-diet Coca Cola. It's easy to appreciate that a single can of Coke isn't very good for you and so comparing a food stuff to a can of Coke is an easy measure of whether you are eating something that's got too much fat or sugar in it.

The actual Calorie count for a Coke can varies by country. In France there are 139 Calories in a can, in the US there are 155. So I've settled on 147 as a good measure. So 1 Colarie = 147 Calories.

That means a man needs to consume the equivalent of 17 cans of Coke per day; for a woman it's 13.5 cans of Coke per day. That isn't a recommended diet, however!

So next time you are faced with a snack bar, use the Colarie measure. Just the other day I was presented with a small biscuit to go with a cup of tea on a BA flight. Looking at the Calorie count it was around 230 Calories for this tiny biscuit. That's 1.5 Colaries!

I didn't eat it.

Thursday, June 05, 2008

GNU Make Unleashed release

For 4 years I've written the Ask Mr Make column over at CM Crossroads (and I continue to write it). Since there's been great interest in the column, I've put together all 4 years of columns plus additional unpublished material as a book and ebook.

All the material has been rechecked for accuracy, errata have been incorporated and the text re-edited. The result is a 230 page book covering everything from basics of GNU Make to advanced topics like eliminating recursive make, doing arithmetic in GNU Make or dealing with spaces in file names.





The book contains 43 separate articles about GNU Make, plus a complete reference to the GNU Make Standard Library.

You can buy a copy in either form here.

A big thank you to everyone who's commented, emailed, or made suggestions on my GNU Make articles over the years.

Monday, May 26, 2008

POPFile v1.0.1 released plus a glimpse of the future

POPFile v1.0.1 was released today; this is the first ever POPFile release that I didn't do. POPFile is now being managed by a core team of developers: Manni Heumann (in Germany), Brian Smith (in the UK), me (in France), Joseph Connors (in the US) and Naoki Iimura (in Japan). A truly international effort. The actual release binaries were built by Brian Smith who, for a long time, has been the installer guru.

This release contains minor feature improvements and a number of bug fixes. Some of the bugs fixes were for annoying bugs that showed up only occasionally: that makes it a worthwhile upgrade.

Since I pulled back from being involved in every detail of POPFile's evolution the core team has been liberated to work on the project. v1.0.1 is their first release, and it is minor, but much greater things are coming:

1. A native Mac installer

2. A SOHO version of POPFile. Some time ago I did most, but not all, of the work to make a multi-user version of POPFile. That work is being completed by the core team and will allow a single POPFile installation to be shared by multiple users.

Thank you to the POPFile Core Team for this great start to a new chapter in POPFile history.

Saturday, May 17, 2008

Breaking the Fermilab Code

A story appeared on Slashdot about a mysterious fax received at Fermilab written in an unknown code. The full story is here. I looked at it and immediately noticed a few things:

1. The first part looked like ternary (base 3) with digits 1 (|), 2(||) and 3(|||).

2. The last part looked like binary with digits 1(|) and 2(||)

3. The middle bit looked like either a weird substitution code, or I wondered if it might be machine code.

4. In the last part the digit 2 (||) never occurs more than once, perhaps it was actually a separator and the last part is not binary.

The first step was to convert the bars into numbers. Here's a copy of my marked up print out:



The first part has the numbers (or at least I thought):

323233331112132
333231322123312
111331132312233
333212123213113
311333313331111
211333323232211
232313331121231
33231312

Noticing this had 113 digits (which is a prime number) I went off on a wild goose chase around primes, and then around the interpretation of this number in hexadecimal as a string in ASCII, Unicode or binary... waste of time.

Then I started thinking about ternary again and wrote down the largest ternary numbers that can be expressed with 1, 2, 3, ... digits:

23 = 210
223 = 810
2223 = 2610
22223 = 8010

One of those stood out: with three digits the maximum number is 26 and there are 26 letters in the alphabet! Then the only question was was how to map the three digits used in the code (1, 2, 3) to the three ternary digits (0, 1, 2).

To simplify things I wrote a small Perl program that tries out all the possible mappings and outputs the ternary interpreted as a string (with 001 = A, etc.):

use strict;
use warnings;

my $top = $ARGV[0];

$top =~ tr/321/abc/;

my @chunks;

while ( $top =~ s/^([abc]{3})// ) {
push @chunks, $1;
}

my @digits = ( '0', '1', '2' );

foreach my $d0 (@digits) {
foreach my $d1 (grep {!/$d0/} @digits) {
foreach my $d2 (grep {!/[$d0$d1]/} @digits) {
print "($d0$d1$d2) ";
foreach my $c (@chunks) {
my $v = 0;
my $m = 1;
foreach my $d (reverse split( //, $c )) {
$d =~ s/a/$d0/;
$d =~ s/b/$d1/;
$d =~ s/c/$d2/;
$v += $d * $m;
$m *= 3;
}
print chr( 64 + $v );
}
print "\n";
}
}
}

With my initial interpretation of the top part of the coded message I got the following output:

(012) [email protected]@[email protected]@CJQJFBWKAF
(021) [email protected]@[email protected]@FTVTCAPSBC
(102) JDNXUMEISOZNUODMFSGYQMPNZHMJCHCPNTELP
(120) [email protected]@RMPWRWJLFUNJ
(201) THYLOZGRKUMYOUHZCKENVZWYMDZTFDFWYJGXW
(210) [email protected]@IZWPIPTXCOYT

A ha! The 021 block (which corresponds to the mapping 3 -> 0, 2 -> 2, 1 -> 1) seems to have a partial message: [email protected]@WOULD and then it's garbage. Going back to the original message I realized that 113 is not divisible by three and that I'd either missed a symbol, or had two too many.

After much fiddling around I discovered that the correct interpretation of the top block is that two of the threes are wrapped from one line to another (there appears to me some indentation in the message that indicates this, take a look at the original, but this could be just random).

323 233 331 112 132
333 231 322 123 312
111 331 132 312 233
333 212 123 213 113
311 333 313 331 113
113 333 232 322 133
231 333 112 123 133
231 312

Rerunning my Perl program output the full message:

(012) [email protected]@[email protected]@[email protected]
(021) [email protected]@[email protected]@[email protected]
(102) JDNXUMEISOZNUODMFSGYQMPNYYMCIVEMXSVEO
(120) [email protected]
(201) THYLOZGRKUMYOUHZCKENVZWYNNZFRQGZLKQGU
(210) [email protected]

So much for the first part. The second part took me off into Z-80, 6502 and 6809 machine code wondering if it was a program and then nowhere. I still don't understand what this part is trying to say.

The third part looked initially like binary but on closer examination I decided that the 2s (||) were actually separators and the message should be interpreted as number separated by 2s by counting the 1s (|). That yields:

31211112111312
32213123123331
12213111332312
23333333233123
12313123332311
33223232312312
112

(Once again there was a wrapping 'problem' in the message where a run of 8 |s was actually 3 |s then 1 || and 3 more |s.) Using the little Perl program reveals:

(012) [email protected]@[email protected]
(021) [email protected]@[email protected]
(102) OZTYSBOOMXGZLODMLNEEOMEVACOOX
(120) [email protected]@NKVMNLUUKMUDYWKKB
(201) UMJNKAUUZLEMXUHZXYGGUZGQBFUUL
(210) [email protected]@YSQZYXOOSZOHNPSSA

So, the same mapping between digits is used.

That leaves some final questions:

1. Who is Frank Shoemaker?
2. Why is base spelt incorrectly?
3. Is the extra S in BASSE a reference to the middle section where three symbols start with S.
4. If #3 is correct, then those three symbols could be intepreted as FC16 which is 252. Could this be the employee number of the author?
5. Why is the letter A missing from the middle section when all the other hexadecimal digits are there?

Thursday, May 01, 2008

The Spammers' Compendium finds a new home

Shortly after I announced that I was getting out of anti-spam the folks at Virus Bulletin contacted me about taking over The Spammers' Compendium. I was delighted.

Today the transfer is complete and the new home is here. It will be maintained and updated by Virus Bulletin. Please send submissions to them.

Monday, March 31, 2008

Multi-route (email and phone) self-aware phishing

Today, I received the following email:

This communication was sent to safeguard your account against any
unauthorized activity.

Max Federal Credit Union is aware of new phishing e-mails
that are circulating. These e-mails request consumers to click
a link due to a compromise of a credit card account.

You should not respond to this message.

For your security we have deactivate your card.

How to activate your card

Call +1 (800)-xxx-9629

Our automated system allows you to quickly activate your card

Card activation will take approximately one minute to complete.


Of course, I don't have an account with Max Federal Credit Union and this is obviously a phish. Notice that the English is quite right:

"For your security we have deactivate your card." and "You should not respond to this message." doesn't make sense in context.

What's more interesting is that the message itself warns you about phishing emails and asks you to call an 800 number.

If you call the 800 number an electronic voice reminds you again to never give your PIN, password or SSN in email and then proceeds to ask you for the card number, PIN, expiry date and CVV2. The assumption is that you've been warned twice not to do something in email, so it's OK by phone.

It's painful to see the phisher use the existence of phishing as a way to phish.

Saturday, March 22, 2008

Building a temperature probe for the OLPC XO-1 laptop

I bought an OLPC XO-1 laptop through the G1G1 program and was intrigued to discover the Measure activity.

The measure activity uses the internal audio system to measure a value input on the microphone socket. With nothing connected this application reads the value of the internal microphone and displays a waveform. You can have fun just by whistling, speaking or singing with Measure running.

But since you can measure a voltage input into the microphone socket, it's possible to build sensors and connect them to th OLPC XO-1. On the Measure web site they mention building a simple temperature sensor using an LM35 temperature sensor that looks like this:

The LM35 can measure a temperature between 0 and 155 Celsius just by hooking it up to a 5v supply. It outputs 10mv per degree so a temperature of 20 Celsius corresponds to 0.200v.

Since the OLPC XO-1 has a USB port it's possible to get 5v from the laptop by hacking a USB connector, and connect 5v to the LM35 and then take the signal coming from the LM35 (the middle pin) and connect it to the microphone socket.

I did this by building two parts: a generic adapter which gives me 5v and a signal line out of a standard stereo 3.5mm jack:

The stereo jack is wired up so that the tip is +5v, the base is Gnd and the middle is the signal going to the microphone socket. The USB plug has only two wires connected (for +5v and Gnd), and the jack going to the microphone socket (which is mono) has the connected to the middle of the stereo jack, and the base is Gnd. All the grounds are joined together.

When plugged into the OLPC XO-1 it creates a generic connector for any other projects I might work on:

For the temperature sensor I simply connected the LM35 to a stereo socket with the correct connections to match up with the stereo jack plug. Then I created a probe with an old plastic pen and some waterproofing compound (so that I can do things like shove the probe in a cup of coffee without wetting the contacts on the LM35). Here it is:

Connect the two together and run the standard Measure activity and you can start to look at the output of the sensor and hence the temperature.

But there's a problem. The microphone input can only handle voltages in the range 0.3v to 1.9v (and my measurements of my OLPC XO-1 show this range to actually be 0.4v to 1.9v). So that means as is the probe can be used to measure temperatures in the range 40 Celsius to 155 Celsius. That low end is a bit high for the sorts of experimentation you can do at home (e.g. measure the temperature in the fridge, or a glass of cold water, or even the temperature inside your mouth).

So we need to scale the voltages coming from the sensor to fit better into the range that's readable by the laptop. The standard way to do that is with an operational amplifier which is used to add two voltages together: the voltage coming from the sensor and a reference voltage. Doing this will move the voltage up.

For that I used the LM1458 which in a single 8 pin package contains a pair of operational amplifiers.

Here's the circuit diagram:

The circuit has three parts: a voltage divider, a summing amplifier and an inverting amplifier.

Voltage divider: the reference voltage is created by taking the 5v available from the USB port and passing it through resistors R8 and R9. The voltage at the middle point of these two resistors is determined by the standard formula for a voltage divider of 5v * R9/(R8 + R9) = 5v * 1 / ( 10 + 1 ) = 0.45v. In my actual circuit with 1% tolerance resistors the measured voltage was 0.41v.

Summing amplifier: the middle portion of the circuit takes the two inputs and adds them together (and because of the nature of the circuit inverts the summed value). So its output going into R7 is -ve the sum of the reference voltage and the sensor voltage.

Inverting amplifier: the final part just inverts the voltage so that the output is +ve and in the range that the OLPC XO-1 can read.

One complexity is that this circuit requires +9v, Gnd and -9v to operate. I obtain that with a pair of 9v batteries linked together giving Gnd where the two are connected. Here's the final circuit with appropriate connectors to hook up to my existing probe and laptop adapter:



And here's what it looks like when it's all hooked together:

Now, this wouldn't be any fun without a bit of software and since the Measure activity can only display the voltage being presented (which is now a mixture of the sensor voltage and the reference voltage) what's needed as a new activity.

I found the developer documentation to be very hard to follow and I ended up hacking the existing Measure activity and renaming it Temperature.

The critical code is in the file drawWaveform.py where it reads self.avg (the value coming from the microphone input via the ADC) and scale it for display. I measured voltages coming from my probe for a couple of known temperatures and worked out a scale factor (The +32768 is because the self.avg ranges from -32768 to 32767):

layout.set_text("Temperature: %.1f C" % (0.00221833*(self.avg+32768)) )

Here's a screenshot of Temperature running on the laptop and measuring the ambient temperature in my office:

You can download my Temperature activity using the browser on your OLPC XO-1 to install it.

Thursday, February 14, 2008

The sum of the first n odd numbers is always a square

I was staring at the checked pattern on the back of an airline seat the other day when I suddenly saw that the sum of the first n odd numbers is always a square. For example,

1
1 + 3 = 4
1 + 3 + 5 = 9
1 + 3 + 5 + 7 = 16

And, of course, it occurred to me that it would be nice to be able to prove it. There are lots of ways to do that. Firstly, this is just the sum of an arithmetic progression starting at a = 1 with a difference of d = 2. So the standard formula gives us:

sum_odd(n) = n(2a + (n-1)d)/2
= n(2 + (n-1)2)/2
= n(1 + n - 1)
= n^2

So, the sum of the first n odd numbers is n^2.

But using standard formulae is annoying, so how about trying a little induction.

sum_odd(1) = 1

sum_odd(n+1) = sum_odd(n) + (2n + 1)
= n^2 + 2n + 1
= (n+1)^2

But back to the airline seat. Here's what I saw (I added the numbering, Lufthansa isn't kind enough to do that for you :-):



The other thing I noticed was this:



You can view the square as the sum of two simpler progressions (the sum of the first n numbers and the sum of the first n-1 numbers):

1 + 3 + 5 + 7 =
1 + 2 + 3 + 4 +
1 + 2 + 3

And given that we know from Gauss the sum of the first n numbers if n(n+1)/2 we can easily calculate:

sum_odd(n) = sum(n) + sum(n-1)
= n(n+1)/2 + (n-1)n/2
= (n^2 + n + n^2 - n)/2
= n^2

What do you do on long flights?

Wednesday, February 13, 2008

Tonight, I'm going to write myself an Aston Martin

This is the story of my attempt to 'cheat' in an on-line spot-the-ball competition to win an Aston Martin. It's also the story of my failure, but you get free source code that implements automatic detection of image alteration using copy/paste or tools like the Clone Tool in Photoshop.

First, take a look at this photo:



Notice anything strange? In fact this image has been tampered with to cover up a truck. The truck is completely hidden by foliage. Here's the original:



Wouldn't it be nice to be able to detect that automatically? It is possible. Here's an image automatically generated by my code showing what was moved. All of the red was moved to the blue (or the other way around).



I was motivated to work on this program by greed (or at least my never-ending love of having a little flutter on things). Best of the Best runs spot-the-ball competitions in airports to win very expensive cars. But they also run the same competition online. That meant I could get my hands on the actual image used... could I process it to discover where the ball had been removed? (In reality, this isn't the right way to win because the actual ball position is not governed by where it actually was, but where a judge thinks it was).

Would it be cheating if I could? Apparently not, the competition rules say I should use my skill and judgment in determining the ball position. Surely, skill covers my programming ability.

So, I went looking for tampering algorithms and eventually came across Detection of Copy-Move Forgery in Digital Images written by Jessica Fridrich at SUNY Binghamton. The paper describes an algorithm for detecting just the sort of changes I thought I was looking for.

Unfortunately, I know nothing about image processing. Fortunately, the paper is written in a very clear style and a bit of Internet research enabled me to track down the knowledge I didn't have. (Also, thanks to Jessica for sending me the original images she used to test my implementation).

In brief the algorithm does the following:
  1. Slide a 16x16 block across the entire image from left hand corner to bottom right hand corner. For each 16x16 block perform a discrete cosine transform (DCT) on it and then quantize the 16x16 block using an expanded version of the standard JPEG quantization matrix.

  2. Each quantized DCT transformed block is stored in a matrix with one row per (x,y) position in the original image (the (x,y) being the upper left hand corner of the 16x16 block being examined).

  3. The resulting matrix is lexicographically sorted and then rows that match in the matrix are identified. For each pair of matching rows (x1,y1) and (x2,y2) the shift vector (x1-x2,y1-y2) (normalized by swapping if necessary so that the first value is +ve) is computed and for each shift vector a count is kept of the number of times it is seen.

  4. Finally the shift vectors with a count > some threshold are examined, the corresponding pair of positions in the image are found and the 16x16 blocks they represent are highlighted.

Here's another picture showing a golfing image that's been touched up to remove something from the grass:






To get access to image data I used the FreeImage library and wrote a small C program that implements Jessica's algorithm. You can download the source here; it's released to you under the GNU GPL.

The program has two key parameters that affect how the image is processed: the quality factor and the threshold.

The quality factor is a number used to 'blur' the image (actually it changes the quantization): the higher the factor the more blurring and hence more 16x16 blocks are likely to seem the same to the algorithm. Increasing the quality factor will tend to increase the false matches.

The threshold is simply the number of blocks that have to appear to have been copied together. This prevents us from seeing a single 16x16 block as evidence of copying. Increasing the threshold means ever larger groups of blocks have to be identified together before they are identified as copying.

Back at Best of the Best I grabbed the image for Supercar Competition (SC-272), cut out a section that I thought the ball had to be in (just to speed up processing) and ran the algorithm. After some parameter tweaking the algorithm came up only with what look like false matches to me (along the bar where it's all one color):



And, of course, that's not where the judge thought the ball was. So, I guess I won't be driving home in the V8 Vantage, but what geek needs that when they've got a cool piece of software that detects copy/move forgery in images?

Which leaves me with one question: how are spot-the-ball images generated? Is this an algorithm problem, a problem because they use JPG (which is already transformed) for their images, or are these images generated in some other way?

Tuesday, February 12, 2008

Interface to SQLite database in 23 lines of Arc

One thing that the first release of Arc was missing was access to any sort of database, but that's easily remedied. Here are 23 lines of Arc code that provide access to a SQLite database:

(= db! 'nil)

(def db+ (name (o host "localhost") (o port 49153))
(let (i o) (connect-socket host port)
(db> o name)
(if (db< i) (list i o))))

(def sql ((i o) q)
(db> o q)
(if (db< i) (readall i 200)))

(def db- (db)
(map close db))

(def db> (o s)
(write s o)
(writec #\return o)
(writec #\newline o)
(flush-socket o))

(def db< (i)
(= db! (read i))
(iso db! 200))

The three functions you need to care about are db+ (get a connection to a named SQLite database), db- (close a connection to a database) and sql (execute a SQL query and return a list (or lists) of rows. There's also db! which contains the status of the last command (200 for OK, or 500 followed by a string explaining the error).

Here's a little Arc session creating a database, putting some data in it and then querying it. The database called test didn't exist at the start of this session:

arc> (= db (db+ "test"))
(#<input-port> #<output-port>)
arc> (sql db "create table foo (id integer primary key, text varchar(255))")
nil
arc> (sql db "select * from foo")
nil
arc> (sql db "insert into foo (text) values ('first');")
nil
arc> (sql db "select * from foo")
(("1" "first"))
arc> (sql db "insert into foo (text) values ('something else')")
nil
arc> (sql db "select * from foo")
(("1" "first") ("2" "something else"))
arc> (db- db)
nil

To make this work I had to write a TCP server that wraps SQLite (it's just a small C program that you can get here). The C program listens on a port for connections from your Arc program and handles queries.

I did have to make a small patch to Arc itself (since arc0 doesn't contain any outgoing socket code). My patch adds the ability to make a TCP connection to a remote machine and to flush an output port (add this to your ac.scm):

(xdef 'connect-socket (lambda (host port)
(let-values ([(in out) (tcp-connect host port)]) (list in out))))
(xdef 'flush-socket (lambda (s) (flush-output s)))

(Apologies if I have abused Scheme there, I'm a Scheme n00b)

All this code is released under the same license as Arc itself.

Monday, February 11, 2008

My first Arc project: a simple Wiki

The only way to learn a programming language is to write something in it. So, I decided it was time to dig into Arc and my first project is a very simple Wiki.

Here's the source (wiki.arc):

; A wiki written in Arc (arc0)
;
; Copyright (c) 2008 John Graham-Cumming
;
; (load "wiki.arc")
; (wsv)
;
; Then go to http://localhost:8080/show

(load "web.arc")
(load "util.arc")

(= pagedir* "wiki/")

(def histfiles (page)
(sort > (map [coerce _ 'int] (rem [is "current" _] (dir (pagepath page))))))

(def nexthist (page)
(let h (histfiles page)
(if h (++ (car h)) 0)))

(def pagepath (page)
(string pagedir* (page 0) "/" (page 0) (page 1) "/" page ))

(def pagefile (page (o file))
(string (pagepath page) "/" (or file "current")))

(def slurp (page (o file))
(if
(let p (pagefile page file)
(if (file-exists p) (readfile p)))))

(def upperlen (word)
(len (keep upper word)))

(def is-wikilink (word)
(if (alphas word)
(if (~is (word 0) (downcase (word 0)))
(>= (upperlen word) 2))))

(mac url-show (page)
`(string "show?p=" ,page))

(mac url-edit (page)
`(string "edit?p=" ,page))

(mac link-show (page text)
`(link ,text (url-show ,page)))

(mac link-edit (page text)
`(link ,text (url-edit ,page)))

(def wikify (word)
(if (is-wikilink word)
(if (file-exists (pagefile word))
(link-show word word)
(pr word)(link-edit word "?"))
(pr word))
(ws))

(mac spew-raw (page)
`(spew ,page [pr _ " "]))

(mac spew-wiki (page (o file))
`(spew ,page [wikify (string _)] ,file))

(def spew (page f (o file))
(let p (pagepath page)
(if (dir-exists p)
(map f (flat (map tokens (slurp page file))))
(pr "This page does not yet exist."))))

(def squash (file body)
(writefile1 body file))

(def save-page (req)
(w/$ p
(w/$ t
(squash (pagefile p) t)
(squash (string (pagepath p) "/" (nexthist p)) t))
(url-show p)))

(mac mtime (f)
`(datetime (file-mtime ,f)))

(def last-modified (page)
(let f (pagefile page)
(if (file-exists f)
(pr "Last modified: " (mtime f)))))

(mac show-page (page)
`(whitepage
(tag h1 (link-show ,page ,page))
(spew-wiki ,page)
(br 2)
(hr)
(last-modified ,page)
(br)
(link-edit ,page "[edit]")
(ws)
(link "[history]" (string "history?p=" ,page))))

(mac edit-page (page)
`(whitepage
(tag h1 (pr (string "Editing " ,page)))
(arform save-page
(textarea "t" 25 80 (spew-raw ,page))
(hidden "p" ,page)
(br)
(submit "Save"))
(link-show ,page "[cancel]")
(br 2)))

(def revision (page rev)
(tag li
(pr "Revision: " )
(link rev (string "revision?p=" page "&r=" rev))
(pr " modified " (mtime (string (pagepath page) "/" rev)))))

(mac history-page (page)
`(whitepage
(tag h1 (pr (string "History of " ,page)))
(tag ul (map [revision ,page _] (histfiles ,page)))
(hr)
(link-show ,page (string "Back to " ,page))))

(mac revision-page (page rev)
`(whitepage
(tag h1 (pr "Showing revision " ,rev " of " ,page))
(spew-wiki ,page ,rev)
(br 2)
(hr)
(last-modified ,page)
(br)
(link-show ,page (string "Back to " ,page))))

(defop show req
(w/$ p
(if p
(show-page ($ "p"))
(show-page "HomePage"))))

(defop edit req
(w/$ p
(ensure-dir (pagepath p))
(edit-page p)))

(defop history req
(history-page ($ "p")))

(defop revision req
(revision-page ($ "p") ($ "r")))

(def wsv ()
(ensure-dir pagedir*)
(asv))

It loads two helpers. The first contains common utilities that aren't really Wiki-related (util.arc):

(def alpha (c)
(or (<= #\a c #\z) (<= #\A c #\Z)))

(def alphas (str)
(is (keep alpha str) str))

(def upper (c)
(is (upcase c) c))

(def datetime ((o time (seconds)))
(let val (tostring
(system (string "date -u -r " time " \"+%Y-%m-%d %H:%M\"")))
(subseq val 0 (- (len val) 1))))


And the second contains enhancement to Arc's web/HTML handling (web.arc):

(mac hidden (name val)
`(gentag input type 'hidden name ,name value ,val))

(mac hr ()
`(gentag hr))

(mac ws ()
`(pr " "))

(mac $ (r)
`(arg req ,r))

(mac w/$ (r . body)
`(with (,r ($ (string ',r))) ,@body))

In web.arc there are a couple of bits of syntax to make accessing form/URL arguments easier: ($ "p") (which gets the value of the argument p) and (w/$ p ...) which sets a variable called p to the value of the argument p and then evaluates the rest of the expression.

All this is released under the same license as Arc. (Since I have never programmed in Arc before, and it's been almost 20 years since I stopped coding in LISP or ML, I'd appreciate constructive comments).

PPP3 (final version) in Java and C

Steve Gibson has released the final version of his PPP system: PPPv3 and so I've updated my code to be compatible.

Two versions of PPPv3 are available:

Both are released, as before, under the BSD license.

Friday, February 08, 2008

The Arc Challenge Explained

When I first looked at the Arc Challenge code my reaction, like that of many people, was WTH? Here's the code:

(defop said req
(aform [w/link (pr "you said: " (arg _ "foo"))
(pr "click here")]
(input "foo")
(submit)))

Within the context of the Arc web/app server this creates a page called /said which has a form on it:

<form method=post action="x">
<input type=hidden name="fnid" value="JtCw8ju328">
<input type=text name="foo" value="" size=10>
<input type=submit value="submit">
</form>

That form accepts a single parameter called foo and redirects to /x.

When clicking submit the user is taken to a page with a single link on it:

<a href="x?fnid=bHJpJ5G1DH">click here</a>

Following that link brings up a page showing what you typed in the first; here's the output when I typed hello in the form:

you said: hello

So, how does that work?

Firstly, the defop defines an 'operation' (which is just a page within the web server). In this case the page is called said and hence is bound to /said. There's a single argument, called req, which will contain the HTTP request when said is called by the server.

When said is called it uses aform to create an HTML form. To see this more clearly I've removed the clever part (and replaced it with X):

(aform X
(input "foo")
(submit))

So aform creates a form with an simple HTML input with the name foo and a submit button. The clever bit is what happens when the form is submitted.

By default the form submits to the page /x. This is hard-coded in the source of the Arc server. It makes use of a neat feature of the Arc server: fnids. When the form was generated a hidden field was inserted with a unique 'function id' (the fnid). This fnid is used by the /x URL to lookup a function to call when /x is activated. (Note this example uses URLs/hidden form fields for the fnid, there's no reason why it couldn't be stored in a cookie).

The function called is actually the first argument to aform which has been stored away to be called when necessary. Here's the function definition:

[w/link (pr "you said: " (arg _ "foo")) (pr "click here")]

[ ... _ ... ] is special Arc syntax for a function with a single argument called _. So the first argument to aform is a function definition, and that function is assigned a unique fnid and that fnid can be used to lookup that function and call it. The single argument consists of the HTTP request used to activate the function.

The w/link macro creates a page consisting of the words click here linked to another page. The link is, once again, done using a function and fnid. The function called when the link is clicked is:

(pr "you said: " (arg _ "foo"))

w/link's first argument is an expression that will be evaluated within the context of a function (which is entirely hidden inside the server) and used to output the page. It retrieves the foo argument from the HTTP request at the time of the initial POST.

What's neat here is the mapping between functions and fnids so that pages are just functions and the lookup of the right page to go to is handled automatically.

Monday, January 21, 2008

Proof that the sum of the squares of the first n whole numbers is n^3/3 + n^2/2 + n/6

A recent thread on Hacker News that I started with a flippant comment turned into a little mathematical puzzle.

What's the sum of the square of the first n whole numbers?

It's well known that the sum of the first n whole numbers is n(n+1)/2. But what's the value of sum(i=1..n) n^2? (I'll call this number S for the remainder of this post).

It turns out that it's easy to prove that S = n^3/3 + n^2/2 + n/6 by induction. But how is the formula derived? To help with reasoning here's a little picture of the first 4 squares stacked up one on top of the other:



If we fill in the blank squares to make a rectangle we have the basis of a derivation of the formula:



Looking at the formerly blank squares (that I've numbered to assist with the thinking) we can see that the columns have 1 then 1+2 then 1+2+3 and finally 1+2+3+4 squares. Thus the columns are sums of consecutive whole numbers (for which we already have the n(n+1)/2 formula.

Now the total rectangle is n+1 squares wide (in this case 5) and its height is the final sum of whole numbers up to n or n(n+1)/2 (in the image it's 4 x 5 / 2 = 10. So the total number of squares in the rectangle is (n+1)n(n+1)/2 (in the example that's 5 x 10 = 50).

So we can calculate S as the total rectangle minus the formerly blank squares which gives:

S = (n+1)n(n+1)/2 - sum(i=1..n)sum(j=1..i) j
= (n(n+1)^2)/2 - sum(i=1..n) i(i+1)/2
2S = n(n+1)^2 - sum(i=1..n) i(i+1)
= n(n+1)^2 - sum(i=1..n) i^2 - sum(i=1..n) i
= n(n+1)^2 - S - n(n+1)/2
3S = n(n+1)^2 - n(n+1)/2
= n(n+1)( n+1 - 1/2 )
= n(n+1)(n+1/2)
= (n^2+n)(n+1/2)
= n^3 + n^2/2 + n^2 + n/2
= n^3 + 3n^2/2 + n/2
S = n^3/3 + n^2/2 + n/6

Thursday, January 17, 2008

Another use of POPFile: detecting weakly encrypted email

Almost all users use POPFile as a spam filter, most of them also use the fact that POPFile can sort in arbitrary categories of mail. However, some people have pushed POPFile even further... Martin Overton (of IBM) has used POPFile to discover email borne malware, even finding that POPFile could automatically detect mutations. Now, some researchers in Japan have used POPFile to detect weak encryption of email with 80% accuracy.

The researchers were building a system to detect improper sending of personal information by email. Their system first checked for the use of strong encryption (if the mail is strongly encrypted then there's no need to worry about eavesdropping), the system also checked for things like telephone numbers, email addresses and other personal data in non-encrypted mail.

But they also wanted a system to detect poor encryption (such as ROT-13), and for that they turned to POPFile. After a mere 30 emails had been trained in POPFile it was able to distinguish plain text messages from those encrypted with weak ciphers.

Some details in their paper.