Skip to main content

tacoli: a simple logging format

A post on Hacker News entitled Log Everything As JSON. Make Your Life Easier reminded me of my private logging strategy which has the following properties:

1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc

2. Easy to parse and analyze in code using Perl, Ruby, or Go

3. Compact

4. Easily expandable and lacking the ambiguity of simple delimited log formats

I call it tacoli (which stands for Tabs, Colons and Lines).  Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.

That's it.  Here's an example log line from Apache in this format:

22/Apr/2012:06:29:07 +0000      apache  ip: 18.12.25.55 method: GET     uri: /example.html code:301        size:305        referer: http://blog.jgc.org/2009/08/geek-weekend-day-1-bletchley-park.html        agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162

Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat.  No special logger module required.  In fact, anything that can 'printf' a string can create tacoli lines trivially.

It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value.  No specialized JSON (or other parser) required.

It's trivial to extend without breaking any tools.  Just add a new field (anywhere on the line) with a new key.

It's simple to work with using Unix tools.  Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).

If you want to extract a single field from each line of the log file then it's easy to do with grep.  Here's an example that extracts all the lines that have an ip entry and just extracts that

grep -Po "\tip: [^\t]+" access.log

The key name can be trivially removed using cut

grep -Po "\tip: [^\t]+" access.log | cut -d: -f2-

and the output can be fed into the other Unix tools.  Also, if you know that your log file format hasn't changed you can still use the positional information to simplify parsing and fall back to cut.
It isn't quite as compact as a log file format that only uses position to indicate meaning, but compression largely overcomes that problem and key names can be chosen to be short and unique.

Comments

zdw said…
The example date format given here is nonideal.

You should be using ISO8601 for date/time stamps, unless you require more accuracy, in which case something like TAI64 may be more appropriate.
Anonymous said…
Seems like a good idea at first blush, but this will cause you no end of heartache as soon as your log data contains embedded colons. Those will pop up much more often than you might expect: Windows pathnames, standard format timestamps, even many user-agent strings contain colons (I'm looking at you, Firefox!).

JSON or other marginally structured formats protect you against that, and that's well worth the cost.
Unknown said…
The leading space from the IP address can be removed by changing the cut(1) to

cut -d' ' -f2-

It will leave further leading spaces but then they're significant since they were present in the outputted field.

printf 'a\tip: 1.2.3.4 5.6.7.8\tb\n' |
grep -Po "\tip: [^\t]+" |
cut -d' ' -f2-

sed 's/[^ ]* //' would also achieve the same and for some reason I tend to prefer it over cut(1); less counting of fields and more thinking about the change?
Nils said…
Ericm: That isn't the problem. You split on tabs, then on _the first_ colon. Unless you allow colons in the key, that is. Embedded tabs though...
curt504 said…
I tried living in a JSON encoded log file world. I literally got headaches! No gain, I felt pain. This is a good idea. Pair it with Splunk or Kibana log parsing DB / search tool and you'd have a good comprimise between still human readable and machine parsible.
curt504 said…
This is a good compromise between still being human readible and better machine parsibility. Combine with Splunk or Kibana log DB / search engines and you have a good combination.
Rob said…
You could also use a positive lookbehind assertion and get rid of cut all together:

grep -Po '(?<=ip: )[^\t]+'

Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:


Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

Importing an existing SSL key/certificate pair into a Java keystore

I'm writing this blog post in case anyone else has to Google that. In Java 6 keytool has been improved so that it now becomes possible to import an existing key and certificate (say one you generated outside of the Java world) into a keystore.

You need: Java 6 and openssl.

1. Suppose you have a certificate and key in PEM format. The key is named host.key and the certificate host.crt.

2. The first step is to convert them into a single PKCS12 file using the command: openssl pkcs12 -export -in host.crt -inkey host.key > host.p12. You will be asked for various passwords (the password to access the key (if set) and then the password for the PKCS12 file being created).

3. Then import the PKCS12 file into a keystore using the command: keytool -importkeystore -srckeystore host.p12 -destkeystore host.jks -srcstoretype pkcs12. You now have a keystore named host.jks containing the certificate/key you need.

For the sake of completeness here's the output of a full session I performe…

More fun with toys: the Ikea LILLABO Train Set

As further proof of my unsuitability to be a child minder (see previous post) I found myself playing with an Ikea LILLABO 20-piece basic set train.


The train set has 16 pieces of track (12 curves, two straight pieces and a two part bridge) and 4 pieces of train. What I wondered was... how many possible looping train tracks can be made using all 16 pieces?

The answer is... 9. Here's a picture of the 9 different layouts.


The picture was generated using a little program written in Processing. The bridge is red, the straight pieces are green and the curves are blue or magenta depending on whether they are oriented clockwise or anticlockwise. The curved pieces can be oriented in either way.

To generate those layouts I wrote a small program which runs through all the possible layouts and determines which form a loop. The program eliminates duplicate layouts (such as those that are mirror images of each other).

It outputs a list of instructions for building loops. These instructions con…