Friday, April 27, 2012

tacoli: a simple logging format

A post on Hacker News entitled Log Everything As JSON. Make Your Life Easier reminded me of my private logging strategy which has the following properties:

1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc

2. Easy to parse and analyze in code using Perl, Ruby, or Go

3. Compact

4. Easily expandable and lacking the ambiguity of simple delimited log formats

I call it tacoli (which stands for Tabs, Colons and Lines).  Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.

That's it.  Here's an example log line from Apache in this format:

22/Apr/2012:06:29:07 +0000      apache  ip: method: GET     uri: /example.html code:301        size:305        referer:        agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162

Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat.  No special logger module required.  In fact, anything that can 'printf' a string can create tacoli lines trivially.

It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value.  No specialized JSON (or other parser) required.

It's trivial to extend without breaking any tools.  Just add a new field (anywhere on the line) with a new key.

It's simple to work with using Unix tools.  Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).

If you want to extract a single field from each line of the log file then it's easy to do with grep.  Here's an example that extracts all the lines that have an ip entry and just extracts that

grep -Po "\tip: [^\t]+" access.log

The key name can be trivially removed using cut

grep -Po "\tip: [^\t]+" access.log | cut -d: -f2-

and the output can be fed into the other Unix tools.  Also, if you know that your log file format hasn't changed you can still use the positional information to simplify parsing and fall back to cut.
It isn't quite as compact as a log file format that only uses position to indicate meaning, but compression largely overcomes that problem and key names can be chosen to be short and unique.

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.


<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

Post a Comment

Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
Create a Link

<< Home