A post on Hacker News entitled Log Everything As JSON. Make Your Life Easier reminded me of my private logging strategy which has the following properties:
1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc
2. Easy to parse and analyze in code using Perl, Ruby, or Go
3. Compact
4. Easily expandable and lacking the ambiguity of simple delimited log formats
I call it tacoli (which stands for Tabs, Colons and Lines). Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.
That's it. Here's an example log line from Apache in this format:
22/Apr/2012:06:29:07 +0000 apache ip: 18.12.25.55 method: GET uri: /example.html code:301 size:305 referer: http://blog.jgc.org/2009/08/geek-weekend-day-1-bletchley-park.html agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162
Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat. No special logger module required. In fact, anything that can 'printf' a string can create tacoli lines trivially.
It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value. No specialized JSON (or other parser) required.
It's trivial to extend without breaking any tools. Just add a new field (anywhere on the line) with a new key.
It's simple to work with using Unix tools. Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).
If you want to extract a single field from each line of the log file then it's easy to do with grep. Here's an example that extracts all the lines that have an ip entry and just extracts that
grep -Po "\tip: [^\t]+" access.log
The key name can be trivially removed using cut
1. Easy to parse and analyze with Unix command-line tools such as grep, cut, sort, uniq, and wc
2. Easy to parse and analyze in code using Perl, Ruby, or Go
3. Compact
4. Easily expandable and lacking the ambiguity of simple delimited log formats
I call it tacoli (which stands for Tabs, Colons and Lines). Here are the tacoli logging rules: Each log entry is a single line that starts with the date/time; the second entry on the line is a string called the 'generator' which indicates where the log line came from (such as the program or module); all the other entries have the format "key: value"; and entries are tab-delimited and no tabs are allowed in keys, values or the generator name.
That's it. Here's an example log line from Apache in this format:
22/Apr/2012:06:29:07 +0000 apache ip: 18.12.25.55 method: GET uri: /example.html code:301 size:305 referer: http://blog.jgc.org/2009/08/geek-weekend-day-1-bletchley-park.html agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162
Note that it's easy to make Apache output this format just by using tabs and adding the appropriate key: to each field in the LogFormat. No special logger module required. In fact, anything that can 'printf' a string can create tacoli lines trivially.
It's trivial to parse in code, all you need is 'split' to break on the tabs, and then split again to break the key name from the value. No specialized JSON (or other parser) required.
It's trivial to extend without breaking any tools. Just add a new field (anywhere on the line) with a new key.
It's simple to work with using Unix tools. Since the format is 'one log entry per line' it works well with wc -l to count instances of anything and it interfaces with all the other Unix tools that expect to work with lines (and even in code the line oriented nature is helpful since getting a complete entry is a single line read).
If you want to extract a single field from each line of the log file then it's easy to do with grep. Here's an example that extracts all the lines that have an ip entry and just extracts that
grep -Po "\tip: [^\t]+" access.log
The key name can be trivially removed using cut
grep -Po "\tip: [^\t]+" access.log | cut -d: -f2-
and the output can be fed into the other Unix tools. Also, if you know that your log file format hasn't changed you can still use the positional information to simplify parsing and fall back to cut.
It isn't quite as compact as a log file format that only uses position to indicate meaning, but compression largely overcomes that problem and key names can be chosen to be short and unique.
Comments
You should be using ISO8601 for date/time stamps, unless you require more accuracy, in which case something like TAI64 may be more appropriate.
JSON or other marginally structured formats protect you against that, and that's well worth the cost.
cut -d' ' -f2-
It will leave further leading spaces but then they're significant since they were present in the outputted field.
printf 'a\tip: 1.2.3.4 5.6.7.8\tb\n' |
grep -Po "\tip: [^\t]+" |
cut -d' ' -f2-
sed 's/[^ ]* //' would also achieve the same and for some reason I tend to prefer it over cut(1); less counting of fields and more thinking about the change?
grep -Po '(?<=ip: )[^\t]+'