Logging

Being unable to find any useful "best practices" or guidelines for logging, here I've tried to put something like that together.

Technologies

Syslog and journald are the primary ways to write logs on unixy systems nowadays. Syslog is more widely used and standardized, but it only has structured data support in RFC 5424, which apparently isn't going to be implemented in glibc, and generally poorly supported as of 2022. Using custom databases, manually managing logfiles, or reinventing the wheel, especially if it's done instead of regular logging, and not in addition to it, leads to maintenance hell for both programming and administration.

There are fancy logging libraries, though I prefer simple ones: hsyslog combined with formatting, abstracted into a function that also passes some system-specific environment variables into structured data fields:

logM :: MonadIO m => Priority -> Format (m ()) a -> a

Similar to the syslog function in C, but safe. Used to use libsystemd-journal, but journald was more buggy than common syslog daemons, and then libsystemd-journal did not support GHC (and base) from Debian 12 repositories (though then it did, after reporting the issue), so I gave up on structured fields and finally switched to hsyslog, as I considered doing for a while at that point.

Log contents

It's tricky to decide what to write apart from error messages: audit trails are useful, and there is event sourcing along those lines, but it would be impractical to log a message before and/or after each action that could fail. One way is to let them evolve: if a debug or info message was not helpful for a while, it should be removed (or limited to debug mode if it's present).

There are log scopes, stack traces, and memory dumps to write more information (particularly context) just on error; they are more complicated, possibly less reliable, and sometimes less useful than simple audit trails, though may be useful in addition to those.

It's also pretty hard to make the logs readable by users/administrators: there are libraries that produce error messages for programmers familiar with those on one side, and users who can't read even carefully prepared messages on the other.

Structure and analysis

To analyze logs reliably (i.e., without relying on NLP too much), they should be structured. There are standard fields (timestamp, priority, process, etc), possibly custom fields, and a non-structured textual message; while the fields are useful, most of the program-specific information is still in text. Some of that text comes from libraries or external sources, and can't be structured with reasonable effort, but we still can try to keep a structure for what we compose.

A single software structure — unless it describes a whole language — would be language-specific and project-specific, inflexible. Message identifiers would be cumbersome to maintain and not syslog-friendly. It may be useful to store variable parameters in custom fields, though it would require to define a large set of standard fields to make it generic, would be tricky to write from a shell script (i.e., without special support), and neither it is syslog-friendly unless RFC 5424 messages (with structured data) can be used.

A language that is both computer-readable and human-readable, and capable of knowledge representation, would be handy here, as well as for various other purposes. But that's hard, and writing textual logs in a language other than English would be awkward, potentially leading to a system only maintainable by its author.

Though a rather good example of dealing with that is postfix: it has nice human-readable textual messages, but they are specified, and can be parsed reliably.

Quotation

One easy step towards structured logging is to enquote variables, though it may slightly harm legibility. For instance:

Received `42` bytes from `1.2.3.4:56`
User `foo` has logged in

In Haskell with the formatting library, I'm using custom formatters for that (they also escape those quotes with backslashes), though it can be done even in a shell script rather easily. If there are multiline variables, they could be moved out of the first line in order to simplify analysis, and all message types could be extracted with a basic sed script then:

#!/bin/sed -rf

# Remove date (journalctl -o short-iso), server name, PID
s/^[^ ]* [^ ]* ([^[]*)\[[0-9]*\]:/\1/

# Skip if it's not the first (date) line
T skip

# Hide the variables
s/([^\\]|^)`([^`\\]|\\.)*`/\1_/g

# End
p
: skip
d

Or s/^[^ ]* [^ ]* [^[]*\[[0-9]*\]: // to remove program names as well. And a shell script to retrieve logs may look like this:

#!/bin/sh
journalctl -a -o short-iso --since $1 \
           _UID=$(id -u user1) + _UID=$(id -u user2)

So that get-logs.sh -6h | hide-variables.sed | sort | uniq would extract a nice list of message templates, e.g.:

Received _ bytes from _
User _ has logged in

And uniq -c | sort -gbr would summarize those logs, e.g.:

12 Received _ bytes from _
 3 User _ has logged in

Now we can translate logs into a different language or format, annotate, filter with grep or logcheck, make it more accessible for those who normally don't read logs, or even perform some statistical analysis using those parameters, and/or message frequences and times.

Most of those operations would still be project-specific (though not affecting programs apart from log analyzers), manually built on top of the extracted message templates, and requiring maintenance/updates whenever the messages get updated.

Sentence structure

In order to assist filtering, the textual messages should be unified in their structure. Using the main verb in its base form would help to grep easily, suggesting the "(program) managed/failed/planning to <verb> <tail>" form. That's also helpful to ensure that the messages are more or less meaningful to others: compare "time: <time>" with "managed to receive time: <time>". Yet those "managed to" prefixes are rather verbose, so a better option might be to use "[+]", "[-]", and "[ ]" to indicate success, failure, and intention. Success and failure could be inferred from priorities, but it's tricky to separate successful actions from plans using standard priorities alone, though one could pretend that planning is an action, or reserve "debug" for plans and "info" for actions. Though that might be confusing.

To illustrate, the above example may be rewritten as:

[+] receive _ bytes from _
[+] authenticate user _