[Lognorm] Libnormalize issue

david at lang.hm david at lang.hm
Thu Nov 3 18:19:50 CET 2011


On Thu, 3 Nov 2011, Rainer Gerhards wrote:

> The problem here is that liblognorm primarily aims at semi-structured data,
> that is text data without an easily parsable structure. Iptables actually
> provides structured data and liblognorm is not great at processing that kind
> of data. It becomes even worse if there are any permutations in field order.
> In that case, you need exponentionally many rules in the worst case.
>
> I was thinking about adding a special name/value parsing capability to
> support that type of data. But then it is vitally important that the data has
> a header that clearly identifies the message, otherwise normalization will
> result in a big mess of garbage. Because the chance that such a very generic
> parser mis-interprets things is very high, especially in the uptables case as
> a single word (like "df" above) is a valid (binary) "name/value-pair", so it
> is hard to detect during parsing if that really is iptables or not. Even if
> we assume it is: the parser consumes probably a lot of data before it detects
> a mismatch. So we need to backtrack over a lot of data. In essence, one such
> rule could probably double the processing speed of all rules. And if you have
> 10 such rules, you could come up with a 1024-times slower rule parsing in the
> worst case (that's the problem that bugs the usual regex approach and
> severely limits extraction speed).

how about adding a couple of new tag types

1. name=value pair

2. one or more name=value pairs

then you could make a rule that would match the fixed part of a log and 
then let the log specify the rest of it

>
> So iptables actually presents a pretty hard problem. I'd still like to tackle
> it, but unfortunately I am short on time at the moment. In any case,
> normalization is still up on my agenda, so probably one of the first things
> to look at when there is time left (CEE has a new draft standard out and I'd
> like to make the necessary adaptions). Probably a solution is to provide this
> "iptables" normalizer, maybe even as a different api call, and the
> controlling application must first select the normalize to use based on other
> information.

one key thing is that iptables lets you specify a tag for the log 
messages. by default it is just 'kernel' but if you are wanting to 
identify them for parsing, you really should set this (and then you can 
unambigously match them)

David Lang


More information about the Lognorm mailing list