[Lognorm] Libnormalize issue

Mon Nov 7 18:31:05 CET 2011

> -----Original Message-----
> From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Thursday, November 03, 2011 6:25 PM
> To: lognorm
> Subject: Re: [Lognorm] Libnormalize issue
> 
> > -----Original Message-----
> > From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
> > bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> > Sent: Thursday, November 03, 2011 6:20 PM
> > To: lognorm
> > Subject: Re: [Lognorm] Libnormalize issue
> >
> > On Thu, 3 Nov 2011, Rainer Gerhards wrote:
> >
> > > The problem here is that liblognorm primarily aims at
> > > semi-structured data, that is text data without an easily parsable
> > > structure. Iptables actually provides structured data and liblognorm
> > > is not great at processing that kind of data. It becomes even worse
> > > if there are any
> > permutations in field order.
> > > In that case, you need exponentionally many rules in the worst case.
> > >
> > > I was thinking about adding a special name/value parsing capability
> > > to support that type of data. But then it is vitally important that
> > > the data has a header that clearly identifies the message, otherwise
> > > normalization will result in a big mess of garbage. Because the
> > > chance that such a very generic parser mis-interprets things is very
> > > high, especially in the uptables case as a single word (like "df"
> > > above) is a valid (binary) "name/value-pair", so it is hard to
> > > detect during parsing if that really is iptables or not. Even if we
assume it
> is:
> > > the parser consumes probably a lot of data before it detects a
> > > mismatch. So we need to backtrack over a lot of data. In essence,
> > > one such rule could probably double the processing speed of all
> > > rules. And if you have
> > > 10 such rules, you could come up with a 1024-times slower rule
> > > parsing in the worst case (that's the problem that bugs the usual
> > > regex approach and severely limits extraction speed).
> >
> > how about adding a couple of new tag types
> >
> > 1. name=value pair
> >
> > 2. one or more name=value pairs
> >
> > then you could make a rule that would match the fixed part of a log
> > and
> then
> > let the log specify the rest of it
> 
> That's (especially 2) what I am thinking about.

I have added some experimental code to liblognorm to handle this case. The
code is currently available via git, only.

This rule:
rule=:%date:date-rfc3164% %host:word% %tag:char-to:\x3a%: %dummy:iptables%

used with this message:
Apr  8 13:58:26 host.example.net iptables: IN=ppp0 OUT= MAC=
SRC=121.11.80.101 DST=my_ext_ip LEN=40 TOS=0x00 PREC=0x00 TTL=108 ID=256 DF
PROTO=TCP SPT=6000 DPT=1433 WINDOW=16384 RES=0x00 SYN URGP=0

Leads to this format (json-formatted):
'{"IN": "ppp0", "OUT": "", "MAC": "", "SRC": "121.11.80.101", "DST":
"my_ext_ip", "LEN": "40", "TOS": "0x00", "PREC": "0x00", "TTL": "108", "ID":
"256", "DF": "[*PRESENT*]", "PROTO": "TCP", "SPT": "6000", "DPT": "1433",
"WINDOW": "16384", "RES": "0x00", "SYN": "[*PRESENT*]", "URGP": "0", "tag":
"iptables", "host": "host.example.net", "date": "Apr  8 13:58:26"}'

Note that things like DF show up with value "[*PRESENT*]".

The code currently does not check malformdness of the iptables part. Most
probably the code will segfault if something is malformed. I have not been
able to conduct broader tests, especially as part of a larger rule base. I'd
deeply appreciate if someone (Champ?) could try out the new code in a
real-world setting. I'd expect that it would considerably reduce the effort
required to handle iptables logs inside a semi-structured log stream. Just
make sure that you assign a unique tag as suggested by David for iptables,
else recognition will be a mess.

Feedback deeply appreciated.
Rainer