[Lognorm] Libnormalize issue
James Lay
jlay at slave-tothe-box.net
Mon Nov 7 18:40:55 CET 2011
>
>
>> -----Original Message-----
>> From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
>> Sent: Thursday, November 03, 2011 6:25 PM
>> To: lognorm
>> Subject: Re: [Lognorm] Libnormalize issue
>>
>> > -----Original Message-----
>> > From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
>> > bounces at lists.adiscon.com] On Behalf Of david at lang.hm
>> > Sent: Thursday, November 03, 2011 6:20 PM
>> > To: lognorm
>> > Subject: Re: [Lognorm] Libnormalize issue
>> >
>> > On Thu, 3 Nov 2011, Rainer Gerhards wrote:
>> >
>> > > The problem here is that liblognorm primarily aims at
>> > > semi-structured data, that is text data without an easily parsable
>> > > structure. Iptables actually provides structured data and liblognorm
>> > > is not great at processing that kind of data. It becomes even worse
>> > > if there are any
>> > permutations in field order.
>> > > In that case, you need exponentionally many rules in the worst case.
>> > >
>> > > I was thinking about adding a special name/value parsing capability
>> > > to support that type of data. But then it is vitally important that
>> > > the data has a header that clearly identifies the message, otherwise
>> > > normalization will result in a big mess of garbage. Because the
>> > > chance that such a very generic parser mis-interprets things is very
>> > > high, especially in the uptables case as a single word (like "df"
>> > > above) is a valid (binary) "name/value-pair", so it is hard to
>> > > detect during parsing if that really is iptables or not. Even if we
> assume it
>> is:
>> > > the parser consumes probably a lot of data before it detects a
>> > > mismatch. So we need to backtrack over a lot of data. In essence,
>> > > one such rule could probably double the processing speed of all
>> > > rules. And if you have
>> > > 10 such rules, you could come up with a 1024-times slower rule
>> > > parsing in the worst case (that's the problem that bugs the usual
>> > > regex approach and severely limits extraction speed).
>> >
>> > how about adding a couple of new tag types
>> >
>> > 1. name=value pair
>> >
>> > 2. one or more name=value pairs
>> >
>> > then you could make a rule that would match the fixed part of a log
>> > and
>> then
>> > let the log specify the rest of it
>>
>> That's (especially 2) what I am thinking about.
>
> I have added some experimental code to liblognorm to handle this case. The
> code is currently available via git, only.
>
> This rule:
> rule=:%date:date-rfc3164% %host:word% %tag:char-to:\x3a%: %dummy:iptables%
>
> used with this message:
> Apr 8 13:58:26 host.example.net iptables: IN=ppp0 OUT= MAC=
> SRC=121.11.80.101 DST=my_ext_ip LEN=40 TOS=0x00 PREC=0x00 TTL=108 ID=256
> DF
> PROTO=TCP SPT=6000 DPT=1433 WINDOW=16384 RES=0x00 SYN URGP=0
>
> Leads to this format (json-formatted):
> '{"IN": "ppp0", "OUT": "", "MAC": "", "SRC": "121.11.80.101", "DST":
> "my_ext_ip", "LEN": "40", "TOS": "0x00", "PREC": "0x00", "TTL": "108",
> "ID":
> "256", "DF": "[*PRESENT*]", "PROTO": "TCP", "SPT": "6000", "DPT": "1433",
> "WINDOW": "16384", "RES": "0x00", "SYN": "[*PRESENT*]", "URGP": "0",
> "tag":
> "iptables", "host": "host.example.net", "date": "Apr 8 13:58:26"}'
>
> Note that things like DF show up with value "[*PRESENT*]".
>
> The code currently does not check malformdness of the iptables part. Most
> probably the code will segfault if something is malformed. I have not been
> able to conduct broader tests, especially as part of a larger rule base.
> I'd
> deeply appreciate if someone (Champ?) could try out the new code in a
> real-world setting. I'd expect that it would considerably reduce the
> effort
> required to handle iptables logs inside a semi-structured log stream. Just
> make sure that you assign a unique tag as suggested by David for iptables,
> else recognition will be a mess.
>
> Feedback deeply appreciated.
> Rainer
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
Count me in Rainer...I'll get this rockin at the home machine either today
or tomorrow and see the impact. I'll let you know my findings..thanks for
all your hard work on this!
James
More information about the Lognorm
mailing list