[Lognorm] Libnormalize issue

Thu Nov 3 11:42:52 CET 2011

I have just uploaded an overview  paper that may help to provide context for
this discussion (though admittedly not touching it precisely):

http://www.gerhards.net/download/LogNormalizationV2.pdf

HTH
Rainer

> -----Original Message-----
> From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Thursday, November 03, 2011 9:05 AM
> To: lognorm
> Subject: Re: [Lognorm] Libnormalize issue
> 
> > -----Original Message-----
> > From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
> > bounces at lists.adiscon.com] On Behalf Of James Lay
> > Sent: Wednesday, November 02, 2011 8:12 PM
> > To: lognorm
> > Subject: Re: [Lognorm] Libnormalize issue
> >
> > >
> > > The rule does not match, because "(Unhandled..." is not matching the
> > > sample.
> > > So it did not extract any fields at all.
> > >
> > > I'll elaborate a bit later why we need to have perfect matches.
> > > Think about false positives...
> > >
> > > rainer
> >
> > Thanks for responding so quickly.  As I look at my test setup, I see
> > that you are spot on...if it doesn't match the WHOLE thing, nothing
> > gets parsed.  That leaves me with two options as it relates to the
> > below
> > examples:
> >
> > IN=ppp0 OUT= MAC= SRC=121.11.80.101 DST=my_ext_ip LEN=40 TOS=0x00
> > PREC=0x00 TTL=108 ID=256 PROTO=TCP SPT=6000 DPT=1433
> WINDOW=16384
> > RES=0x00
> > SYN URGP=0
> >
> > IN=ppp0 OUT= MAC= SRC=121.11.80.101 DST=my_ext_ip LEN=40 TOS=0x00
> > PREC=0x00 TTL=108 ID=256 DF PROTO=TCP SPT=6000 DPT=1433
> WINDOW=16384
> > RES=0x00 SYN URGP=0
> >
> > Easy to miss, but the DF there is where I have an issue...some have
> > it, and some don't.  Without a regex to ignore junk (LEN=.*DF), then
> > what are my options?  I can create 2 different rules, one to match the
> > above with a %DF:word%, and one without, but now I have two seperate
> > entries for pretty much the same info...not optimal.
> 
> The problem here is that liblognorm primarily aims at semi-structured data,
> that is text data without an easily parsable structure. Iptables actually
> provides structured data and liblognorm is not great at processing that
kind
> of data. It becomes even worse if there are any permutations in field
order.
> In that case, you need exponentionally many rules in the worst case.
> 
> I was thinking about adding a special name/value parsing capability to
> support that type of data. But then it is vitally important that the data
has a
> header that clearly identifies the message, otherwise normalization will
> result in a big mess of garbage. Because the chance that such a very
generic
> parser mis-interprets things is very high, especially in the uptables case
as a
> single word (like "df" above) is a valid (binary) "name/value-pair", so it
is hard
> to detect during parsing if that really is iptables or not. Even if we
assume it is:
> the parser consumes probably a lot of data before it detects a mismatch. So
> we need to backtrack over a lot of data. In essence, one such rule could
> probably double the processing speed of all rules. And if you have
> 10 such rules, you could come up with a 1024-times slower rule parsing in
the
> worst case (that's the problem that bugs the usual regex approach and
> severely limits extraction speed).
> 
> So iptables actually presents a pretty hard problem. I'd still like to
tackle it,
> but unfortunately I am short on time at the moment. In any case,
> normalization is still up on my agenda, so probably one of the first things
to
> look at when there is time left (CEE has a new draft standard out and I'd
like
> to make the necessary adaptions). Probably a solution is to provide this
> "iptables" normalizer, maybe even as a different api call, and the
controlling
> application must first select the normalize to use based on other
information.
> 
> >
> > I'm guessing that my questions and comments are from my ignorance of
> > how this all works.  From my dealings so far with Sagan, it looks like
> > my rule file should match first, then send to normlize yes?  I would
> > think that would reduce false positives since my rule has already done
> > the job of matching, and liblognorm's job is to parse out the specific
> > info..yes?
> > Again, maybe I'm TOTALLY missing something.
> 
> I don't now how it is implemented in Sagan. But if you do that, you'll
loose all
> of liblognorm's performance benefits (which it gains from doing all in a
> *single* pass). Note: I am not saying what you intend to do is bad for your
> context, I am just saying how it relates to liblognorm.
> 
> rainer
> >
> > I'll continue to test this out...my goal is corelate snort entires
> > with firewall rules, but so far it's been an uphill battle.  Again,
> > thanks for any light you can shed, and for taking your time to make
> > liblognorm.
> >
> > James
> >
> > _______________________________________________
> > Lognorm mailing list
> > Lognorm at lists.adiscon.com
> > http://lists.adiscon.net/mailman/listinfo/lognorm
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm