[Lognorm] Libnormalize issue

Thu Nov 3 08:40:51 CET 2011

> -----Original Message-----
> From: lognorm-bounces at lists.adiscon.com [mailto:lognorm-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Thursday, November 03, 2011 3:05 AM
> To: lognorm
> Subject: Re: [Lognorm] Libnormalize issue
> 
> On Wed, 2 Nov 2011, cclark wrote:
> 
> > On Wed, 2 Nov 2011 13:11:34 -0600, James Lay wrote:
> >> I'm guessing that my questions and comments are from my ignorance of
> how
> >> this all works.  From my dealings so far with Sagan, it looks like
> my rule
> >> file should match first, then send to normlize yes?  I would think
> that
> >> would reduce false positives since my rule has already done the job
> of
> >> matching, and liblognorm's job is to parse out the specific
> info..yes?
> >> Again, maybe I'm TOTALLY missing something.
> >
> > If the first normalize rule doesn't match, it'll move on to the
> second
> > rule. That is, _right_ when liblognorm "see's" it's not going to
> match,
> > it's already moving on to the next rule. If you have pcre/regexp, it
> > would then have to pump that data via libpcre. That'd create more
> > overhead than you think.  Hence the reason I encourage users (for
> > Sagan/Snort rules) to use "content:" over "pcre:".  Because pcre adds
> > extra CPU overhead.
> >
> > I'm sure Rainer can explain better, and I know this has come up on
> the
> > list before, but adding regexp/libprec to the mix will actually make
> it
> > more complex and less efficient. Efficiency is the name of the game
> > here.
> >
> > I actually think of liblognorm as more of a "mask" than a rule.  That
> > is, if my log is:
> 
> The thing to remember is that liblognorm is creating a parse tree, not
> a
> set of regex rules to match.
> 
> So it's not evaluating the rules one at a time as each line arrives.
> 
> Instead it's evaluating them all at the same time.
> 
> It's essentially creating a mini program where it looks at the first
> character of the input and says 'this character means that it could
> match
> this set of rules, but not this other set', then it looks at the next
> character and says 'of the rules that were possible after the last
> step,
> this set is still possible' and repeats this until there is only one
> rule
> left in the 'possible' set. Then it goes through that rule to assign
> values to variables.
> 
> This process makes it so that it takes very close to the same amount of
> time to evaluate a large number of rules as a small number of rules.

That's an excellent description, David actually said it better than I could
;)

rainer