[Lognorm] getting started document

Champ Clark III [Softwink] champ at softwink.com
Wed Jan 19 15:53:04 CET 2011


> > based on another thread, some additional things that are needed (or at
> > least need to be documented)
> > 
> > how to escape a '%'
> > 
> > it would be good to have some regex features available for the rules.
> 
> I know this seems desirable, but it is very dangerous. The speed of
> liblognorm depends on very fast processing and limited backtracking. The more
> regexpes are used, the more you get into the "normal" slow processing. I
> agree there is a need for this, but I really intend to make it hard to use
> regexpes so that they are only used if there is no way around. Otherwise, I
> fear people turn back the familiar regexpes, because they can be used to do
> what all other parses do. But they do this at an immense performance cost - a
> cost so high that it can actually ruin the base concept. Having said that, I
> am not even sure if I really intend to support them at all...

	I'm with Rainer on this one.   I know we talk about log
normalization,  but really,  to me,  lognorm does more "masking" of the
log entires to extract useable/desired information out of.   I haven't
seen a situation __YET__ that regex/pcre would be "helpful".   That's
not to say it _wouldn't_ be,  but I've just not seen it. 

> > 
> > ones that I think are very useful
> > 
> > alternate words (this|that|other)
> 
> *Very* evil -- you can do this with separate rules.

	I ran into a similar situation yesterday.... Like this:

Domain of sender example.com does not exist|resolv 

	Break it down to one rule:

Domain of sender example.com does not %reason:word%

	While probably not desirable 100% of the time,  it'll allow
you to deal with the this|that|other situation.  It _might_ be nice
to do something like...

Domain of sender example.com does not %mutli-word:resolv|exist%

	That could be slick :)

> > in word matches, being able to specify that this must match a pattern
> > as
> > well (eth* or [sh]da[0-9]* for example)
> 
> Yeah, that's a possibility.

	We're getting back into regex :)

> > I don't know how far it makes sense to go, but if the regex can be
> > compiled down to the parse tree with the other rules it shouldn't hurt
> > performance, and if we can use regex rules that people have created for
> > other tools, it should jumpstart the ruleset development.
> 
> I am with you on the jumpstart, but if we base on regexpes, we can use the
> other tools in the first place (because they are probably not bad, just
> comparatively slow).
> 
> But I think I now got your full message: what I said is true iff a parser is
> build using the regular regexp libraries. If liblognorm compiles the regexp
> itself into the parse tree, the problem does not exist. This sounds very
> good, but obviously is quite some work to do. But I keep it on my mind and
> will think how I could best implement it. Something like the multi-word
> functionality could probably be done via a preprocessing stage with
> relatively little effort. Thanks for the useful idea!

	I do like the multi-word idea.   That'd be pretty nice.  I can
only speak from experience of PCRE ( ie - www.pcre.org ),  but even if
you take the "rule" and pre - pcre_compile down the patterns,  you still 
take a preformance hit.  On the surface,  it doesn't look like much.
However,  once you start thowing millions of log lines at it,  that's
when you'll notice.  This is the same reason software like Snort
(IPS/IDS) prefer the use of "content:" over "pcre:",  when possible. 

	I've toyed with this in my head a little bit,  and I can see in 
certain situation where pcre/regex might be useful.  My fear is that 
lognorm will end up as 'yet another regex log parser'.   I'm just not
100% sure if regexp will be worth it (?).  May or may not be. 

-- 
        Champ Clark III | Softwink, Inc | 800-538-9357 x 101
                     http://www.softwink.com

GPG Key ID: 58A2A58F
Key fingerprint = 7734 2A1C 007D 581E BDF7  6AD5 0F1F 655F 58A2 A58F
If it wasn't for C, we'd be using BASI, PASAL and OBOL.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20110119/76461d1a/attachment.pgp>


More information about the Lognorm mailing list