[Lognorm] regex engine for lognorm

Wed Apr 30 07:45:35 CEST 2014

Hello.

Liblognorm is not a regex engine, nor it tries to be. For the sake of 
flexibility, it actually implements some sort of backtracking while 
searching the tree. It is needed to match variable fields to parsers. 
But it performs its best when the backtracking is not used.

This way of doing things is not as flexible as regex, but it is orders 
of magnitude faster than conventional regex. Believe me, I've tested it.

In your particular example, you may match select clause with "char-to" 
parser.

--
Pavel Levshin

30.04.2014 2:56, Xuri Nagarin:
> On Mon, Apr 28, 2014 at 1:13 PM, David Lang <david at lang.hm> wrote:
>> liblognorm has a completely different way of operating than you are
>> envisoning.
>>
>> It compiles all the rules into a parse tree and it walks that parse tree
>> _once_ and has the log identified
>>
>>
>> as a trivial example
>>
>> if you have the following 'rules'
>>
>> 1. approximately
>> 2. apart
>> 3. apple
>>
>> liblognorm would create a tree
>>
>> ap - art
>>    \- p - roximately
>>       \ - le
>>
>> so when it tries to match apple, it's not three full comparisons, it starts
>> at the beginning, sees 'ap', then it looks and sees that the next character
>> is a 'p' and go down that branch, then see that the next character is a 'l'
>> and go down that branch and match the 3 and say "this is rule 3"
>>
>> With this sort of matching, the number of rules has virtually no impact on
>> the parsing speed, it's just the length of what you are matching.
> I think I understand the disconnect or lack of my understanding here.
> Re-reading the liblognorm documentation, I see that you are
> implementing a subset of the regex language by defining tokens as
> "word", "number" or "ipv4". These are pre-packaged regex expressions.
> Supporting a basic set of regex allows you to avoid creating a full
> blown regex engine and lets you implement a faster parsing mechanism
> like parse trees. But I am wondering if this simplification comes at a
> cost of flexibility?
>
> Take for example this log line that I want to break up into key/value pairs:
> 2014-04-29T21:24:42+00:00 hostnameA.abc.com Oracle Audit[31611]:
> LENGTH : '172' ACTION :[021] 'select * from products' DATABASE
> USER:[3] 'sys' PRIVILEGE :[6] 'SYSDBA' CLIENT USER:[6] 'oracle' CLIENT
> TERMINAL:[0] '' STATUS:[1] '0' DBID:[10] '2796591309'
>
> /ACTION :[021] 'select * from products'/ needs to get normalized to
> "action=/'select * from products'/"
>
> The action or sql text can be of varying length and have varying
> number of whitespaces between two keywords. ' select * from products'
> is just as valid as 'select       * from    products'. If I am doing
> regex, I can use /.+/ followed by the string that is expected to
> succeed the "action" value. In this case, use "DATABASE USER:" as a
> boundary where "action" ends. Of course, this is easily doable in
> regex but not sure of how liblognorm rule language handles it.
>
>