[Lognorm] Tokenized-multivalue field-type for liblognorm

Fri Oct 31 02:56:08 CET 2014

On Fri, Oct 31, 2014 at 5:12 AM, David Lang <david at lang.hm> wrote:

> On Thu, 30 Oct 2014, singh.janmejay wrote:
>
>  Hi,
>>
>> This patch-set introduces a log-norm field-type called tokenized, which
>> allows parsing of token-separated values.
>>
>> A lot of applications such as nginx write fields in logs that are
>> comma+space separated etc. For instance, nginx upstream_addrs field writes
>> comma-separated ip+port combinations to access logs.
>>
>> Parsing such logs takes significant amount of regex and exec-template work
>> and leads to rather ugly solution for something as simple as tokenized
>> string.
>>
>> With this patch, parsing a list of ip-addresses separated by ', '(comma +
>> space) for instance, would require a rule similar to:
>>
>> rule=ips:%my_ips:tokenized:, :ipv4%
>>
>
> What terminates the list?
>
> It looks like this allows multi-character tokens, is that correct?
>
> David Lang
>
>
>  This requires a small patch to libestr as well, so this mail has 3 patches
>> attached.
>>
>> libestr patch:
>>
>> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>>
>> liblognorm patch:
>>
>> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
>> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>>
>> Patches go in order of prefix-number.
>>
>>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>
Yes, it does allow multiple chars.

The match may be stopped because of one of following three reasons:

1. Last set of chars matched the tokenizer, but next set of characters
don't match the field-type (as in, they don't match ipv4)
Eg.
text: "10, 20, 30, abcd"
match stops at: "... 30"
remaining text: ", abcd"

2. The next set of chars don't match the tokenizer
Eg.
text: "10, 20 30, abcd"
match stops at: "...20"
remaining text: " 30, abcd"

3. Parser reaches EOL.

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/a423e703/attachment.html>