[Lognorm] Tokenized-multivalue field-type for liblognorm

Fri Oct 31 07:46:55 CET 2014

On Fri, Oct 31, 2014 at 12:05 PM, Pavel Levshin <pavel at levshin.spb.ru>
wrote:

>  Hi,
>
> I'll look at this little later.
>
> Do you use it in production? Is this (JSON arrays) compatible with
> lognormalizer tool? Can a %tokenized field contain another %tokenized
> fields (i.e., allow for recursion)? Would you write some docs on the
> feature?
>
> Why do you use 'const' modifier for non-pointer arguments, for example,
> 'const unsigned char c'?
>
>
> --
> Pavel
>
>
>
> 30.10.2014 14:03, singh.janmejay:
>
>  Hi,
>
> This patch-set introduces a log-norm field-type called tokenized, which
> allows parsing of token-separated values.
>
> A lot of applications such as nginx write fields in logs that are
> comma+space separated etc. For instance, nginx upstream_addrs field writes
> comma-separated ip+port combinations to access logs.
>
> Parsing such logs takes significant amount of regex and exec-template work
> and leads to rather ugly solution for something as simple as tokenized
> string.
>
> With this patch, parsing a list of ip-addresses separated by ', '(comma +
> space) for instance, would require a rule similar to:
>
> rule=ips:%my_ips:tokenized:, :ipv4%
>
> This requires a small patch to libestr as well, so this mail has 3 patches
> attached.
>
> libestr patch:
>
> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>
> liblognorm patch:
>
> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>
> Patches go in order of prefix-number.
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>
>
> _______________________________________________
> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm
>
>
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>
Const modifier for non-pointer args is just habit, its not intentional.

I have done a lot of testing locally(on my box), but its not on my prod
cluster yet.

Tokenizer followed by tokenizer is something that I have in mind too. But I
promised myself that i'd write a test for that instead of testing it
manually :-). Will add that patch on this thread once I get a chance to
work on it.

However, since you are asking about those kind of forms, let met discuss
something else that I was thinking about.

The idea is to have another field type called recurse.

Similar to how tokenized uses a ctx to parse matching text, recurse will
parse it using the current context. AFAIK, the context is stateless, so I
don't see any problems with that. I also plan to support tag based picking
of which rules the text may match, and if it matches something else, it
should be considered no-match.

Instead of typing it out here, i'll attach a picture I took after thinking
through it briefly(i'll attach it to the next mail).

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/7f6b056a/attachment.html>