From singh.janmejay at gmail.com  Thu Oct 30 12:03:01 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Thu, 30 Oct 2014 16:33:01 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
Message-ID: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>

Hi,

This patch-set introduces a log-norm field-type called tokenized, which
allows parsing of token-separated values.

A lot of applications such as nginx write fields in logs that are
comma+space separated etc. For instance, nginx upstream_addrs field writes
comma-separated ip+port combinations to access logs.

Parsing such logs takes significant amount of regex and exec-template work
and leads to rather ugly solution for something as simple as tokenized
string.

With this patch, parsing a list of ip-addresses separated by ', '(comma +
space) for instance, would require a rule similar to:

rule=ips:%my_ips:tokenized:, :ipv4%

This requires a small patch to libestr as well, so this mail has 3 patches
attached.

libestr patch:

0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch

liblognorm patch:

0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
0002-added-support-for-field_type-tokenized-which-parses-.patch

Patches go in order of prefix-number.

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/a2fd5d8e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-added-support-for-field_type-tokenized-which-parses-.patch
Type: text/x-patch
Size: 9316 bytes
Desc: not available
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/a2fd5d8e/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
Type: text/x-patch
Size: 3202 bytes
Desc: not available
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/a2fd5d8e/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
Type: text/x-patch
Size: 8336 bytes
Desc: not available
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/a2fd5d8e/attachment-0005.bin>

From singh.janmejay at gmail.com  Thu Oct 30 12:11:03 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Thu, 30 Oct 2014 16:41:03 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
Message-ID: <CAGB1Vvx_9bsMH-4E9PQ8qr6MZcjrhZLZBqA8f7K9JO+F65tEpg@mail.gmail.com>

The token-string can be escaped using the same mechanism as char-to. Eg.
\x3a for colon(:) etc.

Also, the tokenized field-type allows user to pick the field-type of each
field on tokenized-fragment and it produces a multi-valued variable(its a
json-array), similar to event.tags.


On Thu, Oct 30, 2014 at 4:33 PM, singh.janmejay <singh.janmejay at gmail.com>
wrote:

> Hi,
>
> This patch-set introduces a log-norm field-type called tokenized, which
> allows parsing of token-separated values.
>
> A lot of applications such as nginx write fields in logs that are
> comma+space separated etc. For instance, nginx upstream_addrs field writes
> comma-separated ip+port combinations to access logs.
>
> Parsing such logs takes significant amount of regex and exec-template work
> and leads to rather ugly solution for something as simple as tokenized
> string.
>
> With this patch, parsing a list of ip-addresses separated by ', '(comma +
> space) for instance, would require a rule similar to:
>
> rule=ips:%my_ips:tokenized:, :ipv4%
>
> This requires a small patch to libestr as well, so this mail has 3 patches
> attached.
>
> libestr patch:
>
> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>
> liblognorm patch:
>
> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>
> Patches go in order of prefix-number.
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/5a254ee7/attachment.html>

From david at lang.hm  Fri Oct 31 00:42:40 2014
From: david at lang.hm (David Lang)
Date: Thu, 30 Oct 2014 16:42:40 -0700 (PDT)
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1410301640390.26139@nftneq.ynat.uz>

On Thu, 30 Oct 2014, singh.janmejay wrote:

> Hi,
>
> This patch-set introduces a log-norm field-type called tokenized, which
> allows parsing of token-separated values.
>
> A lot of applications such as nginx write fields in logs that are
> comma+space separated etc. For instance, nginx upstream_addrs field writes
> comma-separated ip+port combinations to access logs.
>
> Parsing such logs takes significant amount of regex and exec-template work
> and leads to rather ugly solution for something as simple as tokenized
> string.
>
> With this patch, parsing a list of ip-addresses separated by ', '(comma +
> space) for instance, would require a rule similar to:
>
> rule=ips:%my_ips:tokenized:, :ipv4%

What terminates the list?

It looks like this allows multi-character tokens, is that correct?

David Lang

> This requires a small patch to libestr as well, so this mail has 3 patches
> attached.
>
> libestr patch:
>
> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>
> liblognorm patch:
>
> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>
> Patches go in order of prefix-number.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-added-support-for-field_type-tokenized-which-parses-.patch
Type: text/x-patch
Size: 9316 bytes
Desc: 
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/60f5986d/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
Type: text/x-patch
Size: 3202 bytes
Desc: 
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/60f5986d/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
Type: text/x-patch
Size: 8336 bytes
Desc: 
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141030/60f5986d/attachment-0005.bin>
-------------- next part --------------
_______________________________________________
Lognorm mailing list
Lognorm at lists.adiscon.com
http://lists.adiscon.net/mailman/listinfo/lognorm

From singh.janmejay at gmail.com  Fri Oct 31 02:56:08 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 07:26:08 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <alpine.DEB.2.02.1410301640390.26139@nftneq.ynat.uz>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<alpine.DEB.2.02.1410301640390.26139@nftneq.ynat.uz>
Message-ID: <CAGB1Vvwzm5jfdSu8Mo-Bcnhr-_j6Dh1d5jakNbArb4tYfeCSSA@mail.gmail.com>

On Fri, Oct 31, 2014 at 5:12 AM, David Lang <david at lang.hm> wrote:

> On Thu, 30 Oct 2014, singh.janmejay wrote:
>
>  Hi,
>>
>> This patch-set introduces a log-norm field-type called tokenized, which
>> allows parsing of token-separated values.
>>
>> A lot of applications such as nginx write fields in logs that are
>> comma+space separated etc. For instance, nginx upstream_addrs field writes
>> comma-separated ip+port combinations to access logs.
>>
>> Parsing such logs takes significant amount of regex and exec-template work
>> and leads to rather ugly solution for something as simple as tokenized
>> string.
>>
>> With this patch, parsing a list of ip-addresses separated by ', '(comma +
>> space) for instance, would require a rule similar to:
>>
>> rule=ips:%my_ips:tokenized:, :ipv4%
>>
>
> What terminates the list?
>
> It looks like this allows multi-character tokens, is that correct?
>
> David Lang
>
>
>  This requires a small patch to libestr as well, so this mail has 3 patches
>> attached.
>>
>> libestr patch:
>>
>> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>>
>> liblognorm patch:
>>
>> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
>> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>>
>> Patches go in order of prefix-number.
>>
>>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>
Yes, it does allow multiple chars.

The match may be stopped because of one of following three reasons:

1. Last set of chars matched the tokenizer, but next set of characters
don't match the field-type (as in, they don't match ipv4)
Eg.
text: "10, 20, 30, abcd"
match stops at: "... 30"
remaining text: ", abcd"

2. The next set of chars don't match the tokenizer
Eg.
text: "10, 20 30, abcd"
match stops at: "...20"
remaining text: " 30, abcd"

3. Parser reaches EOL.

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/a423e703/attachment.html>

From pavel at levshin.spb.ru  Fri Oct 31 07:35:17 2014
From: pavel at levshin.spb.ru (Pavel Levshin)
Date: Fri, 31 Oct 2014 09:35:17 +0300
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
Message-ID: <54532DA5.8030107@levshin.spb.ru>

Hi,

I'll look at this little later.

Do you use it in production? Is this (JSON arrays) compatible with 
lognormalizer tool? Can a %tokenized field contain another %tokenized 
fields (i.e., allow for recursion)? Would you write some docs on the 
feature?

Why do you use 'const' modifier for non-pointer arguments, for example, 
'const unsigned char c'?


--
Pavel


30.10.2014 14:03, singh.janmejay:
> Hi,
>
> This patch-set introduces a log-norm field-type called tokenized, 
> which allows parsing of token-separated values.
>
> A lot of applications such as nginx write fields in logs that are 
> comma+space separated etc. For instance, nginx upstream_addrs field 
> writes comma-separated ip+port combinations to access logs.
>
> Parsing such logs takes significant amount of regex and exec-template 
> work and leads to rather ugly solution for something as simple as 
> tokenized string.
>
> With this patch, parsing a list of ip-addresses separated by ', 
> '(comma + space) for instance, would require a rule similar to:
>
> rule=ips:%my_ips:tokenized:, :ipv4%
>
> This requires a small patch to libestr as well, so this mail has 3 
> patches attached.
>
> libestr patch:
>
> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>
> liblognorm patch:
>
> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>
> Patches go in order of prefix-number.
>
> -- 
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/eee33103/attachment.html>

From singh.janmejay at gmail.com  Fri Oct 31 07:46:55 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 12:16:55 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <54532DA5.8030107@levshin.spb.ru>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
Message-ID: <CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>

On Fri, Oct 31, 2014 at 12:05 PM, Pavel Levshin <pavel at levshin.spb.ru>
wrote:

>  Hi,
>
> I'll look at this little later.
>
> Do you use it in production? Is this (JSON arrays) compatible with
> lognormalizer tool? Can a %tokenized field contain another %tokenized
> fields (i.e., allow for recursion)? Would you write some docs on the
> feature?
>
> Why do you use 'const' modifier for non-pointer arguments, for example,
> 'const unsigned char c'?
>
>
> --
> Pavel
>
>
>
> 30.10.2014 14:03, singh.janmejay:
>
>  Hi,
>
> This patch-set introduces a log-norm field-type called tokenized, which
> allows parsing of token-separated values.
>
> A lot of applications such as nginx write fields in logs that are
> comma+space separated etc. For instance, nginx upstream_addrs field writes
> comma-separated ip+port combinations to access logs.
>
> Parsing such logs takes significant amount of regex and exec-template work
> and leads to rather ugly solution for something as simple as tokenized
> string.
>
> With this patch, parsing a list of ip-addresses separated by ', '(comma +
> space) for instance, would require a rule similar to:
>
> rule=ips:%my_ips:tokenized:, :ipv4%
>
> This requires a small patch to libestr as well, so this mail has 3 patches
> attached.
>
> libestr patch:
>
> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>
> liblognorm patch:
>
> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>
> Patches go in order of prefix-number.
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>
>
> _______________________________________________
> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm
>
>
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>
Const modifier for non-pointer args is just habit, its not intentional.

I have done a lot of testing locally(on my box), but its not on my prod
cluster yet.

Tokenizer followed by tokenizer is something that I have in mind too. But I
promised myself that i'd write a test for that instead of testing it
manually :-). Will add that patch on this thread once I get a chance to
work on it.

However, since you are asking about those kind of forms, let met discuss
something else that I was thinking about.

The idea is to have another field type called recurse.

Similar to how tokenized uses a ctx to parse matching text, recurse will
parse it using the current context. AFAIK, the context is stateless, so I
don't see any problems with that. I also plan to support tag based picking
of which rules the text may match, and if it matches something else, it
should be considered no-match.

Instead of typing it out here, i'll attach a picture I took after thinking
through it briefly(i'll attach it to the next mail).

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/7f6b056a/attachment.html>

From singh.janmejay at gmail.com  Fri Oct 31 08:34:01 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 13:04:01 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1VvywD7wFdX9kpsd89HNb=4b223ytDQa1nRxUHsuXdAE3og@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<CAGB1VvywD7wFdX9kpsd89HNb=4b223ytDQa1nRxUHsuXdAE3og@mail.gmail.com>
Message-ID: <CAGB1VvzDxVERT0=9yqsORg6gGTbQ6sc-U099P1W3=KkjikYqiw@mail.gmail.com>

Ok, last mail was caught for moderation because of large attachment size.

Here is the url to the image:
https://drive.google.com/file/d/0B_XhUZLNFT4dN3RqdGE2VmN5UW1lMDZITkN4WW5wUUxQOE9F/view?usp=sharing

On Fri, Oct 31, 2014 at 12:19 PM, singh.janmejay <singh.janmejay at gmail.com>
wrote:

> Here is the image. I'll type it out if it's illegible.
>
> --
> Regards,
> Janmejay
>
> PS: Please blame the typos in this mail on my phone's uncivilized soft
> keyboard sporting it's not-so-smart-assist technology.
>
> On Oct 31, 2014 12:16 PM, "singh.janmejay" <singh.janmejay at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Oct 31, 2014 at 12:05 PM, Pavel Levshin <pavel at levshin.spb.ru>
>> wrote:
>>
>>>  Hi,
>>>
>>> I'll look at this little later.
>>>
>>> Do you use it in production? Is this (JSON arrays) compatible with
>>> lognormalizer tool? Can a %tokenized field contain another %tokenized
>>> fields (i.e., allow for recursion)? Would you write some docs on the
>>> feature?
>>>
>>> Why do you use 'const' modifier for non-pointer arguments, for example,
>>> 'const unsigned char c'?
>>>
>>>
>>> --
>>> Pavel
>>>
>>>
>>>
>>> 30.10.2014 14:03, singh.janmejay:
>>>
>>>  Hi,
>>>
>>> This patch-set introduces a log-norm field-type called tokenized, which
>>> allows parsing of token-separated values.
>>>
>>> A lot of applications such as nginx write fields in logs that are
>>> comma+space separated etc. For instance, nginx upstream_addrs field writes
>>> comma-separated ip+port combinations to access logs.
>>>
>>> Parsing such logs takes significant amount of regex and exec-template
>>> work and leads to rather ugly solution for something as simple as tokenized
>>> string.
>>>
>>> With this patch, parsing a list of ip-addresses separated by ', '(comma
>>> + space) for instance, would require a rule similar to:
>>>
>>> rule=ips:%my_ips:tokenized:, :ipv4%
>>>
>>> This requires a small patch to libestr as well, so this mail has 3
>>> patches attached.
>>>
>>> libestr patch:
>>>
>>> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch
>>>
>>> liblognorm patch:
>>>
>>> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch
>>> 0002-added-support-for-field_type-tokenized-which-parses-.patch
>>>
>>> Patches go in order of prefix-number.
>>>
>>> --
>>> Regards,
>>> Janmejay
>>> http://codehunk.wordpress.com
>>>
>>>
>>> _______________________________________________
>>> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>>
>>>
>>> _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>>
>> Const modifier for non-pointer args is just habit, its not intentional.
>>
>> I have done a lot of testing locally(on my box), but its not on my prod
>> cluster yet.
>>
>> Tokenizer followed by tokenizer is something that I have in mind too. But
>> I promised myself that i'd write a test for that instead of testing it
>> manually :-). Will add that patch on this thread once I get a chance to
>> work on it.
>>
>> However, since you are asking about those kind of forms, let met discuss
>> something else that I was thinking about.
>>
>> The idea is to have another field type called recurse.
>>
>> Similar to how tokenized uses a ctx to parse matching text, recurse will
>> parse it using the current context. AFAIK, the context is stateless, so I
>> don't see any problems with that. I also plan to support tag based picking
>> of which rules the text may match, and if it matches something else, it
>> should be considered no-match.
>>
>> Instead of typing it out here, i'll attach a picture I took after
>> thinking through it briefly(i'll attach it to the next mail).
>>
>> --
>> Regards,
>> Janmejay
>> http://codehunk.wordpress.com
>>
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/d7719b73/attachment-0001.html>

From david at lang.hm  Fri Oct 31 10:53:42 2014
From: david at lang.hm (David Lang)
Date: Fri, 31 Oct 2014 02:53:42 -0700 (PDT)
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>

On Fri, 31 Oct 2014, singh.janmejay wrote:

> Tokenizer followed by tokenizer is something that I have in mind too. But I
> promised myself that i'd write a test for that instead of testing it
> manually :-). Will add that patch on this thread once I get a chance to
> work on it.

At least in the short term, you can use the ability to call mmnormalize on a 
variable to parse subvariables.

How are the resulting fields addressed? Rsyslog hasn't had array addressing yet.

David Lang

> However, since you are asking about those kind of forms, let met discuss
> something else that I was thinking about.
>
> The idea is to have another field type called recurse.
>
> Similar to how tokenized uses a ctx to parse matching text, recurse will
> parse it using the current context. AFAIK, the context is stateless, so I
> don't see any problems with that. I also plan to support tag based picking
> of which rules the text may match, and if it matches something else, it
> should be considered no-match.
>
> Instead of typing it out here, i'll attach a picture I took after thinking
> through it briefly(i'll attach it to the next mail).
>
>
-------------- next part --------------
_______________________________________________
Lognorm mailing list
Lognorm at lists.adiscon.com
http://lists.adiscon.net/mailman/listinfo/lognorm

From singh.janmejay at gmail.com  Fri Oct 31 11:06:21 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 15:36:21 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
Message-ID: <CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>

It writes it as a json array, here is a fragment from my manual tests:

[ "15", "26", "15" ]

It was using time in hh:mm:ss format and tokening by colon(:). I'll add
tests for it soon, but until then pasting output here is the best I can do.

The idea behind this is to generate structured content from semi-structured
or unstructured log messages. So array is a good representation for
tokenized-value (it is multi-valued by nature, and array is a good way to
represent that).

But eventually we should allow user to register value-transformers so that
it can be pre-processed before its emitted. May be have a canned set of
transformers, and allow user to plug in new ones.

My first instinct was to utilize variable support for this, infact this was
the motivator for variable support. But it still leads to a fairly complex
config for an access log with 15 - 20 fields, especially given those fields
can have colon separated entries inside comma separated entries etc.

So I felt the need for a simpler way of doing it, hence this and other
(recurse) field-type.

On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:

> On Fri, 31 Oct 2014, singh.janmejay wrote:
>
>  Tokenizer followed by tokenizer is something that I have in mind too. But
>> I
>> promised myself that i'd write a test for that instead of testing it
>> manually :-). Will add that patch on this thread once I get a chance to
>> work on it.
>>
>
> At least in the short term, you can use the ability to call mmnormalize on
> a variable to parse subvariables.
>
> How are the resulting fields addressed? Rsyslog hasn't had array
> addressing yet.
>
> David Lang
>
>
>  However, since you are asking about those kind of forms, let met discuss
>> something else that I was thinking about.
>>
>> The idea is to have another field type called recurse.
>>
>> Similar to how tokenized uses a ctx to parse matching text, recurse will
>> parse it using the current context. AFAIK, the context is stateless, so I
>> don't see any problems with that. I also plan to support tag based picking
>> of which rules the text may match, and if it matches something else, it
>> should be considered no-match.
>>
>> Instead of typing it out here, i'll attach a picture I took after thinking
>> through it briefly(i'll attach it to the next mail).
>>
>>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/ce965713/attachment.html>

From david at lang.hm  Fri Oct 31 11:14:33 2014
From: david at lang.hm (David Lang)
Date: Fri, 31 Oct 2014 03:14:33 -0700 (PDT)
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>

On Fri, 31 Oct 2014, singh.janmejay wrote:

> It writes it as a json array, here is a fragment from my manual tests:
>
> [ "15", "26", "15" ]

right, but how do you access it in rsyslog?

if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and get the 
result '10'

what would you use to access the value '26' in your example?

we also don't have anything like foreach() in our template language, which makes 
it hard to make use of these values as anything other than a JSON string.

I'm not saying that it's not useful, but I am pointing out the problems that we 
will have using it.

David Lang

> It was using time in hh:mm:ss format and tokening by colon(:). I'll add
> tests for it soon, but until then pasting output here is the best I can do.
>
> The idea behind this is to generate structured content from semi-structured
> or unstructured log messages. So array is a good representation for
> tokenized-value (it is multi-valued by nature, and array is a good way to
> represent that).
>
> But eventually we should allow user to register value-transformers so that
> it can be pre-processed before its emitted. May be have a canned set of
> transformers, and allow user to plug in new ones.
>
> My first instinct was to utilize variable support for this, infact this was
> the motivator for variable support. But it still leads to a fairly complex
> config for an access log with 15 - 20 fields, especially given those fields
> can have colon separated entries inside comma separated entries etc.
>
> So I felt the need for a simpler way of doing it, hence this and other
> (recurse) field-type.
>
> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>
>> On Fri, 31 Oct 2014, singh.janmejay wrote:
>>
>>  Tokenizer followed by tokenizer is something that I have in mind too. But
>>> I
>>> promised myself that i'd write a test for that instead of testing it
>>> manually :-). Will add that patch on this thread once I get a chance to
>>> work on it.
>>>
>>
>> At least in the short term, you can use the ability to call mmnormalize on
>> a variable to parse subvariables.
>>
>> How are the resulting fields addressed? Rsyslog hasn't had array
>> addressing yet.
>>
>> David Lang
>>
>>
>>  However, since you are asking about those kind of forms, let met discuss
>>> something else that I was thinking about.
>>>
>>> The idea is to have another field type called recurse.
>>>
>>> Similar to how tokenized uses a ctx to parse matching text, recurse will
>>> parse it using the current context. AFAIK, the context is stateless, so I
>>> don't see any problems with that. I also plan to support tag based picking
>>> of which rules the text may match, and if it matches something else, it
>>> should be considered no-match.
>>>
>>> Instead of typing it out here, i'll attach a picture I took after thinking
>>> through it briefly(i'll attach it to the next mail).
>>>
>>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>>
>
>
>
-------------- next part --------------
_______________________________________________
Lognorm mailing list
Lognorm at lists.adiscon.com
http://lists.adiscon.net/mailman/listinfo/lognorm

From singh.janmejay at gmail.com  Fri Oct 31 11:25:34 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 15:55:34 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
	<alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
Message-ID: <CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>

Yes, I didn't have a need to address tokens individually, but you have a
point.

Any suggestions on what we want to do for addressing array elements?

I wonder if its possible to do in $!... notation without breaking backward
compatibility. How about a function?

I'll be happy to implement support for addressing it in $!... notation if
don't mind breaking a corner case in backward compatibility. Eg.
$!foo!bar![0] ? Its kinda ugly though, or so I think.

On Fri, Oct 31, 2014 at 3:44 PM, David Lang <david at lang.hm> wrote:

> On Fri, 31 Oct 2014, singh.janmejay wrote:
>
>  It writes it as a json array, here is a fragment from my manual tests:
>>
>> [ "15", "26", "15" ]
>>
>
> right, but how do you access it in rsyslog?
>
> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and
> get the result '10'
>
> what would you use to access the value '26' in your example?
>
> we also don't have anything like foreach() in our template language, which
> makes it hard to make use of these values as anything other than a JSON
> string.
>
> I'm not saying that it's not useful, but I am pointing out the problems
> that we will have using it.
>
> David Lang
>
>
>  It was using time in hh:mm:ss format and tokening by colon(:). I'll add
>> tests for it soon, but until then pasting output here is the best I can
>> do.
>>
>> The idea behind this is to generate structured content from
>> semi-structured
>> or unstructured log messages. So array is a good representation for
>> tokenized-value (it is multi-valued by nature, and array is a good way to
>> represent that).
>>
>> But eventually we should allow user to register value-transformers so that
>> it can be pre-processed before its emitted. May be have a canned set of
>> transformers, and allow user to plug in new ones.
>>
>> My first instinct was to utilize variable support for this, infact this
>> was
>> the motivator for variable support. But it still leads to a fairly complex
>> config for an access log with 15 - 20 fields, especially given those
>> fields
>> can have colon separated entries inside comma separated entries etc.
>>
>> So I felt the need for a simpler way of doing it, hence this and other
>> (recurse) field-type.
>>
>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>>
>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>
>>>  Tokenizer followed by tokenizer is something that I have in mind too.
>>> But
>>>
>>>> I
>>>> promised myself that i'd write a test for that instead of testing it
>>>> manually :-). Will add that patch on this thread once I get a chance to
>>>> work on it.
>>>>
>>>>
>>> At least in the short term, you can use the ability to call mmnormalize
>>> on
>>> a variable to parse subvariables.
>>>
>>> How are the resulting fields addressed? Rsyslog hasn't had array
>>> addressing yet.
>>>
>>> David Lang
>>>
>>>
>>>  However, since you are asking about those kind of forms, let met discuss
>>>
>>>> something else that I was thinking about.
>>>>
>>>> The idea is to have another field type called recurse.
>>>>
>>>> Similar to how tokenized uses a ctx to parse matching text, recurse will
>>>> parse it using the current context. AFAIK, the context is stateless, so
>>>> I
>>>> don't see any problems with that. I also plan to support tag based
>>>> picking
>>>> of which rules the text may match, and if it matches something else, it
>>>> should be considered no-match.
>>>>
>>>> Instead of typing it out here, i'll attach a picture I took after
>>>> thinking
>>>> through it briefly(i'll attach it to the next mail).
>>>>
>>>>
>>>>  _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>> _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>>
>>>
>>
>>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/21274db3/attachment-0001.html>

From david at lang.hm  Fri Oct 31 11:45:05 2014
From: david at lang.hm (David Lang)
Date: Fri, 31 Oct 2014 03:45:05 -0700 (PDT)
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
	<alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
	<CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1410310341220.6113@nftneq.ynat.uz>

On Fri, 31 Oct 2014, singh.janmejay wrote:

> Yes, I didn't have a need to address tokens individually, but you have a
> point.
>
> Any suggestions on what we want to do for addressing array elements?
>
> I wonder if its possible to do in $!... notation without breaking backward
> compatibility. How about a function?
>
> I'll be happy to implement support for addressing it in $!... notation if
> don't mind breaking a corner case in backward compatibility. Eg.
> $!foo!bar![0] ? Its kinda ugly though, or so I think.

I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It does mean 
that you can't have '[' in a variable name, but I don't think that's likely to 
be a real problem. I don't think there's ever a really clean way to do something 
like $!foo[2]!bar[2]!baz no matter what your syntax, it gets messy.

for templates, we probably need some sort of foreach(array, pattern) function 
that takes the pattern and repeats it for each item in the array.

David Lang

> On Fri, Oct 31, 2014 at 3:44 PM, David Lang <david at lang.hm> wrote:
>
>> On Fri, 31 Oct 2014, singh.janmejay wrote:
>>
>>  It writes it as a json array, here is a fragment from my manual tests:
>>>
>>> [ "15", "26", "15" ]
>>>
>>
>> right, but how do you access it in rsyslog?
>>
>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and
>> get the result '10'
>>
>> what would you use to access the value '26' in your example?
>>
>> we also don't have anything like foreach() in our template language, which
>> makes it hard to make use of these values as anything other than a JSON
>> string.
>>
>> I'm not saying that it's not useful, but I am pointing out the problems
>> that we will have using it.
>>
>> David Lang
>>
>>
>>  It was using time in hh:mm:ss format and tokening by colon(:). I'll add
>>> tests for it soon, but until then pasting output here is the best I can
>>> do.
>>>
>>> The idea behind this is to generate structured content from
>>> semi-structured
>>> or unstructured log messages. So array is a good representation for
>>> tokenized-value (it is multi-valued by nature, and array is a good way to
>>> represent that).
>>>
>>> But eventually we should allow user to register value-transformers so that
>>> it can be pre-processed before its emitted. May be have a canned set of
>>> transformers, and allow user to plug in new ones.
>>>
>>> My first instinct was to utilize variable support for this, infact this
>>> was
>>> the motivator for variable support. But it still leads to a fairly complex
>>> config for an access log with 15 - 20 fields, especially given those
>>> fields
>>> can have colon separated entries inside comma separated entries etc.
>>>
>>> So I felt the need for a simpler way of doing it, hence this and other
>>> (recurse) field-type.
>>>
>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>>>
>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>
>>>>  Tokenizer followed by tokenizer is something that I have in mind too.
>>>> But
>>>>
>>>>> I
>>>>> promised myself that i'd write a test for that instead of testing it
>>>>> manually :-). Will add that patch on this thread once I get a chance to
>>>>> work on it.
>>>>>
>>>>>
>>>> At least in the short term, you can use the ability to call mmnormalize
>>>> on
>>>> a variable to parse subvariables.
>>>>
>>>> How are the resulting fields addressed? Rsyslog hasn't had array
>>>> addressing yet.
>>>>
>>>> David Lang
>>>>
>>>>
>>>>  However, since you are asking about those kind of forms, let met discuss
>>>>
>>>>> something else that I was thinking about.
>>>>>
>>>>> The idea is to have another field type called recurse.
>>>>>
>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse will
>>>>> parse it using the current context. AFAIK, the context is stateless, so
>>>>> I
>>>>> don't see any problems with that. I also plan to support tag based
>>>>> picking
>>>>> of which rules the text may match, and if it matches something else, it
>>>>> should be considered no-match.
>>>>>
>>>>> Instead of typing it out here, i'll attach a picture I took after
>>>>> thinking
>>>>> through it briefly(i'll attach it to the next mail).
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> Lognorm mailing list
>>>> Lognorm at lists.adiscon.com
>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>
>>>> _______________________________________________
>>>> Lognorm mailing list
>>>> Lognorm at lists.adiscon.com
>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>>
>
>
>
-------------- next part --------------
_______________________________________________
Lognorm mailing list
Lognorm at lists.adiscon.com
http://lists.adiscon.net/mailman/listinfo/lognorm

From singh.janmejay at gmail.com  Fri Oct 31 14:08:59 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 18:38:59 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <alpine.DEB.2.02.1410310341220.6113@nftneq.ynat.uz>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
	<alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
	<CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>
	<alpine.DEB.2.02.1410310341220.6113@nftneq.ynat.uz>
Message-ID: <CAGB1Vvws8svhkHXowX59TqW__oZFqERXgRDC2peCfv13k03wYQ@mail.gmail.com>

Cool, I'll implement $!foo!bar[0].

Let us process this patch-set, because is kinda hard to keep track of old
patches and re-send in one shot.

i'll send the new patch once done(i'll now only get to work on it on
monday).

Do existing patches look ok except for the indexed-addressing feature?

On Fri, Oct 31, 2014 at 4:15 PM, David Lang <david at lang.hm> wrote:

> On Fri, 31 Oct 2014, singh.janmejay wrote:
>
>  Yes, I didn't have a need to address tokens individually, but you have a
>> point.
>>
>> Any suggestions on what we want to do for addressing array elements?
>>
>> I wonder if its possible to do in $!... notation without breaking backward
>> compatibility. How about a function?
>>
>> I'll be happy to implement support for addressing it in $!... notation if
>> don't mind breaking a corner case in backward compatibility. Eg.
>> $!foo!bar![0] ? Its kinda ugly though, or so I think.
>>
>
> I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It does
> mean that you can't have '[' in a variable name, but I don't think that's
> likely to be a real problem. I don't think there's ever a really clean way
> to do something like $!foo[2]!bar[2]!baz no matter what your syntax, it
> gets messy.
>
> for templates, we probably need some sort of foreach(array, pattern)
> function that takes the pattern and repeats it for each item in the array.
>
> David Lang
>
>
>  On Fri, Oct 31, 2014 at 3:44 PM, David Lang <david at lang.hm> wrote:
>>
>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>
>>>  It writes it as a json array, here is a fragment from my manual tests:
>>>
>>>>
>>>> [ "15", "26", "15" ]
>>>>
>>>>
>>> right, but how do you access it in rsyslog?
>>>
>>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and
>>> get the result '10'
>>>
>>> what would you use to access the value '26' in your example?
>>>
>>> we also don't have anything like foreach() in our template language,
>>> which
>>> makes it hard to make use of these values as anything other than a JSON
>>> string.
>>>
>>> I'm not saying that it's not useful, but I am pointing out the problems
>>> that we will have using it.
>>>
>>> David Lang
>>>
>>>
>>>  It was using time in hh:mm:ss format and tokening by colon(:). I'll add
>>>
>>>> tests for it soon, but until then pasting output here is the best I can
>>>> do.
>>>>
>>>> The idea behind this is to generate structured content from
>>>> semi-structured
>>>> or unstructured log messages. So array is a good representation for
>>>> tokenized-value (it is multi-valued by nature, and array is a good way
>>>> to
>>>> represent that).
>>>>
>>>> But eventually we should allow user to register value-transformers so
>>>> that
>>>> it can be pre-processed before its emitted. May be have a canned set of
>>>> transformers, and allow user to plug in new ones.
>>>>
>>>> My first instinct was to utilize variable support for this, infact this
>>>> was
>>>> the motivator for variable support. But it still leads to a fairly
>>>> complex
>>>> config for an access log with 15 - 20 fields, especially given those
>>>> fields
>>>> can have colon separated entries inside comma separated entries etc.
>>>>
>>>> So I felt the need for a simpler way of doing it, hence this and other
>>>> (recurse) field-type.
>>>>
>>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>>>>
>>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>
>>>>>
>>>>>  Tokenizer followed by tokenizer is something that I have in mind too.
>>>>> But
>>>>>
>>>>>  I
>>>>>> promised myself that i'd write a test for that instead of testing it
>>>>>> manually :-). Will add that patch on this thread once I get a chance
>>>>>> to
>>>>>> work on it.
>>>>>>
>>>>>>
>>>>>>  At least in the short term, you can use the ability to call
>>>>> mmnormalize
>>>>> on
>>>>> a variable to parse subvariables.
>>>>>
>>>>> How are the resulting fields addressed? Rsyslog hasn't had array
>>>>> addressing yet.
>>>>>
>>>>> David Lang
>>>>>
>>>>>
>>>>>  However, since you are asking about those kind of forms, let met
>>>>> discuss
>>>>>
>>>>>  something else that I was thinking about.
>>>>>>
>>>>>> The idea is to have another field type called recurse.
>>>>>>
>>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse
>>>>>> will
>>>>>> parse it using the current context. AFAIK, the context is stateless,
>>>>>> so
>>>>>> I
>>>>>> don't see any problems with that. I also plan to support tag based
>>>>>> picking
>>>>>> of which rules the text may match, and if it matches something else,
>>>>>> it
>>>>>> should be considered no-match.
>>>>>>
>>>>>> Instead of typing it out here, i'll attach a picture I took after
>>>>>> thinking
>>>>>> through it briefly(i'll attach it to the next mail).
>>>>>>
>>>>>>
>>>>>>  _______________________________________________
>>>>>>
>>>>> Lognorm mailing list
>>>>> Lognorm at lists.adiscon.com
>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>
>>>>> _______________________________________________
>>>>> Lognorm mailing list
>>>>> Lognorm at lists.adiscon.com
>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>  _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>> _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>>
>>>
>>
>>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/c3fd0f77/attachment.html>

From rgerhards at hq.adiscon.com  Fri Oct 31 14:12:29 2014
From: rgerhards at hq.adiscon.com (Rainer Gerhards)
Date: Fri, 31 Oct 2014 14:12:29 +0100
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CAGB1Vvws8svhkHXowX59TqW__oZFqERXgRDC2peCfv13k03wYQ@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
	<alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
	<CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>
	<alpine.DEB.2.02.1410310341220.6113@nftneq.ynat.uz>
	<CAGB1Vvws8svhkHXowX59TqW__oZFqERXgRDC2peCfv13k03wYQ@mail.gmail.com>
Message-ID: <CADk+mPAKD0tS54a=9WfeKeKAsbs9q8B4kXEOHub1PZWwtHZCbg@mail.gmail.com>

2014-10-31 14:08 GMT+01:00 singh.janmejay <singh.janmejay at gmail.com>:

> Cool, I'll implement $!foo!bar[0].
>
> +1


> Let us process this patch-set, because is kinda hard to keep track of old
> patches and re-send in one shot.
>
>
would you mind cloning on github and maintain a feature branch there? That
would make it much easier for me, as I could merge the branch when you are
done. If not, it's no problem and I'll maintain that branch.


i'll send the new patch once done(i'll now only get to work on it on
> monday).
>
>
I haven't had a chance to look as I am now busy building test environments
and looking at the testbench [yes, one guy!] ;) But I see Pavel has asked
some questions. He recently did a lot of work on the lib,so it is best to
coordinate that part with him.

Rainer

Do existing patches look ok except for the indexed-addressing feature?
>
> On Fri, Oct 31, 2014 at 4:15 PM, David Lang <david at lang.hm> wrote:
>
>> On Fri, 31 Oct 2014, singh.janmejay wrote:
>>
>>  Yes, I didn't have a need to address tokens individually, but you have a
>>> point.
>>>
>>> Any suggestions on what we want to do for addressing array elements?
>>>
>>> I wonder if its possible to do in $!... notation without breaking
>>> backward
>>> compatibility. How about a function?
>>>
>>> I'll be happy to implement support for addressing it in $!... notation if
>>> don't mind breaking a corner case in backward compatibility. Eg.
>>> $!foo!bar![0] ? Its kinda ugly though, or so I think.
>>>
>>
>> I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It
>> does mean that you can't have '[' in a variable name, but I don't think
>> that's likely to be a real problem. I don't think there's ever a really
>> clean way to do something like $!foo[2]!bar[2]!baz no matter what your
>> syntax, it gets messy.
>>
>> for templates, we probably need some sort of foreach(array, pattern)
>> function that takes the pattern and repeats it for each item in the array.
>>
>> David Lang
>>
>>
>>  On Fri, Oct 31, 2014 at 3:44 PM, David Lang <david at lang.hm> wrote:
>>>
>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>
>>>>  It writes it as a json array, here is a fragment from my manual tests:
>>>>
>>>>>
>>>>> [ "15", "26", "15" ]
>>>>>
>>>>>
>>>> right, but how do you access it in rsyslog?
>>>>
>>>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and
>>>> get the result '10'
>>>>
>>>> what would you use to access the value '26' in your example?
>>>>
>>>> we also don't have anything like foreach() in our template language,
>>>> which
>>>> makes it hard to make use of these values as anything other than a JSON
>>>> string.
>>>>
>>>> I'm not saying that it's not useful, but I am pointing out the problems
>>>> that we will have using it.
>>>>
>>>> David Lang
>>>>
>>>>
>>>>  It was using time in hh:mm:ss format and tokening by colon(:). I'll add
>>>>
>>>>> tests for it soon, but until then pasting output here is the best I can
>>>>> do.
>>>>>
>>>>> The idea behind this is to generate structured content from
>>>>> semi-structured
>>>>> or unstructured log messages. So array is a good representation for
>>>>> tokenized-value (it is multi-valued by nature, and array is a good way
>>>>> to
>>>>> represent that).
>>>>>
>>>>> But eventually we should allow user to register value-transformers so
>>>>> that
>>>>> it can be pre-processed before its emitted. May be have a canned set of
>>>>> transformers, and allow user to plug in new ones.
>>>>>
>>>>> My first instinct was to utilize variable support for this, infact this
>>>>> was
>>>>> the motivator for variable support. But it still leads to a fairly
>>>>> complex
>>>>> config for an access log with 15 - 20 fields, especially given those
>>>>> fields
>>>>> can have colon separated entries inside comma separated entries etc.
>>>>>
>>>>> So I felt the need for a simpler way of doing it, hence this and other
>>>>> (recurse) field-type.
>>>>>
>>>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>>>>>
>>>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>>
>>>>>>
>>>>>>  Tokenizer followed by tokenizer is something that I have in mind too.
>>>>>> But
>>>>>>
>>>>>>  I
>>>>>>> promised myself that i'd write a test for that instead of testing it
>>>>>>> manually :-). Will add that patch on this thread once I get a chance
>>>>>>> to
>>>>>>> work on it.
>>>>>>>
>>>>>>>
>>>>>>>  At least in the short term, you can use the ability to call
>>>>>> mmnormalize
>>>>>> on
>>>>>> a variable to parse subvariables.
>>>>>>
>>>>>> How are the resulting fields addressed? Rsyslog hasn't had array
>>>>>> addressing yet.
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>>
>>>>>>  However, since you are asking about those kind of forms, let met
>>>>>> discuss
>>>>>>
>>>>>>  something else that I was thinking about.
>>>>>>>
>>>>>>> The idea is to have another field type called recurse.
>>>>>>>
>>>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse
>>>>>>> will
>>>>>>> parse it using the current context. AFAIK, the context is stateless,
>>>>>>> so
>>>>>>> I
>>>>>>> don't see any problems with that. I also plan to support tag based
>>>>>>> picking
>>>>>>> of which rules the text may match, and if it matches something else,
>>>>>>> it
>>>>>>> should be considered no-match.
>>>>>>>
>>>>>>> Instead of typing it out here, i'll attach a picture I took after
>>>>>>> thinking
>>>>>>> through it briefly(i'll attach it to the next mail).
>>>>>>>
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>>
>>>>>> Lognorm mailing list
>>>>>> Lognorm at lists.adiscon.com
>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>>
>>>>>> _______________________________________________
>>>>>> Lognorm mailing list
>>>>>> Lognorm at lists.adiscon.com
>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> Lognorm mailing list
>>>> Lognorm at lists.adiscon.com
>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>
>>>> _______________________________________________
>>>> Lognorm mailing list
>>>> Lognorm at lists.adiscon.com
>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/6d5cf2dd/attachment-0001.html>

From singh.janmejay at gmail.com  Fri Oct 31 14:16:53 2014
From: singh.janmejay at gmail.com (singh.janmejay)
Date: Fri, 31 Oct 2014 18:46:53 +0530
Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm
In-Reply-To: <CADk+mPAKD0tS54a=9WfeKeKAsbs9q8B4kXEOHub1PZWwtHZCbg@mail.gmail.com>
References: <CAGB1VvzYqFhaGPa0=BNZ0btcjqZRryMJe1HR9ZVytAMVKTS__w@mail.gmail.com>
	<54532DA5.8030107@levshin.spb.ru>
	<CAGB1VvyP+Xdqw2KgwsCmNhR+WgGc5KnBoh5nb9TO8zpZeU-qQg@mail.gmail.com>
	<alpine.DEB.2.02.1410310251330.6113@nftneq.ynat.uz>
	<CAGB1Vvx2+G30BVBqqKjk99fC4APN0wGE3oT=AzRpnVwuYSo1Og@mail.gmail.com>
	<alpine.DEB.2.02.1410310310390.6113@nftneq.ynat.uz>
	<CAGB1Vvyhf=DJefWhpS4zJb92BEy0f95mpd4482qu2EQHCRJVPw@mail.gmail.com>
	<alpine.DEB.2.02.1410310341220.6113@nftneq.ynat.uz>
	<CAGB1Vvws8svhkHXowX59TqW__oZFqERXgRDC2peCfv13k03wYQ@mail.gmail.com>
	<CADk+mPAKD0tS54a=9WfeKeKAsbs9q8B4kXEOHub1PZWwtHZCbg@mail.gmail.com>
Message-ID: <CAGB1VvxK9HwuLQptb48i8i4ubEAUF7GFAuEuS9jOdcsaAJKBHg@mail.gmail.com>

Sure, I'll fork on github.

On Fri, Oct 31, 2014 at 6:42 PM, Rainer Gerhards <rgerhards at hq.adiscon.com>
wrote:

> 2014-10-31 14:08 GMT+01:00 singh.janmejay <singh.janmejay at gmail.com>:
>
>> Cool, I'll implement $!foo!bar[0].
>>
>> +1
>
>
>> Let us process this patch-set, because is kinda hard to keep track of old
>> patches and re-send in one shot.
>>
>>
> would you mind cloning on github and maintain a feature branch there? That
> would make it much easier for me, as I could merge the branch when you are
> done. If not, it's no problem and I'll maintain that branch.
>
>
> i'll send the new patch once done(i'll now only get to work on it on
>> monday).
>>
>>
> I haven't had a chance to look as I am now busy building test environments
> and looking at the testbench [yes, one guy!] ;) But I see Pavel has asked
> some questions. He recently did a lot of work on the lib,so it is best to
> coordinate that part with him.
>
> Rainer
>
> Do existing patches look ok except for the indexed-addressing feature?
>>
>> On Fri, Oct 31, 2014 at 4:15 PM, David Lang <david at lang.hm> wrote:
>>
>>> On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>
>>>  Yes, I didn't have a need to address tokens individually, but you have a
>>>> point.
>>>>
>>>> Any suggestions on what we want to do for addressing array elements?
>>>>
>>>> I wonder if its possible to do in $!... notation without breaking
>>>> backward
>>>> compatibility. How about a function?
>>>>
>>>> I'll be happy to implement support for addressing it in $!... notation
>>>> if
>>>> don't mind breaking a corner case in backward compatibility. Eg.
>>>> $!foo!bar![0] ? Its kinda ugly though, or so I think.
>>>>
>>>
>>> I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It
>>> does mean that you can't have '[' in a variable name, but I don't think
>>> that's likely to be a real problem. I don't think there's ever a really
>>> clean way to do something like $!foo[2]!bar[2]!baz no matter what your
>>> syntax, it gets messy.
>>>
>>> for templates, we probably need some sort of foreach(array, pattern)
>>> function that takes the pattern and repeats it for each item in the array.
>>>
>>> David Lang
>>>
>>>
>>>  On Fri, Oct 31, 2014 at 3:44 PM, David Lang <david at lang.hm> wrote:
>>>>
>>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>>
>>>>>  It writes it as a json array, here is a fragment from my manual tests:
>>>>>
>>>>>>
>>>>>> [ "15", "26", "15" ]
>>>>>>
>>>>>>
>>>>> right, but how do you access it in rsyslog?
>>>>>
>>>>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and
>>>>> get the result '10'
>>>>>
>>>>> what would you use to access the value '26' in your example?
>>>>>
>>>>> we also don't have anything like foreach() in our template language,
>>>>> which
>>>>> makes it hard to make use of these values as anything other than a JSON
>>>>> string.
>>>>>
>>>>> I'm not saying that it's not useful, but I am pointing out the problems
>>>>> that we will have using it.
>>>>>
>>>>> David Lang
>>>>>
>>>>>
>>>>>  It was using time in hh:mm:ss format and tokening by colon(:). I'll
>>>>> add
>>>>>
>>>>>> tests for it soon, but until then pasting output here is the best I
>>>>>> can
>>>>>> do.
>>>>>>
>>>>>> The idea behind this is to generate structured content from
>>>>>> semi-structured
>>>>>> or unstructured log messages. So array is a good representation for
>>>>>> tokenized-value (it is multi-valued by nature, and array is a good
>>>>>> way to
>>>>>> represent that).
>>>>>>
>>>>>> But eventually we should allow user to register value-transformers so
>>>>>> that
>>>>>> it can be pre-processed before its emitted. May be have a canned set
>>>>>> of
>>>>>> transformers, and allow user to plug in new ones.
>>>>>>
>>>>>> My first instinct was to utilize variable support for this, infact
>>>>>> this
>>>>>> was
>>>>>> the motivator for variable support. But it still leads to a fairly
>>>>>> complex
>>>>>> config for an access log with 15 - 20 fields, especially given those
>>>>>> fields
>>>>>> can have colon separated entries inside comma separated entries etc.
>>>>>>
>>>>>> So I felt the need for a simpler way of doing it, hence this and other
>>>>>> (recurse) field-type.
>>>>>>
>>>>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang <david at lang.hm> wrote:
>>>>>>
>>>>>>  On Fri, 31 Oct 2014, singh.janmejay wrote:
>>>>>>
>>>>>>>
>>>>>>>  Tokenizer followed by tokenizer is something that I have in mind
>>>>>>> too.
>>>>>>> But
>>>>>>>
>>>>>>>  I
>>>>>>>> promised myself that i'd write a test for that instead of testing it
>>>>>>>> manually :-). Will add that patch on this thread once I get a
>>>>>>>> chance to
>>>>>>>> work on it.
>>>>>>>>
>>>>>>>>
>>>>>>>>  At least in the short term, you can use the ability to call
>>>>>>> mmnormalize
>>>>>>> on
>>>>>>> a variable to parse subvariables.
>>>>>>>
>>>>>>> How are the resulting fields addressed? Rsyslog hasn't had array
>>>>>>> addressing yet.
>>>>>>>
>>>>>>> David Lang
>>>>>>>
>>>>>>>
>>>>>>>  However, since you are asking about those kind of forms, let met
>>>>>>> discuss
>>>>>>>
>>>>>>>  something else that I was thinking about.
>>>>>>>>
>>>>>>>> The idea is to have another field type called recurse.
>>>>>>>>
>>>>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse
>>>>>>>> will
>>>>>>>> parse it using the current context. AFAIK, the context is
>>>>>>>> stateless, so
>>>>>>>> I
>>>>>>>> don't see any problems with that. I also plan to support tag based
>>>>>>>> picking
>>>>>>>> of which rules the text may match, and if it matches something
>>>>>>>> else, it
>>>>>>>> should be considered no-match.
>>>>>>>>
>>>>>>>> Instead of typing it out here, i'll attach a picture I took after
>>>>>>>> thinking
>>>>>>>> through it briefly(i'll attach it to the next mail).
>>>>>>>>
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>>
>>>>>>> Lognorm mailing list
>>>>>>> Lognorm at lists.adiscon.com
>>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lognorm mailing list
>>>>>>> Lognorm at lists.adiscon.com
>>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>  _______________________________________________
>>>>> Lognorm mailing list
>>>>> Lognorm at lists.adiscon.com
>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>
>>>>> _______________________________________________
>>>>> Lognorm mailing list
>>>>> Lognorm at lists.adiscon.com
>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>> _______________________________________________
>>> Lognorm mailing list
>>> Lognorm at lists.adiscon.com
>>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>>
>>>
>>
>>
>> --
>> Regards,
>> Janmejay
>> http://codehunk.wordpress.com
>>
>> _______________________________________________
>> Lognorm mailing list
>> Lognorm at lists.adiscon.com
>> http://lists.adiscon.net/mailman/listinfo/lognorm
>>
>>
>
> _______________________________________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/mailman/listinfo/lognorm
>
>


-- 
Regards,
Janmejay
http://codehunk.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20141031/429c6df7/attachment.html>