From singh.janmejay at gmail.com Thu Oct 30 12:03:01 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Thu, 30 Oct 2014 16:33:01 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm Message-ID: Hi, This patch-set introduces a log-norm field-type called tokenized, which allows parsing of token-separated values. A lot of applications such as nginx write fields in logs that are comma+space separated etc. For instance, nginx upstream_addrs field writes comma-separated ip+port combinations to access logs. Parsing such logs takes significant amount of regex and exec-template work and leads to rather ugly solution for something as simple as tokenized string. With this patch, parsing a list of ip-addresses separated by ', '(comma + space) for instance, would require a rule similar to: rule=ips:%my_ips:tokenized:, :ipv4% This requires a small patch to libestr as well, so this mail has 3 patches attached. libestr patch: 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch liblognorm patch: 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch 0002-added-support-for-field_type-tokenized-which-parses-.patch Patches go in order of prefix-number. -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-added-support-for-field_type-tokenized-which-parses-.patch Type: text/x-patch Size: 9316 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch Type: text/x-patch Size: 3202 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch Type: text/x-patch Size: 8336 bytes Desc: not available URL: From singh.janmejay at gmail.com Thu Oct 30 12:11:03 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Thu, 30 Oct 2014 16:41:03 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: Message-ID: The token-string can be escaped using the same mechanism as char-to. Eg. \x3a for colon(:) etc. Also, the tokenized field-type allows user to pick the field-type of each field on tokenized-fragment and it produces a multi-valued variable(its a json-array), similar to event.tags. On Thu, Oct 30, 2014 at 4:33 PM, singh.janmejay wrote: > Hi, > > This patch-set introduces a log-norm field-type called tokenized, which > allows parsing of token-separated values. > > A lot of applications such as nginx write fields in logs that are > comma+space separated etc. For instance, nginx upstream_addrs field writes > comma-separated ip+port combinations to access logs. > > Parsing such logs takes significant amount of regex and exec-template work > and leads to rather ugly solution for something as simple as tokenized > string. > > With this patch, parsing a list of ip-addresses separated by ', '(comma + > space) for instance, would require a rule similar to: > > rule=ips:%my_ips:tokenized:, :ipv4% > > This requires a small patch to libestr as well, so this mail has 3 patches > attached. > > libestr patch: > > 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch > > liblognorm patch: > > 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch > 0002-added-support-for-field_type-tokenized-which-parses-.patch > > Patches go in order of prefix-number. > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at lang.hm Fri Oct 31 00:42:40 2014 From: david at lang.hm (David Lang) Date: Thu, 30 Oct 2014 16:42:40 -0700 (PDT) Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: Message-ID: On Thu, 30 Oct 2014, singh.janmejay wrote: > Hi, > > This patch-set introduces a log-norm field-type called tokenized, which > allows parsing of token-separated values. > > A lot of applications such as nginx write fields in logs that are > comma+space separated etc. For instance, nginx upstream_addrs field writes > comma-separated ip+port combinations to access logs. > > Parsing such logs takes significant amount of regex and exec-template work > and leads to rather ugly solution for something as simple as tokenized > string. > > With this patch, parsing a list of ip-addresses separated by ', '(comma + > space) for instance, would require a rule similar to: > > rule=ips:%my_ips:tokenized:, :ipv4% What terminates the list? It looks like this allows multi-character tokens, is that correct? David Lang > This requires a small patch to libestr as well, so this mail has 3 patches > attached. > > libestr patch: > > 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch > > liblognorm patch: > > 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch > 0002-added-support-for-field_type-tokenized-which-parses-.patch > > Patches go in order of prefix-number. > > -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-added-support-for-field_type-tokenized-which-parses-.patch Type: text/x-patch Size: 9316 bytes Desc: URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch Type: text/x-patch Size: 3202 bytes Desc: URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch Type: text/x-patch Size: 8336 bytes Desc: URL: -------------- next part -------------- _______________________________________________ Lognorm mailing list Lognorm at lists.adiscon.com http://lists.adiscon.net/mailman/listinfo/lognorm From singh.janmejay at gmail.com Fri Oct 31 02:56:08 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 07:26:08 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: Message-ID: On Fri, Oct 31, 2014 at 5:12 AM, David Lang wrote: > On Thu, 30 Oct 2014, singh.janmejay wrote: > > Hi, >> >> This patch-set introduces a log-norm field-type called tokenized, which >> allows parsing of token-separated values. >> >> A lot of applications such as nginx write fields in logs that are >> comma+space separated etc. For instance, nginx upstream_addrs field writes >> comma-separated ip+port combinations to access logs. >> >> Parsing such logs takes significant amount of regex and exec-template work >> and leads to rather ugly solution for something as simple as tokenized >> string. >> >> With this patch, parsing a list of ip-addresses separated by ', '(comma + >> space) for instance, would require a rule similar to: >> >> rule=ips:%my_ips:tokenized:, :ipv4% >> > > What terminates the list? > > It looks like this allows multi-character tokens, is that correct? > > David Lang > > > This requires a small patch to libestr as well, so this mail has 3 patches >> attached. >> >> libestr patch: >> >> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch >> >> liblognorm patch: >> >> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch >> 0002-added-support-for-field_type-tokenized-which-parses-.patch >> >> Patches go in order of prefix-number. >> >> > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > Yes, it does allow multiple chars. The match may be stopped because of one of following three reasons: 1. Last set of chars matched the tokenizer, but next set of characters don't match the field-type (as in, they don't match ipv4) Eg. text: "10, 20, 30, abcd" match stops at: "... 30" remaining text: ", abcd" 2. The next set of chars don't match the tokenizer Eg. text: "10, 20 30, abcd" match stops at: "...20" remaining text: " 30, abcd" 3. Parser reaches EOL. -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Fri Oct 31 07:35:17 2014 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Fri, 31 Oct 2014 09:35:17 +0300 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: Message-ID: <54532DA5.8030107@levshin.spb.ru> Hi, I'll look at this little later. Do you use it in production? Is this (JSON arrays) compatible with lognormalizer tool? Can a %tokenized field contain another %tokenized fields (i.e., allow for recursion)? Would you write some docs on the feature? Why do you use 'const' modifier for non-pointer arguments, for example, 'const unsigned char c'? -- Pavel 30.10.2014 14:03, singh.janmejay: > Hi, > > This patch-set introduces a log-norm field-type called tokenized, > which allows parsing of token-separated values. > > A lot of applications such as nginx write fields in logs that are > comma+space separated etc. For instance, nginx upstream_addrs field > writes comma-separated ip+port combinations to access logs. > > Parsing such logs takes significant amount of regex and exec-template > work and leads to rather ugly solution for something as simple as > tokenized string. > > With this patch, parsing a list of ip-addresses separated by ', > '(comma + space) for instance, would require a rule similar to: > > rule=ips:%my_ips:tokenized:, :ipv4% > > This requires a small patch to libestr as well, so this mail has 3 > patches attached. > > libestr patch: > > 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch > > liblognorm patch: > > 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch > 0002-added-support-for-field_type-tokenized-which-parses-.patch > > Patches go in order of prefix-number. > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Fri Oct 31 07:46:55 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 12:16:55 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: <54532DA5.8030107@levshin.spb.ru> References: <54532DA5.8030107@levshin.spb.ru> Message-ID: On Fri, Oct 31, 2014 at 12:05 PM, Pavel Levshin wrote: > Hi, > > I'll look at this little later. > > Do you use it in production? Is this (JSON arrays) compatible with > lognormalizer tool? Can a %tokenized field contain another %tokenized > fields (i.e., allow for recursion)? Would you write some docs on the > feature? > > Why do you use 'const' modifier for non-pointer arguments, for example, > 'const unsigned char c'? > > > -- > Pavel > > > > 30.10.2014 14:03, singh.janmejay: > > Hi, > > This patch-set introduces a log-norm field-type called tokenized, which > allows parsing of token-separated values. > > A lot of applications such as nginx write fields in logs that are > comma+space separated etc. For instance, nginx upstream_addrs field writes > comma-separated ip+port combinations to access logs. > > Parsing such logs takes significant amount of regex and exec-template work > and leads to rather ugly solution for something as simple as tokenized > string. > > With this patch, parsing a list of ip-addresses separated by ', '(comma + > space) for instance, would require a rule similar to: > > rule=ips:%my_ips:tokenized:, :ipv4% > > This requires a small patch to libestr as well, so this mail has 3 patches > attached. > > libestr patch: > > 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch > > liblognorm patch: > > 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch > 0002-added-support-for-field_type-tokenized-which-parses-.patch > > Patches go in order of prefix-number. > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > Const modifier for non-pointer args is just habit, its not intentional. I have done a lot of testing locally(on my box), but its not on my prod cluster yet. Tokenizer followed by tokenizer is something that I have in mind too. But I promised myself that i'd write a test for that instead of testing it manually :-). Will add that patch on this thread once I get a chance to work on it. However, since you are asking about those kind of forms, let met discuss something else that I was thinking about. The idea is to have another field type called recurse. Similar to how tokenized uses a ctx to parse matching text, recurse will parse it using the current context. AFAIK, the context is stateless, so I don't see any problems with that. I also plan to support tag based picking of which rules the text may match, and if it matches something else, it should be considered no-match. Instead of typing it out here, i'll attach a picture I took after thinking through it briefly(i'll attach it to the next mail). -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Fri Oct 31 08:34:01 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 13:04:01 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: Ok, last mail was caught for moderation because of large attachment size. Here is the url to the image: https://drive.google.com/file/d/0B_XhUZLNFT4dN3RqdGE2VmN5UW1lMDZITkN4WW5wUUxQOE9F/view?usp=sharing On Fri, Oct 31, 2014 at 12:19 PM, singh.janmejay wrote: > Here is the image. I'll type it out if it's illegible. > > -- > Regards, > Janmejay > > PS: Please blame the typos in this mail on my phone's uncivilized soft > keyboard sporting it's not-so-smart-assist technology. > > On Oct 31, 2014 12:16 PM, "singh.janmejay" > wrote: > >> >> >> On Fri, Oct 31, 2014 at 12:05 PM, Pavel Levshin >> wrote: >> >>> Hi, >>> >>> I'll look at this little later. >>> >>> Do you use it in production? Is this (JSON arrays) compatible with >>> lognormalizer tool? Can a %tokenized field contain another %tokenized >>> fields (i.e., allow for recursion)? Would you write some docs on the >>> feature? >>> >>> Why do you use 'const' modifier for non-pointer arguments, for example, >>> 'const unsigned char c'? >>> >>> >>> -- >>> Pavel >>> >>> >>> >>> 30.10.2014 14:03, singh.janmejay: >>> >>> Hi, >>> >>> This patch-set introduces a log-norm field-type called tokenized, which >>> allows parsing of token-separated values. >>> >>> A lot of applications such as nginx write fields in logs that are >>> comma+space separated etc. For instance, nginx upstream_addrs field writes >>> comma-separated ip+port combinations to access logs. >>> >>> Parsing such logs takes significant amount of regex and exec-template >>> work and leads to rather ugly solution for something as simple as tokenized >>> string. >>> >>> With this patch, parsing a list of ip-addresses separated by ', '(comma >>> + space) for instance, would require a rule similar to: >>> >>> rule=ips:%my_ips:tokenized:, :ipv4% >>> >>> This requires a small patch to libestr as well, so this mail has 3 >>> patches attached. >>> >>> libestr patch: >>> >>> 0001-Changed-some-functions-that-don-t-modify-their-arg-t.patch >>> >>> liblognorm patch: >>> >>> 0001-Moved-from-parser-receving-data-as-escaped-string-to.patch >>> 0002-added-support-for-field_type-tokenized-which-parses-.patch >>> >>> Patches go in order of prefix-number. >>> >>> -- >>> Regards, >>> Janmejay >>> http://codehunk.wordpress.com >>> >>> >>> _______________________________________________ >>> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >> Const modifier for non-pointer args is just habit, its not intentional. >> >> I have done a lot of testing locally(on my box), but its not on my prod >> cluster yet. >> >> Tokenizer followed by tokenizer is something that I have in mind too. But >> I promised myself that i'd write a test for that instead of testing it >> manually :-). Will add that patch on this thread once I get a chance to >> work on it. >> >> However, since you are asking about those kind of forms, let met discuss >> something else that I was thinking about. >> >> The idea is to have another field type called recurse. >> >> Similar to how tokenized uses a ctx to parse matching text, recurse will >> parse it using the current context. AFAIK, the context is stateless, so I >> don't see any problems with that. I also plan to support tag based picking >> of which rules the text may match, and if it matches something else, it >> should be considered no-match. >> >> Instead of typing it out here, i'll attach a picture I took after >> thinking through it briefly(i'll attach it to the next mail). >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at lang.hm Fri Oct 31 10:53:42 2014 From: david at lang.hm (David Lang) Date: Fri, 31 Oct 2014 02:53:42 -0700 (PDT) Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: On Fri, 31 Oct 2014, singh.janmejay wrote: > Tokenizer followed by tokenizer is something that I have in mind too. But I > promised myself that i'd write a test for that instead of testing it > manually :-). Will add that patch on this thread once I get a chance to > work on it. At least in the short term, you can use the ability to call mmnormalize on a variable to parse subvariables. How are the resulting fields addressed? Rsyslog hasn't had array addressing yet. David Lang > However, since you are asking about those kind of forms, let met discuss > something else that I was thinking about. > > The idea is to have another field type called recurse. > > Similar to how tokenized uses a ctx to parse matching text, recurse will > parse it using the current context. AFAIK, the context is stateless, so I > don't see any problems with that. I also plan to support tag based picking > of which rules the text may match, and if it matches something else, it > should be considered no-match. > > Instead of typing it out here, i'll attach a picture I took after thinking > through it briefly(i'll attach it to the next mail). > > -------------- next part -------------- _______________________________________________ Lognorm mailing list Lognorm at lists.adiscon.com http://lists.adiscon.net/mailman/listinfo/lognorm From singh.janmejay at gmail.com Fri Oct 31 11:06:21 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 15:36:21 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: It writes it as a json array, here is a fragment from my manual tests: [ "15", "26", "15" ] It was using time in hh:mm:ss format and tokening by colon(:). I'll add tests for it soon, but until then pasting output here is the best I can do. The idea behind this is to generate structured content from semi-structured or unstructured log messages. So array is a good representation for tokenized-value (it is multi-valued by nature, and array is a good way to represent that). But eventually we should allow user to register value-transformers so that it can be pre-processed before its emitted. May be have a canned set of transformers, and allow user to plug in new ones. My first instinct was to utilize variable support for this, infact this was the motivator for variable support. But it still leads to a fairly complex config for an access log with 15 - 20 fields, especially given those fields can have colon separated entries inside comma separated entries etc. So I felt the need for a simpler way of doing it, hence this and other (recurse) field-type. On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: > On Fri, 31 Oct 2014, singh.janmejay wrote: > > Tokenizer followed by tokenizer is something that I have in mind too. But >> I >> promised myself that i'd write a test for that instead of testing it >> manually :-). Will add that patch on this thread once I get a chance to >> work on it. >> > > At least in the short term, you can use the ability to call mmnormalize on > a variable to parse subvariables. > > How are the resulting fields addressed? Rsyslog hasn't had array > addressing yet. > > David Lang > > > However, since you are asking about those kind of forms, let met discuss >> something else that I was thinking about. >> >> The idea is to have another field type called recurse. >> >> Similar to how tokenized uses a ctx to parse matching text, recurse will >> parse it using the current context. AFAIK, the context is stateless, so I >> don't see any problems with that. I also plan to support tag based picking >> of which rules the text may match, and if it matches something else, it >> should be considered no-match. >> >> Instead of typing it out here, i'll attach a picture I took after thinking >> through it briefly(i'll attach it to the next mail). >> >> > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at lang.hm Fri Oct 31 11:14:33 2014 From: david at lang.hm (David Lang) Date: Fri, 31 Oct 2014 03:14:33 -0700 (PDT) Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: On Fri, 31 Oct 2014, singh.janmejay wrote: > It writes it as a json array, here is a fragment from my manual tests: > > [ "15", "26", "15" ] right, but how do you access it in rsyslog? if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and get the result '10' what would you use to access the value '26' in your example? we also don't have anything like foreach() in our template language, which makes it hard to make use of these values as anything other than a JSON string. I'm not saying that it's not useful, but I am pointing out the problems that we will have using it. David Lang > It was using time in hh:mm:ss format and tokening by colon(:). I'll add > tests for it soon, but until then pasting output here is the best I can do. > > The idea behind this is to generate structured content from semi-structured > or unstructured log messages. So array is a good representation for > tokenized-value (it is multi-valued by nature, and array is a good way to > represent that). > > But eventually we should allow user to register value-transformers so that > it can be pre-processed before its emitted. May be have a canned set of > transformers, and allow user to plug in new ones. > > My first instinct was to utilize variable support for this, infact this was > the motivator for variable support. But it still leads to a fairly complex > config for an access log with 15 - 20 fields, especially given those fields > can have colon separated entries inside comma separated entries etc. > > So I felt the need for a simpler way of doing it, hence this and other > (recurse) field-type. > > On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: > >> On Fri, 31 Oct 2014, singh.janmejay wrote: >> >> Tokenizer followed by tokenizer is something that I have in mind too. But >>> I >>> promised myself that i'd write a test for that instead of testing it >>> manually :-). Will add that patch on this thread once I get a chance to >>> work on it. >>> >> >> At least in the short term, you can use the ability to call mmnormalize on >> a variable to parse subvariables. >> >> How are the resulting fields addressed? Rsyslog hasn't had array >> addressing yet. >> >> David Lang >> >> >> However, since you are asking about those kind of forms, let met discuss >>> something else that I was thinking about. >>> >>> The idea is to have another field type called recurse. >>> >>> Similar to how tokenized uses a ctx to parse matching text, recurse will >>> parse it using the current context. AFAIK, the context is stateless, so I >>> don't see any problems with that. I also plan to support tag based picking >>> of which rules the text may match, and if it matches something else, it >>> should be considered no-match. >>> >>> Instead of typing it out here, i'll attach a picture I took after thinking >>> through it briefly(i'll attach it to the next mail). >>> >>> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > > -------------- next part -------------- _______________________________________________ Lognorm mailing list Lognorm at lists.adiscon.com http://lists.adiscon.net/mailman/listinfo/lognorm From singh.janmejay at gmail.com Fri Oct 31 11:25:34 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 15:55:34 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: Yes, I didn't have a need to address tokens individually, but you have a point. Any suggestions on what we want to do for addressing array elements? I wonder if its possible to do in $!... notation without breaking backward compatibility. How about a function? I'll be happy to implement support for addressing it in $!... notation if don't mind breaking a corner case in backward compatibility. Eg. $!foo!bar![0] ? Its kinda ugly though, or so I think. On Fri, Oct 31, 2014 at 3:44 PM, David Lang wrote: > On Fri, 31 Oct 2014, singh.janmejay wrote: > > It writes it as a json array, here is a fragment from my manual tests: >> >> [ "15", "26", "15" ] >> > > right, but how do you access it in rsyslog? > > if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and > get the result '10' > > what would you use to access the value '26' in your example? > > we also don't have anything like foreach() in our template language, which > makes it hard to make use of these values as anything other than a JSON > string. > > I'm not saying that it's not useful, but I am pointing out the problems > that we will have using it. > > David Lang > > > It was using time in hh:mm:ss format and tokening by colon(:). I'll add >> tests for it soon, but until then pasting output here is the best I can >> do. >> >> The idea behind this is to generate structured content from >> semi-structured >> or unstructured log messages. So array is a good representation for >> tokenized-value (it is multi-valued by nature, and array is a good way to >> represent that). >> >> But eventually we should allow user to register value-transformers so that >> it can be pre-processed before its emitted. May be have a canned set of >> transformers, and allow user to plug in new ones. >> >> My first instinct was to utilize variable support for this, infact this >> was >> the motivator for variable support. But it still leads to a fairly complex >> config for an access log with 15 - 20 fields, especially given those >> fields >> can have colon separated entries inside comma separated entries etc. >> >> So I felt the need for a simpler way of doing it, hence this and other >> (recurse) field-type. >> >> On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: >> >> On Fri, 31 Oct 2014, singh.janmejay wrote: >>> >>> Tokenizer followed by tokenizer is something that I have in mind too. >>> But >>> >>>> I >>>> promised myself that i'd write a test for that instead of testing it >>>> manually :-). Will add that patch on this thread once I get a chance to >>>> work on it. >>>> >>>> >>> At least in the short term, you can use the ability to call mmnormalize >>> on >>> a variable to parse subvariables. >>> >>> How are the resulting fields addressed? Rsyslog hasn't had array >>> addressing yet. >>> >>> David Lang >>> >>> >>> However, since you are asking about those kind of forms, let met discuss >>> >>>> something else that I was thinking about. >>>> >>>> The idea is to have another field type called recurse. >>>> >>>> Similar to how tokenized uses a ctx to parse matching text, recurse will >>>> parse it using the current context. AFAIK, the context is stateless, so >>>> I >>>> don't see any problems with that. I also plan to support tag based >>>> picking >>>> of which rules the text may match, and if it matches something else, it >>>> should be considered no-match. >>>> >>>> Instead of typing it out here, i'll attach a picture I took after >>>> thinking >>>> through it briefly(i'll attach it to the next mail). >>>> >>>> >>>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >>> >> >> > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at lang.hm Fri Oct 31 11:45:05 2014 From: david at lang.hm (David Lang) Date: Fri, 31 Oct 2014 03:45:05 -0700 (PDT) Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: On Fri, 31 Oct 2014, singh.janmejay wrote: > Yes, I didn't have a need to address tokens individually, but you have a > point. > > Any suggestions on what we want to do for addressing array elements? > > I wonder if its possible to do in $!... notation without breaking backward > compatibility. How about a function? > > I'll be happy to implement support for addressing it in $!... notation if > don't mind breaking a corner case in backward compatibility. Eg. > $!foo!bar![0] ? Its kinda ugly though, or so I think. I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It does mean that you can't have '[' in a variable name, but I don't think that's likely to be a real problem. I don't think there's ever a really clean way to do something like $!foo[2]!bar[2]!baz no matter what your syntax, it gets messy. for templates, we probably need some sort of foreach(array, pattern) function that takes the pattern and repeats it for each item in the array. David Lang > On Fri, Oct 31, 2014 at 3:44 PM, David Lang wrote: > >> On Fri, 31 Oct 2014, singh.janmejay wrote: >> >> It writes it as a json array, here is a fragment from my manual tests: >>> >>> [ "15", "26", "15" ] >>> >> >> right, but how do you access it in rsyslog? >> >> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and >> get the result '10' >> >> what would you use to access the value '26' in your example? >> >> we also don't have anything like foreach() in our template language, which >> makes it hard to make use of these values as anything other than a JSON >> string. >> >> I'm not saying that it's not useful, but I am pointing out the problems >> that we will have using it. >> >> David Lang >> >> >> It was using time in hh:mm:ss format and tokening by colon(:). I'll add >>> tests for it soon, but until then pasting output here is the best I can >>> do. >>> >>> The idea behind this is to generate structured content from >>> semi-structured >>> or unstructured log messages. So array is a good representation for >>> tokenized-value (it is multi-valued by nature, and array is a good way to >>> represent that). >>> >>> But eventually we should allow user to register value-transformers so that >>> it can be pre-processed before its emitted. May be have a canned set of >>> transformers, and allow user to plug in new ones. >>> >>> My first instinct was to utilize variable support for this, infact this >>> was >>> the motivator for variable support. But it still leads to a fairly complex >>> config for an access log with 15 - 20 fields, especially given those >>> fields >>> can have colon separated entries inside comma separated entries etc. >>> >>> So I felt the need for a simpler way of doing it, hence this and other >>> (recurse) field-type. >>> >>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: >>> >>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>> >>>> Tokenizer followed by tokenizer is something that I have in mind too. >>>> But >>>> >>>>> I >>>>> promised myself that i'd write a test for that instead of testing it >>>>> manually :-). Will add that patch on this thread once I get a chance to >>>>> work on it. >>>>> >>>>> >>>> At least in the short term, you can use the ability to call mmnormalize >>>> on >>>> a variable to parse subvariables. >>>> >>>> How are the resulting fields addressed? Rsyslog hasn't had array >>>> addressing yet. >>>> >>>> David Lang >>>> >>>> >>>> However, since you are asking about those kind of forms, let met discuss >>>> >>>>> something else that I was thinking about. >>>>> >>>>> The idea is to have another field type called recurse. >>>>> >>>>> Similar to how tokenized uses a ctx to parse matching text, recurse will >>>>> parse it using the current context. AFAIK, the context is stateless, so >>>>> I >>>>> don't see any problems with that. I also plan to support tag based >>>>> picking >>>>> of which rules the text may match, and if it matches something else, it >>>>> should be considered no-match. >>>>> >>>>> Instead of typing it out here, i'll attach a picture I took after >>>>> thinking >>>>> through it briefly(i'll attach it to the next mail). >>>>> >>>>> >>>>> _______________________________________________ >>>> Lognorm mailing list >>>> Lognorm at lists.adiscon.com >>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>> >>>> _______________________________________________ >>>> Lognorm mailing list >>>> Lognorm at lists.adiscon.com >>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>> >>>> >>>> >>> >>> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > > -------------- next part -------------- _______________________________________________ Lognorm mailing list Lognorm at lists.adiscon.com http://lists.adiscon.net/mailman/listinfo/lognorm From singh.janmejay at gmail.com Fri Oct 31 14:08:59 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 18:38:59 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: Cool, I'll implement $!foo!bar[0]. Let us process this patch-set, because is kinda hard to keep track of old patches and re-send in one shot. i'll send the new patch once done(i'll now only get to work on it on monday). Do existing patches look ok except for the indexed-addressing feature? On Fri, Oct 31, 2014 at 4:15 PM, David Lang wrote: > On Fri, 31 Oct 2014, singh.janmejay wrote: > > Yes, I didn't have a need to address tokens individually, but you have a >> point. >> >> Any suggestions on what we want to do for addressing array elements? >> >> I wonder if its possible to do in $!... notation without breaking backward >> compatibility. How about a function? >> >> I'll be happy to implement support for addressing it in $!... notation if >> don't mind breaking a corner case in backward compatibility. Eg. >> $!foo!bar![0] ? Its kinda ugly though, or so I think. >> > > I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It does > mean that you can't have '[' in a variable name, but I don't think that's > likely to be a real problem. I don't think there's ever a really clean way > to do something like $!foo[2]!bar[2]!baz no matter what your syntax, it > gets messy. > > for templates, we probably need some sort of foreach(array, pattern) > function that takes the pattern and repeats it for each item in the array. > > David Lang > > > On Fri, Oct 31, 2014 at 3:44 PM, David Lang wrote: >> >> On Fri, 31 Oct 2014, singh.janmejay wrote: >>> >>> It writes it as a json array, here is a fragment from my manual tests: >>> >>>> >>>> [ "15", "26", "15" ] >>>> >>>> >>> right, but how do you access it in rsyslog? >>> >>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and >>> get the result '10' >>> >>> what would you use to access the value '26' in your example? >>> >>> we also don't have anything like foreach() in our template language, >>> which >>> makes it hard to make use of these values as anything other than a JSON >>> string. >>> >>> I'm not saying that it's not useful, but I am pointing out the problems >>> that we will have using it. >>> >>> David Lang >>> >>> >>> It was using time in hh:mm:ss format and tokening by colon(:). I'll add >>> >>>> tests for it soon, but until then pasting output here is the best I can >>>> do. >>>> >>>> The idea behind this is to generate structured content from >>>> semi-structured >>>> or unstructured log messages. So array is a good representation for >>>> tokenized-value (it is multi-valued by nature, and array is a good way >>>> to >>>> represent that). >>>> >>>> But eventually we should allow user to register value-transformers so >>>> that >>>> it can be pre-processed before its emitted. May be have a canned set of >>>> transformers, and allow user to plug in new ones. >>>> >>>> My first instinct was to utilize variable support for this, infact this >>>> was >>>> the motivator for variable support. But it still leads to a fairly >>>> complex >>>> config for an access log with 15 - 20 fields, especially given those >>>> fields >>>> can have colon separated entries inside comma separated entries etc. >>>> >>>> So I felt the need for a simpler way of doing it, hence this and other >>>> (recurse) field-type. >>>> >>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: >>>> >>>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>> >>>>> >>>>> Tokenizer followed by tokenizer is something that I have in mind too. >>>>> But >>>>> >>>>> I >>>>>> promised myself that i'd write a test for that instead of testing it >>>>>> manually :-). Will add that patch on this thread once I get a chance >>>>>> to >>>>>> work on it. >>>>>> >>>>>> >>>>>> At least in the short term, you can use the ability to call >>>>> mmnormalize >>>>> on >>>>> a variable to parse subvariables. >>>>> >>>>> How are the resulting fields addressed? Rsyslog hasn't had array >>>>> addressing yet. >>>>> >>>>> David Lang >>>>> >>>>> >>>>> However, since you are asking about those kind of forms, let met >>>>> discuss >>>>> >>>>> something else that I was thinking about. >>>>>> >>>>>> The idea is to have another field type called recurse. >>>>>> >>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse >>>>>> will >>>>>> parse it using the current context. AFAIK, the context is stateless, >>>>>> so >>>>>> I >>>>>> don't see any problems with that. I also plan to support tag based >>>>>> picking >>>>>> of which rules the text may match, and if it matches something else, >>>>>> it >>>>>> should be considered no-match. >>>>>> >>>>>> Instead of typing it out here, i'll attach a picture I took after >>>>>> thinking >>>>>> through it briefly(i'll attach it to the next mail). >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>> Lognorm mailing list >>>>> Lognorm at lists.adiscon.com >>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>> >>>>> _______________________________________________ >>>>> Lognorm mailing list >>>>> Lognorm at lists.adiscon.com >>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >>> >> >> > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Fri Oct 31 14:12:29 2014 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 31 Oct 2014 14:12:29 +0100 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: 2014-10-31 14:08 GMT+01:00 singh.janmejay : > Cool, I'll implement $!foo!bar[0]. > > +1 > Let us process this patch-set, because is kinda hard to keep track of old > patches and re-send in one shot. > > would you mind cloning on github and maintain a feature branch there? That would make it much easier for me, as I could merge the branch when you are done. If not, it's no problem and I'll maintain that branch. i'll send the new patch once done(i'll now only get to work on it on > monday). > > I haven't had a chance to look as I am now busy building test environments and looking at the testbench [yes, one guy!] ;) But I see Pavel has asked some questions. He recently did a lot of work on the lib,so it is best to coordinate that part with him. Rainer Do existing patches look ok except for the indexed-addressing feature? > > On Fri, Oct 31, 2014 at 4:15 PM, David Lang wrote: > >> On Fri, 31 Oct 2014, singh.janmejay wrote: >> >> Yes, I didn't have a need to address tokens individually, but you have a >>> point. >>> >>> Any suggestions on what we want to do for addressing array elements? >>> >>> I wonder if its possible to do in $!... notation without breaking >>> backward >>> compatibility. How about a function? >>> >>> I'll be happy to implement support for addressing it in $!... notation if >>> don't mind breaking a corner case in backward compatibility. Eg. >>> $!foo!bar![0] ? Its kinda ugly though, or so I think. >>> >> >> I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It >> does mean that you can't have '[' in a variable name, but I don't think >> that's likely to be a real problem. I don't think there's ever a really >> clean way to do something like $!foo[2]!bar[2]!baz no matter what your >> syntax, it gets messy. >> >> for templates, we probably need some sort of foreach(array, pattern) >> function that takes the pattern and repeats it for each item in the array. >> >> David Lang >> >> >> On Fri, Oct 31, 2014 at 3:44 PM, David Lang wrote: >>> >>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>> >>>> It writes it as a json array, here is a fragment from my manual tests: >>>> >>>>> >>>>> [ "15", "26", "15" ] >>>>> >>>>> >>>> right, but how do you access it in rsyslog? >>>> >>>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and >>>> get the result '10' >>>> >>>> what would you use to access the value '26' in your example? >>>> >>>> we also don't have anything like foreach() in our template language, >>>> which >>>> makes it hard to make use of these values as anything other than a JSON >>>> string. >>>> >>>> I'm not saying that it's not useful, but I am pointing out the problems >>>> that we will have using it. >>>> >>>> David Lang >>>> >>>> >>>> It was using time in hh:mm:ss format and tokening by colon(:). I'll add >>>> >>>>> tests for it soon, but until then pasting output here is the best I can >>>>> do. >>>>> >>>>> The idea behind this is to generate structured content from >>>>> semi-structured >>>>> or unstructured log messages. So array is a good representation for >>>>> tokenized-value (it is multi-valued by nature, and array is a good way >>>>> to >>>>> represent that). >>>>> >>>>> But eventually we should allow user to register value-transformers so >>>>> that >>>>> it can be pre-processed before its emitted. May be have a canned set of >>>>> transformers, and allow user to plug in new ones. >>>>> >>>>> My first instinct was to utilize variable support for this, infact this >>>>> was >>>>> the motivator for variable support. But it still leads to a fairly >>>>> complex >>>>> config for an access log with 15 - 20 fields, especially given those >>>>> fields >>>>> can have colon separated entries inside comma separated entries etc. >>>>> >>>>> So I felt the need for a simpler way of doing it, hence this and other >>>>> (recurse) field-type. >>>>> >>>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: >>>>> >>>>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>>> >>>>>> >>>>>> Tokenizer followed by tokenizer is something that I have in mind too. >>>>>> But >>>>>> >>>>>> I >>>>>>> promised myself that i'd write a test for that instead of testing it >>>>>>> manually :-). Will add that patch on this thread once I get a chance >>>>>>> to >>>>>>> work on it. >>>>>>> >>>>>>> >>>>>>> At least in the short term, you can use the ability to call >>>>>> mmnormalize >>>>>> on >>>>>> a variable to parse subvariables. >>>>>> >>>>>> How are the resulting fields addressed? Rsyslog hasn't had array >>>>>> addressing yet. >>>>>> >>>>>> David Lang >>>>>> >>>>>> >>>>>> However, since you are asking about those kind of forms, let met >>>>>> discuss >>>>>> >>>>>> something else that I was thinking about. >>>>>>> >>>>>>> The idea is to have another field type called recurse. >>>>>>> >>>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse >>>>>>> will >>>>>>> parse it using the current context. AFAIK, the context is stateless, >>>>>>> so >>>>>>> I >>>>>>> don't see any problems with that. I also plan to support tag based >>>>>>> picking >>>>>>> of which rules the text may match, and if it matches something else, >>>>>>> it >>>>>>> should be considered no-match. >>>>>>> >>>>>>> Instead of typing it out here, i'll attach a picture I took after >>>>>>> thinking >>>>>>> through it briefly(i'll attach it to the next mail). >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>> Lognorm mailing list >>>>>> Lognorm at lists.adiscon.com >>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>>> >>>>>> _______________________________________________ >>>>>> Lognorm mailing list >>>>>> Lognorm at lists.adiscon.com >>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>> Lognorm mailing list >>>> Lognorm at lists.adiscon.com >>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>> >>>> _______________________________________________ >>>> Lognorm mailing list >>>> Lognorm at lists.adiscon.com >>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>> >>>> >>>> >>> >>> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From singh.janmejay at gmail.com Fri Oct 31 14:16:53 2014 From: singh.janmejay at gmail.com (singh.janmejay) Date: Fri, 31 Oct 2014 18:46:53 +0530 Subject: [Lognorm] Tokenized-multivalue field-type for liblognorm In-Reply-To: References: <54532DA5.8030107@levshin.spb.ru> Message-ID: Sure, I'll fork on github. On Fri, Oct 31, 2014 at 6:42 PM, Rainer Gerhards wrote: > 2014-10-31 14:08 GMT+01:00 singh.janmejay : > >> Cool, I'll implement $!foo!bar[0]. >> >> +1 > > >> Let us process this patch-set, because is kinda hard to keep track of old >> patches and re-send in one shot. >> >> > would you mind cloning on github and maintain a feature branch there? That > would make it much easier for me, as I could merge the branch when you are > done. If not, it's no problem and I'll maintain that branch. > > > i'll send the new patch once done(i'll now only get to work on it on >> monday). >> >> > I haven't had a chance to look as I am now busy building test environments > and looking at the testbench [yes, one guy!] ;) But I see Pavel has asked > some questions. He recently did a lot of work on the lib,so it is best to > coordinate that part with him. > > Rainer > > Do existing patches look ok except for the indexed-addressing feature? >> >> On Fri, Oct 31, 2014 at 4:15 PM, David Lang wrote: >> >>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>> >>> Yes, I didn't have a need to address tokens individually, but you have a >>>> point. >>>> >>>> Any suggestions on what we want to do for addressing array elements? >>>> >>>> I wonder if its possible to do in $!... notation without breaking >>>> backward >>>> compatibility. How about a function? >>>> >>>> I'll be happy to implement support for addressing it in $!... notation >>>> if >>>> don't mind breaking a corner case in backward compatibility. Eg. >>>> $!foo!bar![0] ? Its kinda ugly though, or so I think. >>>> >>> >>> I was thinking just $!foo!bar[0] it's a bit ugly, but not too bad. It >>> does mean that you can't have '[' in a variable name, but I don't think >>> that's likely to be a real problem. I don't think there's ever a really >>> clean way to do something like $!foo[2]!bar[2]!baz no matter what your >>> syntax, it gets messy. >>> >>> for templates, we probably need some sort of foreach(array, pattern) >>> function that takes the pattern and repeats it for each item in the array. >>> >>> David Lang >>> >>> >>> On Fri, Oct 31, 2014 at 3:44 PM, David Lang wrote: >>>> >>>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>>> >>>>> It writes it as a json array, here is a fragment from my manual tests: >>>>> >>>>>> >>>>>> [ "15", "26", "15" ] >>>>>> >>>>>> >>>>> right, but how do you access it in rsyslog? >>>>> >>>>> if you have { 'foo': { 'bar': '10' } } you access this as $!foo!bar and >>>>> get the result '10' >>>>> >>>>> what would you use to access the value '26' in your example? >>>>> >>>>> we also don't have anything like foreach() in our template language, >>>>> which >>>>> makes it hard to make use of these values as anything other than a JSON >>>>> string. >>>>> >>>>> I'm not saying that it's not useful, but I am pointing out the problems >>>>> that we will have using it. >>>>> >>>>> David Lang >>>>> >>>>> >>>>> It was using time in hh:mm:ss format and tokening by colon(:). I'll >>>>> add >>>>> >>>>>> tests for it soon, but until then pasting output here is the best I >>>>>> can >>>>>> do. >>>>>> >>>>>> The idea behind this is to generate structured content from >>>>>> semi-structured >>>>>> or unstructured log messages. So array is a good representation for >>>>>> tokenized-value (it is multi-valued by nature, and array is a good >>>>>> way to >>>>>> represent that). >>>>>> >>>>>> But eventually we should allow user to register value-transformers so >>>>>> that >>>>>> it can be pre-processed before its emitted. May be have a canned set >>>>>> of >>>>>> transformers, and allow user to plug in new ones. >>>>>> >>>>>> My first instinct was to utilize variable support for this, infact >>>>>> this >>>>>> was >>>>>> the motivator for variable support. But it still leads to a fairly >>>>>> complex >>>>>> config for an access log with 15 - 20 fields, especially given those >>>>>> fields >>>>>> can have colon separated entries inside comma separated entries etc. >>>>>> >>>>>> So I felt the need for a simpler way of doing it, hence this and other >>>>>> (recurse) field-type. >>>>>> >>>>>> On Fri, Oct 31, 2014 at 3:23 PM, David Lang wrote: >>>>>> >>>>>> On Fri, 31 Oct 2014, singh.janmejay wrote: >>>>>> >>>>>>> >>>>>>> Tokenizer followed by tokenizer is something that I have in mind >>>>>>> too. >>>>>>> But >>>>>>> >>>>>>> I >>>>>>>> promised myself that i'd write a test for that instead of testing it >>>>>>>> manually :-). Will add that patch on this thread once I get a >>>>>>>> chance to >>>>>>>> work on it. >>>>>>>> >>>>>>>> >>>>>>>> At least in the short term, you can use the ability to call >>>>>>> mmnormalize >>>>>>> on >>>>>>> a variable to parse subvariables. >>>>>>> >>>>>>> How are the resulting fields addressed? Rsyslog hasn't had array >>>>>>> addressing yet. >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>> >>>>>>> However, since you are asking about those kind of forms, let met >>>>>>> discuss >>>>>>> >>>>>>> something else that I was thinking about. >>>>>>>> >>>>>>>> The idea is to have another field type called recurse. >>>>>>>> >>>>>>>> Similar to how tokenized uses a ctx to parse matching text, recurse >>>>>>>> will >>>>>>>> parse it using the current context. AFAIK, the context is >>>>>>>> stateless, so >>>>>>>> I >>>>>>>> don't see any problems with that. I also plan to support tag based >>>>>>>> picking >>>>>>>> of which rules the text may match, and if it matches something >>>>>>>> else, it >>>>>>>> should be considered no-match. >>>>>>>> >>>>>>>> Instead of typing it out here, i'll attach a picture I took after >>>>>>>> thinking >>>>>>>> through it briefly(i'll attach it to the next mail). >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>> Lognorm mailing list >>>>>>> Lognorm at lists.adiscon.com >>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Lognorm mailing list >>>>>>> Lognorm at lists.adiscon.com >>>>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>> Lognorm mailing list >>>>> Lognorm at lists.adiscon.com >>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>> >>>>> _______________________________________________ >>>>> Lognorm mailing list >>>>> Lognorm at lists.adiscon.com >>>>> http://lists.adiscon.net/mailman/listinfo/lognorm >>>>> >>>>> >>>>> >>>> >>>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >> >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -- Regards, Janmejay http://codehunk.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: