<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 31, 2014 at 5:12 AM, David Lang <span dir="ltr"><<a href="mailto:david@lang.hm" target="_blank">david@lang.hm</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, 30 Oct 2014, singh.janmejay wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

This patch-set introduces a log-norm field-type called tokenized, which<br>

allows parsing of token-separated values.<br>

<br>

A lot of applications such as nginx write fields in logs that are<br>

comma+space separated etc. For instance, nginx upstream_addrs field writes<br>

comma-separated ip+port combinations to access logs.<br>

<br>

Parsing such logs takes significant amount of regex and exec-template work<br>

and leads to rather ugly solution for something as simple as tokenized<br>

string.<br>

<br>

With this patch, parsing a list of ip-addresses separated by ', '(comma +<br>

space) for instance, would require a rule similar to:<br>

<br>

rule=ips:%my_ips:tokenized:, :ipv4%<br>

</blockquote>

<br></span>

What terminates the list?<br>

<br>

It looks like this allows multi-character tokens, is that correct?<span class="HOEnZb"><font color="#888888"><br>

<br>

David Lang</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

This requires a small patch to libestr as well, so this mail has 3 patches<br>

attached.<br>

<br>

libestr patch:<br>

<br>

0001-Changed-some-functions-<u></u>that-don-t-modify-their-arg-t.<u></u>patch<br>

<br>

liblognorm patch:<br>

<br>

0001-Moved-from-parser-<u></u>receving-data-as-escaped-<u></u>string-to.patch<br>

0002-added-support-for-field_<u></u>type-tokenized-which-parses-.<u></u>patch<br>

<br>

Patches go in order of prefix-number.<br>

<br>

</blockquote>

</div></div><br>_______________________________________________<br>

Lognorm mailing list<br>

<a href="mailto:Lognorm@lists.adiscon.com">Lognorm@lists.adiscon.com</a><br>

<a href="http://lists.adiscon.net/mailman/listinfo/lognorm" target="_blank">http://lists.adiscon.net/mailman/listinfo/lognorm</a><br>

<br>_______________________________________________<br>

Lognorm mailing list<br>

<a href="mailto:Lognorm@lists.adiscon.com">Lognorm@lists.adiscon.com</a><br>

<a href="http://lists.adiscon.net/mailman/listinfo/lognorm" target="_blank">http://lists.adiscon.net/mailman/listinfo/lognorm</a><br>

<br></blockquote></div><br></div><div class="gmail_extra">Yes, it does allow multiple chars.<br><br></div><div class="gmail_extra">The match may be stopped because of one of following three reasons:<br></div><div class="gmail_extra"><br>1. Last set of chars matched the tokenizer, but next set of characters don't match the field-type (as in, they don't match ipv4)<br></div><div class="gmail_extra">Eg. <br>text: "10, 20, 30, abcd"<br></div><div class="gmail_extra">match stops at: "... 30"<br></div><div class="gmail_extra">remaining text: ", abcd"<br></div><div class="gmail_extra"><br>2. The next set of chars don't match the tokenizer<br></div><div class="gmail_extra">Eg.<br></div><div class="gmail_extra">text: "10, 20 30, abcd"<br></div><div class="gmail_extra">match stops at: "...20"<br></div><div class="gmail_extra">remaining text: " 30, abcd"<br><br></div><div class="gmail_extra">3. Parser reaches EOL.<br clear="all"></div><div class="gmail_extra"><br>-- <br>Regards,<br>Janmejay<br><a href="http://codehunk.wordpress.com">http://codehunk.wordpress.com</a><br>

</div></div>