From rgerhards at hq.adiscon.com Thu Nov 7 17:35:39 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 Nov 2013 17:35:39 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <52716335.9090505@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> Message-ID: Pavel, On Wed, Oct 30, 2013 at 8:51 PM, Pavel Levshin wrote: > > So, I have taken the opportunity and refactored liblognorm to use json-c > instead of libee. Some parts of libee now present in liblognorm, notably > field parsers and encoders. They were rewritten to get rid of libee data > structures. In the same time, many bugs were fixed, and many were > undoubtedly produced. > > I still didn't get a chance to merge, but you probably could lend me a helping hand on a related topic. As you know, I am rewriting the rsyslog core engine. The new engine has the capability to execute action instances in parallel, one for each worker thread (wti). I am about to update mmnormalize to the new calling interface. The question is if lioblognorm can be called concurrently (reentrant & thread safe) with *the same* handle. That would obviously be best. It's not (yet) documented to have this capability. However, if I remember correctly, there is no problem in that regard. As you have recently worked on probably the whole body of code, what's your opinion on that? If we are not sure enough, we can always check later, but I think if the first shot at the new mmnormalize implemtion should better be on the safe side or not. Thanks, Rainer -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Thu Nov 7 20:12:54 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Thu, 07 Nov 2013 23:12:54 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> Message-ID: <527BE636.30402@levshin.spb.ru> Yes, for me it looks thread safe and reentrant. -- Pavel Levshin 07.11.2013 20:35, Rainer Gerhards: > Pavel, > > On Wed, Oct 30, 2013 at 8:51 PM, Pavel Levshin > wrote: > > > So, I have taken the opportunity and refactored liblognorm to use > json-c instead of libee. Some parts of libee now present in > liblognorm, notably field parsers and encoders. They were > rewritten to get rid of libee data structures. In the same time, > many bugs were fixed, and many were undoubtedly produced. > > > I still didn't get a chance to merge, but you probably could lend me a > helping hand on a related topic. As you know, I am rewriting the > rsyslog core engine. The new engine has the capability to execute > action instances in parallel, one for each worker thread (wti). > > I am about to update mmnormalize to the new calling interface. The > question is if lioblognorm can be called concurrently (reentrant & > thread safe) with *the same* handle. That would obviously be best. > It's not (yet) documented to have this capability. However, if I > remember correctly, there is no problem in that regard. As you have > recently worked on probably the whole body of code, what's your > opinion on that? If we are not sure enough, we can always check later, > but I think if the first shot at the new mmnormalize implemtion should > better be on the safe side or not. > > Thanks, > Rainer > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Thu Nov 21 17:17:30 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Thu, 21 Nov 2013 20:17:30 +0400 Subject: [Lognorm] [rsyslog] LibLogNorm way of doing things In-Reply-To: <380070311.222513.1385044782392.JavaMail.root@lezard-visuel.com> References: <380070311.222513.1385044782392.JavaMail.root@lezard-visuel.com> Message-ID: <528E321A.70200@levshin.spb.ru> Well, I am fully agree with you, and I've addressed some of these issues already. Next version of liblognorm is here: https://github.com/flicker581/liblognorm/tree/master-json-c It will be merged into upstream soon, I hope. But this version is a rewrite, it is not compatible with current mmnormalize. Therefore, it will be available in rsyslog 7.5 and 8.x, but, most probably, not in 7.4. Main purpose of the rewrite was to replace libee data structures with json-c objects, which should improve performance. More comments below: 21.11.2013 18:39, Walid Moghrabi: > Hi, > > > This message will certainly appears as a "complain" message but I hope this could give some ideas on ways to improve LibLogNorm , or, maybe I'm simply not using it properly and maybe someone could help me since I had many difficulties to find clear documentation or help on this topic. > > > So, what is wrong with it ? > > > Well, first of all, building a rulebase file is not very documented (to say the least) and it is pretty difficult to build one and test it. This is indeed a problem, and some quirks of the implementation makes it even harder. There is a tool to test your rulebase, called "lognormalizer", did you know? It's debug output is pretty useful... for one who knows internals already. I've added a sample rulebase, but it is still incomplete; for example, I've not used prefixes in it: https://github.com/flicker581/liblognorm/blob/master-json-c/rulebases/sample.rulebase Maybe it would be useful to have at least an indication of which rule left "unparsed-data" part? > But what is really annoying is the way it work : its a go/no go way and it is pretty painfull ... let me explain : > > > I use MMNormalize to normalize messages coming from my web servers : I split the message in key/value pairs in order to store them in a database so that I can use them with LogAnalyzer. > Great but ... sometimes, when you are working with logs, since they are not all very normalized themself, their content may vary a little and sometimes, your rulebase simply doesn't work because there is a trailing whitespace character that was added at the end of the message because one logger version is working a bit differently than an other one. > This would be just fine is MMNormalize would simply ignore it and normalize what it can but it is not doing that way ... as soon as there are unparsed data, it simply stop and don't treat the message passing it untouched and thus, not normalized at all so that it completely mess up in the db (the whole message would be stored in the MSG field but other fields for normalized data are simply empty). > > > You might say that it is better than simply droping the message but really, this is very annoying. > > > The same applies for added fields ... at first, I was getting every fields from a classical "combined" log format from Apache but I had to add a few fields (vhost, SSL state, ...). > The first part of the logFormat didn't change, I added the fields at the end of the message so, if MMNormalize would have work the way I'd love it would, it would have retrieve the fields in the rulebase and ignore the new elements that it would have store in the "unparsed data" field and work normaly but no ... it simply ignore everything, I get not normalized fields, only the raw message. > > > Lest but not least ... I have some log files from dedicated applications that are partly normalized, let's say that the 2-3 first fields are normalized and thus, usable for normalization but the last part of the message is random text with no normalization at all. > I can't ask for a change in the format and thus, I can't ask for a quoted string that I could handle. > There would be a nice way to handle this : a selector that would say "from that point, take everything until the end of the line". > That would be great. I've implemented it as "rest" type: # Snow White and the Seven Dwarfs rule=tale:Snow White and %company:rest% It matches zero or more characters till end of the line. As far as I understand all you've said above, this type can do everything you want. > I tried with char-to selectors but never found a way to do this. For "char-to" fields, there is an open question still. In current implementation, it cannot match zero characters. It can be a problem, because, in some cases, you may need to match zero-length field between two separators. I've did not change this aspect, because this is incompatible change; with it, older rulebases can begin to match unexpectedly. Nevertheless, it can be done. Maybe, this is better to do as a separate type, based on "char-to". Comments and suggestions are welcome. -- Pavel Levshin From rgerhards at hq.adiscon.com Fri Nov 22 17:18:39 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 22 Nov 2013 17:18:39 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <52716335.9090505@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> Message-ID: On Wed, Oct 30, 2013 at 8:51 PM, Pavel Levshin wrote: > > So, I have taken the opportunity and refactored liblognorm to use json-c > instead of libee. Some parts of libee now present in liblognorm, notably > field parsers and encoders. They were rewritten to get rid of libee data > structures. In the same time, many bugs were fixed, and many were > undoubtedly produced. > > Current state of the library can be seen here: > > https://github.com/flicker581/liblognorm/tree/master-json-c > I have finally merged this into my git, but so far under the liblognorm1 branch. The idea is to have a liblognorm0, for apps requiring that API and liblognorm1 for those that need the new one. I was not yet bold enough to merge to master and prepare a release as I currently do not know how I can make both available on a system AND have PKGCONFIG detect the right version. I've never done this before and if someone could lend me a helping hand, I'd greatly appreciate that. Thanks, Rainer -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Fri Nov 22 18:53:05 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Fri, 22 Nov 2013 21:53:05 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> Message-ID: <528F9A01.1060209@levshin.spb.ru> Basically, you cannot have both versions available for compile, because they have conflicting header files. Pkg-config is a tool which works at compile time. It will detect just one installed version of the library, and you will build against this version. To have both versions available, you will need to place them at separate "prefixes", and select them manually at build time. I don't think it is really needed. When you have built your software against some version of the library, this piece of software will require compatible version of dynamic loadable library to start. Older versions of software require liblognorm.so.0, but newer will search for liblognorm.so.1. Therefore, you can have both versions of dynamic loadable libraries installed at the same system. Please note that I've just found a regression in my code: it is unable to strip quote marks from "quote-string" fields. I need to redo this part. -- Pavel Levshin 22.11.2013 20:18, Rainer Gerhards: > On Wed, Oct 30, 2013 at 8:51 PM, Pavel Levshin > wrote: > > > So, I have taken the opportunity and refactored liblognorm to use > json-c instead of libee. Some parts of libee now present in > liblognorm, notably field parsers and encoders. They were > rewritten to get rid of libee data structures. In the same time, > many bugs were fixed, and many were undoubtedly produced. > > Current state of the library can be seen here: > > https://github.com/flicker581/liblognorm/tree/master-json-c > > > I have finally merged this into my git, but so far under the > liblognorm1 branch. The idea is to have a liblognorm0, for apps > requiring that API and liblognorm1 for those that need the new one. > > I was not yet bold enough to merge to master and prepare a release as > I currently do not know how I can make both available on a system AND > have PKGCONFIG detect the right version. I've never done this before > and if someone could lend me a helping hand, I'd greatly appreciate that. > > Thanks, > Rainer > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Fri Nov 22 21:19:51 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Sat, 23 Nov 2013 00:19:51 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <528F9A01.1060209@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> Message-ID: <528FBC67.6040205@levshin.spb.ru> 22.11.2013 21:53, Pavel Levshin: > > Please note that I've just found a regression in my code: it is unable > to strip quote marks from "quote-string" fields. I need to redo this part. > It is fixed now. I really feel that we need to implement some facility to parse comma-separated logs, where a field can have zero length. It is easy to do, and it is a quite common log format (I'm using it, for example). Now, liblognorm is unable to parse it. There are two ways: - change char-to behaviour to allow zero-length matches - create a new field type, based on current char-to What to prefer? -- Pavel Levshin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Fri Nov 22 21:33:45 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 22 Nov 2013 21:33:45 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <528FBC67.6040205@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> Message-ID: Always looking for backward comp, i'd say create a new one. Sent from phone, thus brief. Am 22.11.2013 21:19 schrieb "Pavel Levshin" : > > 22.11.2013 21:53, Pavel Levshin: > > > Please note that I've just found a regression in my code: it is unable to > strip quote marks from "quote-string" fields. I need to redo this part. > > > It is fixed now. > > I really feel that we need to implement some facility to parse > comma-separated logs, where a field can have zero length. It is easy to do, > and it is a quite common log format (I'm using it, for example). Now, > liblognorm is unable to parse it. There are two ways: > > - change char-to behaviour to allow zero-length matches > - create a new field type, based on current char-to > > What to prefer? > > > -- > Pavel Levshin > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Sat Nov 23 17:27:54 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sat, 23 Nov 2013 17:27:54 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <528F9A01.1060209@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> Message-ID: Thanks a lot, this was very useful! Rainer On Fri, Nov 22, 2013 at 6:53 PM, Pavel Levshin wrote: > > Basically, you cannot have both versions available for compile, because > they have conflicting header files. Pkg-config is a tool which works at > compile time. It will detect just one installed version of the library, and > you will build against this version. To have both versions available, you > will need to place them at separate "prefixes", and select them manually at > build time. I don't think it is really needed. > > When you have built your software against some version of the library, > this piece of software will require compatible version of dynamic loadable > library to start. Older versions of software require liblognorm.so.0, but > newer will search for liblognorm.so.1. Therefore, you can have both > versions of dynamic loadable libraries installed at the same system. > > > Please note that I've just found a regression in my code: it is unable to > strip quote marks from "quote-string" fields. I need to redo this part. > > > -- > Pavel Levshin > > > 22.11.2013 20:18, Rainer Gerhards: > > On Wed, Oct 30, 2013 at 8:51 PM, Pavel Levshin wrote: > >> >> So, I have taken the opportunity and refactored liblognorm to use json-c >> instead of libee. Some parts of libee now present in liblognorm, notably >> field parsers and encoders. They were rewritten to get rid of libee data >> structures. In the same time, many bugs were fixed, and many were >> undoubtedly produced. >> >> Current state of the library can be seen here: >> >> https://github.com/flicker581/liblognorm/tree/master-json-c >> > > I have finally merged this into my git, but so far under the liblognorm1 > branch. The idea is to have a liblognorm0, for apps requiring that API and > liblognorm1 for those that need the new one. > > I was not yet bold enough to merge to master and prepare a release as I > currently do not know how I can make both available on a system AND have > PKGCONFIG detect the right version. I've never done this before and if > someone could lend me a helping hand, I'd greatly appreciate that. > > Thanks, > Rainer > > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Sun Nov 24 11:35:14 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Sun, 24 Nov 2013 14:35:14 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> Message-ID: <5291D662.2010009@levshin.spb.ru> Done, in the same branch on GitHub: https://github.com/flicker581/liblognorm/commits/master-json-c Also, I have written a brief document on current rulebase syntax and field types. As a user, I would say that liblognorm lacks proper documentation, and this doc is better than nothing. It is not sufficient, though. -- Pavel Levshin 23.11.2013 0:33, Rainer Gerhards: > > Always looking for backward comp, i'd say create a new one. > > Sent from phone, thus brief. > > Am 22.11.2013 21:19 schrieb "Pavel Levshin" >: > > > 22.11.2013 21:53, Pavel Levshin: > >> >> Please note that I've just found a regression in my code: it is >> unable to strip quote marks from "quote-string" fields. I need to >> redo this part. >> > > It is fixed now. > > I really feel that we need to implement some facility to parse > comma-separated logs, where a field can have zero length. It is > easy to do, and it is a quite common log format (I'm using it, for > example). Now, liblognorm is unable to parse it. There are two ways: > > - change char-to behaviour to allow zero-length matches > - create a new field type, based on current char-to > > What to prefer? > > > -- > Pavel Levshin > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Mon Nov 25 09:24:12 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 25 Nov 2013 09:24:12 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <5291D662.2010009@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> Message-ID: On Sun, Nov 24, 2013 at 11:35 AM, Pavel Levshin wrote: > > Done, in the same branch on GitHub: > > https://github.com/flicker581/liblognorm/commits/master-json-c > > > Thanks, already merged. I will later today see that I integrate the mmlognorm changes, tying things together. It will initially go into v8. > Also, I have written a brief document on current rulebase syntax and field > types. As a user, I would say that liblognorm lacks proper documentation, > and this doc is better than nothing. It is not sufficient, though. > > full ack, but there is so much to do and so little time. And nobody wants to do the doc, so... thanks for this effort, it's definitely useful. Rainer > > -- > Pavel Levshin > > > 23.11.2013 0:33, Rainer Gerhards: > > Always looking for backward comp, i'd say create a new one. > > Sent from phone, thus brief. > Am 22.11.2013 21:19 schrieb "Pavel Levshin" : > >> >> 22.11.2013 21:53, Pavel Levshin: >> >> >> Please note that I've just found a regression in my code: it is unable to >> strip quote marks from "quote-string" fields. I need to redo this part. >> >> >> It is fixed now. >> >> I really feel that we need to implement some facility to parse >> comma-separated logs, where a field can have zero length. It is easy to do, >> and it is a quite common log format (I'm using it, for example). Now, >> liblognorm is unable to parse it. There are two ways: >> >> - change char-to behaviour to allow zero-length matches >> - create a new field type, based on current char-to >> >> What to prefer? >> >> >> -- >> Pavel Levshin >> >> >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Thu Nov 28 11:56:49 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Thu, 28 Nov 2013 14:56:49 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> Message-ID: <52972171.90605@levshin.spb.ru> Rainer, I see you are preparing for 1.0.0 release of liblognorm. Libestr removal from internals can be postponed to next minor release, but we need to establish external interface. Instead of current definition, which is int ln_normalize(ln_ctx ctx, char *str, es_size_t strLen, struct json_object **json_p); I propose this: int ln_normalize(ln_ctx ctx, const char *str, size_t strLen, struct json_object **json_p); Why use es_size_t, if we are gettind rid of libestr? This is a reminder, just in case. -- Pavel Levshin 25.11.2013 12:24, Rainer Gerhards: > > On Sun, Nov 24, 2013 at 11:35 AM, Pavel Levshin > wrote: > > > Done, in the same branch on GitHub: > > https://github.com/flicker581/liblognorm/commits/master-json-c > > > > Thanks, already merged. I will later today see that I integrate the > mmlognorm changes, tying things together. It will initially go into v8. > > Also, I have written a brief document on current rulebase syntax > and field types. As a user, I would say that liblognorm lacks > proper documentation, and this doc is better than nothing. It is > not sufficient, though. > > > full ack, but there is so much to do and so little time. And nobody > wants to do the doc, so... > > thanks for this effort, it's definitely useful. > > Rainer > > > -- > Pavel Levshin > > > 23.11.2013 0:33, Rainer Gerhards: >> >> Always looking for backward comp, i'd say create a new one. >> >> Sent from phone, thus brief. >> >> Am 22.11.2013 21:19 schrieb "Pavel Levshin" > >: >> >> >> 22.11.2013 21:53, Pavel Levshin: >> >>> >>> Please note that I've just found a regression in my code: it >>> is unable to strip quote marks from "quote-string" fields. I >>> need to redo this part. >>> >> >> It is fixed now. >> >> I really feel that we need to implement some facility to >> parse comma-separated logs, where a field can have zero >> length. It is easy to do, and it is a quite common log format >> (I'm using it, for example). Now, liblognorm is unable to >> parse it. There are two ways: >> >> - change char-to behaviour to allow zero-length matches >> - create a new field type, based on current char-to >> >> What to prefer? >> >> >> -- >> Pavel Levshin >> >> >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Thu Nov 28 12:00:09 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 28 Nov 2013 12:00:09 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <52972171.90605@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> Message-ID: On Thu, Nov 28, 2013 at 11:56 AM, Pavel Levshin wrote: > Rainer, > > > I see you are preparing for 1.0.0 release of liblognorm. Libestr removal > from internals can be postponed to next minor release, but we need to > establish external interface. Instead of current definition, which is > > int ln_normalize(ln_ctx ctx, char *str, es_size_t strLen, struct > json_object **json_p); > > I propose this: > > int ln_normalize(ln_ctx ctx, const char *str, size_t strLen, struct > json_object **json_p); > > Why use es_size_t, if we are gettind rid of libestr? > ahh... excellent point, I have to admit I overlooked it. es_size_t has the advantage (IMHO) that it is 32 bits and saves us the overhead of passing 64 bits around. I cannot see any reasonable case where we have >2gig messages. How about using uint32_t instead (or just plain int, but...). You were right in time to prevent publishing of the package ;) Rainer > > > This is a reminder, just in case. > > > -- > Pavel Levshin > > > 25.11.2013 12:24, Rainer Gerhards: > > > On Sun, Nov 24, 2013 at 11:35 AM, Pavel Levshin wrote: > >> >> Done, in the same branch on GitHub: >> >> https://github.com/flicker581/liblognorm/commits/master-json-c >> >> >> > Thanks, already merged. I will later today see that I integrate the > mmlognorm changes, tying things together. It will initially go into v8. > > >> Also, I have written a brief document on current rulebase syntax and >> field types. As a user, I would say that liblognorm lacks proper >> documentation, and this doc is better than nothing. It is not sufficient, >> though. >> >> > full ack, but there is so much to do and so little time. And nobody > wants to do the doc, so... > > thanks for this effort, it's definitely useful. > > Rainer > >> >> -- >> Pavel Levshin >> >> >> 23.11.2013 0:33, Rainer Gerhards: >> >> Always looking for backward comp, i'd say create a new one. >> >> Sent from phone, thus brief. >> Am 22.11.2013 21:19 schrieb "Pavel Levshin" : >> >>> >>> 22.11.2013 21:53, Pavel Levshin: >>> >>> >>> Please note that I've just found a regression in my code: it is unable >>> to strip quote marks from "quote-string" fields. I need to redo this part. >>> >>> >>> It is fixed now. >>> >>> I really feel that we need to implement some facility to parse >>> comma-separated logs, where a field can have zero length. It is easy to do, >>> and it is a quite common log format (I'm using it, for example). Now, >>> liblognorm is unable to parse it. There are two ways: >>> >>> - change char-to behaviour to allow zero-length matches >>> - create a new field type, based on current char-to >>> >>> What to prefer? >>> >>> >>> -- >>> Pavel Levshin >>> >>> >>> >>> _______________________________________________ >>> Lognorm mailing list >>> Lognorm at lists.adiscon.com >>> http://lists.adiscon.net/mailman/listinfo/lognorm >>> >>> >> >> _______________________________________________ >> Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm >> >> >> >> _______________________________________________ >> Lognorm mailing list >> Lognorm at lists.adiscon.com >> http://lists.adiscon.net/mailman/listinfo/lognorm >> >> > > > _______________________________________________ > Lognorm mailing listLognorm at lists.adiscon.comhttp://lists.adiscon.net/mailman/listinfo/lognorm > > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Thu Nov 28 12:10:46 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Thu, 28 Nov 2013 15:10:46 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> Message-ID: <529724B6.7040407@levshin.spb.ru> 28.11.2013 15:00, Rainer Gerhards: > > ahh... excellent point, I have to admit I overlooked it. es_size_t has > the advantage (IMHO) that it is 32 bits and saves us the overhead of > passing 64 bits around. I cannot see any reasonable case where we have > >2gig messages. How about using uint32_t instead (or just plain int, > but...). > > I'm not sure. Size_t is universally portable, but uint32_t is also sufficient for all current architectures, which I could think of. Is there someone running liblognorm at 16 bit CPU?.. For external interface, I'd vote for size_t. -- Pavel Levshin From rgerhards at hq.adiscon.com Thu Nov 28 12:19:37 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 28 Nov 2013 12:19:37 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <529724B6.7040407@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> <529724B6.7040407@levshin.spb.ru> Message-ID: On Thu, Nov 28, 2013 at 12:10 PM, Pavel Levshin wrote: > 28.11.2013 15:00, Rainer Gerhards: > > >> ahh... excellent point, I have to admit I overlooked it. es_size_t has >> the advantage (IMHO) that it is 32 bits and saves us the overhead of >> passing 64 bits around. I cannot see any reasonable case where we have >> >2gig messages. How about using uint32_t instead (or just plain int, >> but...). >> >> >> > I'm not sure. Size_t is universally portable, but uint32_t is also > sufficient for all current architectures, which I could think of. Is there > someone running liblognorm at 16 bit CPU?.. > even there uint32_t should be 32bit unsigned, right? > > For external interface, I'd vote for size_t. > > Yeah, I know this problem for many years. I've been totally fine with size_t when we moved vom 16 to 32 bits. With 64 bits, I have a bit of concern as general use in cases like this. Thinking performance wise-manipulating 64 bits requires a lot of space in registers and cache lines. Especially when we know we will never have string that long (can you envision a 4GB syslog message that is run through lognorm -me not...). But I agree that the "clean" solution would is size_t --- I am undecided myself... Can you give me a good argument (besides standards) why size_t would be good here? Or better said: does not hurt. Rainer > > > -- > Pavel Levshin > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel at levshin.spb.ru Thu Nov 28 12:40:44 2013 From: pavel at levshin.spb.ru (Pavel Levshin) Date: Thu, 28 Nov 2013 15:40:44 +0400 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> <529724B6.7040407@levshin.spb.ru> Message-ID: <52972BBC.5090208@levshin.spb.ru> 28.11.2013 15:19, Rainer Gerhards > > > For external interface, I'd vote for size_t. > > > Yeah, I know this problem for many years. I've been totally fine with > size_t when we moved vom 16 to 32 bits. With 64 bits, I have a bit of > concern as general use in cases like this. Thinking performance > wise-manipulating 64 bits requires a lot of space in registers and > cache lines. Especially when we know we will never have string that > long (can you envision a 4GB syslog message that is run through > lognorm -me not...). But I agree that the "clean" solution would is > size_t --- I am undecided myself... > > Can you give me a good argument (besides standards) why size_t would > be good here? Or better said: does not hurt. > I can think only of going back to 16 and 8 bits. Sometimes, I'm working with microcontrollers, where int32_t is too big, and int16_t is standard int. This is not to say that I'll use liblognorm on these platforms. Nevertheless, strlen() and sizeof() are returning size_t. There is an implicit conversion anytime you are using different type. 64 bits do not use more registers on 64bit architecture (x86_64, namely), it is still just one register. As far as I understand, it is 8 byte long on stack, also, because registers are used for stack manipulation. Therefore, this size does not matter as long as it is not used in heap/dynamic variables. -- Pavel Levshin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgerhards at hq.adiscon.com Thu Nov 28 12:49:01 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 28 Nov 2013 12:49:01 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: <52972BBC.5090208@levshin.spb.ru> References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> <529724B6.7040407@levshin.spb.ru> <52972BBC.5090208@levshin.spb.ru> Message-ID: On Thu, Nov 28, 2013 at 12:40 PM, Pavel Levshin wrote: > 28.11.2013 15:19, Rainer Gerhards > > > >> For external interface, I'd vote for size_t. >> >> > Yeah, I know this problem for many years. I've been totally fine with > size_t when we moved vom 16 to 32 bits. With 64 bits, I have a bit of > concern as general use in cases like this. Thinking performance > wise-manipulating 64 bits requires a lot of space in registers and cache > lines. Especially when we know we will never have string that long (can you > envision a 4GB syslog message that is run through lognorm -me not...). But > I agree that the "clean" solution would is size_t --- I am undecided > myself... > > Can you give me a good argument (besides standards) why size_t would be > good here? Or better said: does not hurt. > > > I can think only of going back to 16 and 8 bits. Sometimes, I'm working > with microcontrollers, where int32_t is too big, and int16_t is standard > int. This is not to say that I'll use liblognorm on these platforms. > > Nevertheless, strlen() and sizeof() are returning size_t. There is an > implicit conversion anytime you are using different type. > Yup - I had *excpected* (not verified) that the optimizer just throws away the upper 32 bits. > 64 bits do not use more registers on 64bit architecture (x86_64, namely), > it is still just one register. > I am not sure about GCC, but I think I read that some compilers use the hardware split registers to store two 32 bit int's into a single 64Bit register, thus effectively freeing more registers. Again, not sure about gcc. A > As far as I understand, it is 8 byte long on stack, also, because > registers are used for stack manipulation. > that would be a very convincing argument, as then it really doesn't matter at all (and even may be counter productive). On the other hand, this sounds like a bit of a waste performance wise. Rainer > Therefore, this size does not matter as long as it is not used in > heap/dynamic variables. > > > -- > Pavel Levshin > > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at lang.hm Thu Nov 28 13:19:07 2013 From: david at lang.hm (David Lang) Date: Thu, 28 Nov 2013 04:19:07 -0800 (PST) Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> <529724B6.7040407@levshin.spb.ru> <52972BBC.5090208@levshin.spb.ru> Message-ID: On Thu, 28 Nov 2013, Rainer Gerhards wrote: > On Thu, Nov 28, 2013 at 12:40 PM, Pavel Levshin wrote: > >> 28.11.2013 15:19, Rainer Gerhards >> >> >> >>> For external interface, I'd vote for size_t. >>> >>> >> Yeah, I know this problem for many years. I've been totally fine with >> size_t when we moved vom 16 to 32 bits. With 64 bits, I have a bit of >> concern as general use in cases like this. Thinking performance >> wise-manipulating 64 bits requires a lot of space in registers and cache >> lines. Especially when we know we will never have string that long (can you >> envision a 4GB syslog message that is run through lognorm -me not...). But >> I agree that the "clean" solution would is size_t --- I am undecided >> myself... >> >> Can you give me a good argument (besides standards) why size_t would be >> good here? Or better said: does not hurt. >> >> >> I can think only of going back to 16 and 8 bits. Sometimes, I'm working >> with microcontrollers, where int32_t is too big, and int16_t is standard >> int. This is not to say that I'll use liblognorm on these platforms. >> >> Nevertheless, strlen() and sizeof() are returning size_t. There is an >> implicit conversion anytime you are using different type. >> > > Yup - I had *excpected* (not verified) that the optimizer just throws away > the upper 32 bits. > > >> 64 bits do not use more registers on 64bit architecture (x86_64, namely), >> it is still just one register. >> > > I am not sure about GCC, but I think I read that some compilers use the > hardware split registers to store two 32 bit int's into a single 64Bit > register, thus effectively freeing more registers. Again, not sure about > gcc. A > > >> As far as I understand, it is 8 byte long on stack, also, because >> registers are used for stack manipulation. >> > > that would be a very convincing argument, as then it really doesn't matter > at all (and even may be counter productive). On the other hand, this sounds > like a bit of a waste performance wise. > > Rainer > > >> Therefore, this size does not matter as long as it is not used in >> heap/dynamic variables. performance wise, 64 bit variables are just as fast or faster to use on x86-64 than 32 bit variables. This is as long as they are aligned properly (things slow down if they are not aligned, but as I understand it, it's to the same speed as 32 bit access. In any case, this is a very small difference. The advantage os using 32 bit values starts to show up when you are storing the data, more data fits in a single CPU cache line, etc. David Lang -------------- next part -------------- _______________________________________________ Lognorm mailing list Lognorm at lists.adiscon.com http://lists.adiscon.net/mailman/listinfo/lognorm From rgerhards at hq.adiscon.com Thu Nov 28 14:17:52 2013 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 28 Nov 2013 14:17:52 +0100 Subject: [Lognorm] [rsyslog] liblognorm In-Reply-To: References: <526E77CC.6040904@levshin.spb.ru> <52716335.9090505@levshin.spb.ru> <528F9A01.1060209@levshin.spb.ru> <528FBC67.6040205@levshin.spb.ru> <5291D662.2010009@levshin.spb.ru> <52972171.90605@levshin.spb.ru> <529724B6.7040407@levshin.spb.ru> <52972BBC.5090208@levshin.spb.ru> Message-ID: On Thu, Nov 28, 2013 at 1:19 PM, David Lang wrote: > On Thu, 28 Nov 2013, Rainer Gerhards wrote: > > On Thu, Nov 28, 2013 at 12:40 PM, Pavel Levshin > >wrote: >> >> 28.11.2013 15:19, Rainer Gerhards >>> >>> >>> >>> For external interface, I'd vote for size_t. >>>> >>>> >>>> Yeah, I know this problem for many years. I've been totally fine with >>> size_t when we moved vom 16 to 32 bits. With 64 bits, I have a bit of >>> concern as general use in cases like this. Thinking performance >>> wise-manipulating 64 bits requires a lot of space in registers and cache >>> lines. Especially when we know we will never have string that long (can >>> you >>> envision a 4GB syslog message that is run through lognorm -me not...). >>> But >>> I agree that the "clean" solution would is size_t --- I am undecided >>> myself... >>> >>> Can you give me a good argument (besides standards) why size_t would be >>> good here? Or better said: does not hurt. >>> >>> >>> I can think only of going back to 16 and 8 bits. Sometimes, I'm working >>> with microcontrollers, where int32_t is too big, and int16_t is standard >>> int. This is not to say that I'll use liblognorm on these platforms. >>> >>> Nevertheless, strlen() and sizeof() are returning size_t. There is an >>> implicit conversion anytime you are using different type. >>> >>> >> Yup - I had *excpected* (not verified) that the optimizer just throws away >> the upper 32 bits. >> >> >> 64 bits do not use more registers on 64bit architecture (x86_64, namely), >>> it is still just one register. >>> >>> >> I am not sure about GCC, but I think I read that some compilers use the >> hardware split registers to store two 32 bit int's into a single 64Bit >> register, thus effectively freeing more registers. Again, not sure about >> gcc. A >> >> >> As far as I understand, it is 8 byte long on stack, also, because >>> registers are used for stack manipulation. >>> >>> >> that would be a very convincing argument, as then it really doesn't matter >> at all (and even may be counter productive). On the other hand, this >> sounds >> like a bit of a waste performance wise. >> >> Rainer >> >> >> Therefore, this size does not matter as long as it is not used in >>> heap/dynamic variables. >>> >> > performance wise, 64 bit variables are just as fast or faster to use on > x86-64 than 32 bit variables. This is as long as they are aligned properly > (things slow down if they are not aligned, but as I understand it, it's to > the same speed as 32 bit access. > > In any case, this is a very small difference. > Yeah, I agree here. But an important question is whether the stack always holds 64 bit values, even if the variable is 32 bits wide. Then it would turn down into a cache line questions, actually the same one we have for structs. I have not found the exact specifics on what gcc does quickly enough, so I looked up Intels processor manual [1]. In section 7.3.1.5 it says: ==== Vol. 1 7-7 PROGRAMMING WITH GENERAL-PURPOSE INSTRUCTIONS 7.3.1.5 Stack Manipulation Instructions in 64-Bit Mode In 64-bit mode, the stack pointer size is 64 bits and cannot be overridden by an instruction prefix. In implicit stack references, address-size overrides are ignored. Pushes and pops of 32-bit values on the stack are not possible in 64-bit mode. 16-bit pushes and pops are supported by us ing the 66H operand-size prefix. PUSHA, PUSHAD, POPA,and POPAD are not supported. ==== So this boils down that indeed always 64 bit are pushed onto the stack (the compiler could optimize two 32 parameters into a single one, but that doesn't always work, looks a bit clumpsy and wouldn't apply in our case anyways...). Bottom line: I am now convinced that size_t is a good fit. T Rainer [1] http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf > The advantage os using 32 bit values starts to show up when you are > storing the data, more data fits in a single CPU cache line, etc. > > David Lang > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > _______________________________________________ > Lognorm mailing list > Lognorm at lists.adiscon.com > http://lists.adiscon.net/mailman/listinfo/lognorm > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedl at adiscon.com Thu Nov 28 18:08:22 2013 From: friedl at adiscon.com (Florian Riedl) Date: Thu, 28 Nov 2013 18:08:22 +0100 Subject: [Lognorm] liblognorm 1.0.0 released Message-ID: Hi all, We have just released liblognorm 1.0.0. This is a completely revamped and enhanced version. It introduces incompatible API changes, which were unavoidable. For details please visit http://www.liblognorm.com/news/on-liblognorm-1-0-0/ Changes Version 1.0.0, 2013-11-28 - WARNING: this version has incompatible interface and older programs will not compile with it. For details see http://www.liblognorm.com/news/on-liblognorm-1-0-0/ - libestr is not used any more in interface functions. Traditional C strings are used instead. Internally, libestr is still used, but scheduled for removal. - libee is not used any more. JSON-C is used for object handling instead. Parsers and formatters are now part of liblognorm. - added new field type "rest", which simply sinks all up to end of the string. - added support for glueing two fields together, without literal between them. It allows for constructs like: %volume:number%%unit:word% which matches string "1000Kbps" - Fix incorrect merging of trees with empty literal at end Thanks to Pavel Levshin for the patch - this version has survived many bugfixes Download: http://www.liblognorm.com/download/liblognorm-1-0-0/ As always, feedback is appreciated. Best regards, Florian Riedl -------------- next part -------------- An HTML attachment was scrubbed... URL: