[Lognorm] [rsyslog] liblognorm
Rainer Gerhards
rgerhards at hq.adiscon.com
Thu Oct 31 22:02:02 CET 2013
Thats great news! Please bear with me a short time, i am right in the
middle of that big rule engine refactoring. I am very interested in
merging this as soon i have sufficient time to do it decently. Great work!
Rainer
Sent from phone, thus brief.
Am 30.10.2013 20:51 schrieb "Pavel Levshin" <pavel at levshin.spb.ru>:
>
> So, I have taken the opportunity and refactored liblognorm to use json-c
> instead of libee. Some parts of libee now present in liblognorm, notably
> field parsers and encoders. They were rewritten to get rid of libee data
> structures. In the same time, many bugs were fixed, and many were
> undoubtedly produced.
>
> Current state of the library can be seen here:
>
> https://github.com/flicker581/**liblognorm/tree/master-json-c<https://github.com/flicker581/liblognorm/tree/master-json-c>
>
> It is work in progress, though. Lognormalizer works fine, but mmnormalize
> has not been updated yet. New version is somewhat slower than older
> versions used to be. In my tests it was ~40% slower. This slowdown is
> attributable to more complex memory management due to bigger allocations by
> json-c. Still, it should be much faster than mmnormalize with older
> liblognorm.
>
> Comments are greatly welcome.
>
> * b) there is no terminator until the end of the buffer
>>>
>>>
>>> same problem. The broader the simple parsers are, the higher the chances
>> for false positives or much more backtracking (in the end-of-line case
>> it's
>> just false positive). The core idea is to use (lots of) very special
>> parsers, and resort to generic ones only if there is no way around that.
>>
>
> Char-to parser stops at a certain character. Therefore, only way to match
> is to have this characted after the field. If there is the literal
> characted after the field, it is safe to have the field "empty", I think.
> It should not even break any existing meaningful rules.
>
> Both break CSV parsing.
>>>
>>>
>>> Isn't there a CSV parser already?
>>
>>
> No, there is not. But it is just an example.
>
>
> --
> Pavel Levshin
>
>
> 28.10.2013 20:01, Rainer Gerhards:
>
>> On Mon, Oct 28, 2013 at 3:42 PM, Pavel Levshin <pavel at levshin.spb.ru>
>> wrote:
>>
>> Hello.
>>>
>>> Is it OK to discuss liblognorm here?
>>>
>>>
>>> I think it's fine, but it may be a good idea to CC the lognorm list,
>> there
>> may be one or two folks over there ;)
>>
>>
>> This approach to log parsing seems attractive to me. In it's current
>>> state, though, it not very usable for highload, and if there is no high
>>> load, then one can use regexps to do the same. So I would like to extend
>>> the idea to something fitting our purposes, instead of writing custom
>>> parsing module.
>>>
>>> There are a few shortcomings now:
>>>
>>> 1. Liblognorm is using libee for parsing and event handling, then the
>>> event gets converted to json and imported to json-c structures. It is
>>> declared as inefficient. I'll do my own tests of how inefficient it is in
>>> reality. Then, what is preferred way of overcoming it? Liblognorm could
>>> be
>>> extended to support json-c natively, or it could present some callback
>>> interface to populate fields in mmnormalize. It is questionable if we
>>> should continue to use libee, then. Or libee could be rewritten to use
>>> json-c, maybe...
>>>
>>>
>> That would probably require a much longer answer, but let me at least go
>> for a quick one. There is a lot of legacy with liblognorm and libee. libee
>> was thought to become the reference lib for Mitre's CEE effort, before it
>> begun to hibernate (to phrase it politely). Even worse, libee is written
>> to
>> a much older spec, and is very bloated in many of its objects. The
>> long-term approach should probably to get rid of libee altogether, but
>> there are some other apps that depend on it, so we must be somewhat
>> carefully (Champ, any comments?). BTW: the same is true for libestr, which
>> was meant to be used as a common string-lib for CEE, as CEE initially
>> thought they would desperately need to support embedded NUL chars,
>> something that was later dropped (but still is part of libestr).
>>
>> Finally, json-c is probably even an interim solution for rsyslog. It is
>> quite generic, which also boils down to slower and memory hungry than
>> absolutely necessary. There has been thinking about replacing it when we
>> have time to do so (or fork as slimmer version).
>>
>> As a tactical solution, my preferrence would still be to port liblognorm
>> to
>> work with native json-c objects. I think that would also clean up larger
>> parts of the code.
>>
>>
>> 2. Liblognorm is unable to match last part of a string in some cases.
>>> There is no field type which could fit anything till the end of string.
>>> This quirk maybe arise from some ideology, but it makes impossible, for
>>> example, to parse common CSV format, unless last field fits some of
>>> predefined field types by accident. Currently, parsers are defined in
>>> libee, and there is no interface to add one, which presents us with a
>>> choice: extend libee or use own parsers. There can be other useful field
>>> types, as well.
>>>
>>> This came up on the list before. I thought there were some "rest of
>> line"
>> type of syntax, but I had no time checking that. Looks like it isn't. I
>> think it would be a useful thing to have, even though this may lead to
>> some
>> problems during the parser run.
>>
>>
>> For the latter, what is the reason under these two restrictions in
>>> char-to
>>> parser:
>>>
>>> It is considered a format error if
>>> * a) the to-be-parsed buffer is already positioned on the terminator
>>> character
>>>
>>> don't remember exactly, but it for sure has to do with avoiding false
>> positives
>>
>>
>> * b) there is no terminator until the end of the buffer
>>>
>>>
>>> same problem. The broader the simple parsers are, the higher the chances
>> for false positives or much more backtracking (in the end-of-line case
>> it's
>> just false positive). The core idea is to use (lots of) very special
>> parsers, and resort to generic ones only if there is no way around that.
>>
>>
>> Both break CSV parsing.
>>>
>>>
>>> Isn't there a CSV parser already?
>>
>>
>> HTH
>> Rainer
>>
>> --
>>> Pavel Levshin
>>>
>>> ______________________________****_________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog>
>>> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
>>> >
>>> http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/>
>>> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>
>>> >
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> ______________________________**_________________
>> rsyslog mailing list
>> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
>> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>
> ______________________________**_________________
> Lognorm mailing list
> Lognorm at lists.adiscon.com
> http://lists.adiscon.net/**mailman/listinfo/lognorm<http://lists.adiscon.net/mailman/listinfo/lognorm>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.adiscon.net/pipermail/lognorm/attachments/20131031/908d0f03/attachment.htm>
More information about the Lognorm
mailing list