[rsyslog] multi-message handling and databases
rgerhards at hq.adiscon.com
Mon Apr 20 19:57:55 CEST 2009
> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Monday, April 20, 2009 7:21 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] multi-message handling and databases
> On Mon, 20 Apr 2009, Rainer Gerhards wrote:
> > David,
> > I start with some quick pointers. I think it makes sense to move the
> > of this discussion into a document - or alternatively move it to the
> wiki, if
> > you (or others) find this useful. I have to admit that I am a bit
> > about the wiki, I guess mail is better for discussion here. But I
> wanted to
> > mention this option.
> > Now on to the meat:
> >> -----Original Message-----
> >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> >> Sent: Saturday, April 18, 2009 12:29 AM
> >> To: rsyslog-users
> >> Subject: [rsyslog] multi-message handling and databases
> >> the company that I work for has decided to sponser multi-message
> >> output capability, they have chosen to remain anonomous (I am
> >> from
> >> my personal account)
> >> there are two parts to this.
> >> 1. the interaction between the output module and the queue
> >> 2. the configuration of the output module for it's interaction with
> >> database
> >> for the first part (how the output module interacts with the queue),
> >> the
> >> criteria are that
> >> 1. it needs to be able to maintain guarenteed delivery (even in the
> >> face
> >> of crashes, assuming rsyslog is configured appropriately)
> >> 2. at low-volume times it must not wait for 'enough' messages to
> >> accumulate, messages should be processed with as little latency as
> >> possible
> >> to meet these criteria, what is being proposed is the following
> >> a configuration option to define the max number of messages to be
> >> processed at once.
> >> the output module goes through the following loop
> > This sentence covers much of the complexity of this change ;)
> > The "problem" is that is it the other way around. It is not the
> output module
> > that asks the queue engine for data, it is the queue engine that
> pushes data
> > to the output module. While this sounds like a simple change of
> positions, it
> > has greater implications.
> > ... especially if you think about the data flow. At this point, it
> may make
> > sense to review the data flow. I have described it here:
> > http://www.rsyslog.com/Article350.phtml
> I will do this later today.
> > Even if you don't listen to the presentation, the diagram is useful.
> In it,
> > you see there are n queues, with n being 1 + number of actions. The
> > is the main message queue. So each message moves first into the main
> > is dequeued there (in the push-way described above), run through the
> > engine and then placed into the relevant action queues.
> > So the new interface does not necessarily need to modify the main
> queue (but
> > there is much benefit in doing so). But it must change the way action
> > deliver messages. That, in turn, means that the new batch mode can
> only work
> > if the action is configured to use any actual queueing mode (not the
> > "DIRECT" mode, where incoming messages are directly handed over to
> the action
> > processing without any actual in-memory buffering).
> hmm, I suspect that having the 'direct' mode able to do this IFF (if
> and only if) all output modules are able to do the multi-message
> would be a win.
You can't do that, because if it is in direct mode, there always is at most
one message inside the queue. You can not operate on the main message queue
"batch", as this is not yet filtered, so you do not know which message is for
which action. So, from the action perspective, nothing is queued at this
point. Thus, you need a queue running in a real queue mode. I hope it will
become more clear if you have looked at the data flow (otherwise I need to
write some big overview about it...).
> specificly I expect to find that the locking process to deliver a
> message is expensive enough
This is handled by the main queue batch. So even in direct mode, we have the
benefit from the locking code improvement (I agree, potentially a *very big*
gain). I guess you currently think of a single big queue inside rsyslog,
which is the wrong picture. We have chained queues and you always need to
look which part of the message processing works on which queues. Very
> that it's a big win even for the simple
> default case of writing to a file. I also expect to see wins for moving
> events from the main queue to the action queues.
Yup, thus the direct mode oft he action queue does not affect the main queue
at all (and in direct mode we have no locing in the action queues, why should
we ... nothing needs to by synchronized if you just stick the message into
> > So the approach is probably to enhance the queue object (which drives
> > the main and action queues) to support dequeueing of multiple
> messages at
> > once (what, as a side-effect, will also greatly reduce looking
> > Under normal operations, this is relatively straightforward.
> so far so good.
> > It gets messy when there is failure in the actions and it gets very
> > if we think about the various shutdown scenarios (not to mention disk
> > assisted queues actually running in DA mode). I have begin to look at
> > issues (part of today's and over-the-weekend thinking ;)), but this
> > probably need some more time to finally solve - plus some discussion,
> > guess...
> would it simplify things significantly to say that the multi-message
> output and having multiple worker threads are exclusive?
Unlikely (but I don't like to totally outrule it, probability less than 5%)
> >> X=max_messages
> >> if (messages in queue)
> >> mark that it is going to process the next X messages
> >> grab the messages
> >> format them for output
> >> attempt to deliver the messages
> >> if (message delived sucessfully)
> >> mark messages in the queue as delivered
> >> X=max_messages (reset X in case it was reduced due to delivery
> >> errors)
> >> else (delivering this batch failed, reset and try to deliver the
> >> first half)
> > I think, in our previous discussion (mailing list archive), we
> concluded that
> > there is no value in re-trying with half of the batch.
> very possibly, I'm not remembering it.
> not doing so will simplify the code considerably, but the advantages of
> retrying with half the batch are:
> 1. you deliver as much as you can
> 2. when you finally get stuck, you can pinpoint directly what message
> were stuck on (in case you have a failure based on the data, say quotes
> something that then gets formatted into a database, or slashes in
> something that becomes a filename component)
> your call
I need to refer you back to our previous discussion. Unfortunately, it was
private. I dug the link out and sent it via private mail. Sorry all others,
please stand by a little moment. If I have not read it wrong, it boiled down
to we have no non-transactional sources that were problematic and we had not
identified cases where it would be useful to retry with fewer elements.
I'd provide a more complete description, but that would probably take me
another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long
discussion). David, if you like to quote anything from me, feel free to do
> >> unmark the messages that it tried to deliver (putting them back
> >> into the status where no delivery has been attempted)
> >> X=int(# messages attempted / 2)
> >> if (X=0)
> >> unable to deliver a single message, do existing message error
> >> process
> >> this approach is more complex than a simple 'wait for X messages,
> >> insert them all', but it has some significant advantages
> >> 1. no waiting for 'enough' things to happen before something gets
> >> written
> >> 2. if you have one bad message, it will transmit all the good
> >> before the bad one, then error out only on the bad one before
> >> up
> >> with the ones after the bad one.
> > This needs to be specified. Again, I think our prior conclusion was
> that this
> > would not make much sense. After all, if e.g. a SQL statement is
> invalid in
> > the template, how should it recover? If the sql statement is correct,
> > should it eternally fail? Or should we drop a message if it fails
> after n
> > attempts (OK, we can do that already ;)). Hard to do for non-
> > outputs.
> as noted above, I'm thinking in terms of the data in the particular log
> message being something that it shouldn't be, that causes problems for
> output module
> for databases this could be quotes
> for file output with dynamic files you could get a hostname or program
> that has a slash (or ../../../../../../etc/shadow) in it.
> in theory these should all be detected by the module and scrubbed
> being submitted, in practice bugs happen (especially if/when rsyslog
> starts dealing with unicode messages), being able to pinpoint 'this is
> message that I was unable to deal with' is very helpful.
> with a vector interface, another option would be to allow the output
> module to report back how many of the submitted messages it sucessfully
> delivered. that way any 'retry half' type logic could be in the module,
> and only if it makes sense. for a file output module, if you ran out of
> disk space partway through the write, it could report on the number
> it sucessfully wrote.
> as I said before, your call.
Let's go through previous argument, first. We are re-iterating ;)
> David Lang
> rsyslog mailing list
More information about the rsyslog