[rsyslog] output plugin calling interface

david at lang.hm david at lang.hm
Wed May 6 18:06:32 CEST 2009


On Wed, 6 May 2009, Rainer Gerhards wrote:

>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
>
>>>> in section 3.11 you say
>>>> "However, the action itself may also end the transaction and notify the
>>>> caller." by this do you mean that the action may abort the transaction?
> or
>>>> that the action could decide to commit (complete) the transaction?
>>>
>>> The later case.
>>>
>>>>
>>>> if you mean abort the transaction, this makes sense (essentially on any
>>>> doAction() call the return code could be 'fatal error, transaction
>>>> aborted' and the queue walker code would have to fail the entire batch
> and
>>>> retry) if you mean allow it to decide to commit the transaction early if
>>>> it chooses, this strikes me as a wrong thing to do.
>>>
>>> You need to think broader than databases. For example, the tcp forwarder
>>> NEEDS to commit every record, simply because it has no other chance in
> doing
>>> things. Well, it may commit only after a given buffer size, but it
>> definitely
>>> can not (or should not) wait until the caller is finished. Even if it
> waited
>>> until endTransaction is called(), it could only then move data to the
> actual
>>> output, where it than - right in the middle - may see problems. It is far
>>> better if it can commit in between.
>>
>> I'm not sure I agree with you on this. if you have lots of small messages,
>> you do have advantages to sending them all to the stack at once. you don't
>> _have_ to do so (tcp will combine the messages as it's waiting to send
>> them out, but you could waste bandwith with small packets if the data
>> comes in slowly enough)
>
> I agree with you on the performance. I disagree that this means the output
> transaction and the transaction from the upper layer must exactly match.
> Think RELP. Would that mean that a relp window must always be as large as the
> largest batch? OK, in this case I control the protocol and so could change
> it. But what with a SNMP trap? Or a rfc3195 conversation? They *have*
> different notation of a transaction. So the best thing I can do IMHO is
> permit the output plugin to tell the engine when its transaction was
> finished. That's no problem for the upper layer, it just needs to mark these
> messages as committed. But it is simply impossible for all outputs to have
> the same idea of transaction than the upper layer may have.

I guess I see the upper layer having a larger definition of a batch than 
the lower level to be a configuration error.

not one to refuse to boot with, but one that could result in wasted 
effort.

in the situations you describe another way of handling it would be to wait 
until the endTransaction() call and then return that it only suceeded with 
the first N messages (N being the number that fit it's limit)

that is less efficiant than what you are proposing, (in that the messages 
>N need to be resubmitted), but it simplifies things by avoiding the need 
for the handling the unexpected commit (and buffering errors for the next 
call, etc)

it ends up combining this situation with the 'disk full, only could write 
X messages' issue. and makes it so that transaction results only happen 
when you tell it to do a transaction.

>>
>>>> in the case of rsyslog
>>>> (where we are commiting a set of unrelated messages) it is not
> nessasarily
>>>> a fatal problem, but it just seems to complicate things with little
>>>> benifit.
>>>
>>> It actually simplifies things - because we need not take different
>> approaches
>>> to different type of output plugins.
>>
>> I'm seeing it as the other way around, this complicates things by making
>> there be different ways for the transaction to be committed.
>
> OK, but what is complicated by that? Also think about third-party (already
> existing) output modules, which I need to define either a totally different
> output interface for or use the the extensible one I described. Even if I
> force a totally new interface, I still need to support non-transactional
> outputs in the upper layers. But then I need different code pathes to do
> that.

I was thinking about this, and am wondering if it's really a bad thing to 
seperate the code paths.

I'm thinking along the lines that if the module doesn't support the 
doTransaction method you set batch_size=1 and use the old doAction() 
interface.

if it does support the doTransaction method you use that exclusivly (the 
module internally may share code between doTransaction and doAction, and 
most will)

as it is you already will have code paths that you only follow in the case 
where the module supports doTransaction, is it really simpler to have the 
two partially combined instead of keeping them seperate?

>>>> if you really do want doAction() to be able to finish a transaction and
>>>> start another one, the state diagram will be far more complicated.
>>>>
>>>> one final note on locking, I expect that the process of processing
> objects
>>>> in the queue (marking them as pending, formatting them, and calling
>>>> doAction() on them), is going to require some locking in the face of
>>>> multiple worker threads (to prevent two threads from processing the same
>>>> message). I see two ways of doing this.
>>>>
>>>> processbatch(
>>>>    foreach message (up to limit or number in queue){
>>>>      lock queue
>>>>      mark message pending
>>>>      unlock queue
>>>>      formatMessage()
>>>>      doAction()
>>>>    }
>>>>    endTransaction()
>>>>    lock queue
>>>>    foreach message{
>>>>      mark completed
>>>>    }
>>>>    unlock queue
>>>> )
>>>>
>>>> processbatch(
>>>>    lock queue
>>>>    foreach message (up to limit or number in queue){
>>>>      mark message pending
>>>>      formatMessage()
>>>>      doAction()
>>>>    }
>>>>    unlock queue
>>>>    endTransaction()
>>>>    lock queue
>>>>    foreach message{
>>>>      mark completed
>>>>    }
>>>>    unlock queue
>>>> )
>>>
>>> doAction cannot be called within the queue worker. A simple reason is
> that
>>> this does not support direct mode. Also, it would take far too long to
>>> complete. If we have infinite retries, it may sit for a day or more
> inside
>>> this call ;) [not precisely in that call, but in a loop surrounding it].
>>
>> Ok, I am missing things again. I had understood that doAction() _was_
>> called by the queue worker,
>
> It depends on whether or not there is a queue worker. A direct queue does not
> have one, so here it is called by the enqueuer (but you are right, you may
> say "queue worker" as an abstraction. Still, the pseudocode above is very far
> from how things work. It is much more like this:
>
> queueworker{
>   lock queue
>   dequeue batch
>   unlock queue
>   process batch
> }
>
> If I assume it is useful to make this ultra-reliable (and I still doubt it
> is), that would change to something like this:
>
> queueworker{
>   lock queue
>   mark previous batch as done
>   dequeue batch & mark messages as being processed
>   unlock queue
>   process batch
> }
>
> Please note that the ultra-reliability looks rather simple in that
> pseudocode, it is much harder in reality.

Ok, makes sense. and by combining the marking of the prior batch completed 
inside the same lock as starting the next batch you avoid one set of lock 
transactions compared to what I was thinking.

>>>> I suspect that the overhead in manipulating the lock is high enough that
>>>> the second approach will be a win (similar to the efficiancies that were
>>>> gained in the UDP input module by letting it add multiple messages to
> the
>>>> queue inside one lock).
>>>>
>>>> As such I am seeing significant value in making the doAction() call be
>>>> lightweight under all conditions, which is an argument against having it
>>>> do any more than nessasary.
>>>
>>> We do not know what this may be. Again, don't think "database only".
>>
>> I am not. even if the action is writing to disk (with fsync), to a pipe,
>> or calling an external program it can take a significant amount of time.
>> you say above that doAction could block for hours, so I am confused a bit
>> here. (the new version of the document may clear this up)
>
> My sentence should better be phrased "can not be called where you described
> it". See my chart and you'll see that the queue mutex is never looked for an
> extended period of time - not even in ultra-reliable mode (part of why it is
> so hard to do it).
>
> My understanding is that you work under the assumption that doAction() is
> called during dequeueing. As we don't know what doAction() will do - it may
> be very lengthy - it is strictly de-coupled from anything that holds the
> queue mutex (this also applies to v3 and is the base design). You may take
> the word "base design" as a hint that probably the whole engine would need to
> be rewritten if that design is to be changed. I do not see any reasons for
> such a change ;)
>
> I guess our discussion circles around the point that I did not yet convey the
> full picture on how things work. At least I have the "feeling" that you have
> a different architecture on your mind.

yes, from prior discussions I was thinking that you were planning to have 
the queue worker iterate through the messaes on the queue, calling 
doAction() for each one as it got to it, and then when it hits the limit 
doing endTransaction()

if you are instead going to do what you just described (pull the entire 
list of messages, then unlock the queue and process them) then it doesn't 
matter if the work is done in doAction() or endTransaction()

the question of if it makes sense to have the batch mode be a completely 
seperate codepath, or have the two logical paths combined is still a good 
one.

thinking out loud here, how hard would it be to detect that there is no 
processBatch() method and create one that is a wrapper around the 
doAction() interface when you load the module, and then let all the code 
after that use the new interface?

David Lang



More information about the rsyslog mailing list