[rsyslog] output plugin calling interface

david at lang.hm david at lang.hm
Wed May 6 18:51:17 CEST 2009


On Wed, 6 May 2009, Rainer Gerhards wrote:

>>>>>> in the case of rsyslog
>>>>>> (where we are commiting a set of unrelated messages) it is not
>>> nessasarily
>>>>>> a fatal problem, but it just seems to complicate things with little
>>>>>> benifit.
>>>>>
>>>>> It actually simplifies things - because we need not take different
>>>> approaches
>>>>> to different type of output plugins.
>>>>
>>>> I'm seeing it as the other way around, this complicates things by making
>>>> there be different ways for the transaction to be committed.
>>>
>>> OK, but what is complicated by that? Also think about third-party
> (already
>>> existing) output modules, which I need to define either a totally
> different
>>> output interface for or use the the extensible one I described. Even if I
>>> force a totally new interface, I still need to support non-transactional
>>> outputs in the upper layers. But then I need different code pathes to do
>>> that.
>>
>> I was thinking about this, and am wondering if it's really a bad thing to
>> seperate the code paths.
>>
>> I'm thinking along the lines that if the module doesn't support the
>> doTransaction method you set batch_size=1 and use the old doAction()
>> interface.
>
> Oh, that's a *very serious* performance hit. Because batches not only support
> transactions but are the defining factor for the number of queue lock
> operations on a busy system. So with a batch size of 32, you do 1 lock. With
> a size of 1, you do 32 locks - both to process 32 messages.

exactly as I expected. note that the performance hit is no worse than how 
things work today.

> And now consider the usual config, where almost everything runs in direct
> mode. So if at least one output does not support batches, the main message
> queue must run at batches of one.
>
> Plus, I need complex logic to detect all that. A unified interface is simple
> and elegant and does not show any of these problems.

yes, you need to either limit your batch size to the minumum of the batch 
sizes defined for all the outputs that you are dealing with, or you need 
very complicated logic

for example:

does a message that is filtered out count against the batch size?

if you get 10,000 logs/sec, set a batch size of 100, and 1% of the log 
messages need to go to your database, does this mean that you will end up 
doing 100 database transactions/sec, or 1 database transaction/sec?

there are a lot of interesting and nasty interactions that you can get in 
these situations.


I was thinking that anyone wanting to get top performance would basicly 
be required to define seperate action queues (not nessasarily large ones, 
I would say probably ~batch size * (# worker threads +1) would be a good 
default) and the main queue walker would just filter the messages into the 
action queues. then the action queue walkers would do the batching. no 
need to create a new way to buffer things, we already have a buffer 
mechanism, it's called a seperate action queue.

>> if it does support the doTransaction method you use that exclusivly (the
>> module internally may share code between doTransaction and doAction, and
>> most will)
>
> What would that simplify?

I am seeing that this would complicate the setup code a bit, but simplify 
the logic for processing messages from the queue as that wouldn't have to 
deal with batch and non-batch modes, it would only have the batch mode.

>> as it is you already will have code paths that you only follow in the case
>> where the module supports doTransaction, is it really simpler to have the
>> two partially combined instead of keeping them seperate?
>
> These separate code pathes will probably be a single "if" statement that does
> the endTransaction() call at the end of the batch. So it is actually a single
> statement and three lines of code. The rest, I think, is absolutely the same.

I'm not seeing that. it's not just the code at the end of the batch, it's 
also pulling the appropriate number of messages off the queue in the first 
place. so I would see the minimum as two 'if' blocks

>>>>>> I suspect that the overhead in manipulating the lock is high enough
> that
>>>>>> the second approach will be a win (similar to the efficiancies that
> were
>>>>>> gained in the UDP input module by letting it add multiple messages to
>>> the
>>>>>> queue inside one lock).
>>>>>>
>>>>>> As such I am seeing significant value in making the doAction() call be
>>>>>> lightweight under all conditions, which is an argument against having
> it
>>>>>> do any more than nessasary.
>>>>>
>>>>> We do not know what this may be. Again, don't think "database only".
>>>>
>>>> I am not. even if the action is writing to disk (with fsync), to a pipe,
>>>> or calling an external program it can take a significant amount of time.
>>>> you say above that doAction could block for hours, so I am confused a
> bit
>>>> here. (the new version of the document may clear this up)
>>>
>>> My sentence should better be phrased "can not be called where you
> described
>>> it". See my chart and you'll see that the queue mutex is never looked for
> an
>>> extended period of time - not even in ultra-reliable mode (part of why it
> is
>>> so hard to do it).
>>>
>>> My understanding is that you work under the assumption that doAction() is
>>> called during dequeueing. As we don't know what doAction() will do - it
> may
>>> be very lengthy - it is strictly de-coupled from anything that holds the
>>> queue mutex (this also applies to v3 and is the base design). You may
> take
>>> the word "base design" as a hint that probably the whole engine would
> need
>> to
>>> be rewritten if that design is to be changed. I do not see any reasons
> for
>>> such a change ;)
>>>
>>> I guess our discussion circles around the point that I did not yet convey
>> the
>>> full picture on how things work. At least I have the "feeling" that you
> have
>>> a different architecture on your mind.
>>
>> yes, from prior discussions I was thinking that you were planning to have
>> the queue worker iterate through the messaes on the queue, calling
>> doAction() for each one as it got to it, and then when it hits the limit
>> doing endTransaction()
>
> I am not sure if you overlooked that message, the new queue worker already
> *exists*. So you can look at it in actual code. Same for parts of the action
> interface (without the "real" transaction support).
>
> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/queue.c;h=c2df928b6303
> 34449d117aa191ec1d502525346b;hb=multi-dequeue#l1396
>
> All of this, of course, is experimental and will not work if only the
> slightest error occurs somewhere in the system (and it may crash even if
> everything else runs fine ;))

I'll try to look at this

>>
>> if you are instead going to do what you just described (pull the entire
>> list of messages, then unlock the queue and process them) then it doesn't
>> matter if the work is done in doAction() or endTransaction()
>>
>> the question of if it makes sense to have the batch mode be a completely
>> seperate codepath, or have the two logical paths combined is still a good
>> one.
>>
>> thinking out loud here, how hard would it be to detect that there is no
>> processBatch() method and create one that is a wrapper around the
>> doAction() interface when you load the module, and then let all the code
>> after that use the new interface?
>
> That sounds pretty much like what I am proposing above ;)
>
> You can consider this to be the wrapper:
>
> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=action.c;h=928b30dc3c5cbb2f6af
> 41cdf2ea02a3f1c14b06a;hb=multi-dequeue#l544
>
> This is the too-simplistic version, but I hope it conveys the idea.
>
> The bottom line is that it is more complicated to create a real wrapper than
> to implement the interface in the way I described. I fail to see the
> complexity that you are so concerned about...

Ok, I could be wrong here.

David Lang



More information about the rsyslog mailing list