[rsyslog] output plugin calling interface

david at lang.hm david at lang.hm
Wed May 6 15:26:34 CEST 2009


On Wed, 6 May 2009, Rainer Gerhards wrote:

>> -----Original Message-----
>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
>>
>> On Wed, 6 May 2009, Rainer Gerhards wrote:
>>
>>>
>>> I have put on interim version of my model online:
>>>
>>> http://www.rsyslog.com/download/design.pdf
>>>
>>> Note that it is far from being ready, and some things are not as clear
> (or
>>> even correct) as I would like to see them. Also, it currently contains
>>> definitions for the various objects, but not any processing information.
> My
>>> aim is to get a very clear (and dense!) picture of the objects and the
>>> relationships between them from where I am convinced we can find some
> simple
>>> processing functions (which may or may not be easily enough translated in
>>> code).
>>>
>>> Feedback is appreciate, but please keep in mind that the document is a
>> moving
>>> target. I'll be working on it full day today so I hope to have a better
>>> version this afternoon.
>>
>> thanks for sending this.
>
> I have been updating it throughout the day, so I right now another version
> than you downloaded may be available.
>>
>> unfortuntntly one question right away (and one that doesn't translate into
>> text well :-(
>>
>> at the end of the first paragraph you say (cut-n-paste into text, symbols
>> mangled)
>>
>> 'Often, objects Oi; i 2 N; i  partition O,
>> but this is not necessarily the case.'
>>
>> I'm not sure what is meant by this
>>
>> I read that as
>> often
>> the set of objects identified by i, with i being a subset of N such that i
> <=
>> |gothic O|
>> partition gothic O, but this is not nessasarily the case
>> I am lost from the point N is introduced
>
> It just describes i as an index. So, we have these "all-sets" gothic-o of all
> instances that exist in a running system. When you build sets O from these
> instances, they may or may not partition the all-sets. As I do not know how
> many of these O-sets exist, I use O_i to index them. So let's assume we have
> 5 instances e.g. of a message inside rsyslog. Then we have 5 or less of the
> O-sets inside the system. With I, that samply means we have O_1, O_2, ...,
> O_5 sets. Often messages may be in just one of these sets at a given time
> (partitions them). Now replace "message" with any of the other objects. That
> means that while often the all-set is partitioned in sets, this must not
> necessarily be the case (one object may be in two O-sets at the same time).

Ok, makes sense

>> in section 2 you say
>>
>> be Sm={} the totally ordered set of message states
>>
>>   I think a better thing to say would be either
>>
>> define Sm={} as the totally ordered set of messge states
>> or
>> let Sm={} be the totally orderd set of message states
>>
>>
>> it may just be that I have been out of school for a long time and am
>> forgetting defintitions, but it seems to me that you ae skipping the
>> defintitions of a bunch of stuff in section 2. I think I'm puzzling it
>> out, but eventually it's something to clean up.
>
> That's why I said it is preliminary ;)
>
> In fact, the current version has mangled much with the state sets, but not
> yet in a consistent way. I am not even sure if the structure of section 2
> will survive or be moved to some other parts. While this is tempting, it is
> nice to have everything very condendsed close together, after all, that was
> the initial motivation.
>
> The sentences you quote above are also changed now. Besides that, I noticed
> that I a missing a couple of German <-> English translations, which add some
> extra problems ;)
>
>> you use the term 'tuple' a lot in this section, and i'm not sure it's
>> always correct. to me a tuple is a set of two items,
>
> that's an ordered pair...
>
>> like coordinates
>> (X,Y), but you use it in at least one case where you can have an arbatrary
>> number of items in the list. I think that in many cases where you say
>> tuple you really would be better off saying list.
>
> ... that's mostly the same term, but I have to admit that in the context I am
> using it, at least to me tuple sounds more intuitive. In code it isn't. At
> least Wikipedia seems to take the same position:
>
> http://en.wikipedia.org/wiki/Tuple

now that I think about it, I've seen people use n-tuple to refer to 
arbatrarily wide sets.

it just struck me as odd

>> in section 3.11 you say
>> "However, the action itself may also end the transaction and notify the
>> caller." by this do you mean that the action may abort the transaction? or
>> that the action could decide to commit (complete) the transaction?
>
> The later case.
>
>>
>> if you mean abort the transaction, this makes sense (essentially on any
>> doAction() call the return code could be 'fatal error, transaction
>> aborted' and the queue walker code would have to fail the entire batch and
>> retry) if you mean allow it to decide to commit the transaction early if
>> it chooses, this strikes me as a wrong thing to do.
>
> You need to think broader than databases. For example, the tcp forwarder
> NEEDS to commit every record, simply because it has no other chance in doing
> things. Well, it may commit only after a given buffer size, but it definitely
> can not (or should not) wait until the caller is finished. Even if it waited
> until endTransaction is called(), it could only then move data to the actual
> output, where it than - right in the middle - may see problems. It is far
> better if it can commit in between.

I'm not sure I agree with you on this. if you have lots of small messages, 
you do have advantages to sending them all to the stack at once. you don't 
_have_ to do so (tcp will combine the messages as it's waiting to send 
them out, but you could waste bandwith with small packets if the data 
comes in slowly enough)

>> in the case of rsyslog
>> (where we are commiting a set of unrelated messages) it is not nessasarily
>> a fatal problem, but it just seems to complicate things with little
>> benifit.
>
> It actually simplifies things - because we need not take different approaches
> to different type of output plugins.

I'm seeing it as the other way around, this complicates things by making 
there be different ways for the transaction to be committed.

>> this says that either the entire transaction is submitted, or it all
>> fails, I think that the optimization of allowing the endTransaction() call
>> to return 'the first X suceeded, the rest failed' may be worth supporting.
>> it's FAR simpler than the 'doAction() may trigger an endTransaction()
>> transparently' that you were exploring earlier.
>
> I may be thinking wrong, but this sounds much more complex to me (again, do
> not think databases at this time). Just think about the buffering needs.

I'm not seeing how this increases the buffering needed

>> if you really do want doAction() to be able to finish a transaction and
>> start another one, the state diagram will be far more complicated.
>>
>> one final note on locking, I expect that the process of processing objects
>> in the queue (marking them as pending, formatting them, and calling
>> doAction() on them), is going to require some locking in the face of
>> multiple worker threads (to prevent two threads from processing the same
>> message). I see two ways of doing this.
>>
>> processbatch(
>>    foreach message (up to limit or number in queue){
>>      lock queue
>>      mark message pending
>>      unlock queue
>>      formatMessage()
>>      doAction()
>>    }
>>    endTransaction()
>>    lock queue
>>    foreach message{
>>      mark completed
>>    }
>>    unlock queue
>> )
>>
>> processbatch(
>>    lock queue
>>    foreach message (up to limit or number in queue){
>>      mark message pending
>>      formatMessage()
>>      doAction()
>>    }
>>    unlock queue
>>    endTransaction()
>>    lock queue
>>    foreach message{
>>      mark completed
>>    }
>>    unlock queue
>> )
>
> doAction cannot be called within the queue worker. A simple reason is that
> this does not support direct mode. Also, it would take far too long to
> complete. If we have infinite retries, it may sit for a day or more inside
> this call ;) [not precisely in that call, but in a loop surrounding it].

Ok, I am missing things again. I had understood that doAction() _was_ 
called by the queue worker, that was how it passed the item to the output 
module code, and it would block if the doAction() call stalled (if you 
need to avoid it blocking, define an action queue)

> If we need to close the tiny potential message loss window (and you have not
> yet convinced me there is reason to do so - looking forward to your
> disk-queue results), we must still do a "dequeue pending", then process "as
> usual" and go over the batch again to actually remove messages from the
> queue.

yep, still pending. I had an emergancy come up, that among other things 
had me in the office from 4pm saturday to 2pm sunday :-(

>> I suspect that the overhead in manipulating the lock is high enough that
>> the second approach will be a win (similar to the efficiancies that were
>> gained in the UDP input module by letting it add multiple messages to the
>> queue inside one lock).
>>
>> As such I am seeing significant value in making the doAction() call be
>> lightweight under all conditions, which is an argument against having it
>> do any more than nessasary.
>
> We do not know what this may be. Again, don't think "database only".

I am not. even if the action is writing to disk (with fsync), to a pipe, 
or calling an external program it can take a significant amount of time. 
you say above that doAction could block for hours, so I am confused a bit 
here. (the new version of the document may clear this up)

>> I think that the beginTransaction()
>> functionality is probably fast enough that adding that into doAction() is
>> acceptable, but the endTransaction() functionality definantly has the
>> potential to block for a substantial amount of time (and therefor must not
>> happen while locks are held that could cause other threads to be waiting)
>>
>> except for your isolation of the output modules (which I have questioned
>> elsewhere), it could even be a win to move the formatMessage() step out of
>> the inner loop and have that called from the output module (doAction()
>> would just pass the pointer to the queue object, the output module would
>> remember them and do the formatting inside endTransaction()), this would
>> still be thread-safe as the output module would only be reading the
>> message queue.
>>
>> hmm, the more I think about this, the cleaner it seems to be.
>
> The problem is that you are still looking from a different design ;) Why
> should the output do this if it can be provided with the already fabricated
> information? At this stage in processing, no looking at all is involved
> (except for the action lock, which is required because the interface
> guarantees that actions are not called concurrently from multiple threads).

if you don't have locking, what stops two worker threads from trying to 
de-queue the same message at the same time?

reasons that I see to not format the message before putting it in the 
action queue.

1. performance bottleneck, if the queue walker for the main queue needs to 
format the message, it can't be moving messages, testing filters. for the 
default setup where you just use the entire string this isn't that big a 
deal, but if you have a more complicated format (escaping characters, 
doing substrings, etc) it can take substantially more time, some of that 
time is spent just parsing the format string to figure out what you need 
to do.

2. locking efficiancy this could be part of #1, if you are doing a lot 
then you want to drop and later re-aquire any locks you have so that other 
threads can get at the data, if you are doing very little then you can 
avoid doing.

3. single-instance-store, if you don't change the message between the main 
queue and the action queue you have the ability to just have one copy of 
the message contents (which you dynamicly allocate space for), instead of 
one copy for each queue. since memory is significantly slower than the 
CPU, avoiding the need to copy the data can speed things up as well

>> It would also delay the need to do any buffer allocation until the
>> endTransaction() step. this doesn't nessasarily eliminate the peak memory
>> usage (all threads could be in endTransaction at the same time), but it
>> will significantly reduce the average memory useage (normally they _won't_
>> all be in endTransaction() with maximum size messages at the same time)
>>
>> there would be a fixed size array (based on the max batch size), to
>> track what messages are in the batch. I think this is already needed for
>> the worker thread to know which messages to mark as completed when the
>> transaction completes
>>
>> doAction() would just put a pointer (logical pointer, not C memory
>> pointer) to the message contents into the next slot in the array.
>>
>> there would be no need for a startTransaction.
>>
>> there would be a helper function, something like
>> formatMessage(char *format, msg *message, char *output, int bufsize)
>> that would format the message and write it to the output buffer, output
>> would then point at the null at the end of the string
>>
>> endTransaction() would do all the work. it would allocate a buffer
>> needed (note that it would know the actual message sizes, so could
>> allocate based on the actual amount of data involved), do the work
>> that was planned for startTransaction() (probably putting
>> boilerplate in the buffer), call formatMessage() for each message in the
>> list, output the messages, free the buffers, and return
>
> That complicates the programming of all non-db type of output plugins
> considerably. I don't think this is justified.

I am not seeing what's db specific here, but I'll think about it more

>> note that this assumes that when you have action queues, the action queues
>> contain the same data that was in the main queue, and the formatting is
>> done by the queue walker thread that called doAction(), not by the thread
>> that walks the main queue and puts the messages in the action queue.
>
> That's correct.

this seems to contridict what you said above.

as I understand it (this is part of the diagram I didn't get created 
yet )-:

rsyslog basicly breaks down into the following pieces from a 
process/thread point of view


master thread (housekeeping, handling signals, etc)

N input threads
   recieve messages
   parse each message into msg structure
   put the parsed msg structure in the main queue

N main queue worker threads
   process messages in the main queue
   foreach message
     foreach filters
       if filter applies
         perform action (which is usually)
           format the message
           put formatted message into the action queue
             if action queue mode is DIRECT call doAction()

N action queue worker threads
   process messages in the action queue
   foreach message call doAction()



More information about the rsyslog mailing list