[rsyslog] output plugin calling interface

Rainer Gerhards rgerhards at hq.adiscon.com
Wed May 6 14:36:47 CEST 2009


> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Wednesday, May 06, 2009 1:58 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface
> 
> On Wed, 6 May 2009, Rainer Gerhards wrote:
> 
> >
> > I have put on interim version of my model online:
> >
> > http://www.rsyslog.com/download/design.pdf
> >
> > Note that it is far from being ready, and some things are not as clear
(or
> > even correct) as I would like to see them. Also, it currently contains
> > definitions for the various objects, but not any processing information.
My
> > aim is to get a very clear (and dense!) picture of the objects and the
> > relationships between them from where I am convinced we can find some
simple
> > processing functions (which may or may not be easily enough translated in
> > code).
> >
> > Feedback is appreciate, but please keep in mind that the document is a
> moving
> > target. I'll be working on it full day today so I hope to have a better
> > version this afternoon.
> 
> thanks for sending this.

I have been updating it throughout the day, so I right now another version
than you downloaded may be available. 
> 
> unfortuntntly one question right away (and one that doesn't translate into
> text well :-(
> 
> at the end of the first paragraph you say (cut-n-paste into text, symbols
> mangled)
> 
> 'Often, objects Oi; i 2 N; i  partition O,
> but this is not necessarily the case.'
> 
> I'm not sure what is meant by this
> 
> I read that as
> often
> the set of objects identified by i, with i being a subset of N such that i
<=
> |gothic O|
> partition gothic O, but this is not nessasarily the case
> I am lost from the point N is introduced

It just describes i as an index. So, we have these "all-sets" gothic-o of all
instances that exist in a running system. When you build sets O from these
instances, they may or may not partition the all-sets. As I do not know how
many of these O-sets exist, I use O_i to index them. So let's assume we have
5 instances e.g. of a message inside rsyslog. Then we have 5 or less of the
O-sets inside the system. With I, that samply means we have O_1, O_2, ...,
O_5 sets. Often messages may be in just one of these sets at a given time
(partitions them). Now replace "message" with any of the other objects. That
means that while often the all-set is partitioned in sets, this must not
necessarily be the case (one object may be in two O-sets at the same time).

> 
> I think that you are saying that a set of messages to be processed  are
> frequently contiguous, but not always.

Important is that o is not "message", but object. Here, o can be anything, be
it a message, an action, a queue, whatever. It's just a generic container.

> 
> 
> 
> in section 2 you say
> 
> be Sm={} the totally ordered set of message states
> 
>   I think a better thing to say would be either
> 
> define Sm={} as the totally ordered set of messge states
> or
> let Sm={} be the totally orderd set of message states
> 
> 
> it may just be that I have been out of school for a long time and am
> forgetting defintitions, but it seems to me that you ae skipping the
> defintitions of a bunch of stuff in section 2. I think I'm puzzling it
> out, but eventually it's something to clean up.

That's why I said it is preliminary ;) 

In fact, the current version has mangled much with the state sets, but not
yet in a consistent way. I am not even sure if the structure of section 2
will survive or be moved to some other parts. While this is tempting, it is
nice to have everything very condendsed close together, after all, that was
the initial motivation.

The sentences you quote above are also changed now. Besides that, I noticed
that I a missing a couple of German <-> English translations, which add some
extra problems ;)

> you use the term 'tuple' a lot in this section, and i'm not sure it's
> always correct. to me a tuple is a set of two items, 

that's an ordered pair... 

> like coordinates
> (X,Y), but you use it in at least one case where you can have an arbatrary
> number of items in the list. I think that in many cases where you say
> tuple you really would be better off saying list.

... that's mostly the same term, but I have to admit that in the context I am
using it, at least to me tuple sounds more intuitive. In code it isn't. At
least Wikipedia seems to take the same position:

http://en.wikipedia.org/wiki/Tuple

> 
> 
> 
> in section 3.11 you say
> "However, the action itself may also end the transaction and notify the
> caller." by this do you mean that the action may abort the transaction? or
> that the action could decide to commit (complete) the transaction?

The later case.

> 
> if you mean abort the transaction, this makes sense (essentially on any
> doAction() call the return code could be 'fatal error, transaction
> aborted' and the queue walker code would have to fail the entire batch and
> retry) if you mean allow it to decide to commit the transaction early if
> it chooses, this strikes me as a wrong thing to do. 

You need to think broader than databases. For example, the tcp forwarder
NEEDS to commit every record, simply because it has no other chance in doing
things. Well, it may commit only after a given buffer size, but it definitely
can not (or should not) wait until the caller is finished. Even if it waited
until endTransaction is called(), it could only then move data to the actual
output, where it than - right in the middle - may see problems. It is far
better if it can commit in between.

> in the case of rsyslog
> (where we are commiting a set of unrelated messages) it is not nessasarily
> a fatal problem, but it just seems to complicate things with little
> benifit.

It actually simplifies things - because we need not take different approaches
to different type of output plugins.

> 
> also the fact that you use the term 'notify the caller' could be read to
> mean that it executes a callback function from the main system in this
> case, I think it's simpler to just return an error code (which is the
> other way to read this)

That's already in the newest draft [but other things are probably
inconsistent in that].

> 
> a bit further down you clarify this to mean that it may commit early. my
> initial reaction is that this is premature optimization. I see why you
> want to do this (the case of buffers filling up is a good example), but I
> think the complications that this produces (which you then touch on) are
> ugly enough that it may be worth the inefficiancy of failing the
> transaction (and having to resubmit a partial batch) rather than dealing
> with the need to say things like 'the transaction suceeded, but this
> message failed' as part of a doAction() call.
> 
> 
> 
> 
> in defining what happens when a transaction is aborted you say 'However,
> not all outputs work on actually transactional destination.', I think the
> better way to say this (borrowing from the database terminology) is that
> not all outputs provide atomic transactions.
> 
> 
> you say
> 
>    An output transaction is started by calling beginTransaction() either
> ex-plicitely or implicitely by a call to doAction() without calling
> beginTransaction() before
> 
> I think it's a mistake to do both. pick one or the other and standarzie on
> it. I see the value in letting the first doAction() call trigger the
> beginTransaction() in simplifying the calling code, but if the output
> module must handle this case, make that the standard way of doing things
> and eliminate the seperate call of beginTransaction entirely

Good point! Makes a lot of sense.

> 
> 
> 
> this says that either the entire transaction is submitted, or it all
> fails, I think that the optimization of allowing the endTransaction() call
> to return 'the first X suceeded, the rest failed' may be worth supporting.
> it's FAR simpler than the 'doAction() may trigger an endTransaction()
> transparently' that you were exploring earlier.

I may be thinking wrong, but this sounds much more complex to me (again, do
not think databases at this time). Just think about the buffering needs.
> 
> 
> in your transaction diagram, a sucessful retry should move to to sucess,
> not ready (ready is only a state that the action will be in immediatly
> after it starts, before it has processed anything. as such it's fairly
> meaningless and could probably be combined with success as 'all prior work
> is done and ready to start a new transaction'.

I just dumped the whole diagram. It still does not model action state
correctly...

> 
> you may also need a state 'not initialized' for startup, or you may just
> say that the initialization needs to be done before the queues are setup,
> so it's not relavent to this discussion.
> 
> 
> if you really do want doAction() to be able to finish a transaction and
> start another one, the state diagram will be far more complicated.
> 
> one final note on locking, I expect that the process of processing objects
> in the queue (marking them as pending, formatting them, and calling
> doAction() on them), is going to require some locking in the face of
> multiple worker threads (to prevent two threads from processing the same
> message). I see two ways of doing this.
> 
> processbatch(
>    foreach message (up to limit or number in queue){
>      lock queue
>      mark message pending
>      unlock queue
>      formatMessage()
>      doAction()
>    }
>    endTransaction()
>    lock queue
>    foreach message{
>      mark completed
>    }
>    unlock queue
> )
> 
> processbatch(
>    lock queue
>    foreach message (up to limit or number in queue){
>      mark message pending
>      formatMessage()
>      doAction()
>    }
>    unlock queue
>    endTransaction()
>    lock queue
>    foreach message{
>      mark completed
>    }
>    unlock queue
> )

doAction cannot be called within the queue worker. A simple reason is that
this does not support direct mode. Also, it would take far too long to
complete. If we have infinite retries, it may sit for a day or more inside
this call ;) [not precisely in that call, but in a loop surrounding it].

If we need to close the tiny potential message loss window (and you have not
yet convinced me there is reason to do so - looking forward to your
disk-queue results), we must still do a "dequeue pending", then process "as
usual" and go over the batch again to actually remove messages from the
queue.

> 
> I suspect that the overhead in manipulating the lock is high enough that
> the second approach will be a win (similar to the efficiancies that were
> gained in the UDP input module by letting it add multiple messages to the
> queue inside one lock).
> 
> As such I am seeing significant value in making the doAction() call be
> lightweight under all conditions, which is an argument against having it
> do any more than nessasary. 

We do not know what this may be. Again, don't think "database only". 

> I think that the beginTransaction()
> functionality is probably fast enough that adding that into doAction() is
> acceptable, but the endTransaction() functionality definantly has the
> potential to block for a substantial amount of time (and therefor must not
> happen while locks are held that could cause other threads to be waiting)
> 
> except for your isolation of the output modules (which I have questioned
> elsewhere), it could even be a win to move the formatMessage() step out of
> the inner loop and have that called from the output module (doAction()
> would just pass the pointer to the queue object, the output module would
> remember them and do the formatting inside endTransaction()), this would
> still be thread-safe as the output module would only be reading the
> message queue.
> 
> hmm, the more I think about this, the cleaner it seems to be.

The problem is that you are still looking from a different design ;) Why
should the output do this if it can be provided with the already fabricated
information? At this stage in processing, no looking at all is involved
(except for the action lock, which is required because the interface
guarantees that actions are not called concurrently from multiple threads).

> 
> It would also delay the need to do any buffer allocation until the
> endTransaction() step. this doesn't nessasarily eliminate the peak memory
> usage (all threads could be in endTransaction at the same time), but it
> will significantly reduce the average memory useage (normally they _won't_
> all be in endTransaction() with maximum size messages at the same time)
> 
> there would be a fixed size array (based on the max batch size), to
> track what messages are in the batch. I think this is already needed for
> the worker thread to know which messages to mark as completed when the
> transaction completes
> 
> doAction() would just put a pointer (logical pointer, not C memory
> pointer) to the message contents into the next slot in the array.
> 
> there would be no need for a startTransaction.
> 
> there would be a helper function, something like
> formatMessage(char *format, msg *message, char *output, int bufsize)
> that would format the message and write it to the output buffer, output
> would then point at the null at the end of the string
> 
> endTransaction() would do all the work. it would allocate a buffer
> needed (note that it would know the actual message sizes, so could
> allocate based on the actual amount of data involved), do the work
> that was planned for startTransaction() (probably putting
> boilerplate in the buffer), call formatMessage() for each message in the
> list, output the messages, free the buffers, and return

That complicates the programming of all non-db type of output plugins
considerably. I don't think this is justified.
> 
> 
> 
> 
> note that this assumes that when you have action queues, the action queues
> contain the same data that was in the main queue, and the formatting is
> done by the queue walker thread that called doAction(), not by the thread
> that walks the main queue and puts the messages in the action queue. 

That's correct.
> If that isn't the case (if instead, the action queues contain the formatted
> strings, not the raw messages), then this won't work, but we should have
> another discussion around this (including the potential advantages of
> single-instance-store if you have lots of action queues)

Rainer



More information about the rsyslog mailing list