[rsyslog] output plugin calling interface

Rainer Gerhards rgerhards at hq.adiscon.com
Wed May 6 17:23:30 CEST 2009


> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Wednesday, May 06, 2009 3:27 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface


> >> in section 3.11 you say
> >> "However, the action itself may also end the transaction and notify the
> >> caller." by this do you mean that the action may abort the transaction?
or
> >> that the action could decide to commit (complete) the transaction?
> >
> > The later case.
> >
> >>
> >> if you mean abort the transaction, this makes sense (essentially on any
> >> doAction() call the return code could be 'fatal error, transaction
> >> aborted' and the queue walker code would have to fail the entire batch
and
> >> retry) if you mean allow it to decide to commit the transaction early if
> >> it chooses, this strikes me as a wrong thing to do.
> >
> > You need to think broader than databases. For example, the tcp forwarder
> > NEEDS to commit every record, simply because it has no other chance in
doing
> > things. Well, it may commit only after a given buffer size, but it
> definitely
> > can not (or should not) wait until the caller is finished. Even if it
waited
> > until endTransaction is called(), it could only then move data to the
actual
> > output, where it than - right in the middle - may see problems. It is far
> > better if it can commit in between.
> 
> I'm not sure I agree with you on this. if you have lots of small messages,
> you do have advantages to sending them all to the stack at once. you don't
> _have_ to do so (tcp will combine the messages as it's waiting to send
> them out, but you could waste bandwith with small packets if the data
> comes in slowly enough)

I agree with you on the performance. I disagree that this means the output
transaction and the transaction from the upper layer must exactly match.
Think RELP. Would that mean that a relp window must always be as large as the
largest batch? OK, in this case I control the protocol and so could change
it. But what with a SNMP trap? Or a rfc3195 conversation? They *have*
different notation of a transaction. So the best thing I can do IMHO is
permit the output plugin to tell the engine when its transaction was
finished. That's no problem for the upper layer, it just needs to mark these
messages as committed. But it is simply impossible for all outputs to have
the same idea of transaction than the upper layer may have.

> 
> >> in the case of rsyslog
> >> (where we are commiting a set of unrelated messages) it is not
nessasarily
> >> a fatal problem, but it just seems to complicate things with little
> >> benifit.
> >
> > It actually simplifies things - because we need not take different
> approaches
> > to different type of output plugins.
> 
> I'm seeing it as the other way around, this complicates things by making
> there be different ways for the transaction to be committed.

OK, but what is complicated by that? Also think about third-party (already
existing) output modules, which I need to define either a totally different
output interface for or use the the extensible one I described. Even if I
force a totally new interface, I still need to support non-transactional
outputs in the upper layers. But then I need different code pathes to do
that.

> 
> >> this says that either the entire transaction is submitted, or it all
> >> fails, I think that the optimization of allowing the endTransaction()
call
> >> to return 'the first X suceeded, the rest failed' may be worth
supporting.
> >> it's FAR simpler than the 'doAction() may trigger an endTransaction()
> >> transparently' that you were exploring earlier.
> >
> > I may be thinking wrong, but this sounds much more complex to me (again,
do
> > not think databases at this time). Just think about the buffering needs.
> 
> I'm not seeing how this increases the buffering needed

That was related to the need to buffer to-be-processed messages until we
finally get an endTranscation(). 
> 
> >> if you really do want doAction() to be able to finish a transaction and
> >> start another one, the state diagram will be far more complicated.
> >>
> >> one final note on locking, I expect that the process of processing
objects
> >> in the queue (marking them as pending, formatting them, and calling
> >> doAction() on them), is going to require some locking in the face of
> >> multiple worker threads (to prevent two threads from processing the same
> >> message). I see two ways of doing this.
> >>
> >> processbatch(
> >>    foreach message (up to limit or number in queue){
> >>      lock queue
> >>      mark message pending
> >>      unlock queue
> >>      formatMessage()
> >>      doAction()
> >>    }
> >>    endTransaction()
> >>    lock queue
> >>    foreach message{
> >>      mark completed
> >>    }
> >>    unlock queue
> >> )
> >>
> >> processbatch(
> >>    lock queue
> >>    foreach message (up to limit or number in queue){
> >>      mark message pending
> >>      formatMessage()
> >>      doAction()
> >>    }
> >>    unlock queue
> >>    endTransaction()
> >>    lock queue
> >>    foreach message{
> >>      mark completed
> >>    }
> >>    unlock queue
> >> )
> >
> > doAction cannot be called within the queue worker. A simple reason is
that
> > this does not support direct mode. Also, it would take far too long to
> > complete. If we have infinite retries, it may sit for a day or more
inside
> > this call ;) [not precisely in that call, but in a loop surrounding it].
> 
> Ok, I am missing things again. I had understood that doAction() _was_
> called by the queue worker, 

It depends on whether or not there is a queue worker. A direct queue does not
have one, so here it is called by the enqueuer (but you are right, you may
say "queue worker" as an abstraction. Still, the pseudocode above is very far
from how things work. It is much more like this:

queueworker{
   lock queue
   dequeue batch
   unlock queue
   process batch
}

If I assume it is useful to make this ultra-reliable (and I still doubt it
is), that would change to something like this:

queueworker{
   lock queue
   mark previous batch as done
   dequeue batch & mark messages as being processed
   unlock queue
   process batch
}

Please note that the ultra-reliability looks rather simple in that
pseudocode, it is much harder in reality.

> that was how it passed the item to the output
> module code, and it would block if the doAction() call stalled (if you
> need to avoid it blocking, define an action queue)
> 
> > If we need to close the tiny potential message loss window (and you have
not
> > yet convinced me there is reason to do so - looking forward to your
> > disk-queue results), we must still do a "dequeue pending", then process
"as
> > usual" and go over the batch again to actually remove messages from the
> > queue.
> 
> yep, still pending. I had an emergancy come up, that among other things
> had me in the office from 4pm saturday to 2pm sunday :-(

:-(

> 
> >> I suspect that the overhead in manipulating the lock is high enough that
> >> the second approach will be a win (similar to the efficiancies that were
> >> gained in the UDP input module by letting it add multiple messages to
the
> >> queue inside one lock).
> >>
> >> As such I am seeing significant value in making the doAction() call be
> >> lightweight under all conditions, which is an argument against having it
> >> do any more than nessasary.
> >
> > We do not know what this may be. Again, don't think "database only".
> 
> I am not. even if the action is writing to disk (with fsync), to a pipe,
> or calling an external program it can take a significant amount of time.
> you say above that doAction could block for hours, so I am confused a bit
> here. (the new version of the document may clear this up)

My sentence should better be phrased "can not be called where you described
it". See my chart and you'll see that the queue mutex is never looked for an
extended period of time - not even in ultra-reliable mode (part of why it is
so hard to do it).

My understanding is that you work under the assumption that doAction() is
called during dequeueing. As we don't know what doAction() will do - it may
be very lengthy - it is strictly de-coupled from anything that holds the
queue mutex (this also applies to v3 and is the base design). You may take
the word "base design" as a hint that probably the whole engine would need to
be rewritten if that design is to be changed. I do not see any reasons for
such a change ;) 

I guess our discussion circles around the point that I did not yet convey the
full picture on how things work. At least I have the "feeling" that you have
a different architecture on your mind.

> 
> >> I think that the beginTransaction()
> >> functionality is probably fast enough that adding that into doAction()
is
> >> acceptable, but the endTransaction() functionality definantly has the
> >> potential to block for a substantial amount of time (and therefor must
not
> >> happen while locks are held that could cause other threads to be
waiting)
> >>
> >> except for your isolation of the output modules (which I have questioned
> >> elsewhere), it could even be a win to move the formatMessage() step out
of
> >> the inner loop and have that called from the output module (doAction()
> >> would just pass the pointer to the queue object, the output module would
> >> remember them and do the formatting inside endTransaction()), this would
> >> still be thread-safe as the output module would only be reading the
> >> message queue.
> >>
> >> hmm, the more I think about this, the cleaner it seems to be.
> >
> > The problem is that you are still looking from a different design ;) Why
> > should the output do this if it can be provided with the already
fabricated
> > information? At this stage in processing, no looking at all is involved
> > (except for the action lock, which is required because the interface
> > guarantees that actions are not called concurrently from multiple
threads).
> 
> if you don't have locking, what stops two worker threads from trying to
> de-queue the same message at the same time?
> 
> reasons that I see to not format the message before putting it in the
> action queue.
> 
> 1. performance bottleneck, if the queue walker for the main queue needs to
> format the message, it can't be moving messages, testing filters. for the
> default setup where you just use the entire string this isn't that big a
> deal, but if you have a more complicated format (escaping characters,
> doing substrings, etc) it can take substantially more time, some of that
> time is spent just parsing the format string to figure out what you need
> to do.
> 
> 2. locking efficiancy this could be part of #1, if you are doing a lot
> then you want to drop and later re-aquire any locks you have so that other
> threads can get at the data, if you are doing very little then you can
> avoid doing.
> 
> 3. single-instance-store, if you don't change the message between the main
> queue and the action queue you have the ability to just have one copy of
> the message contents (which you dynamicly allocate space for), instead of
> one copy for each queue. since memory is significantly slower than the
> CPU, avoiding the need to copy the data can speed things up as well
> 
> >> It would also delay the need to do any buffer allocation until the
> >> endTransaction() step. this doesn't nessasarily eliminate the peak
memory
> >> usage (all threads could be in endTransaction at the same time), but it
> >> will significantly reduce the average memory useage (normally they
_won't_
> >> all be in endTransaction() with maximum size messages at the same time)
> >>
> >> there would be a fixed size array (based on the max batch size), to
> >> track what messages are in the batch. I think this is already needed for
> >> the worker thread to know which messages to mark as completed when the
> >> transaction completes
> >>
> >> doAction() would just put a pointer (logical pointer, not C memory
> >> pointer) to the message contents into the next slot in the array.
> >>
> >> there would be no need for a startTransaction.
> >>
> >> there would be a helper function, something like
> >> formatMessage(char *format, msg *message, char *output, int bufsize)
> >> that would format the message and write it to the output buffer, output
> >> would then point at the null at the end of the string
> >>
> >> endTransaction() would do all the work. it would allocate a buffer
> >> needed (note that it would know the actual message sizes, so could
> >> allocate based on the actual amount of data involved), do the work
> >> that was planned for startTransaction() (probably putting
> >> boilerplate in the buffer), call formatMessage() for each message in the
> >> list, output the messages, free the buffers, and return
> >
> > That complicates the programming of all non-db type of output plugins
> > considerably. I don't think this is justified.
> 
> I am not seeing what's db specific here, but I'll think about it more

I hope I have addressed this with my explenations further above. If not, we
need to dig deeper into generic outputs.

Rainer



More information about the rsyslog mailing list