[rsyslog] output plugin calling interface
Rainer Gerhards
rgerhards at hq.adiscon.com
Sat May 2 10:03:32 CEST 2009
> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Saturday, May 02, 2009 12:00 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface
>
> On Fri, 1 May 2009, Rainer Gerhards wrote:
>
> >> -----Original Message-----
> >> From: rsyslog-bounces at lists.adiscon.com
> >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> >>
> >> On Fri, 1 May 2009, Rainer Gerhards wrote:
> >>
> >>> Please let me know if you also find a math model useful
> >> (but I'll probably
> >>> need to do it in any case, because it helps me clean up my mind...).
> >>
> >> I think it will help clarify things a lot. with a good model
> >> we won't have
> >> misunderstandings about what we are talking about.
> >
> > Yes - and I also think that with the model some complexities disappear. I
> > think (hope I am right) the solution will become obvious. I know I am
> > investing a lot of time in a tiny portion of the code, but this is one of
> the
> > core elements involving many complexities.
> >
> >> with my 'binary search' approach, handling permanently bad
> >> messages could
> >> be as simple as 'too many retries once we hit a batch size of
> >> 1' (with a
> >> possible option of the output module reporting back that it dectected
> >> something that makes retries useless, but this is just an
> >> optimization)
> >
> > Yes, indeed. One quick thought: I see a batch as a set of (msg, state)
> > ordered pairs. Once we have procssed it in one action (all of them have
> > entered one permanent state), we can than build a subset that we use as
the
> > new (remaining) batch in the backup actions. So the "bad record search"
is
> > "just" one facet of many that we need to handle with little and hopefully
> > simple code (doing it with 2000 LoC would be rather easy ;)).
>
> I agree with the definition of a batch. Let's see what different states
> you are thinking of.
>
> I am currently assuming that the messages stay in the queue (with the
> state attached) so that if rsyslog restarts (assuming disk queues), it
> will realize that the message hasn't been delivered and try again.
No, it is different: the batch is actually dequeued. So if at that point we
have a system power failure (for whatever reason), the messages are lost.
While the rsyslog engine intends to be very reliable, it is not a complete
transactional system. A slight risk remains. For this, you need to understand
what happens when the batch is processed. I assume that we have no sudden,
untrappable process termination. Then, if a batch cannot be processed, it is
returned back to the top of queue. This is not yet implemented, but is how
single messages (which you can think of an abstraction of a batch in the
current code) are handled. If, for example, the engine shuts down, but an
action takes longer than the configured shutdown timeout, the action is
cancelled and the queue engine reclaims the unprocessed messages. They go
into a special area inside the .qi file and are placed on top of the queue
once the engine restarts.
The only case where this not work is sudden process termination. I see two
cases:
a) a fatal software bug
We cannot really address this. Even if the messages were remaining in the
queue until finally processed, a software bug (maybe an invalid pointer) may
affect the queue structures at large, possibly even at the risk of total loss
of all data inside that queue. So this is an inevitable risk.
b) sudden power fail
... which can and should be mitigated at another level
One may argue that there also is
c) admin error
e.g, kill -9 rsyslogd
Here a fully transactional queue will probably help.
However, I do not think that the risk involved justifies a far more complex
fully transactional implementation of the queue object. Some risk always
remains (what in the disaster case, even with a fully transactional queue?).
And it is so complex to let the messages stay in queue because it is complex
to work with such messages and disk queues. It would also cost a lot of
performance, especially when done reliably (need to sync). We would then need
to touch each element at least four times, twice as much as currently. Also,
the hybrid disk/memory queues become very, very complex. There are more
complexities around this, I just wanted to tell the most obvious.
So, all in all, the idea is that messages are dequeued, processed and put
back to the queue (think: ungetc()) when something goes wrong. Reasonable
(but not more) effort is made to prevent message loss while the messages are
in unprocessed state outside of the queue.
Hope that clarifies and I am glad you brought this up. Made me think again,
but I concluded to what I've written above ;)
Rainer
More information about the rsyslog
mailing list