[rsyslog] output plugin calling interface
Rainer Gerhards
rgerhards at hq.adiscon.com
Sat May 2 20:28:48 CEST 2009
After a lot of thinking today, we can have a "kind of" transactional queue,
but we need to accept potential message *duplication* in the event of
failures (but no loss). This would work without a two-phase commit. However,
there still is considerable effort to implement it. I wonder if the use case
actually justifies it. Please also consider what I wrote below on the
performance of any ultra-reliable version. And, yes, I know we have fast and
reliable controllers today, but even then the disk path is much, much slower
than any memory based queue. I fail to believe you can build a very
high-performance syslog server on a disk queue, even with the best hardware
money can buy today.
Rainer
> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Saturday, May 02, 2009 10:33 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface
>
> > -----Original Message-----
> > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> > bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> > Sent: Saturday, May 02, 2009 10:21 AM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] output plugin calling interface
> >
> > On Sat, 2 May 2009, Rainer Gerhards wrote:
> >
> > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> > >>
> > >> On Fri, 1 May 2009, Rainer Gerhards wrote:
> > >>
> > >>>> -----Original Message-----
> > >>>> From: rsyslog-bounces at lists.adiscon.com
> > >>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of
david at lang.hm
> > >>>>
> > >>>> On Fri, 1 May 2009, Rainer Gerhards wrote:
> > >>>>
> > >>>>> Please let me know if you also find a math model useful
> > >>>> (but I'll probably
> > >>>>> need to do it in any case, because it helps me clean up my
mind...).
> > >>>>
> > >>>> I think it will help clarify things a lot. with a good model
> > >>>> we won't have
> > >>>> misunderstandings about what we are talking about.
> > >>>
> > >>> Yes - and I also think that with the model some complexities
disappear.
> I
> > >>> think (hope I am right) the solution will become obvious. I know I am
> > >>> investing a lot of time in a tiny portion of the code, but this is
one
> of
> > >> the
> > >>> core elements involving many complexities.
> > >>>
> > >>>> with my 'binary search' approach, handling permanently bad
> > >>>> messages could
> > >>>> be as simple as 'too many retries once we hit a batch size of
> > >>>> 1' (with a
> > >>>> possible option of the output module reporting back that it
dectected
> > >>>> something that makes retries useless, but this is just an
> > >>>> optimization)
> > >>>
> > >>> Yes, indeed. One quick thought: I see a batch as a set of (msg,
state)
> > >>> ordered pairs. Once we have procssed it in one action (all of them
have
> > >>> entered one permanent state), we can than build a subset that we use
as
> > > the
> > >>> new (remaining) batch in the backup actions. So the "bad record
search"
> > > is
> > >>> "just" one facet of many that we need to handle with little and
> hopefully
> > >>> simple code (doing it with 2000 LoC would be rather easy ;)).
> > >>
> > >> I agree with the definition of a batch. Let's see what different
states
> > >> you are thinking of.
> > >>
> > >> I am currently assuming that the messages stay in the queue (with the
> > >> state attached) so that if rsyslog restarts (assuming disk queues), it
> > >> will realize that the message hasn't been delivered and try again.
> > >
> > > No, it is different: the batch is actually dequeued. So if at that
point
> we
> > > have a system power failure (for whatever reason), the messages are
lost.
> > > While the rsyslog engine intends to be very reliable, it is not a
> complete
> > > transactional system. A slight risk remains. For this, you need to
> > understand
> > > what happens when the batch is processed. I assume that we have no
> sudden,
> > > untrappable process termination. Then, if a batch cannot be processed,
it
> is
> > > returned back to the top of queue. This is not yet implemented, but is
> how
> > > single messages (which you can think of an abstraction of a batch in
the
> > > current code) are handled. If, for example, the engine shuts down, but
an
> > > action takes longer than the configured shutdown timeout, the action is
> > > cancelled and the queue engine reclaims the unprocessed messages. They
go
> > > into a special area inside the .qi file and are placed on top of the
> queue
> > > once the engine restarts.
> > >
> > > The only case where this not work is sudden process termination. I see
> two
> > > cases:
> > >
> > > a) a fatal software bug
> > > We cannot really address this. Even if the messages were remaining in
the
> > > queue until finally processed, a software bug (maybe an invalid
pointer)
> may
> > > affect the queue structures at large, possibly even at the risk of
total
> > loss
> > > of all data inside that queue. So this is an inevitable risk.
> > >
> > > b) sudden power fail
> > > ... which can and should be mitigated at another level
> > >
> > > One may argue that there also is
> > >
> > > c) admin error
> > > e.g, kill -9 rsyslogd
> > > Here a fully transactional queue will probably help.
> > >
> > > However, I do not think that the risk involved justifies a far more
> complex
> > > fully transactional implementation of the queue object. Some risk
always
> > > remains (what in the disaster case, even with a fully transactional
> queue?).
> > >
> > > And it is so complex to let the messages stay in queue because it is
> complex
> > > to work with such messages and disk queues. It would also cost a lot of
> > > performance, especially when done reliably (need to sync). We would
then
> > need
> > > to touch each element at least four times, twice as much as currently.
> Also,
> > > the hybrid disk/memory queues become very, very complex. There are more
> > > complexities around this, I just wanted to tell the most obvious.
> > >
> > > So, all in all, the idea is that messages are dequeued, processed and
put
> > > back to the queue (think: ungetc()) when something goes wrong.
Reasonable
> > > (but not more) effort is made to prevent message loss while the
messages
> are
> > > in unprocessed state outside of the queue.
> > >
> > > Hope that clarifies and I am glad you brought this up. Made me think
> again,
> > > but I concluded to what I've written above ;)
> >
> > this is definantly different from the way I thought things worked from
our
> > prior discussions about reliability. from those I understood that rsyslog
> > could be used to make a fully reliable system, if you are willing to take
> > the performance hit to do so.
>
> You can, but than you need to use batch sizes of 1.
>
> > as batch size increases (to gain efficiancy) the number of log messages
> > that can be lost also increase.
> >
> > unfortunantly I have the belief that power outages cannot be avoided
(I've
> > seen cases where millions have been spent on the power systems and still
> > ended up with a datacenter-wide blackout.
>
> Let me think about this, but I think to protect against this problem, you
> really need to have two-phase commit, which I am not sure belongs into a
> syslogd.
>
> > when you get the model of things togeather we will be in a much better
> > position to discuss this.
>
> Well, we'd probably restart discussing reliability requirements. If it
turns
> out that you need 100% reliability, not matter what happens at all, I am
not
> sure if we can implement this without adding considerable database-ish
> processing. "Under all circumstances" reliability is very hard to achive,
> especially if you also would like to have high performance. Think about it:
> to guard against the data center full power loss scenario, you need to have
a
> disk-only queue, being synced to disk for every single en- and dequeue
> operation. This is extremely costly. Does it than really matter if we have
> large batches or not? The system, I think, will be so slow, that you cannot
> use it for any demanding real-life application, so some compromise between
> speed and reliability, I think, must be made in any case.
>
> > it's 1:20am here and I'm ready to collapse.
>
> I hadn't even expected this response at this time ;)
>
> Rainer
> >
> > David Lang
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
More information about the rsyslog
mailing list