[rsyslog] output plugin calling interface

Rainer Gerhards rgerhards at hq.adiscon.com
Thu May 7 22:16:16 CEST 2009


> > So, I'd appreciate if you could have a look at sections 3.2 
> and 3.3 of
> >
> > http://www.rsyslog.com/download/design.pdf
> 
> overall it looks good.
> 
> one suggestion I would make is that since message based 
> failures cannot be 
> reliably detected, I would consider using the same failure 
> process for all 
> failures, and declare a message as bad if it fails the max 
> retry number of 
> times by itself (once you hit n=1)

But then you either 

A) do not need the batch logic at all (because the action is configured for
infinite retries)

Or

B) you loose many messages if the action is not configured for infinite
retries and you have a longer-duration outage e.g. on a database server.
Let's say it is offline for a couple of hours, then you lose almost
everything in that period

To prevent this, you need two different retry methods.

> otherwise you end up resubmitting the entire batch a number of times 
> before you try to narrow it down to the particular message. since the 
> process of finding the bad message will take a number of 
> retries, and then 
> you will want to retry the suspect message several times (to 
> make sure 
> that it's really a message error, not a action error) this 
> could result in 
> a lot of retries.
> 
> also, the algorithm that you posted has a subtle difference 
> from what I 
> had listed. 

It must, because it has two different levels of retries.

> yours is more straightforward and easier to 
> understand (and 
> requires no global knowledge), I think that mine is more 
> efficiant in the 
> rare failure case. there is a potential (very subtle) race 
> condition in 
> this area that will need attention when we get down to lower level 
> discussion (no matter which algorithm is used)
> 
> at this point I don't see this as critical (not even very 
> important) as we 
> are talking high-level concepts at this point, but I wanted 
> to note this 
> for a future conversation.

I agree on that is is not critical at this point. I also have not even tried
to optimize it. The critical point is the discussion above on the two
different retry modes. It took me a lot of thinking to see the subtle issues,
but trying to do all with just one mode was the root cause of the problems at
least I faced.

I am not sure how you could solve the dilemma above with just a single retry
mode.

> 
> 
> two notes on the reliability section

That's why I not mentioned this section - so far, it is just a copy of a
mailing list post (and all the comments it raised apply to it)

> 
> 1. I think we had figured out that reliability required 
> touching each item 
> 3 times instead of 2 (not 4 times as you note in the text)
> 
> 2. I disagree with you on the idea that power issues should 
> be handled at 
> a different level. I'll try to track down some discussions on 
> sysadmin/security mailing lists about this.

Keep in mind that my key point is that you can not currently protect a busy
system against message loss. The issue is not if a power failure may happen.
I agree it can. I just think that you can not build a busy system without
using at least partial in-memory queuing, which by definition is not save
from power failures. So it doesn't make sense to protect a handful of
messages when we loose much more of them anyways.

> 
> David Lang
> 
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
> 



More information about the rsyslog mailing list