[rsyslog] Development of failsafe disk based queue
david at lang.hm
david at lang.hm
Wed Oct 1 15:40:30 CEST 2008
On Wed, 1 Oct 2008, Rainer Gerhards wrote:
> On Wed, 2008-10-01 at 06:24 -0700, david at lang.hm wrote:
>
>> one possible thing is tht if the write has not completed then the system
>> sending you the logs has not received confirmation that you have the log
>> yet, so they are the ones responsible for it.
>
> I concur. But as I understood the scenario here, the log messages are
> emited from a process *inside* the failing machine. So that process
> fails, too, and we do not have any interim that has a copy of the data.
> So if it is not persisted to disk, it is lost. Anyhow, this requirement
> has been relaxed in later posting.
>
> Also note that I was just thinking about the physical layer, considering
> a single physical write - far away from rsyslog.
>
>> it's only after you acknowledge the message (via relp or equivalent) that
>> you are required to not loose the log message.
>
> I concur and this is how RELP handles this. Well, actually I think there
> currently is a very slight (but still existing) window of exposure. I
> think RELP acks when the message is submited to the queue engine. That
> does not necessarily mean the message is already present on disk.
right now it doesn't make a difference (since the queue doesn't sync the
data), but if/when this is added checking that the call to queue the data
doesn't return until after that point should be done.
> Also,
> I think some mild duplication of messages may happen with RELP in a
> power fail scenario. It is not doing a two-phase commit, thought it
> tries very hard to get a perfect understanding of what is written and
> what not. I could check this with spec/code, but I think this is not
> justified at this point in time ;)
no matter what checks you do, it's always possible for things to fail
after succeeding on doing the work and before the acknowlegement gets back
to the sender (with relp, if the receiving machine sends the
acknowledgement, but the sending machine crashes before fully processing
it, the receiving machine has no way of detecting this)
on the other hand, this is fairly easy to deal with by making messages
include a sequence number to guarentee that they are all unique, and then
have something filter out duplicates later.
David Lang
More information about the rsyslog
mailing list