[rsyslog] Development of failsafe disk based queue

david at lang.hm david at lang.hm
Wed Oct 1 15:40:30 CEST 2008


On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> On Wed, 2008-10-01 at 06:24 -0700, david at lang.hm wrote:
>
>> one possible thing is tht if the write has not completed then the system
>> sending you the logs has not received confirmation that you have the log
>> yet, so they are the ones responsible for it.
>
> I concur. But as I understood the scenario here, the log messages are
> emited from a process *inside* the failing machine. So that process
> fails, too, and we do not have any interim that has a copy of the data.
> So if it is not persisted to disk, it is lost. Anyhow, this requirement
> has been relaxed in later posting.
>
> Also note that I was just thinking about the physical layer, considering
> a single physical write - far away from rsyslog.
>
>> it's only after you acknowledge the message (via relp or equivalent) that
>> you are required to not loose the log message.
>
> I concur and this is how RELP handles this. Well, actually I think there
> currently is a very slight (but still existing) window of exposure. I
> think RELP acks when the message is submited to the queue engine. That
> does not necessarily mean the message is already present on disk.

right now it doesn't make a difference (since the queue doesn't sync the 
data), but if/when this is added checking that the call to queue the data 
doesn't return until after that point should be done.

> Also,
> I think some mild duplication of messages may happen with RELP in a
> power fail scenario. It is not doing a two-phase commit, thought it
> tries very hard to get a perfect understanding of what is written and
> what not. I could check this with spec/code, but I think this is not
> justified at this point in time ;)

no matter what checks you do, it's always possible for things to fail 
after succeeding on doing the work and before the acknowlegement gets back 
to the sender (with relp, if the receiving machine sends the 
acknowledgement, but the sending machine crashes before fully processing 
it, the receiving machine has no way of detecting this)

on the other hand, this is fairly easy to deal with by making messages 
include a sequence number to guarentee that they are all unique, and then 
have something filter out duplicates later.

David Lang



More information about the rsyslog mailing list