[rsyslog] output plugin calling interface
Rainer Gerhards
rgerhards at hq.adiscon.com
Sun May 3 10:38:37 CEST 2009
Thanks for the good discussion, it is very inspiring, please keep the
thoughts flowing. Answers inline below...
> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Sunday, May 03, 2009 7:19 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface
>
> On Sun, 3 May 2009, Tom Metro wrote:
>
> > david at lang.hm wrote:
> >> Rainer Gerhards wrote:
> >>> After a lot of thinking today, we can have a "kind of" transactional
> queue,
> >>> but we need to accept potential message *duplication* in the event of
> >>> failures (but no loss).
> >>
> >> this is the approach that you have taken for other things (relp for
> >> example), and when we were discussing reliability for direct mode vs
disk
> >> queues you mentioned that rsyslog could duplicate messages in case of
> >> failures, but would not loose messages.
> >
> > I also noticed this side effect mentioned in the RELP documentation and
> > wondered why message duplication couldn't be avoided by something as
> > simple as assigning a serial number to each log record. A 32-bit
> > monotonically increasing counter that rolls over periodically.
> >
> > The receiving side would cache the serial numbers for the last N records
> > (something that could be done quite memory efficiently if the records
> > show up mostly in order) and discard records it had seen.
> >
> > A hash might work well too, providing you're using high-res time stamps
> > so you don't get false positive duplications.
> >
> > With a strictly in-memory cache of seen records, you could still get
> > duplication after the receiver gets restarted, but at least you'd have
> > greatly narrowed the potential. And the receiver could always pre-seed
> > its cache from the last N stored records on startup.
RELP is potentially able to do that and most in the way Tom has described. It
cannot do it today, because I had no time to implement it and there are
things with much higher priority than that. RELP uses sequence numbers and I
think they are indeed mod 2^32.
>
> note that this would have to be a per-sender list of records, what if you
> are getting messages from lots of systems?
Not even pre-sender but per-conversation. A single sender can open up
multiple conversations with a single receiver, by just specifying more than
one connection. There are ample use-cases for this. For example, one
conversation could carry emergency and another one bulk messages.
> since you can have two or more threads sending you messages you can't
> assume that you will get them in order (if you could assume this you could
> just store the last sucessful message processed)
Inside a RELP connection, they are in sequence, but we have a sliding window.
But there are two things mixed in here: one is the reliable transport, the
other one is end-to-end reliability. For example, RELP cannot check if "the
messages are already stored" because we have no universal predicate "is
stored" (what should that mean?). All RELP can know is that it submitted
things to the queue. So even if we put everything into a database, RELP
cannot rely on that information to decide which message already have been
received and which not. So RELP needs to keep its own state information.
That's not awfully hard, because we have just a sliding window, which also
acts as a "window of uncertainty". Assuming that we had a "processed
messages" state information, on connection re-establish, during the handshake
process, sender can query receiver on the state of potential duplicates and
remove them. This "just" is not yet done. Also note that this requires that
state is properly recorded under all circumstances, an issue where we run
into many subtle things to look at.
> I agree that this is the basic approach that would need to be taken, but
> before we worry about filtering out duplicates for cases like this, we
> need to make sure we aren't loosing any messages ;-)
What I would find useful is a unique message ID that is created at the
original originator and moved forward until whatever final destination. The
approach here is to enable analysis tools to detect the duplicates. An uuid
would probably make up a good identifier. But this also requires standards
work, otherwise it would be a rsyslog-only thing, which here makes especially
little sense as the whole point is that external tools (log analyzers) would
need to understand it.
Rainer
More information about the rsyslog
mailing list