[rsyslog] output plugin calling interface

Rainer Gerhards rgerhards at hq.adiscon.com
Mon May 4 08:11:49 CEST 2009


One more ;)

> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of Tom Metro
> Sent: Sunday, May 03, 2009 11:24 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] output plugin calling interface
> 
> Rainer Gerhards wrote:
> > But there are two things mixed in here: one is the reliable transport,
the
> > other one is end-to-end reliability. For example, RELP cannot check if
"the
> > messages are already stored" because we have no universal predicate "is
> > stored"...
> 
> I assumed it followed a model of conveyed responsibility, just like
> SMTP. Once the receiver has acknowledged receipt, it needs to take full
> responsibility for storage. If the receiver wants to cut corners on the
> reliability of its internals, then it should delay acks until it has
> confirmed successful storage.

It does, being reliable as described in the other posts.
> 
> 
> > So even if we put everything into a database, RELP cannot rely on
> > that information to decide which message already have been received
> > and which not.
> 
> I'm confused. On one side a receiver is talking RELP, and via RELP it
> receives a batch of messages, potentially containing duplicates. On the
> other side of that receiver is its storage back-end. 

No, that's the key point. There is no storage-back end with RELP yet. You
think about a "foreign" storage backend, which (in case of a relay) may not
even exists - see other posting.

> If the receiver
> chooses, it ought to be able to query that storage to see if any of the
> messages are duplicates, and if so, discard them. 

No - and if done on the output layer, it would put a lot of burden onto the
outputs. Definitely wrong place to do it.

> This doesn't involve
> RELP. (I described an in-memory cache for efficiency reasons, but the
> duplicate check could involve querying a database.)
> 
> 
> > Assuming that we had a "processed messages" state information, on
> > connection re-establish, during the handshake process, sender can
> > query receiver on the state of potential duplicates and remove them.
> 
> I assumed the de-dupe intelligence would be on the receiver side. Sender
> throws messages over the wall at the receiver, and it sorts things out.

That requires more bandwidth than necessary. Why do it if exchange of
sequence numbers is sufficient?

> 
> 
> > What I would find useful is a unique message ID that is created at the
> > original originator and moved forward until whatever final destination.
The
> > approach here is to enable analysis tools to detect the duplicates.
> 
> Sure, that could be a good approach. For the "cost" of a cryptographic
> hash - probably computer right after the timestamp is added to the
> message - you'd push the duplicate filtering problem to the
> post-processing code.
> 
> It would be interesting to do a benchmark comparison between the
> up-front hash computation vs. all the overhead of adding a serial
> number, caching seen record IDs, and dedupe logic.

RELP already does all of this, it just does not persist any state information
(plus some other things I don't know out of my head) There are a number of
subtle issue - wich I simply can not explain right now - that such sequence
number is required. If you look at how other protocols are implemented,
you'll see that this is at least the mainstream approach (and I think I am
not overdoing if I state it probably is the only one that works reliable
without violating abstraction layers).

Rainer

> 
>   -Tom
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com



More information about the rsyslog mailing list