[rsyslog] output plugin calling interface

Tom Metro tmetro+rsyslog at gmail.com
Sun May 3 23:24:00 CEST 2009


Rainer Gerhards wrote:
> But there are two things mixed in here: one is the reliable transport, the
> other one is end-to-end reliability. For example, RELP cannot check if "the
> messages are already stored" because we have no universal predicate "is
> stored"...

I assumed it followed a model of conveyed responsibility, just like 
SMTP. Once the receiver has acknowledged receipt, it needs to take full 
responsibility for storage. If the receiver wants to cut corners on the 
reliability of its internals, then it should delay acks until it has 
confirmed successful storage.


> So even if we put everything into a database, RELP cannot rely on
> that information to decide which message already have been received
> and which not.

I'm confused. On one side a receiver is talking RELP, and via RELP it 
receives a batch of messages, potentially containing duplicates. On the 
other side of that receiver is its storage back-end. If the receiver 
chooses, it ought to be able to query that storage to see if any of the 
messages are duplicates, and if so, discard them. This doesn't involve 
RELP. (I described an in-memory cache for efficiency reasons, but the 
duplicate check could involve querying a database.)


> Assuming that we had a "processed messages" state information, on
> connection re-establish, during the handshake process, sender can
> query receiver on the state of potential duplicates and remove them.

I assumed the de-dupe intelligence would be on the receiver side. Sender 
throws messages over the wall at the receiver, and it sorts things out.


> What I would find useful is a unique message ID that is created at the
> original originator and moved forward until whatever final destination. The
> approach here is to enable analysis tools to detect the duplicates.

Sure, that could be a good approach. For the "cost" of a cryptographic 
hash - probably computer right after the timestamp is added to the 
message - you'd push the duplicate filtering problem to the 
post-processing code.

It would be interesting to do a benchmark comparison between the 
up-front hash computation vs. all the overhead of adding a serial 
number, caching seen record IDs, and dedupe logic.

  -Tom



More information about the rsyslog mailing list