[rsyslog] output plugin calling interface
tmetro+rsyslog at gmail.com
Sun May 3 23:24:00 CEST 2009
Rainer Gerhards wrote:
> But there are two things mixed in here: one is the reliable transport, the
> other one is end-to-end reliability. For example, RELP cannot check if "the
> messages are already stored" because we have no universal predicate "is
I assumed it followed a model of conveyed responsibility, just like
SMTP. Once the receiver has acknowledged receipt, it needs to take full
responsibility for storage. If the receiver wants to cut corners on the
reliability of its internals, then it should delay acks until it has
confirmed successful storage.
> So even if we put everything into a database, RELP cannot rely on
> that information to decide which message already have been received
> and which not.
I'm confused. On one side a receiver is talking RELP, and via RELP it
receives a batch of messages, potentially containing duplicates. On the
other side of that receiver is its storage back-end. If the receiver
chooses, it ought to be able to query that storage to see if any of the
messages are duplicates, and if so, discard them. This doesn't involve
RELP. (I described an in-memory cache for efficiency reasons, but the
duplicate check could involve querying a database.)
> Assuming that we had a "processed messages" state information, on
> connection re-establish, during the handshake process, sender can
> query receiver on the state of potential duplicates and remove them.
I assumed the de-dupe intelligence would be on the receiver side. Sender
throws messages over the wall at the receiver, and it sorts things out.
> What I would find useful is a unique message ID that is created at the
> original originator and moved forward until whatever final destination. The
> approach here is to enable analysis tools to detect the duplicates.
Sure, that could be a good approach. For the "cost" of a cryptographic
hash - probably computer right after the timestamp is added to the
message - you'd push the duplicate filtering problem to the
It would be interesting to do a benchmark comparison between the
up-front hash computation vs. all the overhead of adding a serial
number, caching seen record IDs, and dedupe logic.
More information about the rsyslog