[rsyslog] output plugin calling interface

david at lang.hm david at lang.hm
Mon May 4 04:32:28 CEST 2009


On Sun, 3 May 2009, Tom Metro wrote:

> Rainer Gerhards wrote:
>> But there are two things mixed in here: one is the reliable transport, the
>> other one is end-to-end reliability. For example, RELP cannot check if "the
>> messages are already stored" because we have no universal predicate "is
>> stored"...
>
> I assumed it followed a model of conveyed responsibility, just like
> SMTP. Once the receiver has acknowledged receipt, it needs to take full
> responsibility for storage. If the receiver wants to cut corners on the
> reliability of its internals, then it should delay acks until it has
> confirmed successful storage.
>
>
>> So even if we put everything into a database, RELP cannot rely on
>> that information to decide which message already have been received
>> and which not.
>
> I'm confused. On one side a receiver is talking RELP, and via RELP it
> receives a batch of messages, potentially containing duplicates. On the
> other side of that receiver is its storage back-end. If the receiver
> chooses, it ought to be able to query that storage to see if any of the
> messages are duplicates, and if so, discard them. This doesn't involve
> RELP. (I described an in-memory cache for efficiency reasons, but the
> duplicate check could involve querying a database.)

it's not the right thing to just eliminate duplicate message. you may get 
the same message multiple times (with the same timestamp even). the only 
way to know if you have seen _this copy_ of the message before is to have 
a unique identifier for the message.

this unique identifier may not be something that's appropriate to store 
(if it wasn't generated by the original sender, you may not want to pass 
it on the the softwar that would be analysing the logs)

David Lang

>
>> Assuming that we had a "processed messages" state information, on
>> connection re-establish, during the handshake process, sender can
>> query receiver on the state of potential duplicates and remove them.
>
> I assumed the de-dupe intelligence would be on the receiver side. Sender
> throws messages over the wall at the receiver, and it sorts things out.
>
>
>> What I would find useful is a unique message ID that is created at the
>> original originator and moved forward until whatever final destination. The
>> approach here is to enable analysis tools to detect the duplicates.
>
> Sure, that could be a good approach. For the "cost" of a cryptographic
> hash - probably computer right after the timestamp is added to the
> message - you'd push the duplicate filtering problem to the
> post-processing code.
>
> It would be interesting to do a benchmark comparison between the
> up-front hash computation vs. all the overhead of adding a serial
> number, caching seen record IDs, and dedupe logic.
>
>  -Tom
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>



More information about the rsyslog mailing list