[rsyslog] output plugin calling interface
Tom Metro
tmetro+rsyslog at gmail.com
Mon May 4 07:30:45 CEST 2009
david at lang.hm wrote:
> Tom Metro wrote:
>> Rainer Gerhards wrote:
>>> So even if we put everything into a database, RELP cannot rely on
>>> that information to decide which message already have been received
>>> and which not.
>> I'm confused. On one side a receiver is talking RELP, and via RELP it
>> receives a batch of messages, potentially containing duplicates. On the
>> other side of that receiver is its storage back-end. If the receiver
>> chooses, it ought to be able to query that storage to see if any of the
>> messages are duplicates, and if so, discard them. This doesn't involve
>> RELP. (I described an in-memory cache for efficiency reasons, but the
>> duplicate check could involve querying a database.)
>
> it's not the right thing to just eliminate duplicate message. you may get
> the same message multiple times (with the same timestamp even). the only
> way to know if you have seen _this copy_ of the message before is to have
> a unique identifier for the message.
Your point nay be correct, but I'm not sure it has relevance to the
material you quoted. The context of the above comments included Rainer
saying, "RELP uses sequence numbers." So at least within the scope of a
limited time window, the individual messages can be uniquely distinguished.
> this unique identifier may not be something that's appropriate to store
> (if it wasn't generated by the original sender, you may not want to pass
> it on the the softwar that would be analysing the logs)
Right. So for example, there might not be much sense in persistently
storing a time-limited sequence number. But that didn't seem to be the
point Rainer was making with regards to using a database back-end. A key
comment he made was, "we have no universal predicate 'is stored'." And I
was wondering why such functionality is required in order to avoid
duplicates.
> you may get the same message multiple times (with the same timestamp
> even).
Is that true even with a high-res time stamp? I suppose that's relative
to the resolution of your time stamp and your message throughput.
To insure a hash of a message is unique, you'd probably have to include
a sequence number in the data being hashed, in addition to the time
stamp. Actually, timestamp + sequence number ought to provide a
sufficiently unique ID for any message within a "conversation." The hash
is probably of value only for obtaining something smaller to store or
faster to look up (on the receiving side).
-Tom
More information about the rsyslog
mailing list