[rsyslog] output plugin calling interface
david at lang.hm
david at lang.hm
Thu Apr 30 21:41:40 CEST 2009
On Thu, 30 Apr 2009, Rainer Gerhards wrote:
> I have just drawn a state diagram. It is not 100% perfect, but I think
> it conveys the idea better than the text. Anyhow, you should read it
> together with the text below, because the text is more precise. Note
> that the diagram looks only at message states, thus
> message-permanent-failure and action-permanent-failure lead to the some
> state in the diagram:
>
> http://www.rsyslog.com/modules/Static_Docs/data/action-call.png
>
> I hope it is useful.
>
> I drew it with graphvic, the dot control file is in multi-dequeue git
> branch and named action-call.dot - in case you'd like to modify ;)
this is definantly useful and I believe accuratly represents what you have
described.
what I was proposing would be slightly different
from commit pending you would have a third possible result
change the endTransaction _SUSPENDED to endTransaction _SUSPENDED, b=1
add another line endTransaction _SUSPENDED, b>1 that changes b=b/2 and
goes back to 'ready for processing'
then on the 'retry succeeds' line change b=Batch_Size
David Lang
> Rainer
>
> On Thu, 2009-04-30 at 19:00 +0200, Rainer Gerhards wrote:
>> I forgot to mention one other subtle, but important issue. I add it down
>> below in the right context:
>>
>>> -----Original Message-----
>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
>>> Sent: Thursday, April 30, 2009 5:47 PM
>>> To: rsyslog-users
>>> Subject: Re: [rsyslog] output plugin calling interface
>>>
>>> David,
>>>
>>> one more note. I think I have basically solved the initial discussion topic
>>> (with the *action* look and I am back at the overall picture). I will write
>>> up on the action lock later, I think I can save some time when we go down
>> the
>>> the other topic first.
>>>
>>> Rest inline below...
>>>
>>>
>>>> -----Original Message-----
>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
>>>> Sent: Thursday, April 30, 2009 1:32 AM
>>>> To: rsyslog-users
>>>> Subject: Re: [rsyslog] output plugin calling interface
>>>>
>>>> On Wed, 29 Apr 2009, Rainer Gerhards wrote:
>>>>
>>>>>>
>>>>>> the key to the locking is to try to both minimize the work
>>>>>> done inside the
>>>>>> lock, and minimize the time the lock is held.
>>>>>>
>>>>>> in general I think the locking should be something along the lines of
>>>>>>
>>>>>> doStartbatch()
>>>>>> lock action mutex
>>>>>> multiple doAction() calls
>>>>>> unlock action mutex
>>>>>> doEndbatch()
>>>>>
>>>>> I overlooked a very important point, and it now appears to me. I *must*
>>> lock
>>>>> the complete batch, including doEndBatch(). The reason is that a single
>>>>> app-level transaction otherwise has no definite start and end point. So
>>> we
>>>>> would not know what is committed and what not (if another thread puts
>>>>> messages into the transaction queue. That breaks the whole model. The
>>> only
>>>>> solution is to hold the lock during the whole tranaction. As you
>> outline,
>>>>> this should not be too long. Even if it took considerable time, there
>>> would
>>>>> be limited usefulness in interleaving other doAction calls, as this
>> would
>>>>> simply cause additional time. At this point, I think there is no real
>>>> benefit
>>>>> in running multiple threads concurrently.
>>>>
>>>> you really don't want to hold the lock through the doEndBatch() call,
>> that
>>>> can potentially take a _long_ time, and that is the time when you most
>>>> want the ability to have other things accessing the queue (note that I
>> may
>>>> be misunderstanding the definition of the lock here)
>>>>
>>>> the output module will be tied up this entire time, but you need other
>>>> things to be able to access the queue (definantly addding things to the
>>>> queue, but there is a win in having the ability for another reader thread
>>>> to be pushing things to a different copy of the output module at the same
>>>> time)
>>>>
>>>> so this means that when you are goign through and doing the doAction()
>>>> call, you are marking that you are working on that queue entry. then you
>>>> release the lock and the next reader that comes along will skip over the
>>>> entries that you have claimed and work to deliver the next N messages.
>>>>
>>>> then when you get the results of doEndBatch() you go back and mark some
>> or
>>>> all of those messages as completed (removing them from the queue). note
>>>> that with multiple worker threads you have the potential to have items
>>>> that aren't the oldest ones in the queue being completed before the
>> oldest
>>>> ones, batching may make bigger holes, but the potential for holes was
>>>> there all along.
>>>>
>>>>>> and the output module should be written to defer as much
>>>>>> processing as
>>>>>> possible to the doEndbatch() call to make the doAction() call
>>>>>> as fast as
>>>>>> possible.
>>>>>
>>>>> Sounds reasonable.
>>>>
>>>> note that the reason for doAction deferring as much work as possible is
>> to
>>>> allow that work to be done outside of any locking
>>>>
>>>>>> since most errors will not be detected during doAction (in
>>>>>> fact, the only
>>>>>> errors I can think of that will happen at this point are
>>>>>> rsyslog resource
>>>>>> contratints), the error handling will need to be done after
>>>>>> doEndbatc()
>>>>>> returns
>>>>>>
>>>>>> at that point the output module may not know which of the
>>>>>> messages caused
>>>>>> the error (if the module sends the messages as a transaction
>>>>>> it may just
>>>>>> know that the transaction failed, and have to do retries with
>>>>>> subsets to
>>>>>> narrow down which message caused the failure)
>>>>>>
>>>>>> as long as at least one message is sucessful, things are not
>>>>>> blocked and
>>>>>> should continue. it's only when doEndBatch() reports that no messages
>>>>>> could be delivered that you have a possible reason to drop
>>>>>> the message
>>>>>> (and even then, only the first message. all others must be retried)
>>>>>
>>>>> Well, I wouldn't conclude that it is the first message, but "one
>> message"
>>>>> inside the batch. So there may be some benefit in retrying the batch
>> with
>>>>> less records (as you suggested). Under the assumption that usually only
>>> one
>>>>> record casues the problem, I tend to think that it may be useful to run
>>>>> commit the batch one-by-one in this case - this may be more efficient
>>> than a
>>>>> binary search for the failing record.
>>>>
>>>> note that it's not _quite_ a binary search (same basic concept though),
>> as
>>>> you submit a subset of them they either go through or you need to try a
>>>> smaller batch
>>>
>>> Ack
>>>
>>>>
>>>> with the individual submissions you are O(N) (on average you will have to
>>>> commit 1/2 the batch individually before you hit the bad one, ~6
>>>> transactions for a batch size of 10, 51 for a batch size of 100)
>>>>
>>>> with the 'binary search' approach you are O(log(N)) ( ~4 transactions for
>>>> a batch size of 10, ~7 for a batch size of 100)
>>>>
>>>> the worst case is probably where the last message is the one that has the
>>>> problem.
>>>>
>>>> for the individual processing that is simple math (batch size of 100, you
>>>> will fail the first one, then submit 99 sucessfully, then fail on the
>> last
>>>> one)
>>>>
>>>> for the binary search it's more complicated (this is assuming the batch
>>>> size gets bumped up when it succeeds)
>>>>
>>>> assuming the batch completly fails (i.e. a database where the output
>>>> module doesn't know which one caused it to fail)
>>>>
>>>> bad message is message 100 and there are >>100 messages in the queue
>>>> fail 100
>>>> succeed 50 (bad message is now message 50)
>>>> fail 100
>>>> fail 50
>>>> succeed 25 (bad message is now message 25)
>>>> fail 100
>>>> fail 50
>>>> fail 25
>>>> suceed 12 (bad message is now message 13
>>>> fail 100
>>>> fail 50
>>>> fail 25
>>>> suceed 12 (bad message is now message 1
>>>> fail 100
>>>> fail 50
>>>> fail 25
>>>> fail 12
>>>> fail 6
>>>> fail 3
>>>> fail 1
>>>> retry 1
>>>> .
>>>> .
>>>> message 1 is bad (20 transactions + retries)
>>>>
>>>> best case would be
>>>>
>>>> bad message is message 1 and there are >>100 messages in the queue
>>>> fail 100
>>>> fail 50
>>>> fail 25
>>>> fail 12
>>>> fail 6
>>>> fail 3
>>>> fail 1
>>>> retry 1
>>>> .
>>>> .
>>>> message 1 is bad (7 transactions + retries)
>>>>
>>>>
>>>> if the output module is able to commit a partial transaction, then the
>>>> logic devolves to
>>>>
>>>> bad message is message 100 and there are >>100 messages in the queue
>>>> submit 100, succeed 99 bad message is message 1
>>>>
>>>>
>>>>> All in all, it looks like the algorithm needs to get a bit more
>>> complicated
>>>>> ;)
>>>>
>>>> unfortunantly, but not very much more complicated.
>>>
>>> We are actually working here on a very important piece of code. This is
>> also
>>> why I am thinking so hard on it. This algorithm, in hopefully
>>> as-simple-as-possible (but not simpler than that ;)) handles all the retry
>>> logic, the output transaction, action resumption and recovery and is also
>>> responsible that the upper-layer queue object is able to reliably persist
>> not
>>> fully processed batches of messages during a restart. This is why I (and
>>> obviously you, too) put so much design effort into it.
>>>
>>> This is primarily a note for those list followers that wonder why we work
>> so
>>> hard on these few lines of code.
>>>>
>>>> the algorithm I posted last week cut the batch size in half for each loop
>>>> and restored it when a commit succeded. you didn't want that to be in the
>>>> core (and it doesn't have to be, but that means that the output module
>> may
>>>> need to do the retries)
>>>
>>> It's not that I didn't like it to be in core, I questioned (at least I hope
>>> so) if it is useful at all. I just re-checked and this indeed is what I did
>>> ;)
>>>
>>> I try to sum up the reason for this. So far, all output modules return one
>> of
>>> three states:
>>>
>>> RS_RET_OK,
>>> which means everything went well
>>>
>>> RS_RET_SUSPENDED,
>>> which means we had a temporary failure and the core should retry the action
>>> some time later
>>>
>>> RS_RET_DISABLED,
>>> which means that we had a permanent failure *of the action* and there is no
>>> point in retrying the action any longer (this actually means the action has
>>> died for the rest of the lifetime of this rsyslogd instance)
>>>
>>> Note that none of the standard modules returns an "I have a permanent
>> failure
>>> with only this message".
>>>
>>> Of course, it may be an oversight, and I tend to agree that we need to add
>>> such a state (or better, a variety of states that provide back several
>> causes
>>> for message-permanent failures). But let's first consider why this state is
>>> not yet already present. The simple truth is that we so far did not need
>> it,
>>> because such a failure is very seldom (and I initially thought
>> non-existing).
>>>
>>> When can it happen? With the file output? No, why should one text string
>> not
>>> be able to be written, but the next one is. Syslog forwarding? No, same
>>> reason. Why one message but not the next one? Same story for user messages
>>> and so on. I think this covers all but the database outputs.
>>>
>>> Now let's look at the database outputs. It first looks like the same story,
>>> but if you look more closely, it may be different (and this subtle issue
>> did
>>> not go into the design of the original database outputs). I can see that
>> the
>>> situation happens if a unique index is defined on one field. Then, if for
>>> example we need to store a sequence of three records, and they have the
>>> following key sequence: (a,a,b). So the second record will fail, as the
>> first
>>> record already had the same key, but the third record will work (thanks to
>> a
>>> different key). The original design was made under the assumption that such
>> a
>>> unique index placed on a field who's values are not unique is a
>> configuration
>>> error (and I still tend to believe it is).
>>>
>>> One may now argue, that while this may be the case, rsyslog should be able
>> to
>>> recover from such user error. I tend to agree to that argument - but I am
>> not
>>> sure if some other folks would argue that this is indeed an user error and
>> if
>>> we permit losing this record, we actually have a data loss bug. There is
>> also
>>> some truth in that argument. It may be better to see this case as a
>> temporary
>>> failure (after all, the real cure is to remove the unique index or fix an
>>> invalid sql statement). But again, I tend to follow the argument that this
>> is
>>> a message-permanent processing failure and as such the message needs to be
>>> discarded so that further messages can be handled.
>>>
>>> HOWEVER, this is the *only* situation in which I can think of a
>>> message-permanent processing failure. No other case comes up my mind, and I
>>> think this was also the conclusion of our Sep 2008 discussion.
>>>
>>> But if there is only one case, and it is very remote, does that actually
>>> justify adding a (somewhat) complex algorithm if a simpler would also do?
>> I'd
>>> say that we can be fine with O(n) in case one of these very remote failures
>>> happens. That means we could simply resort to doing every insert in its own
>>> transaction (just like without batching). Granted, if endTransaction()
>>> dominates the cost function, we have O(n) vs. O(1) in that case, but if
>> that
>>> case happens infrequently enough (what I assume), there really is no
>>> difference between the two. And as you say, even the "binary search"
>> approach
>>> is not O(log(n)) but rather O(n/2) [yes, I know, but it makes sense
>> here...].
>>> So the gain in using that algorithm is even less (in essence, we are at
>> O(n),
>>> whatever we use for recovery).
>>>
>>> I may be overlooking some cases for message-permanent failures and, if I
>> do,
>>> I would be very grateful if you (or someone else) could point me to them.
>>>
>>> If you look at your algorithm (I've put it up here for easy reference:
>>> http://blog.gerhards.net/2009/04/batch-output-handling-algorithm.html ) and
>>> what I have crafted yesterday, you'll see that my writeup is strongly
>>> influenced by your proposal. However, as I wrote in my initial response to
>>> your message, the algorithm fails at some subtleties that you did not know
>> at
>>> the time of writing (most importantly the push-model of the queue object).
>>>
>>> There are also some fine details of the retry handling that needs to go
>> into
>>> the algorithm, as well as proper handling of the now four cases:
>>>
>>> * success
>>> * temporary failure
>>> * message-permanent failure
>>> * action-permanent failure
>>>
>>
>> What I forgot the mention is that we currently have a kind of "pseudo"
>> message-permanent failure state. That is related to temporary failure. My
>> description above was under the assumption that the action is configured for
>> eternal retry (retryCount = -1). If a upper bound for the number of retries n
>> is set, then and only then a message enters message-permanent failure state
>> after n unsuccessful retries. If that happens, the message is discarded, the
>> in-sequence retry counter is reset and the next message scheduled for
>> processing (sounds like it would be a good idea to draw a state diagram...).
>> The idea behind this is that it often is preferable to lose some messages if
>> they cannot be processed rather than re-trying. But I have to admit that the
>> functionality is rooted in rsyslog's past, where no capable queues existed. I
>> still think it needs to be preserved - many use it and I see lots of use
>> cases.
>>
>> But: If we have such an upper bound n, we need to think how to handle this
>> situation for batches. If we have a batch size b > 1, the old interface
>> retried n times before discarding the first message and 2n times before
>> discarding the second. For the b-th record, we have bn retries before it is
>> discarded (maybe one off each, not checked exactly). There is no point in
>> modeling this with batches. So we could use between n..bn retries. The
>> problem here is that b is not fixed - it is between one and the configured
>> upper bound of the batch size. So we do not get (fully) deterministic
>> behavior. Question is whether or not this is actually important, as the
>> unavailability of the action's target (when looked from the rsyslog POV) is
>> also not deterministic. So I tend to use a fixed number of retries for the
>> whole batch. That could simply be n, because the user can configure it. Or we
>> could derive a new n' from the configured one, e.g. by n' = n * b/10 (where b
>> is the *actual* batch size, as above, not the upper bound!). That way, we the
>> max number of retries would be related to the actual batch size and varies
>> with it. Somehow, just a feeling, I'd *not* go for n' = nb (though this may
>> just be an emotional position...).
>>
>> Comments appreciated.
>>
>> Rainer
>>
>>>>
>>>> the question is which side the retry logic needs to be in.
>>>>
>>>> it can be in the queue walkder that calls the output module
>>>>
>>>> it can be in doEndBatch() in the output module
>>>>
>>>> some output modules don't need partial retries (as they can output
>> partial
>>>> batches)
>>>>
>>>> some output modules do need partial retries.
>>>>
>>>> the more complicated retry logic will work for both situations, or it can
>>>> be implemented in each of how ever many output modules need it.
>>>>
>>>> it can go either way, I tend to lean towards only having the logic in one
>>>> place (even if it's more complicated logic than some modules need)
>>>
>>> I fully agree here with you. That functionality belongs into the core, we
>>> just need to craft it well. I think the probably most important question at
>>> this time is if I overlook some possible message-permanent failure cases.
>> So
>>> I would greatly appreciate feedback especially on that part of my reply.
>>>
>>> Thanks,
>>> Rainer
>>>
>>>>
>>>> David Lang
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
More information about the rsyslog
mailing list