From tmetro+rsyslog at gmail.com Fri May 1 01:10:27 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Thu, 30 Apr 2009 19:10:27 -0400 Subject: [rsyslog] directing logs to a broadcast address fails In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AFF1@GRFEXC.intern.adiscon.com> References: <49F7DF6B.9020208@gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AFE0@GRFEXC.intern.adiscon.com> <49F8D879.7080101@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF1@GRFEXC.intern.adiscon.com> Message-ID: <49FA2FE3.5080207@gmail.com> Rainer Gerhards wrote: >> I'll try doing the backport. As long as there aren't any >> interdependencies that can't be met (like reliance on a newer kernel or >> shared library), it should just be a matter of grabbing the newer >> package source and rebuilding. Then when the OS eventually gets >> upgraded, it'll automatically get updated too. > > Ahhhh! I was thinking about backporting a patch. ... I think a > backport to the older distro should be fairly painless. You may run into > gnutls issues, if you do, let me know. They should be easy to address (at > least I hope so). The backport was straight forward. A few compile warnings. A bunch of packaging warnings (apparently one of the Debian tools checks for unresolved symbols within the packaged libraries). But it installed and ran OK (I'll post the procedure separately), and after merging the 1.x and 3.x conf file, didn't log any warnings on startup. But it didn't fix the broadcast problem. As the apparently identical version works on Ubuntu 8.10, this suggests the problem is in one of the shared libraries. Although... /usr/local/src/rsyslog-3.18.1# fgrep -r setsockopt . | fgrep -i broadcast produces no output, which seems suspicious. "fgrep -ir broadcast ." also turns up nothing relevant. I'd expect SO_BROADCAST to appear somewhere. Maybe the rsyslog code hasn't change, but the underlying libraries lifted the requirement of an SO_BROADCAST flag in order for a socket to permit broadcast packets? I tried the following patch to explicitly enable the SO_BROADCAST flag: --- rsyslog-3.18.1.orig/omfwd.c +++ rsyslog-3.18.1/omfwd.c @@ -356,6 +356,11 @@ if(pData->protocol == FORW_UDP) { if(pData->pSockArray == NULL) { pData->pSockArray = net.create_udp_socket((uchar*)pData->f_hname, NULL, 0); + int on = 1; + if (setsockopt(*pData->pSockArray, SOL_SOCKET, SO_BROADCAST, + (char *) &on, sizeof(on)) < 0 ) { + errmsg.LogError(NO_ERRCODE, "setsockopt(SO_BROADCAST)"); + } } } pData->ttSuspend = time(NULL); But that didn't fix it either. (On a side note, the above code uses net.create_udp_socket() to create the UDP socket for forwarding, yet that function contains a bunch of error checking for a listening socket that isn't applicable, and potentially quite misleading, if it got triggered.) -Tom From YungWei.Chen at resolvity.com Fri May 1 05:30:34 2009 From: YungWei.Chen at resolvity.com (YungWei.Chen) Date: Thu, 30 Apr 2009 23:30:34 -0400 Subject: [rsyslog] Building rsyslog from source code In-Reply-To: References: <795E60BBD9A86846BFEDF32029788B1BA7491B@MI8NYCMAIL14.Mi8.com> Message-ID: <795E60BBD9A86846BFEDF32029788B1BA749AE@MI8NYCMAIL14.Mi8.com> My goal is to build rsyslog 3.19 or above with least upgrades of vanilla CentOS 5.2. For those who have successfully done that, * did you successfully build lmnsd_gtls.so from its source code too? * where can I get source code of lmnsd_gtls.so if it's available? Thanks. From rgerhards at hq.adiscon.com Fri May 1 10:09:33 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 1 May 2009 10:09:33 +0200 Subject: [rsyslog] Building rsyslog from source code Message-ID: <000001c9ca34$670b84df$100013ac@intern.adiscon.com> I can answer this one: the source is included in the regular release tarball, I think in the runtime directory. rainer ----- Urspr?ngliche Nachricht ----- Von: "YungWei.Chen" An: "rsyslog at lists.adiscon.com" Gesendet: 01.05.09 05:31 Betreff: Re: [rsyslog] Building rsyslog from source code My goal is to build rsyslog 3.19 or above with least upgrades of vanilla CentOS 5.2. For those who have successfully done that, * did you successfully build lmnsd_gtls.so from its source code too? * where can I get source code of lmnsd_gtls.so if it's available? Thanks. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri May 1 21:56:18 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 1 May 2009 21:56:18 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> David, > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com > [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Thursday, April 30, 2009 11:57 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > >> 1. something wrong with a message. > >> retries won't help (with batches we need to narrow down > >> to which message) > > > > But what could be wrong? For which outputs? > > > > Can you envison anything other than db outputs only, > duplicate identity in > > column errornously declared with a unique index (which I > still consider an > > admin error)? I really fail to find any other sample. > > things like quoting errors (which I know you defend against) > > unicode text causing grief > > dynafile hits a read-only file > > basicly data-driven things that trigger bugs in the message delivery > mechanism in some form. > > this is easiest to understand and provide examples with > databases, but you > could have problems with other methods. > > in an ideal world these would never happen, but for most > output types I > can think of some form of corrupt input that could cause that > message to > fail. OK, thanks a lot. It doesn't matter if I can defend a case or two, but you provided good examples. I think I now can be relieved that we have sufficient probability to seriously look at these cases. Resting sometimes helps, and so I also think I made a big step forward (while not actively working on it ;)) today. I think what makes the problem so hard to solve is the language we use. I thought about a purely mathematical model, and I have one on my mind. The state diagram was a first step, but it went not far enough. So I think I will try to write up that model and then we can discuss based on it and finally derive the actual code from it. That's an extra step, but I think it will be a useful one. As a side-note, I have also identified that we have overlooked a subtle issue so far: backup actions - they need to work on the subset of the batch that had message permanent failures. So the message state actually needs to be part of the message inside the batch. But now, I think, things really begin to come together and are far less complex than initially thought. One problem with the state chart - that was why I said it is not 100% correct - is that it does not properly abstract batches vs. single messages. Both of them entangled in a way that I thought [;)] to be very complex. But if you model that with processing states, then the batch processing state is simply a function of the individual message processing states. Please let me know if you also find a math model useful (but I'll probably need to do it in any case, because it helps me clean up my mind...). Rainer > > >> > >> 2. something wrong with the destination. > >> in this case waiting and retrying makes sense, the > >> destination may > >> recover (either on it's own, or through sysadmin assistance) > > > > Yes, potentially recoverable. > > > >> > >> 3. internal rsyslog issue (out of memory or similar) > > > > That's potentially recoverable, too. The most concerning > issue is running out > > of process address space, but even this can be recovered > from (at least in > > theory, in practice I doubt that rsyslog really will survive such a > > situation) > > agreed. > > >> > >> I'm not thinking of things that don't fall in one of these > >> buckets, but > >> I'll add more later if I think of them. > > > > The bottom line is that I see only a single case, and that > is caused by > > config error. > > if you define failure to appropriatly handle odd characters > (failing to > change control characters for example) to be a config error I > can agree > with you. > > David Lang > > > Rainer > > > >> > >>> Rainer > >>> PS: tomorrow is a public holiday over here, I may be out > >> with friends and > >>> family and don't know if I will get to my mail. > >> > >> no problem. > >> > >> David Lang > >> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com > >>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of > >> david at lang.hm > >>>> Sent: Thursday, April 30, 2009 9:42 PM > >>>> To: rsyslog-users > >>>> Subject: Re: [rsyslog] output plugin calling interface > >>>> > >>>> On Thu, 30 Apr 2009, Rainer Gerhards wrote: > >>>> > >>>>> I have just drawn a state diagram. It is not 100% perfect, > >>>> but I think > >>>>> it conveys the idea better than the text. Anyhow, you > >> should read it > >>>>> together with the text below, because the text is more > >> precise. Note > >>>>> that the diagram looks only at message states, thus > >>>>> message-permanent-failure and action-permanent-failure lead > >>>> to the some > >>>>> state in the diagram: > >>>>> > >>>>> http://www.rsyslog.com/modules/Static_Docs/data/action-call.png > >>>>> > >>>>> I hope it is useful. > >>>>> > >>>>> I drew it with graphvic, the dot control file is in > >>>> multi-dequeue git > >>>>> branch and named action-call.dot - in case you'd like > to modify ;) > >>>> > >>>> this is definantly useful and I believe accuratly represents > >>>> what you have > >>>> described. > >>>> > >>>> what I was proposing would be slightly different > >>>> > >>>> from commit pending you would have a third possible result > >>>> > >>>> change the endTransaction _SUSPENDED to endTransaction > >> _SUSPENDED, b=1 > >>>> > >>>> add another line endTransaction _SUSPENDED, b>1 that changes > >>>> b=b/2 and > >>>> goes back to 'ready for processing' > >>>> > >>>> then on the 'retry succeeds' line change b=Batch_Size > >>>> > >>>> David Lang > >>>> > >>>>> Rainer > >>>>> > >>>>> On Thu, 2009-04-30 at 19:00 +0200, Rainer Gerhards wrote: > >>>>>> I forgot to mention one other subtle, but important issue. > >>>> I add it down > >>>>>> below in the right context: > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>>>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > >>>>>>> Sent: Thursday, April 30, 2009 5:47 PM > >>>>>>> To: rsyslog-users > >>>>>>> Subject: Re: [rsyslog] output plugin calling interface > >>>>>>> > >>>>>>> David, > >>>>>>> > >>>>>>> one more note. I think I have basically solved the > >>>> initial discussion topic > >>>>>>> (with the *action* look and I am back at the overall > >>>> picture). I will write > >>>>>>> up on the action lock later, I think I can save some time > >>>> when we go down > >>>>>> the > >>>>>>> the other topic first. > >>>>>>> > >>>>>>> Rest inline below... > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>>>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>>>>>> Sent: Thursday, April 30, 2009 1:32 AM > >>>>>>>> To: rsyslog-users > >>>>>>>> Subject: Re: [rsyslog] output plugin calling interface > >>>>>>>> > >>>>>>>> On Wed, 29 Apr 2009, Rainer Gerhards wrote: > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> the key to the locking is to try to both minimize the work > >>>>>>>>>> done inside the > >>>>>>>>>> lock, and minimize the time the lock is held. > >>>>>>>>>> > >>>>>>>>>> in general I think the locking should be something > >>>> along the lines of > >>>>>>>>>> > >>>>>>>>>> doStartbatch() > >>>>>>>>>> lock action mutex > >>>>>>>>>> multiple doAction() calls > >>>>>>>>>> unlock action mutex > >>>>>>>>>> doEndbatch() > >>>>>>>>> > >>>>>>>>> I overlooked a very important point, and it now appears > >>>> to me. I *must* > >>>>>>> lock > >>>>>>>>> the complete batch, including doEndBatch(). The reason > >>>> is that a single > >>>>>>>>> app-level transaction otherwise has no definite start > >>>> and end point. So > >>>>>>> we > >>>>>>>>> would not know what is committed and what not (if > >>>> another thread puts > >>>>>>>>> messages into the transaction queue. That breaks the > >>>> whole model. The > >>>>>>> only > >>>>>>>>> solution is to hold the lock during the whole > >> tranaction. As you > >>>>>> outline, > >>>>>>>>> this should not be too long. Even if it took > >>>> considerable time, there > >>>>>>> would > >>>>>>>>> be limited usefulness in interleaving other doAction > >>>> calls, as this > >>>>>> would > >>>>>>>>> simply cause additional time. At this point, I think > >>>> there is no real > >>>>>>>> benefit > >>>>>>>>> in running multiple threads concurrently. > >>>>>>>> > >>>>>>>> you really don't want to hold the lock through the > >>>> doEndBatch() call, > >>>>>> that > >>>>>>>> can potentially take a _long_ time, and that is the time > >>>> when you most > >>>>>>>> want the ability to have other things accessing the > >>>> queue (note that I > >>>>>> may > >>>>>>>> be misunderstanding the definition of the lock here) > >>>>>>>> > >>>>>>>> the output module will be tied up this entire time, but > >>>> you need other > >>>>>>>> things to be able to access the queue (definantly > >>>> addding things to the > >>>>>>>> queue, but there is a win in having the ability for > >>>> another reader thread > >>>>>>>> to be pushing things to a different copy of the output > >>>> module at the same > >>>>>>>> time) > >>>>>>>> > >>>>>>>> so this means that when you are goign through and doing > >>>> the doAction() > >>>>>>>> call, you are marking that you are working on that queue > >>>> entry. then you > >>>>>>>> release the lock and the next reader that comes along > >>>> will skip over the > >>>>>>>> entries that you have claimed and work to deliver the > >>>> next N messages. > >>>>>>>> > >>>>>>>> then when you get the results of doEndBatch() you go > >>>> back and mark some > >>>>>> or > >>>>>>>> all of those messages as completed (removing them from > >>>> the queue). note > >>>>>>>> that with multiple worker threads you have the potential > >>>> to have items > >>>>>>>> that aren't the oldest ones in the queue being completed > >>>> before the > >>>>>> oldest > >>>>>>>> ones, batching may make bigger holes, but the potential > >>>> for holes was > >>>>>>>> there all along. > >>>>>>>> > >>>>>>>>>> and the output module should be written to defer as much > >>>>>>>>>> processing as > >>>>>>>>>> possible to the doEndbatch() call to make the > doAction() call > >>>>>>>>>> as fast as > >>>>>>>>>> possible. > >>>>>>>>> > >>>>>>>>> Sounds reasonable. > >>>>>>>> > >>>>>>>> note that the reason for doAction deferring as much work > >>>> as possible is > >>>>>> to > >>>>>>>> allow that work to be done outside of any locking > >>>>>>>> > >>>>>>>>>> since most errors will not be detected during doAction (in > >>>>>>>>>> fact, the only > >>>>>>>>>> errors I can think of that will happen at this point are > >>>>>>>>>> rsyslog resource > >>>>>>>>>> contratints), the error handling will need to be done after > >>>>>>>>>> doEndbatc() > >>>>>>>>>> returns > >>>>>>>>>> > >>>>>>>>>> at that point the output module may not know which of the > >>>>>>>>>> messages caused > >>>>>>>>>> the error (if the module sends the messages as a > transaction > >>>>>>>>>> it may just > >>>>>>>>>> know that the transaction failed, and have to do > retries with > >>>>>>>>>> subsets to > >>>>>>>>>> narrow down which message caused the failure) > >>>>>>>>>> > >>>>>>>>>> as long as at least one message is sucessful, > things are not > >>>>>>>>>> blocked and > >>>>>>>>>> should continue. it's only when doEndBatch() reports > >>>> that no messages > >>>>>>>>>> could be delivered that you have a possible reason to drop > >>>>>>>>>> the message > >>>>>>>>>> (and even then, only the first message. all others > >>>> must be retried) > >>>>>>>>> > >>>>>>>>> Well, I wouldn't conclude that it is the first > >> message, but "one > >>>>>> message" > >>>>>>>>> inside the batch. So there may be some benefit in > >>>> retrying the batch > >>>>>> with > >>>>>>>>> less records (as you suggested). Under the assumption > >>>> that usually only > >>>>>>> one > >>>>>>>>> record casues the problem, I tend to think that it may > >>>> be useful to run > >>>>>>>>> commit the batch one-by-one in this case - this may be > >>>> more efficient > >>>>>>> than a > >>>>>>>>> binary search for the failing record. > >>>>>>>> > >>>>>>>> note that it's not _quite_ a binary search (same basic > >>>> concept though), > >>>>>> as > >>>>>>>> you submit a subset of them they either go through or > >>>> you need to try a > >>>>>>>> smaller batch > >>>>>>> > >>>>>>> Ack > >>>>>>> > >>>>>>>> > >>>>>>>> with the individual submissions you are O(N) (on average > >>>> you will have to > >>>>>>>> commit 1/2 the batch individually before you hit the > >> bad one, ~6 > >>>>>>>> transactions for a batch size of 10, 51 for a batch > >> size of 100) > >>>>>>>> > >>>>>>>> with the 'binary search' approach you are O(log(N)) ( ~4 > >>>> transactions for > >>>>>>>> a batch size of 10, ~7 for a batch size of 100) > >>>>>>>> > >>>>>>>> the worst case is probably where the last message is the > >>>> one that has the > >>>>>>>> problem. > >>>>>>>> > >>>>>>>> for the individual processing that is simple math (batch > >>>> size of 100, you > >>>>>>>> will fail the first one, then submit 99 sucessfully, > >>>> then fail on the > >>>>>> last > >>>>>>>> one) > >>>>>>>> > >>>>>>>> for the binary search it's more complicated (this is > >>>> assuming the batch > >>>>>>>> size gets bumped up when it succeeds) > >>>>>>>> > >>>>>>>> assuming the batch completly fails (i.e. a database > >>>> where the output > >>>>>>>> module doesn't know which one caused it to fail) > >>>>>>>> > >>>>>>>> bad message is message 100 and there are >>100 messages > >>>> in the queue > >>>>>>>> fail 100 > >>>>>>>> succeed 50 (bad message is now message 50) > >>>>>>>> fail 100 > >>>>>>>> fail 50 > >>>>>>>> succeed 25 (bad message is now message 25) > >>>>>>>> fail 100 > >>>>>>>> fail 50 > >>>>>>>> fail 25 > >>>>>>>> suceed 12 (bad message is now message 13 > >>>>>>>> fail 100 > >>>>>>>> fail 50 > >>>>>>>> fail 25 > >>>>>>>> suceed 12 (bad message is now message 1 > >>>>>>>> fail 100 > >>>>>>>> fail 50 > >>>>>>>> fail 25 > >>>>>>>> fail 12 > >>>>>>>> fail 6 > >>>>>>>> fail 3 > >>>>>>>> fail 1 > >>>>>>>> retry 1 > >>>>>>>> . > >>>>>>>> . > >>>>>>>> message 1 is bad (20 transactions + retries) > >>>>>>>> > >>>>>>>> best case would be > >>>>>>>> > >>>>>>>> bad message is message 1 and there are >>100 messages in > >>>> the queue > >>>>>>>> fail 100 > >>>>>>>> fail 50 > >>>>>>>> fail 25 > >>>>>>>> fail 12 > >>>>>>>> fail 6 > >>>>>>>> fail 3 > >>>>>>>> fail 1 > >>>>>>>> retry 1 > >>>>>>>> . > >>>>>>>> . > >>>>>>>> message 1 is bad (7 transactions + retries) > >>>>>>>> > >>>>>>>> > >>>>>>>> if the output module is able to commit a partial > >>>> transaction, then the > >>>>>>>> logic devolves to > >>>>>>>> > >>>>>>>> bad message is message 100 and there are >>100 messages > >>>> in the queue > >>>>>>>> submit 100, succeed 99 bad message is message 1 > >>>>>>>> > >>>>>>>> > >>>>>>>>> All in all, it looks like the algorithm needs to get > >> a bit more > >>>>>>> complicated > >>>>>>>>> ;) > >>>>>>>> > >>>>>>>> unfortunantly, but not very much more complicated. > >>>>>>> > >>>>>>> We are actually working here on a very important piece of > >>>> code. This is > >>>>>> also > >>>>>>> why I am thinking so hard on it. This algorithm, in hopefully > >>>>>>> as-simple-as-possible (but not simpler than that ;)) > >>>> handles all the retry > >>>>>>> logic, the output transaction, action resumption and > >>>> recovery and is also > >>>>>>> responsible that the upper-layer queue object is able to > >>>> reliably persist > >>>>>> not > >>>>>>> fully processed batches of messages during a restart. > >>>> This is why I (and > >>>>>>> obviously you, too) put so much design effort into it. > >>>>>>> > >>>>>>> This is primarily a note for those list followers that > >>>> wonder why we work > >>>>>> so > >>>>>>> hard on these few lines of code. > >>>>>>>> > >>>>>>>> the algorithm I posted last week cut the batch size in > >>>> half for each loop > >>>>>>>> and restored it when a commit succeded. you didn't want > >>>> that to be in the > >>>>>>>> core (and it doesn't have to be, but that means that the > >>>> output module > >>>>>> may > >>>>>>>> need to do the retries) > >>>>>>> > >>>>>>> It's not that I didn't like it to be in core, I > >>>> questioned (at least I hope > >>>>>>> so) if it is useful at all. I just re-checked and this > >>>> indeed is what I did > >>>>>>> ;) > >>>>>>> > >>>>>>> I try to sum up the reason for this. So far, all output > >>>> modules return one > >>>>>> of > >>>>>>> three states: > >>>>>>> > >>>>>>> RS_RET_OK, > >>>>>>> which means everything went well > >>>>>>> > >>>>>>> RS_RET_SUSPENDED, > >>>>>>> which means we had a temporary failure and the core > >>>> should retry the action > >>>>>>> some time later > >>>>>>> > >>>>>>> RS_RET_DISABLED, > >>>>>>> which means that we had a permanent failure *of the > >>>> action* and there is no > >>>>>>> point in retrying the action any longer (this actually > >>>> means the action has > >>>>>>> died for the rest of the lifetime of this rsyslogd instance) > >>>>>>> > >>>>>>> Note that none of the standard modules returns an "I have > >>>> a permanent > >>>>>> failure > >>>>>>> with only this message". > >>>>>>> > >>>>>>> Of course, it may be an oversight, and I tend to agree > >>>> that we need to add > >>>>>>> such a state (or better, a variety of states that provide > >>>> back several > >>>>>> causes > >>>>>>> for message-permanent failures). But let's first consider > >>>> why this state is > >>>>>>> not yet already present. The simple truth is that we so > >>>> far did not need > >>>>>> it, > >>>>>>> because such a failure is very seldom (and I initially thought > >>>>>> non-existing). > >>>>>>> > >>>>>>> When can it happen? With the file output? No, why should > >>>> one text string > >>>>>> not > >>>>>>> be able to be written, but the next one is. Syslog > >>>> forwarding? No, same > >>>>>>> reason. Why one message but not the next one? Same story > >>>> for user messages > >>>>>>> and so on. I think this covers all but the database outputs. > >>>>>>> > >>>>>>> Now let's look at the database outputs. It first looks > >>>> like the same story, > >>>>>>> but if you look more closely, it may be different (and > >>>> this subtle issue > >>>>>> did > >>>>>>> not go into the design of the original database outputs). > >>>> I can see that > >>>>>> the > >>>>>>> situation happens if a unique index is defined on one > >>>> field. Then, if for > >>>>>>> example we need to store a sequence of three records, and > >>>> they have the > >>>>>>> following key sequence: (a,a,b). So the second record > >>>> will fail, as the > >>>>>> first > >>>>>>> record already had the same key, but the third record > >>>> will work (thanks to > >>>>>> a > >>>>>>> different key). The original design was made under the > >>>> assumption that such > >>>>>> a > >>>>>>> unique index placed on a field who's values are not > unique is a > >>>>>> configuration > >>>>>>> error (and I still tend to believe it is). > >>>>>>> > >>>>>>> One may now argue, that while this may be the case, > >>>> rsyslog should be able > >>>>>> to > >>>>>>> recover from such user error. I tend to agree to that > >>>> argument - but I am > >>>>>> not > >>>>>>> sure if some other folks would argue that this is indeed > >>>> an user error and > >>>>>> if > >>>>>>> we permit losing this record, we actually have a data > >>>> loss bug. There is > >>>>>> also > >>>>>>> some truth in that argument. It may be better to see this > >>>> case as a > >>>>>> temporary > >>>>>>> failure (after all, the real cure is to remove the unique > >>>> index or fix an > >>>>>>> invalid sql statement). But again, I tend to follow the > >>>> argument that this > >>>>>> is > >>>>>>> a message-permanent processing failure and as such the > >>>> message needs to be > >>>>>>> discarded so that further messages can be handled. > >>>>>>> > >>>>>>> HOWEVER, this is the *only* situation in which I can > think of a > >>>>>>> message-permanent processing failure. No other case comes > >>>> up my mind, and I > >>>>>>> think this was also the conclusion of our Sep 2008 discussion. > >>>>>>> > >>>>>>> But if there is only one case, and it is very remote, > >>>> does that actually > >>>>>>> justify adding a (somewhat) complex algorithm if a > >>>> simpler would also do? > >>>>>> I'd > >>>>>>> say that we can be fine with O(n) in case one of these > >>>> very remote failures > >>>>>>> happens. That means we could simply resort to doing every > >>>> insert in its own > >>>>>>> transaction (just like without batching). Granted, if > >>>> endTransaction() > >>>>>>> dominates the cost function, we have O(n) vs. O(1) in > >>>> that case, but if > >>>>>> that > >>>>>>> case happens infrequently enough (what I assume), there > >>>> really is no > >>>>>>> difference between the two. And as you say, even the > >>>> "binary search" > >>>>>> approach > >>>>>>> is not O(log(n)) but rather O(n/2) [yes, I know, but it > >>>> makes sense > >>>>>> here...]. > >>>>>>> So the gain in using that algorithm is even less (in > >>>> essence, we are at > >>>>>> O(n), > >>>>>>> whatever we use for recovery). > >>>>>>> > >>>>>>> I may be overlooking some cases for message-permanent > >>>> failures and, if I > >>>>>> do, > >>>>>>> I would be very grateful if you (or someone else) could > >>>> point me to them. > >>>>>>> > >>>>>>> If you look at your algorithm (I've put it up here for > >>>> easy reference: > >>>>>>> > >>>> http://blog.gerhards.net/2009/04/batch-output-handling-algorit > >>>> hm.html ) and > >>>>>>> what I have crafted yesterday, you'll see that my writeup > >>>> is strongly > >>>>>>> influenced by your proposal. However, as I wrote in my > >>>> initial response to > >>>>>>> your message, the algorithm fails at some subtleties that > >>>> you did not know > >>>>>> at > >>>>>>> the time of writing (most importantly the push-model of > >>>> the queue object). > >>>>>>> > >>>>>>> There are also some fine details of the retry handling > >>>> that needs to go > >>>>>> into > >>>>>>> the algorithm, as well as proper handling of the now > four cases: > >>>>>>> > >>>>>>> * success > >>>>>>> * temporary failure > >>>>>>> * message-permanent failure > >>>>>>> * action-permanent failure > >>>>>>> > >>>>>> > >>>>>> What I forgot the mention is that we currently have a kind > >>>> of "pseudo" > >>>>>> message-permanent failure state. That is related to > >>>> temporary failure. My > >>>>>> description above was under the assumption that the action > >>>> is configured for > >>>>>> eternal retry (retryCount = -1). If a upper bound for the > >>>> number of retries n > >>>>>> is set, then and only then a message enters > >>>> message-permanent failure state > >>>>>> after n unsuccessful retries. If that happens, the message > >>>> is discarded, the > >>>>>> in-sequence retry counter is reset and the next message > >>>> scheduled for > >>>>>> processing (sounds like it would be a good idea to draw a > >>>> state diagram...). > >>>>>> The idea behind this is that it often is preferable to > >>>> lose some messages if > >>>>>> they cannot be processed rather than re-trying. But I have > >>>> to admit that the > >>>>>> functionality is rooted in rsyslog's past, where no > >>>> capable queues existed. I > >>>>>> still think it needs to be preserved - many use it and I > >>>> see lots of use > >>>>>> cases. > >>>>>> > >>>>>> But: If we have such an upper bound n, we need to think > >>>> how to handle this > >>>>>> situation for batches. If we have a batch size b > 1, the > >>>> old interface > >>>>>> retried n times before discarding the first message and 2n > >>>> times before > >>>>>> discarding the second. For the b-th record, we have bn > >>>> retries before it is > >>>>>> discarded (maybe one off each, not checked exactly). There > >>>> is no point in > >>>>>> modeling this with batches. So we could use between n..bn > >>>> retries. The > >>>>>> problem here is that b is not fixed - it is between one > >>>> and the configured > >>>>>> upper bound of the batch size. So we do not get (fully) > >>>> deterministic > >>>>>> behavior. Question is whether or not this is actually > >>>> important, as the > >>>>>> unavailability of the action's target (when looked from > >>>> the rsyslog POV) is > >>>>>> also not deterministic. So I tend to use a fixed number of > >>>> retries for the > >>>>>> whole batch. That could simply be n, because the user can > >>>> configure it. Or we > >>>>>> could derive a new n' from the configured one, e.g. by n' > >>>> = n * b/10 (where b > >>>>>> is the *actual* batch size, as above, not the upper > >>>> bound!). That way, we the > >>>>>> max number of retries would be related to the actual batch > >>>> size and varies > >>>>>> with it. Somehow, just a feeling, I'd *not* go for n' = nb > >>>> (though this may > >>>>>> just be an emotional position...). > >>>>>> > >>>>>> Comments appreciated. > >>>>>> > >>>>>> Rainer > >>>>>> > >>>>>>>> > >>>>>>>> the question is which side the retry logic needs to be in. > >>>>>>>> > >>>>>>>> it can be in the queue walkder that calls the output module > >>>>>>>> > >>>>>>>> it can be in doEndBatch() in the output module > >>>>>>>> > >>>>>>>> some output modules don't need partial retries (as they > >>>> can output > >>>>>> partial > >>>>>>>> batches) > >>>>>>>> > >>>>>>>> some output modules do need partial retries. > >>>>>>>> > >>>>>>>> the more complicated retry logic will work for both > >>>> situations, or it can > >>>>>>>> be implemented in each of how ever many output > modules need it. > >>>>>>>> > >>>>>>>> it can go either way, I tend to lean towards only having > >>>> the logic in one > >>>>>>>> place (even if it's more complicated logic than some > >>>> modules need) > >>>>>>> > >>>>>>> I fully agree here with you. That functionality belongs > >>>> into the core, we > >>>>>>> just need to craft it well. I think the probably most > >>>> important question at > >>>>>>> this time is if I overlook some possible > >>>> message-permanent failure cases. > >>>>>> So > >>>>>>> I would greatly appreciate feedback especially on that > >>>> part of my reply. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Rainer > >>>>>>> > >>>>>>>> > >>>>>>>> David Lang > >>>>>>>> _______________________________________________ > >>>>>>>> rsyslog mailing list > >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>>>>> http://www.rsyslog.com > >>>>>>> _______________________________________________ > >>>>>>> rsyslog mailing list > >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>>>> http://www.rsyslog.com > >>>>>> _______________________________________________ > >>>>>> rsyslog mailing list > >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>>> http://www.rsyslog.com > >>>>> > >>>>> _______________________________________________ > >>>>> rsyslog mailing list > >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>> http://www.rsyslog.com > >>>>> > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>>> > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > >> > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From david at lang.hm Fri May 1 22:14:16 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 1 May 2009 13:14:16 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 1 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> 1. something wrong with a message. >>>> retries won't help (with batches we need to narrow down >>>> to which message) >>> >>> But what could be wrong? For which outputs? >>> >>> Can you envison anything other than db outputs only, >> duplicate identity in >>> column errornously declared with a unique index (which I >> still consider an >>> admin error)? I really fail to find any other sample. >> >> things like quoting errors (which I know you defend against) >> >> unicode text causing grief >> >> dynafile hits a read-only file >> >> basicly data-driven things that trigger bugs in the message delivery >> mechanism in some form. >> >> this is easiest to understand and provide examples with >> databases, but you >> could have problems with other methods. >> >> in an ideal world these would never happen, but for most >> output types I >> can think of some form of corrupt input that could cause that >> message to >> fail. > > OK, thanks a lot. It doesn't matter if I can defend a case or two, but you > provided good examples. I think I now can be relieved that we have sufficient > probability to seriously look at these cases. > > Resting sometimes helps, and so I also think I made a big step forward (while > not actively working on it ;)) today. I think what makes the problem so hard > to solve is the language we use. I thought about a purely mathematical model, > and I have one on my mind. The state diagram was a first step, but it went > not far enough. So I think I will try to write up that model and then we can > discuss based on it and finally derive the actual code from it. That's an > extra step, but I think it will be a useful one. > > As a side-note, I have also identified that we have overlooked a subtle issue > so far: backup actions - they need to work on the subset of the batch that > had message permanent failures. So the message state actually needs to be > part of the message inside the batch. But now, I think, things really begin > to come together and are far less complex than initially thought. > > One problem with the state chart - that was why I said it is not 100% correct > - is that it does not properly abstract batches vs. single messages. Both of > them entangled in a way that I thought [;)] to be very complex. But if you > model that with processing states, then the batch processing state is simply > a function of the individual message processing states. > > Please let me know if you also find a math model useful (but I'll probably > need to do it in any case, because it helps me clean up my mind...). I think it will help clarify things a lot. with a good model we won't have misunderstandings about what we are talking about. with my 'binary search' approach, handling permanently bad messages could be as simple as 'too many retries once we hit a batch size of 1' (with a possible option of the output module reporting back that it dectected something that makes retries useless, but this is just an optimization) David Lang > Rainer >> >>>> >>>> 2. something wrong with the destination. >>>> in this case waiting and retrying makes sense, the >>>> destination may >>>> recover (either on it's own, or through sysadmin assistance) >>> >>> Yes, potentially recoverable. >>> >>>> >>>> 3. internal rsyslog issue (out of memory or similar) >>> >>> That's potentially recoverable, too. The most concerning >> issue is running out >>> of process address space, but even this can be recovered >> from (at least in >>> theory, in practice I doubt that rsyslog really will survive such a >>> situation) >> >> agreed. >> >>>> >>>> I'm not thinking of things that don't fall in one of these >>>> buckets, but >>>> I'll add more later if I think of them. >>> >>> The bottom line is that I see only a single case, and that >> is caused by >>> config error. >> >> if you define failure to appropriatly handle odd characters >> (failing to >> change control characters for example) to be a config error I >> can agree >> with you. >> >> David Lang >> >>> Rainer >>> >>>> >>>>> Rainer >>>>> PS: tomorrow is a public holiday over here, I may be out >>>> with friends and >>>>> family and don't know if I will get to my mail. >>>> >>>> no problem. >>>> >>>> David Lang >>>> >>>>>> -----Original Message----- >>>>>> From: rsyslog-bounces at lists.adiscon.com >>>>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of >>>> david at lang.hm >>>>>> Sent: Thursday, April 30, 2009 9:42 PM >>>>>> To: rsyslog-users >>>>>> Subject: Re: [rsyslog] output plugin calling interface >>>>>> >>>>>> On Thu, 30 Apr 2009, Rainer Gerhards wrote: >>>>>> >>>>>>> I have just drawn a state diagram. It is not 100% perfect, >>>>>> but I think >>>>>>> it conveys the idea better than the text. Anyhow, you >>>> should read it >>>>>>> together with the text below, because the text is more >>>> precise. Note >>>>>>> that the diagram looks only at message states, thus >>>>>>> message-permanent-failure and action-permanent-failure lead >>>>>> to the some >>>>>>> state in the diagram: >>>>>>> >>>>>>> http://www.rsyslog.com/modules/Static_Docs/data/action-call.png >>>>>>> >>>>>>> I hope it is useful. >>>>>>> >>>>>>> I drew it with graphvic, the dot control file is in >>>>>> multi-dequeue git >>>>>>> branch and named action-call.dot - in case you'd like >> to modify ;) >>>>>> >>>>>> this is definantly useful and I believe accuratly represents >>>>>> what you have >>>>>> described. >>>>>> >>>>>> what I was proposing would be slightly different >>>>>> >>>>>> from commit pending you would have a third possible result >>>>>> >>>>>> change the endTransaction _SUSPENDED to endTransaction >>>> _SUSPENDED, b=1 >>>>>> >>>>>> add another line endTransaction _SUSPENDED, b>1 that changes >>>>>> b=b/2 and >>>>>> goes back to 'ready for processing' >>>>>> >>>>>> then on the 'retry succeeds' line change b=Batch_Size >>>>>> >>>>>> David Lang >>>>>> >>>>>>> Rainer >>>>>>> >>>>>>> On Thu, 2009-04-30 at 19:00 +0200, Rainer Gerhards wrote: >>>>>>>> I forgot to mention one other subtle, but important issue. >>>>>> I add it down >>>>>>>> below in the right context: >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>>>>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >>>>>>>>> Sent: Thursday, April 30, 2009 5:47 PM >>>>>>>>> To: rsyslog-users >>>>>>>>> Subject: Re: [rsyslog] output plugin calling interface >>>>>>>>> >>>>>>>>> David, >>>>>>>>> >>>>>>>>> one more note. I think I have basically solved the >>>>>> initial discussion topic >>>>>>>>> (with the *action* look and I am back at the overall >>>>>> picture). I will write >>>>>>>>> up on the action lock later, I think I can save some time >>>>>> when we go down >>>>>>>> the >>>>>>>>> the other topic first. >>>>>>>>> >>>>>>>>> Rest inline below... >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>>>>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>>>>>>>> Sent: Thursday, April 30, 2009 1:32 AM >>>>>>>>>> To: rsyslog-users >>>>>>>>>> Subject: Re: [rsyslog] output plugin calling interface >>>>>>>>>> >>>>>>>>>> On Wed, 29 Apr 2009, Rainer Gerhards wrote: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> the key to the locking is to try to both minimize the work >>>>>>>>>>>> done inside the >>>>>>>>>>>> lock, and minimize the time the lock is held. >>>>>>>>>>>> >>>>>>>>>>>> in general I think the locking should be something >>>>>> along the lines of >>>>>>>>>>>> >>>>>>>>>>>> doStartbatch() >>>>>>>>>>>> lock action mutex >>>>>>>>>>>> multiple doAction() calls >>>>>>>>>>>> unlock action mutex >>>>>>>>>>>> doEndbatch() >>>>>>>>>>> >>>>>>>>>>> I overlooked a very important point, and it now appears >>>>>> to me. I *must* >>>>>>>>> lock >>>>>>>>>>> the complete batch, including doEndBatch(). The reason >>>>>> is that a single >>>>>>>>>>> app-level transaction otherwise has no definite start >>>>>> and end point. So >>>>>>>>> we >>>>>>>>>>> would not know what is committed and what not (if >>>>>> another thread puts >>>>>>>>>>> messages into the transaction queue. That breaks the >>>>>> whole model. The >>>>>>>>> only >>>>>>>>>>> solution is to hold the lock during the whole >>>> tranaction. As you >>>>>>>> outline, >>>>>>>>>>> this should not be too long. Even if it took >>>>>> considerable time, there >>>>>>>>> would >>>>>>>>>>> be limited usefulness in interleaving other doAction >>>>>> calls, as this >>>>>>>> would >>>>>>>>>>> simply cause additional time. At this point, I think >>>>>> there is no real >>>>>>>>>> benefit >>>>>>>>>>> in running multiple threads concurrently. >>>>>>>>>> >>>>>>>>>> you really don't want to hold the lock through the >>>>>> doEndBatch() call, >>>>>>>> that >>>>>>>>>> can potentially take a _long_ time, and that is the time >>>>>> when you most >>>>>>>>>> want the ability to have other things accessing the >>>>>> queue (note that I >>>>>>>> may >>>>>>>>>> be misunderstanding the definition of the lock here) >>>>>>>>>> >>>>>>>>>> the output module will be tied up this entire time, but >>>>>> you need other >>>>>>>>>> things to be able to access the queue (definantly >>>>>> addding things to the >>>>>>>>>> queue, but there is a win in having the ability for >>>>>> another reader thread >>>>>>>>>> to be pushing things to a different copy of the output >>>>>> module at the same >>>>>>>>>> time) >>>>>>>>>> >>>>>>>>>> so this means that when you are goign through and doing >>>>>> the doAction() >>>>>>>>>> call, you are marking that you are working on that queue >>>>>> entry. then you >>>>>>>>>> release the lock and the next reader that comes along >>>>>> will skip over the >>>>>>>>>> entries that you have claimed and work to deliver the >>>>>> next N messages. >>>>>>>>>> >>>>>>>>>> then when you get the results of doEndBatch() you go >>>>>> back and mark some >>>>>>>> or >>>>>>>>>> all of those messages as completed (removing them from >>>>>> the queue). note >>>>>>>>>> that with multiple worker threads you have the potential >>>>>> to have items >>>>>>>>>> that aren't the oldest ones in the queue being completed >>>>>> before the >>>>>>>> oldest >>>>>>>>>> ones, batching may make bigger holes, but the potential >>>>>> for holes was >>>>>>>>>> there all along. >>>>>>>>>> >>>>>>>>>>>> and the output module should be written to defer as much >>>>>>>>>>>> processing as >>>>>>>>>>>> possible to the doEndbatch() call to make the >> doAction() call >>>>>>>>>>>> as fast as >>>>>>>>>>>> possible. >>>>>>>>>>> >>>>>>>>>>> Sounds reasonable. >>>>>>>>>> >>>>>>>>>> note that the reason for doAction deferring as much work >>>>>> as possible is >>>>>>>> to >>>>>>>>>> allow that work to be done outside of any locking >>>>>>>>>> >>>>>>>>>>>> since most errors will not be detected during doAction (in >>>>>>>>>>>> fact, the only >>>>>>>>>>>> errors I can think of that will happen at this point are >>>>>>>>>>>> rsyslog resource >>>>>>>>>>>> contratints), the error handling will need to be done after >>>>>>>>>>>> doEndbatc() >>>>>>>>>>>> returns >>>>>>>>>>>> >>>>>>>>>>>> at that point the output module may not know which of the >>>>>>>>>>>> messages caused >>>>>>>>>>>> the error (if the module sends the messages as a >> transaction >>>>>>>>>>>> it may just >>>>>>>>>>>> know that the transaction failed, and have to do >> retries with >>>>>>>>>>>> subsets to >>>>>>>>>>>> narrow down which message caused the failure) >>>>>>>>>>>> >>>>>>>>>>>> as long as at least one message is sucessful, >> things are not >>>>>>>>>>>> blocked and >>>>>>>>>>>> should continue. it's only when doEndBatch() reports >>>>>> that no messages >>>>>>>>>>>> could be delivered that you have a possible reason to drop >>>>>>>>>>>> the message >>>>>>>>>>>> (and even then, only the first message. all others >>>>>> must be retried) >>>>>>>>>>> >>>>>>>>>>> Well, I wouldn't conclude that it is the first >>>> message, but "one >>>>>>>> message" >>>>>>>>>>> inside the batch. So there may be some benefit in >>>>>> retrying the batch >>>>>>>> with >>>>>>>>>>> less records (as you suggested). Under the assumption >>>>>> that usually only >>>>>>>>> one >>>>>>>>>>> record casues the problem, I tend to think that it may >>>>>> be useful to run >>>>>>>>>>> commit the batch one-by-one in this case - this may be >>>>>> more efficient >>>>>>>>> than a >>>>>>>>>>> binary search for the failing record. >>>>>>>>>> >>>>>>>>>> note that it's not _quite_ a binary search (same basic >>>>>> concept though), >>>>>>>> as >>>>>>>>>> you submit a subset of them they either go through or >>>>>> you need to try a >>>>>>>>>> smaller batch >>>>>>>>> >>>>>>>>> Ack >>>>>>>>> >>>>>>>>>> >>>>>>>>>> with the individual submissions you are O(N) (on average >>>>>> you will have to >>>>>>>>>> commit 1/2 the batch individually before you hit the >>>> bad one, ~6 >>>>>>>>>> transactions for a batch size of 10, 51 for a batch >>>> size of 100) >>>>>>>>>> >>>>>>>>>> with the 'binary search' approach you are O(log(N)) ( ~4 >>>>>> transactions for >>>>>>>>>> a batch size of 10, ~7 for a batch size of 100) >>>>>>>>>> >>>>>>>>>> the worst case is probably where the last message is the >>>>>> one that has the >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> for the individual processing that is simple math (batch >>>>>> size of 100, you >>>>>>>>>> will fail the first one, then submit 99 sucessfully, >>>>>> then fail on the >>>>>>>> last >>>>>>>>>> one) >>>>>>>>>> >>>>>>>>>> for the binary search it's more complicated (this is >>>>>> assuming the batch >>>>>>>>>> size gets bumped up when it succeeds) >>>>>>>>>> >>>>>>>>>> assuming the batch completly fails (i.e. a database >>>>>> where the output >>>>>>>>>> module doesn't know which one caused it to fail) >>>>>>>>>> >>>>>>>>>> bad message is message 100 and there are >>100 messages >>>>>> in the queue >>>>>>>>>> fail 100 >>>>>>>>>> succeed 50 (bad message is now message 50) >>>>>>>>>> fail 100 >>>>>>>>>> fail 50 >>>>>>>>>> succeed 25 (bad message is now message 25) >>>>>>>>>> fail 100 >>>>>>>>>> fail 50 >>>>>>>>>> fail 25 >>>>>>>>>> suceed 12 (bad message is now message 13 >>>>>>>>>> fail 100 >>>>>>>>>> fail 50 >>>>>>>>>> fail 25 >>>>>>>>>> suceed 12 (bad message is now message 1 >>>>>>>>>> fail 100 >>>>>>>>>> fail 50 >>>>>>>>>> fail 25 >>>>>>>>>> fail 12 >>>>>>>>>> fail 6 >>>>>>>>>> fail 3 >>>>>>>>>> fail 1 >>>>>>>>>> retry 1 >>>>>>>>>> . >>>>>>>>>> . >>>>>>>>>> message 1 is bad (20 transactions + retries) >>>>>>>>>> >>>>>>>>>> best case would be >>>>>>>>>> >>>>>>>>>> bad message is message 1 and there are >>100 messages in >>>>>> the queue >>>>>>>>>> fail 100 >>>>>>>>>> fail 50 >>>>>>>>>> fail 25 >>>>>>>>>> fail 12 >>>>>>>>>> fail 6 >>>>>>>>>> fail 3 >>>>>>>>>> fail 1 >>>>>>>>>> retry 1 >>>>>>>>>> . >>>>>>>>>> . >>>>>>>>>> message 1 is bad (7 transactions + retries) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> if the output module is able to commit a partial >>>>>> transaction, then the >>>>>>>>>> logic devolves to >>>>>>>>>> >>>>>>>>>> bad message is message 100 and there are >>100 messages >>>>>> in the queue >>>>>>>>>> submit 100, succeed 99 bad message is message 1 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> All in all, it looks like the algorithm needs to get >>>> a bit more >>>>>>>>> complicated >>>>>>>>>>> ;) >>>>>>>>>> >>>>>>>>>> unfortunantly, but not very much more complicated. >>>>>>>>> >>>>>>>>> We are actually working here on a very important piece of >>>>>> code. This is >>>>>>>> also >>>>>>>>> why I am thinking so hard on it. This algorithm, in hopefully >>>>>>>>> as-simple-as-possible (but not simpler than that ;)) >>>>>> handles all the retry >>>>>>>>> logic, the output transaction, action resumption and >>>>>> recovery and is also >>>>>>>>> responsible that the upper-layer queue object is able to >>>>>> reliably persist >>>>>>>> not >>>>>>>>> fully processed batches of messages during a restart. >>>>>> This is why I (and >>>>>>>>> obviously you, too) put so much design effort into it. >>>>>>>>> >>>>>>>>> This is primarily a note for those list followers that >>>>>> wonder why we work >>>>>>>> so >>>>>>>>> hard on these few lines of code. >>>>>>>>>> >>>>>>>>>> the algorithm I posted last week cut the batch size in >>>>>> half for each loop >>>>>>>>>> and restored it when a commit succeded. you didn't want >>>>>> that to be in the >>>>>>>>>> core (and it doesn't have to be, but that means that the >>>>>> output module >>>>>>>> may >>>>>>>>>> need to do the retries) >>>>>>>>> >>>>>>>>> It's not that I didn't like it to be in core, I >>>>>> questioned (at least I hope >>>>>>>>> so) if it is useful at all. I just re-checked and this >>>>>> indeed is what I did >>>>>>>>> ;) >>>>>>>>> >>>>>>>>> I try to sum up the reason for this. So far, all output >>>>>> modules return one >>>>>>>> of >>>>>>>>> three states: >>>>>>>>> >>>>>>>>> RS_RET_OK, >>>>>>>>> which means everything went well >>>>>>>>> >>>>>>>>> RS_RET_SUSPENDED, >>>>>>>>> which means we had a temporary failure and the core >>>>>> should retry the action >>>>>>>>> some time later >>>>>>>>> >>>>>>>>> RS_RET_DISABLED, >>>>>>>>> which means that we had a permanent failure *of the >>>>>> action* and there is no >>>>>>>>> point in retrying the action any longer (this actually >>>>>> means the action has >>>>>>>>> died for the rest of the lifetime of this rsyslogd instance) >>>>>>>>> >>>>>>>>> Note that none of the standard modules returns an "I have >>>>>> a permanent >>>>>>>> failure >>>>>>>>> with only this message". >>>>>>>>> >>>>>>>>> Of course, it may be an oversight, and I tend to agree >>>>>> that we need to add >>>>>>>>> such a state (or better, a variety of states that provide >>>>>> back several >>>>>>>> causes >>>>>>>>> for message-permanent failures). But let's first consider >>>>>> why this state is >>>>>>>>> not yet already present. The simple truth is that we so >>>>>> far did not need >>>>>>>> it, >>>>>>>>> because such a failure is very seldom (and I initially thought >>>>>>>> non-existing). >>>>>>>>> >>>>>>>>> When can it happen? With the file output? No, why should >>>>>> one text string >>>>>>>> not >>>>>>>>> be able to be written, but the next one is. Syslog >>>>>> forwarding? No, same >>>>>>>>> reason. Why one message but not the next one? Same story >>>>>> for user messages >>>>>>>>> and so on. I think this covers all but the database outputs. >>>>>>>>> >>>>>>>>> Now let's look at the database outputs. It first looks >>>>>> like the same story, >>>>>>>>> but if you look more closely, it may be different (and >>>>>> this subtle issue >>>>>>>> did >>>>>>>>> not go into the design of the original database outputs). >>>>>> I can see that >>>>>>>> the >>>>>>>>> situation happens if a unique index is defined on one >>>>>> field. Then, if for >>>>>>>>> example we need to store a sequence of three records, and >>>>>> they have the >>>>>>>>> following key sequence: (a,a,b). So the second record >>>>>> will fail, as the >>>>>>>> first >>>>>>>>> record already had the same key, but the third record >>>>>> will work (thanks to >>>>>>>> a >>>>>>>>> different key). The original design was made under the >>>>>> assumption that such >>>>>>>> a >>>>>>>>> unique index placed on a field who's values are not >> unique is a >>>>>>>> configuration >>>>>>>>> error (and I still tend to believe it is). >>>>>>>>> >>>>>>>>> One may now argue, that while this may be the case, >>>>>> rsyslog should be able >>>>>>>> to >>>>>>>>> recover from such user error. I tend to agree to that >>>>>> argument - but I am >>>>>>>> not >>>>>>>>> sure if some other folks would argue that this is indeed >>>>>> an user error and >>>>>>>> if >>>>>>>>> we permit losing this record, we actually have a data >>>>>> loss bug. There is >>>>>>>> also >>>>>>>>> some truth in that argument. It may be better to see this >>>>>> case as a >>>>>>>> temporary >>>>>>>>> failure (after all, the real cure is to remove the unique >>>>>> index or fix an >>>>>>>>> invalid sql statement). But again, I tend to follow the >>>>>> argument that this >>>>>>>> is >>>>>>>>> a message-permanent processing failure and as such the >>>>>> message needs to be >>>>>>>>> discarded so that further messages can be handled. >>>>>>>>> >>>>>>>>> HOWEVER, this is the *only* situation in which I can >> think of a >>>>>>>>> message-permanent processing failure. No other case comes >>>>>> up my mind, and I >>>>>>>>> think this was also the conclusion of our Sep 2008 discussion. >>>>>>>>> >>>>>>>>> But if there is only one case, and it is very remote, >>>>>> does that actually >>>>>>>>> justify adding a (somewhat) complex algorithm if a >>>>>> simpler would also do? >>>>>>>> I'd >>>>>>>>> say that we can be fine with O(n) in case one of these >>>>>> very remote failures >>>>>>>>> happens. That means we could simply resort to doing every >>>>>> insert in its own >>>>>>>>> transaction (just like without batching). Granted, if >>>>>> endTransaction() >>>>>>>>> dominates the cost function, we have O(n) vs. O(1) in >>>>>> that case, but if >>>>>>>> that >>>>>>>>> case happens infrequently enough (what I assume), there >>>>>> really is no >>>>>>>>> difference between the two. And as you say, even the >>>>>> "binary search" >>>>>>>> approach >>>>>>>>> is not O(log(n)) but rather O(n/2) [yes, I know, but it >>>>>> makes sense >>>>>>>> here...]. >>>>>>>>> So the gain in using that algorithm is even less (in >>>>>> essence, we are at >>>>>>>> O(n), >>>>>>>>> whatever we use for recovery). >>>>>>>>> >>>>>>>>> I may be overlooking some cases for message-permanent >>>>>> failures and, if I >>>>>>>> do, >>>>>>>>> I would be very grateful if you (or someone else) could >>>>>> point me to them. >>>>>>>>> >>>>>>>>> If you look at your algorithm (I've put it up here for >>>>>> easy reference: >>>>>>>>> >>>>>> http://blog.gerhards.net/2009/04/batch-output-handling-algorit >>>>>> hm.html ) and >>>>>>>>> what I have crafted yesterday, you'll see that my writeup >>>>>> is strongly >>>>>>>>> influenced by your proposal. However, as I wrote in my >>>>>> initial response to >>>>>>>>> your message, the algorithm fails at some subtleties that >>>>>> you did not know >>>>>>>> at >>>>>>>>> the time of writing (most importantly the push-model of >>>>>> the queue object). >>>>>>>>> >>>>>>>>> There are also some fine details of the retry handling >>>>>> that needs to go >>>>>>>> into >>>>>>>>> the algorithm, as well as proper handling of the now >> four cases: >>>>>>>>> >>>>>>>>> * success >>>>>>>>> * temporary failure >>>>>>>>> * message-permanent failure >>>>>>>>> * action-permanent failure >>>>>>>>> >>>>>>>> >>>>>>>> What I forgot the mention is that we currently have a kind >>>>>> of "pseudo" >>>>>>>> message-permanent failure state. That is related to >>>>>> temporary failure. My >>>>>>>> description above was under the assumption that the action >>>>>> is configured for >>>>>>>> eternal retry (retryCount = -1). If a upper bound for the >>>>>> number of retries n >>>>>>>> is set, then and only then a message enters >>>>>> message-permanent failure state >>>>>>>> after n unsuccessful retries. If that happens, the message >>>>>> is discarded, the >>>>>>>> in-sequence retry counter is reset and the next message >>>>>> scheduled for >>>>>>>> processing (sounds like it would be a good idea to draw a >>>>>> state diagram...). >>>>>>>> The idea behind this is that it often is preferable to >>>>>> lose some messages if >>>>>>>> they cannot be processed rather than re-trying. But I have >>>>>> to admit that the >>>>>>>> functionality is rooted in rsyslog's past, where no >>>>>> capable queues existed. I >>>>>>>> still think it needs to be preserved - many use it and I >>>>>> see lots of use >>>>>>>> cases. >>>>>>>> >>>>>>>> But: If we have such an upper bound n, we need to think >>>>>> how to handle this >>>>>>>> situation for batches. If we have a batch size b > 1, the >>>>>> old interface >>>>>>>> retried n times before discarding the first message and 2n >>>>>> times before >>>>>>>> discarding the second. For the b-th record, we have bn >>>>>> retries before it is >>>>>>>> discarded (maybe one off each, not checked exactly). There >>>>>> is no point in >>>>>>>> modeling this with batches. So we could use between n..bn >>>>>> retries. The >>>>>>>> problem here is that b is not fixed - it is between one >>>>>> and the configured >>>>>>>> upper bound of the batch size. So we do not get (fully) >>>>>> deterministic >>>>>>>> behavior. Question is whether or not this is actually >>>>>> important, as the >>>>>>>> unavailability of the action's target (when looked from >>>>>> the rsyslog POV) is >>>>>>>> also not deterministic. So I tend to use a fixed number of >>>>>> retries for the >>>>>>>> whole batch. That could simply be n, because the user can >>>>>> configure it. Or we >>>>>>>> could derive a new n' from the configured one, e.g. by n' >>>>>> = n * b/10 (where b >>>>>>>> is the *actual* batch size, as above, not the upper >>>>>> bound!). That way, we the >>>>>>>> max number of retries would be related to the actual batch >>>>>> size and varies >>>>>>>> with it. Somehow, just a feeling, I'd *not* go for n' = nb >>>>>> (though this may >>>>>>>> just be an emotional position...). >>>>>>>> >>>>>>>> Comments appreciated. >>>>>>>> >>>>>>>> Rainer >>>>>>>> >>>>>>>>>> >>>>>>>>>> the question is which side the retry logic needs to be in. >>>>>>>>>> >>>>>>>>>> it can be in the queue walkder that calls the output module >>>>>>>>>> >>>>>>>>>> it can be in doEndBatch() in the output module >>>>>>>>>> >>>>>>>>>> some output modules don't need partial retries (as they >>>>>> can output >>>>>>>> partial >>>>>>>>>> batches) >>>>>>>>>> >>>>>>>>>> some output modules do need partial retries. >>>>>>>>>> >>>>>>>>>> the more complicated retry logic will work for both >>>>>> situations, or it can >>>>>>>>>> be implemented in each of how ever many output >> modules need it. >>>>>>>>>> >>>>>>>>>> it can go either way, I tend to lean towards only having >>>>>> the logic in one >>>>>>>>>> place (even if it's more complicated logic than some >>>>>> modules need) >>>>>>>>> >>>>>>>>> I fully agree here with you. That functionality belongs >>>>>> into the core, we >>>>>>>>> just need to craft it well. I think the probably most >>>>>> important question at >>>>>>>>> this time is if I overlook some possible >>>>>> message-permanent failure cases. >>>>>>>> So >>>>>>>>> I would greatly appreciate feedback especially on that >>>>>> part of my reply. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Rainer >>>>>>>>> >>>>>>>>>> >>>>>>>>>> David Lang >>>>>>>>>> _______________________________________________ >>>>>>>>>> rsyslog mailing list >>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com >>>>>>>>> _______________________________________________ >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com >>>>>>>> _______________________________________________ >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com >>>>>>> >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com >>>>>>> >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com >>>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com >>>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Fri May 1 22:59:25 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 1 May 2009 22:59:25 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com > [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, May 01, 2009 10:14 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Fri, 1 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com > >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of > david at lang.hm > >>>> 1. something wrong with a message. > >>>> retries won't help (with batches we need to narrow down > >>>> to which message) > >>> > >>> But what could be wrong? For which outputs? > >>> > >>> Can you envison anything other than db outputs only, > >> duplicate identity in > >>> column errornously declared with a unique index (which I > >> still consider an > >>> admin error)? I really fail to find any other sample. > >> > >> things like quoting errors (which I know you defend against) > >> > >> unicode text causing grief > >> > >> dynafile hits a read-only file > >> > >> basicly data-driven things that trigger bugs in the > message delivery > >> mechanism in some form. > >> > >> this is easiest to understand and provide examples with > >> databases, but you > >> could have problems with other methods. > >> > >> in an ideal world these would never happen, but for most > >> output types I > >> can think of some form of corrupt input that could cause that > >> message to > >> fail. > > > > OK, thanks a lot. It doesn't matter if I can defend a case > or two, but you > > provided good examples. I think I now can be relieved that > we have sufficient > > probability to seriously look at these cases. > > > > Resting sometimes helps, and so I also think I made a big > step forward (while > > not actively working on it ;)) today. I think what makes > the problem so hard > > to solve is the language we use. I thought about a purely > mathematical model, > > and I have one on my mind. The state diagram was a first > step, but it went > > not far enough. So I think I will try to write up that > model and then we can > > discuss based on it and finally derive the actual code from > it. That's an > > extra step, but I think it will be a useful one. > > > > As a side-note, I have also identified that we have > overlooked a subtle issue > > so far: backup actions - they need to work on the subset of > the batch that > > had message permanent failures. So the message state > actually needs to be > > part of the message inside the batch. But now, I think, > things really begin > > to come together and are far less complex than initially thought. > > > > One problem with the state chart - that was why I said it > is not 100% correct > > - is that it does not properly abstract batches vs. single > messages. Both of > > them entangled in a way that I thought [;)] to be very > complex. But if you > > model that with processing states, then the batch > processing state is simply > > a function of the individual message processing states. > > > > Please let me know if you also find a math model useful > (but I'll probably > > need to do it in any case, because it helps me clean up my mind...). > > I think it will help clarify things a lot. with a good model > we won't have > misunderstandings about what we are talking about. Yes - and I also think that with the model some complexities disappear. I think (hope I am right) the solution will become obvious. I know I am investing a lot of time in a tiny portion of the code, but this is one of the core elements involving many complexities. > with my 'binary search' approach, handling permanently bad > messages could > be as simple as 'too many retries once we hit a batch size of > 1' (with a > possible option of the output module reporting back that it dectected > something that makes retries useless, but this is just an > optimization) Yes, indeed. One quick thought: I see a batch as a set of (msg, state) ordered pairs. Once we have procssed it in one action (all of them have entered one permanent state), we can than build a subset that we use as the new (remaining) batch in the backup actions. So the "bad record search" is "just" one facet of many that we need to handle with little and hopefully simple code (doing it with 2000 LoC would be rather easy ;)). Rainer > > David Lang > From david at lang.hm Fri May 1 23:59:58 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 1 May 2009 14:59:58 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com> References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 1 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Fri, 1 May 2009, Rainer Gerhards wrote: >> >>> Please let me know if you also find a math model useful >> (but I'll probably >>> need to do it in any case, because it helps me clean up my mind...). >> >> I think it will help clarify things a lot. with a good model >> we won't have >> misunderstandings about what we are talking about. > > Yes - and I also think that with the model some complexities disappear. I > think (hope I am right) the solution will become obvious. I know I am > investing a lot of time in a tiny portion of the code, but this is one of the > core elements involving many complexities. > >> with my 'binary search' approach, handling permanently bad >> messages could >> be as simple as 'too many retries once we hit a batch size of >> 1' (with a >> possible option of the output module reporting back that it dectected >> something that makes retries useless, but this is just an >> optimization) > > Yes, indeed. One quick thought: I see a batch as a set of (msg, state) > ordered pairs. Once we have procssed it in one action (all of them have > entered one permanent state), we can than build a subset that we use as the > new (remaining) batch in the backup actions. So the "bad record search" is > "just" one facet of many that we need to handle with little and hopefully > simple code (doing it with 2000 LoC would be rather easy ;)). I agree with the definition of a batch. Let's see what different states you are thinking of. I am currently assuming that the messages stay in the queue (with the state attached) so that if rsyslog restarts (assuming disk queues), it will realize that the message hasn't been delivered and try again. David Lang From rgerhards at hq.adiscon.com Sat May 2 10:03:32 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sat, 2 May 2009 10:03:32 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B00D@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Saturday, May 02, 2009 12:00 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Fri, 1 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com > >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> On Fri, 1 May 2009, Rainer Gerhards wrote: > >> > >>> Please let me know if you also find a math model useful > >> (but I'll probably > >>> need to do it in any case, because it helps me clean up my mind...). > >> > >> I think it will help clarify things a lot. with a good model > >> we won't have > >> misunderstandings about what we are talking about. > > > > Yes - and I also think that with the model some complexities disappear. I > > think (hope I am right) the solution will become obvious. I know I am > > investing a lot of time in a tiny portion of the code, but this is one of > the > > core elements involving many complexities. > > > >> with my 'binary search' approach, handling permanently bad > >> messages could > >> be as simple as 'too many retries once we hit a batch size of > >> 1' (with a > >> possible option of the output module reporting back that it dectected > >> something that makes retries useless, but this is just an > >> optimization) > > > > Yes, indeed. One quick thought: I see a batch as a set of (msg, state) > > ordered pairs. Once we have procssed it in one action (all of them have > > entered one permanent state), we can than build a subset that we use as the > > new (remaining) batch in the backup actions. So the "bad record search" is > > "just" one facet of many that we need to handle with little and hopefully > > simple code (doing it with 2000 LoC would be rather easy ;)). > > I agree with the definition of a batch. Let's see what different states > you are thinking of. > > I am currently assuming that the messages stay in the queue (with the > state attached) so that if rsyslog restarts (assuming disk queues), it > will realize that the message hasn't been delivered and try again. No, it is different: the batch is actually dequeued. So if at that point we have a system power failure (for whatever reason), the messages are lost. While the rsyslog engine intends to be very reliable, it is not a complete transactional system. A slight risk remains. For this, you need to understand what happens when the batch is processed. I assume that we have no sudden, untrappable process termination. Then, if a batch cannot be processed, it is returned back to the top of queue. This is not yet implemented, but is how single messages (which you can think of an abstraction of a batch in the current code) are handled. If, for example, the engine shuts down, but an action takes longer than the configured shutdown timeout, the action is cancelled and the queue engine reclaims the unprocessed messages. They go into a special area inside the .qi file and are placed on top of the queue once the engine restarts. The only case where this not work is sudden process termination. I see two cases: a) a fatal software bug We cannot really address this. Even if the messages were remaining in the queue until finally processed, a software bug (maybe an invalid pointer) may affect the queue structures at large, possibly even at the risk of total loss of all data inside that queue. So this is an inevitable risk. b) sudden power fail ... which can and should be mitigated at another level One may argue that there also is c) admin error e.g, kill -9 rsyslogd Here a fully transactional queue will probably help. However, I do not think that the risk involved justifies a far more complex fully transactional implementation of the queue object. Some risk always remains (what in the disaster case, even with a fully transactional queue?). And it is so complex to let the messages stay in queue because it is complex to work with such messages and disk queues. It would also cost a lot of performance, especially when done reliably (need to sync). We would then need to touch each element at least four times, twice as much as currently. Also, the hybrid disk/memory queues become very, very complex. There are more complexities around this, I just wanted to tell the most obvious. So, all in all, the idea is that messages are dequeued, processed and put back to the queue (think: ungetc()) when something goes wrong. Reasonable (but not more) effort is made to prevent message loss while the messages are in unprocessed state outside of the queue. Hope that clarifies and I am glad you brought this up. Made me think again, but I concluded to what I've written above ;) Rainer From david at lang.hm Sat May 2 10:20:56 2009 From: david at lang.hm (david at lang.hm) Date: Sat, 2 May 2009 01:20:56 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B00D@GRFEXC.intern.adiscon.com> References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B00D@GRFEXC.intern.adiscon.com> Message-ID: On Sat, 2 May 2009, Rainer Gerhards wrote: >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Fri, 1 May 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com >>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> >>>> On Fri, 1 May 2009, Rainer Gerhards wrote: >>>> >>>>> Please let me know if you also find a math model useful >>>> (but I'll probably >>>>> need to do it in any case, because it helps me clean up my mind...). >>>> >>>> I think it will help clarify things a lot. with a good model >>>> we won't have >>>> misunderstandings about what we are talking about. >>> >>> Yes - and I also think that with the model some complexities disappear. I >>> think (hope I am right) the solution will become obvious. I know I am >>> investing a lot of time in a tiny portion of the code, but this is one of >> the >>> core elements involving many complexities. >>> >>>> with my 'binary search' approach, handling permanently bad >>>> messages could >>>> be as simple as 'too many retries once we hit a batch size of >>>> 1' (with a >>>> possible option of the output module reporting back that it dectected >>>> something that makes retries useless, but this is just an >>>> optimization) >>> >>> Yes, indeed. One quick thought: I see a batch as a set of (msg, state) >>> ordered pairs. Once we have procssed it in one action (all of them have >>> entered one permanent state), we can than build a subset that we use as > the >>> new (remaining) batch in the backup actions. So the "bad record search" > is >>> "just" one facet of many that we need to handle with little and hopefully >>> simple code (doing it with 2000 LoC would be rather easy ;)). >> >> I agree with the definition of a batch. Let's see what different states >> you are thinking of. >> >> I am currently assuming that the messages stay in the queue (with the >> state attached) so that if rsyslog restarts (assuming disk queues), it >> will realize that the message hasn't been delivered and try again. > > No, it is different: the batch is actually dequeued. So if at that point we > have a system power failure (for whatever reason), the messages are lost. > While the rsyslog engine intends to be very reliable, it is not a complete > transactional system. A slight risk remains. For this, you need to understand > what happens when the batch is processed. I assume that we have no sudden, > untrappable process termination. Then, if a batch cannot be processed, it is > returned back to the top of queue. This is not yet implemented, but is how > single messages (which you can think of an abstraction of a batch in the > current code) are handled. If, for example, the engine shuts down, but an > action takes longer than the configured shutdown timeout, the action is > cancelled and the queue engine reclaims the unprocessed messages. They go > into a special area inside the .qi file and are placed on top of the queue > once the engine restarts. > > The only case where this not work is sudden process termination. I see two > cases: > > a) a fatal software bug > We cannot really address this. Even if the messages were remaining in the > queue until finally processed, a software bug (maybe an invalid pointer) may > affect the queue structures at large, possibly even at the risk of total loss > of all data inside that queue. So this is an inevitable risk. > > b) sudden power fail > ... which can and should be mitigated at another level > > One may argue that there also is > > c) admin error > e.g, kill -9 rsyslogd > Here a fully transactional queue will probably help. > > However, I do not think that the risk involved justifies a far more complex > fully transactional implementation of the queue object. Some risk always > remains (what in the disaster case, even with a fully transactional queue?). > > And it is so complex to let the messages stay in queue because it is complex > to work with such messages and disk queues. It would also cost a lot of > performance, especially when done reliably (need to sync). We would then need > to touch each element at least four times, twice as much as currently. Also, > the hybrid disk/memory queues become very, very complex. There are more > complexities around this, I just wanted to tell the most obvious. > > So, all in all, the idea is that messages are dequeued, processed and put > back to the queue (think: ungetc()) when something goes wrong. Reasonable > (but not more) effort is made to prevent message loss while the messages are > in unprocessed state outside of the queue. > > Hope that clarifies and I am glad you brought this up. Made me think again, > but I concluded to what I've written above ;) this is definantly different from the way I thought things worked from our prior discussions about reliability. from those I understood that rsyslog could be used to make a fully reliable system, if you are willing to take the performance hit to do so. as batch size increases (to gain efficiancy) the number of log messages that can be lost also increase. unfortunantly I have the belief that power outages cannot be avoided (I've seen cases where millions have been spent on the power systems and still ended up with a datacenter-wide blackout. when you get the model of things togeather we will be in a much better position to discuss this. it's 1:20am here and I'm ready to collapse. David Lang From rgerhards at hq.adiscon.com Sat May 2 10:33:12 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sat, 2 May 2009 10:33:12 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B010@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Saturday, May 02, 2009 10:21 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Sat, 2 May 2009, Rainer Gerhards wrote: > > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> On Fri, 1 May 2009, Rainer Gerhards wrote: > >> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com > >>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>> > >>>> On Fri, 1 May 2009, Rainer Gerhards wrote: > >>>> > >>>>> Please let me know if you also find a math model useful > >>>> (but I'll probably > >>>>> need to do it in any case, because it helps me clean up my mind...). > >>>> > >>>> I think it will help clarify things a lot. with a good model > >>>> we won't have > >>>> misunderstandings about what we are talking about. > >>> > >>> Yes - and I also think that with the model some complexities disappear. I > >>> think (hope I am right) the solution will become obvious. I know I am > >>> investing a lot of time in a tiny portion of the code, but this is one of > >> the > >>> core elements involving many complexities. > >>> > >>>> with my 'binary search' approach, handling permanently bad > >>>> messages could > >>>> be as simple as 'too many retries once we hit a batch size of > >>>> 1' (with a > >>>> possible option of the output module reporting back that it dectected > >>>> something that makes retries useless, but this is just an > >>>> optimization) > >>> > >>> Yes, indeed. One quick thought: I see a batch as a set of (msg, state) > >>> ordered pairs. Once we have procssed it in one action (all of them have > >>> entered one permanent state), we can than build a subset that we use as > > the > >>> new (remaining) batch in the backup actions. So the "bad record search" > > is > >>> "just" one facet of many that we need to handle with little and hopefully > >>> simple code (doing it with 2000 LoC would be rather easy ;)). > >> > >> I agree with the definition of a batch. Let's see what different states > >> you are thinking of. > >> > >> I am currently assuming that the messages stay in the queue (with the > >> state attached) so that if rsyslog restarts (assuming disk queues), it > >> will realize that the message hasn't been delivered and try again. > > > > No, it is different: the batch is actually dequeued. So if at that point we > > have a system power failure (for whatever reason), the messages are lost. > > While the rsyslog engine intends to be very reliable, it is not a complete > > transactional system. A slight risk remains. For this, you need to > understand > > what happens when the batch is processed. I assume that we have no sudden, > > untrappable process termination. Then, if a batch cannot be processed, it is > > returned back to the top of queue. This is not yet implemented, but is how > > single messages (which you can think of an abstraction of a batch in the > > current code) are handled. If, for example, the engine shuts down, but an > > action takes longer than the configured shutdown timeout, the action is > > cancelled and the queue engine reclaims the unprocessed messages. They go > > into a special area inside the .qi file and are placed on top of the queue > > once the engine restarts. > > > > The only case where this not work is sudden process termination. I see two > > cases: > > > > a) a fatal software bug > > We cannot really address this. Even if the messages were remaining in the > > queue until finally processed, a software bug (maybe an invalid pointer) may > > affect the queue structures at large, possibly even at the risk of total > loss > > of all data inside that queue. So this is an inevitable risk. > > > > b) sudden power fail > > ... which can and should be mitigated at another level > > > > One may argue that there also is > > > > c) admin error > > e.g, kill -9 rsyslogd > > Here a fully transactional queue will probably help. > > > > However, I do not think that the risk involved justifies a far more complex > > fully transactional implementation of the queue object. Some risk always > > remains (what in the disaster case, even with a fully transactional queue?). > > > > And it is so complex to let the messages stay in queue because it is complex > > to work with such messages and disk queues. It would also cost a lot of > > performance, especially when done reliably (need to sync). We would then > need > > to touch each element at least four times, twice as much as currently. Also, > > the hybrid disk/memory queues become very, very complex. There are more > > complexities around this, I just wanted to tell the most obvious. > > > > So, all in all, the idea is that messages are dequeued, processed and put > > back to the queue (think: ungetc()) when something goes wrong. Reasonable > > (but not more) effort is made to prevent message loss while the messages are > > in unprocessed state outside of the queue. > > > > Hope that clarifies and I am glad you brought this up. Made me think again, > > but I concluded to what I've written above ;) > > this is definantly different from the way I thought things worked from our > prior discussions about reliability. from those I understood that rsyslog > could be used to make a fully reliable system, if you are willing to take > the performance hit to do so. You can, but than you need to use batch sizes of 1. > as batch size increases (to gain efficiancy) the number of log messages > that can be lost also increase. > > unfortunantly I have the belief that power outages cannot be avoided (I've > seen cases where millions have been spent on the power systems and still > ended up with a datacenter-wide blackout. Let me think about this, but I think to protect against this problem, you really need to have two-phase commit, which I am not sure belongs into a syslogd. > when you get the model of things togeather we will be in a much better > position to discuss this. Well, we'd probably restart discussing reliability requirements. If it turns out that you need 100% reliability, not matter what happens at all, I am not sure if we can implement this without adding considerable database-ish processing. "Under all circumstances" reliability is very hard to achive, especially if you also would like to have high performance. Think about it: to guard against the data center full power loss scenario, you need to have a disk-only queue, being synced to disk for every single en- and dequeue operation. This is extremely costly. Does it than really matter if we have large batches or not? The system, I think, will be so slow, that you cannot use it for any demanding real-life application, so some compromise between speed and reliability, I think, must be made in any case. > it's 1:20am here and I'm ready to collapse. I hadn't even expected this response at this time ;) Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From YungWei.Chen at resolvity.com Sat May 2 15:13:24 2009 From: YungWei.Chen at resolvity.com (YungWei.Chen) Date: Sat, 2 May 2009 09:13:24 -0400 Subject: [rsyslog] Building rsyslog from source code In-Reply-To: <000001c9ca34$670b84df$100013ac@intern.adiscon.com> References: <000001c9ca34$670b84df$100013ac@intern.adiscon.com> Message-ID: <795E60BBD9A86846BFEDF32029788B1BA74A6F@MI8NYCMAIL14.Mi8.com> Suppose I have successfully built rsyslog from source code on one machine, how do I install it on other CentOS machines? Do I have to repeat the build process on each machine? Thanks. From rgerhards at hq.adiscon.com Sat May 2 20:28:48 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sat, 2 May 2009 20:28:48 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B010@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> After a lot of thinking today, we can have a "kind of" transactional queue, but we need to accept potential message *duplication* in the event of failures (but no loss). This would work without a two-phase commit. However, there still is considerable effort to implement it. I wonder if the use case actually justifies it. Please also consider what I wrote below on the performance of any ultra-reliable version. And, yes, I know we have fast and reliable controllers today, but even then the disk path is much, much slower than any memory based queue. I fail to believe you can build a very high-performance syslog server on a disk queue, even with the best hardware money can buy today. Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Saturday, May 02, 2009 10:33 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > Sent: Saturday, May 02, 2009 10:21 AM > > To: rsyslog-users > > Subject: Re: [rsyslog] output plugin calling interface > > > > On Sat, 2 May 2009, Rainer Gerhards wrote: > > > > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > >> > > >> On Fri, 1 May 2009, Rainer Gerhards wrote: > > >> > > >>>> -----Original Message----- > > >>>> From: rsyslog-bounces at lists.adiscon.com > > >>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > >>>> > > >>>> On Fri, 1 May 2009, Rainer Gerhards wrote: > > >>>> > > >>>>> Please let me know if you also find a math model useful > > >>>> (but I'll probably > > >>>>> need to do it in any case, because it helps me clean up my mind...). > > >>>> > > >>>> I think it will help clarify things a lot. with a good model > > >>>> we won't have > > >>>> misunderstandings about what we are talking about. > > >>> > > >>> Yes - and I also think that with the model some complexities disappear. > I > > >>> think (hope I am right) the solution will become obvious. I know I am > > >>> investing a lot of time in a tiny portion of the code, but this is one > of > > >> the > > >>> core elements involving many complexities. > > >>> > > >>>> with my 'binary search' approach, handling permanently bad > > >>>> messages could > > >>>> be as simple as 'too many retries once we hit a batch size of > > >>>> 1' (with a > > >>>> possible option of the output module reporting back that it dectected > > >>>> something that makes retries useless, but this is just an > > >>>> optimization) > > >>> > > >>> Yes, indeed. One quick thought: I see a batch as a set of (msg, state) > > >>> ordered pairs. Once we have procssed it in one action (all of them have > > >>> entered one permanent state), we can than build a subset that we use as > > > the > > >>> new (remaining) batch in the backup actions. So the "bad record search" > > > is > > >>> "just" one facet of many that we need to handle with little and > hopefully > > >>> simple code (doing it with 2000 LoC would be rather easy ;)). > > >> > > >> I agree with the definition of a batch. Let's see what different states > > >> you are thinking of. > > >> > > >> I am currently assuming that the messages stay in the queue (with the > > >> state attached) so that if rsyslog restarts (assuming disk queues), it > > >> will realize that the message hasn't been delivered and try again. > > > > > > No, it is different: the batch is actually dequeued. So if at that point > we > > > have a system power failure (for whatever reason), the messages are lost. > > > While the rsyslog engine intends to be very reliable, it is not a > complete > > > transactional system. A slight risk remains. For this, you need to > > understand > > > what happens when the batch is processed. I assume that we have no > sudden, > > > untrappable process termination. Then, if a batch cannot be processed, it > is > > > returned back to the top of queue. This is not yet implemented, but is > how > > > single messages (which you can think of an abstraction of a batch in the > > > current code) are handled. If, for example, the engine shuts down, but an > > > action takes longer than the configured shutdown timeout, the action is > > > cancelled and the queue engine reclaims the unprocessed messages. They go > > > into a special area inside the .qi file and are placed on top of the > queue > > > once the engine restarts. > > > > > > The only case where this not work is sudden process termination. I see > two > > > cases: > > > > > > a) a fatal software bug > > > We cannot really address this. Even if the messages were remaining in the > > > queue until finally processed, a software bug (maybe an invalid pointer) > may > > > affect the queue structures at large, possibly even at the risk of total > > loss > > > of all data inside that queue. So this is an inevitable risk. > > > > > > b) sudden power fail > > > ... which can and should be mitigated at another level > > > > > > One may argue that there also is > > > > > > c) admin error > > > e.g, kill -9 rsyslogd > > > Here a fully transactional queue will probably help. > > > > > > However, I do not think that the risk involved justifies a far more > complex > > > fully transactional implementation of the queue object. Some risk always > > > remains (what in the disaster case, even with a fully transactional > queue?). > > > > > > And it is so complex to let the messages stay in queue because it is > complex > > > to work with such messages and disk queues. It would also cost a lot of > > > performance, especially when done reliably (need to sync). We would then > > need > > > to touch each element at least four times, twice as much as currently. > Also, > > > the hybrid disk/memory queues become very, very complex. There are more > > > complexities around this, I just wanted to tell the most obvious. > > > > > > So, all in all, the idea is that messages are dequeued, processed and put > > > back to the queue (think: ungetc()) when something goes wrong. Reasonable > > > (but not more) effort is made to prevent message loss while the messages > are > > > in unprocessed state outside of the queue. > > > > > > Hope that clarifies and I am glad you brought this up. Made me think > again, > > > but I concluded to what I've written above ;) > > > > this is definantly different from the way I thought things worked from our > > prior discussions about reliability. from those I understood that rsyslog > > could be used to make a fully reliable system, if you are willing to take > > the performance hit to do so. > > You can, but than you need to use batch sizes of 1. > > > as batch size increases (to gain efficiancy) the number of log messages > > that can be lost also increase. > > > > unfortunantly I have the belief that power outages cannot be avoided (I've > > seen cases where millions have been spent on the power systems and still > > ended up with a datacenter-wide blackout. > > Let me think about this, but I think to protect against this problem, you > really need to have two-phase commit, which I am not sure belongs into a > syslogd. > > > when you get the model of things togeather we will be in a much better > > position to discuss this. > > Well, we'd probably restart discussing reliability requirements. If it turns > out that you need 100% reliability, not matter what happens at all, I am not > sure if we can implement this without adding considerable database-ish > processing. "Under all circumstances" reliability is very hard to achive, > especially if you also would like to have high performance. Think about it: > to guard against the data center full power loss scenario, you need to have a > disk-only queue, being synced to disk for every single en- and dequeue > operation. This is extremely costly. Does it than really matter if we have > large batches or not? The system, I think, will be so slow, that you cannot > use it for any demanding real-life application, so some compromise between > speed and reliability, I think, must be made in any case. > > > it's 1:20am here and I'm ready to collapse. > > I hadn't even expected this response at this time ;) > > Rainer > > > > David Lang > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Sat May 2 22:22:03 2009 From: david at lang.hm (david at lang.hm) Date: Sat, 2 May 2009 13:22:03 -0700 (PDT) Subject: [rsyslog] Building rsyslog from source code In-Reply-To: <795E60BBD9A86846BFEDF32029788B1BA74A6F@MI8NYCMAIL14.Mi8.com> References: <000001c9ca34$670b84df$100013ac@intern.adiscon.com> <795E60BBD9A86846BFEDF32029788B1BA74A6F@MI8NYCMAIL14.Mi8.com> Message-ID: On Sat, 2 May 2009, YungWei.Chen wrote: > Suppose I have successfully built rsyslog from source code on one > machine, how do I install it on other CentOS machines? Do I have to > repeat the build process on each machine? Thanks. I use checkinstall to create packages (in my case .deb, but it also does .rpm) and then install that package on the other machines. I have run into a problem that this process doesn't always replace the modules, so you are better off doing a uninstall of the old package and then a install of the new package than just trying an upgrade. David Lang From david at lang.hm Sun May 3 02:42:18 2009 From: david at lang.hm (david at lang.hm) Date: Sat, 2 May 2009 17:42:18 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> Message-ID: On Sat, 2 May 2009, Rainer Gerhards wrote: > After a lot of thinking today, we can have a "kind of" transactional queue, > but we need to accept potential message *duplication* in the event of > failures (but no loss). this is the approach that you have taken for other things (relp for example), and when we were discussing reliability for direct mode vs disk queues you mentioned that rsyslog could duplicate messages in case of failures, but would not loose messages. > This would work without a two-phase commit. However, > there still is considerable effort to implement it. as I understand things the current process is thread A recieves the message and puts it in the Queue worker thread B pulls the message from the queue formats it and puts it in the action queue (if there is no action queue, this triggers the output modulecode as part of thread B.) if there is an action queue, thread C is running, and does basicly the same thing that thread B would do if there was no action queue what I am envisoning is that the worker thread would touch the queue one additional time. instead of removing the message from the queue to perform the action it would mark the message as being 'in process', then after the message is delivered it would delte it from the queue (touching the queue three times instead of two) > I wonder if the use case > actually justifies it. Please also consider what I wrote below on the > performance of any ultra-reliable version. And, yes, I know we have fast and > reliable controllers today, but even then the disk path is much, much slower > than any memory based queue. I fail to believe you can build a very > high-performance syslog server on a disk queue, even with the best hardware > money can buy today. I'm going to be testing this shortly ;-) I have a fusion IO drive to try and will be getting some boxes with the Intel X-25E SSD drives in a couple of weeks. the only thing I can't try is the ram-based drive. David Lang > Rainer > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >> Sent: Saturday, May 02, 2009 10:33 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] output plugin calling interface >> >>> -----Original Message----- >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>> Sent: Saturday, May 02, 2009 10:21 AM >>> To: rsyslog-users >>> Subject: Re: [rsyslog] output plugin calling interface >>> >>> On Sat, 2 May 2009, Rainer Gerhards wrote: >>> >>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>>> >>>>> On Fri, 1 May 2009, Rainer Gerhards wrote: >>>>> >>>>>>> -----Original Message----- >>>>>>> From: rsyslog-bounces at lists.adiscon.com >>>>>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of > david at lang.hm >>>>>>> >>>>>>> On Fri, 1 May 2009, Rainer Gerhards wrote: >>>>>>> >>>>>>>> Please let me know if you also find a math model useful >>>>>>> (but I'll probably >>>>>>>> need to do it in any case, because it helps me clean up my > mind...). >>>>>>> >>>>>>> I think it will help clarify things a lot. with a good model >>>>>>> we won't have >>>>>>> misunderstandings about what we are talking about. >>>>>> >>>>>> Yes - and I also think that with the model some complexities > disappear. >> I >>>>>> think (hope I am right) the solution will become obvious. I know I am >>>>>> investing a lot of time in a tiny portion of the code, but this is > one >> of >>>>> the >>>>>> core elements involving many complexities. >>>>>> >>>>>>> with my 'binary search' approach, handling permanently bad >>>>>>> messages could >>>>>>> be as simple as 'too many retries once we hit a batch size of >>>>>>> 1' (with a >>>>>>> possible option of the output module reporting back that it > dectected >>>>>>> something that makes retries useless, but this is just an >>>>>>> optimization) >>>>>> >>>>>> Yes, indeed. One quick thought: I see a batch as a set of (msg, > state) >>>>>> ordered pairs. Once we have procssed it in one action (all of them > have >>>>>> entered one permanent state), we can than build a subset that we use > as >>>> the >>>>>> new (remaining) batch in the backup actions. So the "bad record > search" >>>> is >>>>>> "just" one facet of many that we need to handle with little and >> hopefully >>>>>> simple code (doing it with 2000 LoC would be rather easy ;)). >>>>> >>>>> I agree with the definition of a batch. Let's see what different > states >>>>> you are thinking of. >>>>> >>>>> I am currently assuming that the messages stay in the queue (with the >>>>> state attached) so that if rsyslog restarts (assuming disk queues), it >>>>> will realize that the message hasn't been delivered and try again. >>>> >>>> No, it is different: the batch is actually dequeued. So if at that > point >> we >>>> have a system power failure (for whatever reason), the messages are > lost. >>>> While the rsyslog engine intends to be very reliable, it is not a >> complete >>>> transactional system. A slight risk remains. For this, you need to >>> understand >>>> what happens when the batch is processed. I assume that we have no >> sudden, >>>> untrappable process termination. Then, if a batch cannot be processed, > it >> is >>>> returned back to the top of queue. This is not yet implemented, but is >> how >>>> single messages (which you can think of an abstraction of a batch in > the >>>> current code) are handled. If, for example, the engine shuts down, but > an >>>> action takes longer than the configured shutdown timeout, the action is >>>> cancelled and the queue engine reclaims the unprocessed messages. They > go >>>> into a special area inside the .qi file and are placed on top of the >> queue >>>> once the engine restarts. >>>> >>>> The only case where this not work is sudden process termination. I see >> two >>>> cases: >>>> >>>> a) a fatal software bug >>>> We cannot really address this. Even if the messages were remaining in > the >>>> queue until finally processed, a software bug (maybe an invalid > pointer) >> may >>>> affect the queue structures at large, possibly even at the risk of > total >>> loss >>>> of all data inside that queue. So this is an inevitable risk. >>>> >>>> b) sudden power fail >>>> ... which can and should be mitigated at another level >>>> >>>> One may argue that there also is >>>> >>>> c) admin error >>>> e.g, kill -9 rsyslogd >>>> Here a fully transactional queue will probably help. >>>> >>>> However, I do not think that the risk involved justifies a far more >> complex >>>> fully transactional implementation of the queue object. Some risk > always >>>> remains (what in the disaster case, even with a fully transactional >> queue?). >>>> >>>> And it is so complex to let the messages stay in queue because it is >> complex >>>> to work with such messages and disk queues. It would also cost a lot of >>>> performance, especially when done reliably (need to sync). We would > then >>> need >>>> to touch each element at least four times, twice as much as currently. >> Also, >>>> the hybrid disk/memory queues become very, very complex. There are more >>>> complexities around this, I just wanted to tell the most obvious. >>>> >>>> So, all in all, the idea is that messages are dequeued, processed and > put >>>> back to the queue (think: ungetc()) when something goes wrong. > Reasonable >>>> (but not more) effort is made to prevent message loss while the > messages >> are >>>> in unprocessed state outside of the queue. >>>> >>>> Hope that clarifies and I am glad you brought this up. Made me think >> again, >>>> but I concluded to what I've written above ;) >>> >>> this is definantly different from the way I thought things worked from > our >>> prior discussions about reliability. from those I understood that rsyslog >>> could be used to make a fully reliable system, if you are willing to take >>> the performance hit to do so. >> >> You can, but than you need to use batch sizes of 1. >> >>> as batch size increases (to gain efficiancy) the number of log messages >>> that can be lost also increase. >>> >>> unfortunantly I have the belief that power outages cannot be avoided > (I've >>> seen cases where millions have been spent on the power systems and still >>> ended up with a datacenter-wide blackout. >> >> Let me think about this, but I think to protect against this problem, you >> really need to have two-phase commit, which I am not sure belongs into a >> syslogd. >> >>> when you get the model of things togeather we will be in a much better >>> position to discuss this. >> >> Well, we'd probably restart discussing reliability requirements. If it > turns >> out that you need 100% reliability, not matter what happens at all, I am > not >> sure if we can implement this without adding considerable database-ish >> processing. "Under all circumstances" reliability is very hard to achive, >> especially if you also would like to have high performance. Think about it: >> to guard against the data center full power loss scenario, you need to have > a >> disk-only queue, being synced to disk for every single en- and dequeue >> operation. This is extremely costly. Does it than really matter if we have >> large batches or not? The system, I think, will be so slow, that you cannot >> use it for any demanding real-life application, so some compromise between >> speed and reliability, I think, must be made in any case. >> >>> it's 1:20am here and I'm ready to collapse. >> >> I hadn't even expected this response at this time ;) >> >> Rainer >>> >>> David Lang >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From tmetro+rsyslog at gmail.com Sun May 3 06:47:51 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Sun, 03 May 2009 00:47:51 -0400 Subject: [rsyslog] output plugin calling interface In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> Message-ID: <49FD21F7.50309@gmail.com> david at lang.hm wrote: > Rainer Gerhards wrote: >> After a lot of thinking today, we can have a "kind of" transactional queue, >> but we need to accept potential message *duplication* in the event of >> failures (but no loss). > > this is the approach that you have taken for other things (relp for > example), and when we were discussing reliability for direct mode vs disk > queues you mentioned that rsyslog could duplicate messages in case of > failures, but would not loose messages. I also noticed this side effect mentioned in the RELP documentation and wondered why message duplication couldn't be avoided by something as simple as assigning a serial number to each log record. A 32-bit monotonically increasing counter that rolls over periodically. The receiving side would cache the serial numbers for the last N records (something that could be done quite memory efficiently if the records show up mostly in order) and discard records it had seen. A hash might work well too, providing you're using high-res time stamps so you don't get false positive duplications. With a strictly in-memory cache of seen records, you could still get duplication after the receiver gets restarted, but at least you'd have greatly narrowed the potential. And the receiver could always pre-seed its cache from the last N stored records on startup. -Tom From david at lang.hm Sun May 3 07:18:45 2009 From: david at lang.hm (david at lang.hm) Date: Sat, 2 May 2009 22:18:45 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <49FD21F7.50309@gmail.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> <49FD21F7.50309@gmail.com> Message-ID: On Sun, 3 May 2009, Tom Metro wrote: > david at lang.hm wrote: >> Rainer Gerhards wrote: >>> After a lot of thinking today, we can have a "kind of" transactional queue, >>> but we need to accept potential message *duplication* in the event of >>> failures (but no loss). >> >> this is the approach that you have taken for other things (relp for >> example), and when we were discussing reliability for direct mode vs disk >> queues you mentioned that rsyslog could duplicate messages in case of >> failures, but would not loose messages. > > I also noticed this side effect mentioned in the RELP documentation and > wondered why message duplication couldn't be avoided by something as > simple as assigning a serial number to each log record. A 32-bit > monotonically increasing counter that rolls over periodically. > > The receiving side would cache the serial numbers for the last N records > (something that could be done quite memory efficiently if the records > show up mostly in order) and discard records it had seen. > > A hash might work well too, providing you're using high-res time stamps > so you don't get false positive duplications. > > With a strictly in-memory cache of seen records, you could still get > duplication after the receiver gets restarted, but at least you'd have > greatly narrowed the potential. And the receiver could always pre-seed > its cache from the last N stored records on startup. note that this would have to be a per-sender list of records, what if you are getting messages from lots of systems? since you can have two or more threads sending you messages you can't assume that you will get them in order (if you could assume this you could just store the last sucessful message processed) I agree that this is the basic approach that would need to be taken, but before we worry about filtering out duplicates for cases like this, we need to make sure we aren't loosing any messages ;-) David Lang From rgerhards at hq.adiscon.com Sun May 3 10:38:37 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sun, 3 May 2009 10:38:37 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> Thanks for the good discussion, it is very inspiring, please keep the thoughts flowing. Answers inline below... > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Sunday, May 03, 2009 7:19 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Sun, 3 May 2009, Tom Metro wrote: > > > david at lang.hm wrote: > >> Rainer Gerhards wrote: > >>> After a lot of thinking today, we can have a "kind of" transactional > queue, > >>> but we need to accept potential message *duplication* in the event of > >>> failures (but no loss). > >> > >> this is the approach that you have taken for other things (relp for > >> example), and when we were discussing reliability for direct mode vs disk > >> queues you mentioned that rsyslog could duplicate messages in case of > >> failures, but would not loose messages. > > > > I also noticed this side effect mentioned in the RELP documentation and > > wondered why message duplication couldn't be avoided by something as > > simple as assigning a serial number to each log record. A 32-bit > > monotonically increasing counter that rolls over periodically. > > > > The receiving side would cache the serial numbers for the last N records > > (something that could be done quite memory efficiently if the records > > show up mostly in order) and discard records it had seen. > > > > A hash might work well too, providing you're using high-res time stamps > > so you don't get false positive duplications. > > > > With a strictly in-memory cache of seen records, you could still get > > duplication after the receiver gets restarted, but at least you'd have > > greatly narrowed the potential. And the receiver could always pre-seed > > its cache from the last N stored records on startup. RELP is potentially able to do that and most in the way Tom has described. It cannot do it today, because I had no time to implement it and there are things with much higher priority than that. RELP uses sequence numbers and I think they are indeed mod 2^32. > > note that this would have to be a per-sender list of records, what if you > are getting messages from lots of systems? Not even pre-sender but per-conversation. A single sender can open up multiple conversations with a single receiver, by just specifying more than one connection. There are ample use-cases for this. For example, one conversation could carry emergency and another one bulk messages. > since you can have two or more threads sending you messages you can't > assume that you will get them in order (if you could assume this you could > just store the last sucessful message processed) Inside a RELP connection, they are in sequence, but we have a sliding window. But there are two things mixed in here: one is the reliable transport, the other one is end-to-end reliability. For example, RELP cannot check if "the messages are already stored" because we have no universal predicate "is stored" (what should that mean?). All RELP can know is that it submitted things to the queue. So even if we put everything into a database, RELP cannot rely on that information to decide which message already have been received and which not. So RELP needs to keep its own state information. That's not awfully hard, because we have just a sliding window, which also acts as a "window of uncertainty". Assuming that we had a "processed messages" state information, on connection re-establish, during the handshake process, sender can query receiver on the state of potential duplicates and remove them. This "just" is not yet done. Also note that this requires that state is properly recorded under all circumstances, an issue where we run into many subtle things to look at. > I agree that this is the basic approach that would need to be taken, but > before we worry about filtering out duplicates for cases like this, we > need to make sure we aren't loosing any messages ;-) What I would find useful is a unique message ID that is created at the original originator and moved forward until whatever final destination. The approach here is to enable analysis tools to detect the duplicates. An uuid would probably make up a good identifier. But this also requires standards work, otherwise it would be a rsyslog-only thing, which here makes especially little sense as the whole point is that external tools (log analyzers) would need to understand it. Rainer From rgerhards at hq.adiscon.com Sun May 3 11:13:36 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Sun, 3 May 2009 11:13:36 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Sunday, May 03, 2009 2:42 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Sat, 2 May 2009, Rainer Gerhards wrote: > > > After a lot of thinking today, we can have a "kind of" transactional queue, > > but we need to accept potential message *duplication* in the event of > > failures (but no loss). > > this is the approach that you have taken for other things (relp for > example), and when we were discussing reliability for direct mode vs disk > queues you mentioned that rsyslog could duplicate messages in case of > failures, but would not loose messages. I think I always mentioned that the currently processed message is at risk. We are drawing a very fine line here, I think, because in case of a fatal failure, we always end up with some uncertainty. Just think about the fact that all sources besides RELP are much more unreliable than the engine itself. So, compared to the environment in which rsyslog is intended to work, I think it is far more reliable than the whole rest of said system. So I am somewhat hesitant to put *a lot* of effort into a small part of reliability which can, if looked at the overall picture, not really be utilized. Just let me iterate that we still talk about a total failure situation. So if the whole data center looses power, it is quite random what we receive. It is also quite random if the receiver manages to enqueue the message, if it receives it. It is uncertain if the disk subsystem (at the controller level) manages to complete the disk transaction. Let's assume the data center's power subsystem emits a failure message while it dies. Obviously, this one would be especially important to save. To do so, we must be lucky enough so that a) power system can emit a valid message b) network components have long enough power to send message to rsyslogd c) rsyslog input runs long enough to actually receive the message (think OS reception queue) d) rsyslog input can parse message e) parsed message can be handed over to queue subsystem f) queue subsystem can commit message to disk All of this needs to happen to ensure the message is saved. I'd say there is a lot of potential that we lose a message along that path. But now let's consider this all somehow works. Then this happens: g) queue subsystem dequeues message from disk (now it is in danger again) h) message is run through filter engine i) action is carried out (assuming a direct queue) The probability of message loss now depends mostly on i) if that is a quick action (like write to disk), the probability is very low. If it is an action that connects to the network (e.g. database, forwarding), I'd say the probability tends to 1. So, yes, in this case the message will most probably be lost. However, the relative probability of loss depends on the the probability that a) to f) succeed, which I consider to be very low. And it also depends on a number of other factors. E.g. the OS, if it is notified of the power fail, will try to ensure system consistency as much as possible and thus will probably not turn back into user space during emergency processing, reducing the probability of step c) towards 0. The question now, IMHO, is how important is it to ensure that these very limited message loss potential is actually considered. But I agree that with batches, the magnitude of the problem increases and such the additional relative probability of message loss may increase enough to justify looking at the issue. > > > This would work without a two-phase commit. However, > > there still is considerable effort to implement it. > > as I understand things the current process is > > thread A recieves the message and puts it in the Queue > > worker thread B pulls the message from the queue formats it and puts it in > the action queue (if there is no action queue, this triggers the output > modulecode as part of thread B.) > > if there is an action queue, thread C is running, and does basicly the > same thing that thread B would do if there was no action queue > Right! > > what I am envisoning is that the worker thread would touch the queue one > additional time. > > instead of removing the message from the queue to perform the action it > would mark the message as being 'in process', then after the message is > delivered it would delte it from the queue (touching the queue three times > instead of two) > Yes, full ack. I am thinking along these lines, too and it is good to hear that someone independently of me does, too. However, while this sounds very simple, there are a lot of subtle issues. Just think about the different queue modes. Especially the transition from memory-only to hybrid mode (and hybrid mode at all) for DA queues brings a lot of potential trouble spots. There is also a performance price to pay with the additional reliability we get. > > > I wonder if the use case > > actually justifies it. Please also consider what I wrote below on the > > performance of any ultra-reliable version. And, yes, I know we have fast and > > reliable controllers today, but even then the disk path is much, much slower > > than any memory based queue. I fail to believe you can build a very > > high-performance syslog server on a disk queue, even with the best hardware > > money can buy today. > > I'm going to be testing this shortly ;-) I have a fusion IO drive to try > and will be getting some boxes with the Intel X-25E SSD drives in a couple > of weeks. the only thing I can't try is the ram-based drive. That would be very good to know. So far, I have to admit, I fail to convince myself that a disk-only configuration can be used for a high volume system. If that would be the case (let's assume that for a moment), we would need to run any such system with at least a DA queue, so one that relies on messages being held in memory at least for part of their lifetime. If that is true, all the discussion about relative loss probabilities is irrelevant, because if we have n messages exclusively in an in-memory queue, and we have a sudden power loss, we surely lose all these n messages. So I conclude any motivation to try prevent even the slightest loss - in case of a total power loss (then and only then my opinion applies) - depends on the ability to run a high-volume system in pure disk mode. If that's possible, I agree preventing the loss is useful. If you cannot use pure disk mode, you can not totally prevent loss and there is no point in trying to minimize an effect in a situation that never occurs. I have to give a training next Monday and Tuesday, so I may not be as responsive. But I'll continue to think about the whole issue today. Feedback on the "high-volume disk only" case would be most welcome. Actually, I'd really love to be proven wrong (not only by the actual hardware results [which vary over the years], but by an error in my argument), so please do. If you can, we can probably build a much more reliable system than I envisioned. So far, my reliability picture does not include disaster cases. Where, of course, power failure is just the mildest facet of them. Rainer > > David Lang > > > Rainer > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > >> Sent: Saturday, May 02, 2009 10:33 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] output plugin calling interface > >> > >>> -----Original Message----- > >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>> Sent: Saturday, May 02, 2009 10:21 AM > >>> To: rsyslog-users > >>> Subject: Re: [rsyslog] output plugin calling interface > >>> > >>> On Sat, 2 May 2009, Rainer Gerhards wrote: > >>> > >>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>>> > >>>>> On Fri, 1 May 2009, Rainer Gerhards wrote: > >>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: rsyslog-bounces at lists.adiscon.com > >>>>>>> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of > > david at lang.hm > >>>>>>> > >>>>>>> On Fri, 1 May 2009, Rainer Gerhards wrote: > >>>>>>> > >>>>>>>> Please let me know if you also find a math model useful > >>>>>>> (but I'll probably > >>>>>>>> need to do it in any case, because it helps me clean up my > > mind...). > >>>>>>> > >>>>>>> I think it will help clarify things a lot. with a good model > >>>>>>> we won't have > >>>>>>> misunderstandings about what we are talking about. > >>>>>> > >>>>>> Yes - and I also think that with the model some complexities > > disappear. > >> I > >>>>>> think (hope I am right) the solution will become obvious. I know I am > >>>>>> investing a lot of time in a tiny portion of the code, but this is > > one > >> of > >>>>> the > >>>>>> core elements involving many complexities. > >>>>>> > >>>>>>> with my 'binary search' approach, handling permanently bad > >>>>>>> messages could > >>>>>>> be as simple as 'too many retries once we hit a batch size of > >>>>>>> 1' (with a > >>>>>>> possible option of the output module reporting back that it > > dectected > >>>>>>> something that makes retries useless, but this is just an > >>>>>>> optimization) > >>>>>> > >>>>>> Yes, indeed. One quick thought: I see a batch as a set of (msg, > > state) > >>>>>> ordered pairs. Once we have procssed it in one action (all of them > > have > >>>>>> entered one permanent state), we can than build a subset that we use > > as > >>>> the > >>>>>> new (remaining) batch in the backup actions. So the "bad record > > search" > >>>> is > >>>>>> "just" one facet of many that we need to handle with little and > >> hopefully > >>>>>> simple code (doing it with 2000 LoC would be rather easy ;)). > >>>>> > >>>>> I agree with the definition of a batch. Let's see what different > > states > >>>>> you are thinking of. > >>>>> > >>>>> I am currently assuming that the messages stay in the queue (with the > >>>>> state attached) so that if rsyslog restarts (assuming disk queues), it > >>>>> will realize that the message hasn't been delivered and try again. > >>>> > >>>> No, it is different: the batch is actually dequeued. So if at that > > point > >> we > >>>> have a system power failure (for whatever reason), the messages are > > lost. > >>>> While the rsyslog engine intends to be very reliable, it is not a > >> complete > >>>> transactional system. A slight risk remains. For this, you need to > >>> understand > >>>> what happens when the batch is processed. I assume that we have no > >> sudden, > >>>> untrappable process termination. Then, if a batch cannot be processed, > > it > >> is > >>>> returned back to the top of queue. This is not yet implemented, but is > >> how > >>>> single messages (which you can think of an abstraction of a batch in > > the > >>>> current code) are handled. If, for example, the engine shuts down, but > > an > >>>> action takes longer than the configured shutdown timeout, the action is > >>>> cancelled and the queue engine reclaims the unprocessed messages. They > > go > >>>> into a special area inside the .qi file and are placed on top of the > >> queue > >>>> once the engine restarts. > >>>> > >>>> The only case where this not work is sudden process termination. I see > >> two > >>>> cases: > >>>> > >>>> a) a fatal software bug > >>>> We cannot really address this. Even if the messages were remaining in > > the > >>>> queue until finally processed, a software bug (maybe an invalid > > pointer) > >> may > >>>> affect the queue structures at large, possibly even at the risk of > > total > >>> loss > >>>> of all data inside that queue. So this is an inevitable risk. > >>>> > >>>> b) sudden power fail > >>>> ... which can and should be mitigated at another level > >>>> > >>>> One may argue that there also is > >>>> > >>>> c) admin error > >>>> e.g, kill -9 rsyslogd > >>>> Here a fully transactional queue will probably help. > >>>> > >>>> However, I do not think that the risk involved justifies a far more > >> complex > >>>> fully transactional implementation of the queue object. Some risk > > always > >>>> remains (what in the disaster case, even with a fully transactional > >> queue?). > >>>> > >>>> And it is so complex to let the messages stay in queue because it is > >> complex > >>>> to work with such messages and disk queues. It would also cost a lot of > >>>> performance, especially when done reliably (need to sync). We would > > then > >>> need > >>>> to touch each element at least four times, twice as much as currently. > >> Also, > >>>> the hybrid disk/memory queues become very, very complex. There are more > >>>> complexities around this, I just wanted to tell the most obvious. > >>>> > >>>> So, all in all, the idea is that messages are dequeued, processed and > > put > >>>> back to the queue (think: ungetc()) when something goes wrong. > > Reasonable > >>>> (but not more) effort is made to prevent message loss while the > > messages > >> are > >>>> in unprocessed state outside of the queue. > >>>> > >>>> Hope that clarifies and I am glad you brought this up. Made me think > >> again, > >>>> but I concluded to what I've written above ;) > >>> > >>> this is definantly different from the way I thought things worked from > > our > >>> prior discussions about reliability. from those I understood that rsyslog > >>> could be used to make a fully reliable system, if you are willing to take > >>> the performance hit to do so. > >> > >> You can, but than you need to use batch sizes of 1. > >> > >>> as batch size increases (to gain efficiancy) the number of log messages > >>> that can be lost also increase. > >>> > >>> unfortunantly I have the belief that power outages cannot be avoided > > (I've > >>> seen cases where millions have been spent on the power systems and still > >>> ended up with a datacenter-wide blackout. > >> > >> Let me think about this, but I think to protect against this problem, you > >> really need to have two-phase commit, which I am not sure belongs into a > >> syslogd. > >> > >>> when you get the model of things togeather we will be in a much better > >>> position to discuss this. > >> > >> Well, we'd probably restart discussing reliability requirements. If it > > turns > >> out that you need 100% reliability, not matter what happens at all, I am > > not > >> sure if we can implement this without adding considerable database-ish > >> processing. "Under all circumstances" reliability is very hard to achive, > >> especially if you also would like to have high performance. Think about it: > >> to guard against the data center full power loss scenario, you need to have > > a > >> disk-only queue, being synced to disk for every single en- and dequeue > >> operation. This is extremely costly. Does it than really matter if we have > >> large batches or not? The system, I think, will be so slow, that you cannot > >> use it for any demanding real-life application, so some compromise between > >> speed and reliability, I think, must be made in any case. > >> > >>> it's 1:20am here and I'm ready to collapse. > >> > >> I hadn't even expected this response at this time ;) > >> > >> Rainer > >>> > >>> David Lang > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tmetro+rsyslog at gmail.com Sun May 3 23:05:50 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Sun, 03 May 2009 17:05:50 -0400 Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> Message-ID: <49FE072E.1070705@gmail.com> Rainer Gerhards wrote: > Just think about the fact that all sources besides RELP are much more > unreliable than the engine itself. So, compared to the environment in > which rsyslog is intended to work, I think it is far more reliable > than the whole rest of said system. Isn't the entry point into syslog also unreliable, such that if the daemon exits for any reason, messages get dropped on the floor? If so, all the signal handlers and other steps - short of say sticking a queue buffer into the kernel - will only get you so far. > So I am somewhat hesitant to put *a lot* of effort into a small part > of reliability which can, if looked at the overall picture, not > really be utilized. I think you're on the right track to take an agile, incremental approach of improving the reliability of the weakest links. -Tom From tmetro+rsyslog at gmail.com Sun May 3 23:24:00 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Sun, 03 May 2009 17:24:00 -0400 Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> Message-ID: <49FE0B70.9010009@gmail.com> Rainer Gerhards wrote: > But there are two things mixed in here: one is the reliable transport, the > other one is end-to-end reliability. For example, RELP cannot check if "the > messages are already stored" because we have no universal predicate "is > stored"... I assumed it followed a model of conveyed responsibility, just like SMTP. Once the receiver has acknowledged receipt, it needs to take full responsibility for storage. If the receiver wants to cut corners on the reliability of its internals, then it should delay acks until it has confirmed successful storage. > So even if we put everything into a database, RELP cannot rely on > that information to decide which message already have been received > and which not. I'm confused. On one side a receiver is talking RELP, and via RELP it receives a batch of messages, potentially containing duplicates. On the other side of that receiver is its storage back-end. If the receiver chooses, it ought to be able to query that storage to see if any of the messages are duplicates, and if so, discard them. This doesn't involve RELP. (I described an in-memory cache for efficiency reasons, but the duplicate check could involve querying a database.) > Assuming that we had a "processed messages" state information, on > connection re-establish, during the handshake process, sender can > query receiver on the state of potential duplicates and remove them. I assumed the de-dupe intelligence would be on the receiver side. Sender throws messages over the wall at the receiver, and it sorts things out. > What I would find useful is a unique message ID that is created at the > original originator and moved forward until whatever final destination. The > approach here is to enable analysis tools to detect the duplicates. Sure, that could be a good approach. For the "cost" of a cryptographic hash - probably computer right after the timestamp is added to the message - you'd push the duplicate filtering problem to the post-processing code. It would be interesting to do a benchmark comparison between the up-front hash computation vs. all the overhead of adding a serial number, caching seen record IDs, and dedupe logic. -Tom From david at lang.hm Mon May 4 04:32:28 2009 From: david at lang.hm (david at lang.hm) Date: Sun, 3 May 2009 19:32:28 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <49FE0B70.9010009@gmail.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> <49FE0B70.9010009@gmail.com> Message-ID: On Sun, 3 May 2009, Tom Metro wrote: > Rainer Gerhards wrote: >> But there are two things mixed in here: one is the reliable transport, the >> other one is end-to-end reliability. For example, RELP cannot check if "the >> messages are already stored" because we have no universal predicate "is >> stored"... > > I assumed it followed a model of conveyed responsibility, just like > SMTP. Once the receiver has acknowledged receipt, it needs to take full > responsibility for storage. If the receiver wants to cut corners on the > reliability of its internals, then it should delay acks until it has > confirmed successful storage. > > >> So even if we put everything into a database, RELP cannot rely on >> that information to decide which message already have been received >> and which not. > > I'm confused. On one side a receiver is talking RELP, and via RELP it > receives a batch of messages, potentially containing duplicates. On the > other side of that receiver is its storage back-end. If the receiver > chooses, it ought to be able to query that storage to see if any of the > messages are duplicates, and if so, discard them. This doesn't involve > RELP. (I described an in-memory cache for efficiency reasons, but the > duplicate check could involve querying a database.) it's not the right thing to just eliminate duplicate message. you may get the same message multiple times (with the same timestamp even). the only way to know if you have seen _this copy_ of the message before is to have a unique identifier for the message. this unique identifier may not be something that's appropriate to store (if it wasn't generated by the original sender, you may not want to pass it on the the softwar that would be analysing the logs) David Lang > >> Assuming that we had a "processed messages" state information, on >> connection re-establish, during the handshake process, sender can >> query receiver on the state of potential duplicates and remove them. > > I assumed the de-dupe intelligence would be on the receiver side. Sender > throws messages over the wall at the receiver, and it sorts things out. > > >> What I would find useful is a unique message ID that is created at the >> original originator and moved forward until whatever final destination. The >> approach here is to enable analysis tools to detect the duplicates. > > Sure, that could be a good approach. For the "cost" of a cryptographic > hash - probably computer right after the timestamp is added to the > message - you'd push the duplicate filtering problem to the > post-processing code. > > It would be interesting to do a benchmark comparison between the > up-front hash computation vs. all the overhead of adding a serial > number, caching seen record IDs, and dedupe logic. > > -Tom > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From david at lang.hm Mon May 4 04:33:38 2009 From: david at lang.hm (david at lang.hm) Date: Sun, 3 May 2009 19:33:38 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <49FE072E.1070705@gmail.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> <49FE072E.1070705@gmail.com> Message-ID: On Sun, 3 May 2009, Tom Metro wrote: > Rainer Gerhards wrote: >> Just think about the fact that all sources besides RELP are much more >> unreliable than the engine itself. So, compared to the environment in >> which rsyslog is intended to work, I think it is far more reliable >> than the whole rest of said system. > > Isn't the entry point into syslog also unreliable, such that if the > daemon exits for any reason, messages get dropped on the floor? If so, > all the signal handlers and other steps - short of say sticking a queue > buffer into the kernel - will only get you so far. it depends on what method you use to get the messages into rsyslog. if you make your app talk RELP you should be able to get complete end-to-end reliability. David Lang > >> So I am somewhat hesitant to put *a lot* of effort into a small part >> of reliability which can, if looked at the overall picture, not >> really be utilized. > > I think you're on the right track to take an agile, incremental approach > of improving the reliability of the weakest links. > > -Tom > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From YungWei.Chen at resolvity.com Mon May 4 05:13:33 2009 From: YungWei.Chen at resolvity.com (YungWei.Chen) Date: Sun, 3 May 2009 23:13:33 -0400 Subject: [rsyslog] Building rsyslog from source code References: <000001c9ca34$670b84df$100013ac@intern.adiscon.com><795E60BBD9A86846BFEDF32029788B1BA74A6F@MI8NYCMAIL14.Mi8.com> Message-ID: <795E60BBD9A86846BFEDF32029788B1B082D93@MI8NYCMAIL14.Mi8.com> What I need from rsyslog is the following: * support TLS communication * support fail-over I am wondering if I have other options besides rsyslog. Thanks. From tmetro+rsyslog at gmail.com Mon May 4 07:07:19 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 01:07:19 -0400 Subject: [rsyslog] output plugin calling interface In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> <49FE072E.1070705@gmail.com> Message-ID: <49FE7807.2080409@gmail.com> david at lang.hm wrote: > if you make your app talk RELP you should be able to get complete > end-to-end reliability. Makes sense. Is there a RELP client library that is a drop-in replacement for the equivalent syslog API? So presumably this RELP client library would incorporate a queue. Of course unless you want the absence of logging to block your application, most of these reliability enhancements only buy you some time until the buffer fills up. -Tom From david at lang.hm Mon May 4 07:24:29 2009 From: david at lang.hm (david at lang.hm) Date: Sun, 3 May 2009 22:24:29 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <49FE7807.2080409@gmail.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B014@GRFEXC.intern.adiscon.com> <49FE072E.1070705@gmail.com> <49FE7807.2080409@gmail.com> Message-ID: On Mon, 4 May 2009, Tom Metro wrote: > david at lang.hm wrote: >> if you make your app talk RELP you should be able to get complete >> end-to-end reliability. > > Makes sense. Is there a RELP client library that is a drop-in > replacement for the equivalent syslog API? not that would be a drop-in replacement. > So presumably this RELP client library would incorporate a queue. Of > course unless you want the absence of logging to block your application, > most of these reliability enhancements only buy you some time until the > buffer fills up. not nessasarily, with classic syslog, the application blocks until the syslog daemon completes the write of the log message to disk (including doing an fsync). if the syslog daemon can't handle the request, the application waits. the only syslogd that I know of that lets you avoid this blockage is the linux sysklogd (syslog-ng and rsyslog are replacements that add a _lot_ of other functionality as well as memory buffering). if you are using syslogd on any *nix platform other than linux, your apps have to wait for the disk for each log message. yes, this can cripple application performance (and frequently does) with rsyslog's modular structure it's also possible to write an input module that does application-level acknowledgements like relp does, that then feeds the message into the normal rsyslog mechansim. as long as the input module doesn't acknowledge recipt of the message before it's in the queue, this should result in a reliable end-to-end message delivery mechansim. I define this as: If the application sucessfully writes the log message, the message will not be lost, even in the face of infrastructure failures If the application doesn't sucessfully write the log message (either can't contact rsyslog, or doesn't get an acknowledgement), the application should assume that the log message did not get through and respond accordingly. this could be to pause and retry, it could be to abort the action it was about to make, or something else (if the application needs to be _really_ paranoid, it will log that it's about to do the critical action, do the action, then log that it completed the action. that way if things go down in the middle you have a record that you were in that state and can do futher investigation to see if that event actually ovccured or not). In general I believe that duplicated messages are a far better situation than lost messages. Applicaitons can put unique IDs in a message if they are worried about duplicates (and filter them on the back-end), they can't do anything about messages that disappear on them. David Lang From tmetro+rsyslog at gmail.com Mon May 4 07:30:45 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 01:30:45 -0400 Subject: [rsyslog] output plugin calling interface In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> <49FE0B70.9010009@gmail.com> Message-ID: <49FE7D85.2080702@gmail.com> david at lang.hm wrote: > Tom Metro wrote: >> Rainer Gerhards wrote: >>> So even if we put everything into a database, RELP cannot rely on >>> that information to decide which message already have been received >>> and which not. >> I'm confused. On one side a receiver is talking RELP, and via RELP it >> receives a batch of messages, potentially containing duplicates. On the >> other side of that receiver is its storage back-end. If the receiver >> chooses, it ought to be able to query that storage to see if any of the >> messages are duplicates, and if so, discard them. This doesn't involve >> RELP. (I described an in-memory cache for efficiency reasons, but the >> duplicate check could involve querying a database.) > > it's not the right thing to just eliminate duplicate message. you may get > the same message multiple times (with the same timestamp even). the only > way to know if you have seen _this copy_ of the message before is to have > a unique identifier for the message. Your point nay be correct, but I'm not sure it has relevance to the material you quoted. The context of the above comments included Rainer saying, "RELP uses sequence numbers." So at least within the scope of a limited time window, the individual messages can be uniquely distinguished. > this unique identifier may not be something that's appropriate to store > (if it wasn't generated by the original sender, you may not want to pass > it on the the softwar that would be analysing the logs) Right. So for example, there might not be much sense in persistently storing a time-limited sequence number. But that didn't seem to be the point Rainer was making with regards to using a database back-end. A key comment he made was, "we have no universal predicate 'is stored'." And I was wondering why such functionality is required in order to avoid duplicates. > you may get the same message multiple times (with the same timestamp > even). Is that true even with a high-res time stamp? I suppose that's relative to the resolution of your time stamp and your message throughput. To insure a hash of a message is unique, you'd probably have to include a sequence number in the data being hashed, in addition to the time stamp. Actually, timestamp + sequence number ought to provide a sufficiently unique ID for any message within a "conversation." The hash is probably of value only for obtaining something smaller to store or faster to look up (on the receiving side). -Tom From rgerhards at hq.adiscon.com Mon May 4 08:00:09 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 4 May 2009 08:00:09 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> <49FE0B70.9010009@gmail.com> <49FE7D85.2080702@gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B015@GRFEXC.intern.adiscon.com> Quickly just one response as I am preparing for the training I give... > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Tom Metro > Sent: Monday, May 04, 2009 7:31 AM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > david at lang.hm wrote: > > Tom Metro wrote: > >> Rainer Gerhards wrote: > >>> So even if we put everything into a database, RELP cannot rely on > >>> that information to decide which message already have been received > >>> and which not. > >> I'm confused. On one side a receiver is talking RELP, and via RELP it > >> receives a batch of messages, potentially containing duplicates. On the > >> other side of that receiver is its storage back-end. If the receiver > >> chooses, it ought to be able to query that storage to see if any of the > >> messages are duplicates, and if so, discard them. This doesn't involve > >> RELP. (I described an in-memory cache for efficiency reasons, but the > >> duplicate check could involve querying a database.) > > > > it's not the right thing to just eliminate duplicate message. you may get > > the same message multiple times (with the same timestamp even). the only > > way to know if you have seen _this copy_ of the message before is to have > > a unique identifier for the message. > > Your point nay be correct, but I'm not sure it has relevance to the > material you quoted. The context of the above comments included Rainer > saying, "RELP uses sequence numbers." So at least within the scope of a > limited time window, the individual messages can be uniquely distinguished. It's not a time window, it's a sliding window (much like eg TCP does) that reflects the flow of messages. But at this point of the discussion, there is not much difference between the two. The problem, I think, that surfaces in the discussion is that you do not properly think about the different layers. While rsyslogd is a single application, it is internally store-and-forward and as such mimics the infrastructure syslog uses in general. So think that shuffeling messages from the input to the main queue is one complete "transaction". Shuffeling from the main to the action queue is another one. Executing the action is the next one, all within a single process space. However, you can easily extend that view to remote peers. Within relp, that is (omrelp -> network -> imrelp) we have sequence numbers. But they are valid (and even exist) only in that context. > > this unique identifier may not be something that's appropriate to store > > (if it wasn't generated by the original sender, you may not want to pass > > it on the the softwar that would be analysing the logs) > > Right. So for example, there might not be much sense in persistently > storing a time-limited sequence number. But that didn't seem to be the > point Rainer was making with regards to using a database back-end. A key > comment he made was, "we have no universal predicate 'is stored'." And I > was wondering why such functionality is required in order to avoid > duplicates. Think about the store-and-forward system. An analogy: can a mail client provide a reliable delivery notification? No, because it does not deliver the message. That does another entity. So, the ultimate destination may generate a delivery report and it may send it back to you. But that's not part of the original mail transaction but rather a new one. So the original mail client does not have a predicate "is delivered". In the same sense, rsyslog does not have a predicate "is stored". An input, imrelp for example, does not even know if a message will eventually be written to a database. Much less it knows how to use such database (assuming it exists) to obtain knowledge about what was transmitted so far and what not. What, for example, if the potentially-duplicate message is one that has been discarded by the rule engine. So using any outcome of an action - two logical hops away - as a state information for an input is unreliable and IMHO as such unacceptable. So imrelp, if it intends to filter out duplicates, must keep the state itself. That, indeed, it is designed to do, but has not yet implemented. My overall position here is that rsyslog today is much more reliable then the whole rest of the syslog infrastructure, so there is no point in getting a tiny bit more reliability here where it can not really be of help (an answer to the "pure disk queue and high-volume sytems" question may change my position, this is why I don't intend to explain that point any further until I get David's results). > > you may get the same message multiple times (with the same timestamp > > even). > > Is that true even with a high-res time stamp? I suppose that's relative > to the resolution of your time stamp and your message throughput. As of the relevant standards, it is microsecond resolution at best. But that depends on the resolution of the time source. I do not consider a timestamp to be necessarily unique. > To insure a hash of a message is unique, you'd probably have to include > a sequence number in the data being hashed, in addition to the time > stamp. Actually, timestamp + sequence number ought to provide a > sufficiently unique ID for any message within a "conversation." The hash > is probably of value only for obtaining something smaller to store or > faster to look up (on the receiving side). I think this is an old discussion and the only real solution is a uuid. I don't see any point in re-inventing it (but generating uuid's takes time and using them inside the syslog context requires standards). Hope that helps, please continue to let the thoughts flow... Rainer > > -Tom > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Mon May 4 08:11:49 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 4 May 2009 08:11:49 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B012@GRFEXC.intern.adiscon.com><49FD21F7.50309@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B013@GRFEXC.intern.adiscon.com> <49FE0B70.9010009@gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B016@GRFEXC.intern.adiscon.com> One more ;) > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Tom Metro > Sent: Sunday, May 03, 2009 11:24 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > Rainer Gerhards wrote: > > But there are two things mixed in here: one is the reliable transport, the > > other one is end-to-end reliability. For example, RELP cannot check if "the > > messages are already stored" because we have no universal predicate "is > > stored"... > > I assumed it followed a model of conveyed responsibility, just like > SMTP. Once the receiver has acknowledged receipt, it needs to take full > responsibility for storage. If the receiver wants to cut corners on the > reliability of its internals, then it should delay acks until it has > confirmed successful storage. It does, being reliable as described in the other posts. > > > > So even if we put everything into a database, RELP cannot rely on > > that information to decide which message already have been received > > and which not. > > I'm confused. On one side a receiver is talking RELP, and via RELP it > receives a batch of messages, potentially containing duplicates. On the > other side of that receiver is its storage back-end. No, that's the key point. There is no storage-back end with RELP yet. You think about a "foreign" storage backend, which (in case of a relay) may not even exists - see other posting. > If the receiver > chooses, it ought to be able to query that storage to see if any of the > messages are duplicates, and if so, discard them. No - and if done on the output layer, it would put a lot of burden onto the outputs. Definitely wrong place to do it. > This doesn't involve > RELP. (I described an in-memory cache for efficiency reasons, but the > duplicate check could involve querying a database.) > > > > Assuming that we had a "processed messages" state information, on > > connection re-establish, during the handshake process, sender can > > query receiver on the state of potential duplicates and remove them. > > I assumed the de-dupe intelligence would be on the receiver side. Sender > throws messages over the wall at the receiver, and it sorts things out. That requires more bandwidth than necessary. Why do it if exchange of sequence numbers is sufficient? > > > > What I would find useful is a unique message ID that is created at the > > original originator and moved forward until whatever final destination. The > > approach here is to enable analysis tools to detect the duplicates. > > Sure, that could be a good approach. For the "cost" of a cryptographic > hash - probably computer right after the timestamp is added to the > message - you'd push the duplicate filtering problem to the > post-processing code. > > It would be interesting to do a benchmark comparison between the > up-front hash computation vs. all the overhead of adding a serial > number, caching seen record IDs, and dedupe logic. RELP already does all of this, it just does not persist any state information (plus some other things I don't know out of my head) There are a number of subtle issue - wich I simply can not explain right now - that such sequence number is required. If you look at how other protocols are implemented, you'll see that this is at least the mainstream approach (and I think I am not overdoing if I state it probably is the only one that works reliable without violating abstraction layers). Rainer > > -Tom > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tmetro+rsyslog at gmail.com Mon May 4 09:47:11 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 03:47:11 -0400 Subject: [rsyslog] desktop notifications from syslog In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> Message-ID: <49FE9D7F.5060706@gmail.com> Rainer Gerhards wrote: > I really like the idea...and think that it is also the solution > to some other things I have on my mind (like a way to get rsyslogd internals > via the GUI, that would at least for debugging and probably for a couple of > other things be useful). ... > What I have on my mind is a kind of interactive interface, where, in > real-time, you can see things like queue saturation, modules loaded, maybe > generate test events and so on. That's my debug case. Of course, some > interactive features may be interesting for regular end user Are those kinds of internals accessible from an output plugin? If not, then either the "subscribe to events via DBus" functionality would need to be implemented in the core code, or these might be two independent projects. I see you have an SNMP interface. I wonder if that has some common ground with the functionality you're describing. Layering the SNMP interface on the DBus interface, or just having both use a common middleware with two different front-ends. > In any case, however, this sounds like a lot of work (that being the reason I > did not yet start the effort). It can aways be approached incrementally. A first cut would provide only a single DBus operation - subscribe to events. What events would be entirely controlled by where the output module was specified in the config file, and you'd probably be limited to only one occurrence of that module in the config. An incremental enhancement might be to permit the DBus client to specify a named channel it wishes to subscribe to. (I recall seeing something about named channels in the rsyslog documentation. Not sure if they'd be applicable here.) > So while I would consider this approach technically inferior, I'd > still take the route to create a libnotify output, where the users > has the burden of correct configuring the params, if that can be done > in a couple of hours. The quick hack might actually be to go back to my original approach of using a named pipe for the first leg of IPC, with the hope that rsyslog handles pipes more reliably (or I can figure out why they appeared not to work reliably with sysklogd; I assume xconsole users would have noticed if this mechanism was unreliable). Then use Michael Biebl's technique of running the client from the user's X session so you don't have to go through hoops to get connected to the right desktop. (I already have a Perl script that implements this, I just need to launch it from within the X session.) This is basically the same setup as xconsole, just a different UI. > That, of course, would need to run on a large async queue, because it > has the potential of blocking rsyslog and in consequence the system > as whole (I assume that's not an issue with DBus). I'm curious to know if rsyslog will block if it fills the buffer going to the named pipe, in much the same way it can block if the shell execute process hangs? I'm assuming not, as I've seen syslog.conf's that write to /dev/xconsole, and ran fine despite there being no xconsole on the system servicing the pipe. -Tom From tmetro+rsyslog at gmail.com Mon May 4 09:49:38 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 03:49:38 -0400 Subject: [rsyslog] desktop notifications from syslog In-Reply-To: References: <49F8B2F7.9070201@gmail.com> Message-ID: <49FE9E12.7040209@gmail.com> david at lang.hm wrote: > Tom Metro wrote: >> Is there a recommended technique for displaying notifications on a GUI >> desktop from syslog? > > unfortunantly this is going to vary significantly based on which desktop > and distro you are using. I'm OK with that, as long as it can be broken into two pieces, with rsyslog providing some generic mechanism to provide access to the events, and a separate client that glues rsyslog to the particular GUI. It seems that libnotify already covers most of the bases within the Linux universe. According to: http://www.galago-project.org/specs/notification/ it implements a freedesktop.org standard. I know it is well supported by GNOME, and Googling shows at least some level of support for it in KDE, ROX and XFCE. > one of the changes in Ubuntu 9.04 (just released) was a drastic change in > the notification daemon and how it works. Thanks for the tip. I dug up this: https://wiki.ubuntu.com/JauntyJackalope/TechnicalOverview#New%20style%20for%20notifications%20and%20notification%20preferences which links to a video demonstrating the new notifications. What is unclear is whether this is just a new GUI layer over the same code, or at least something that is still protocol compatible with libnotify. A forum posting led me to the notify-osd package: https://launchpad.net/ubuntu/+source/notify-osd which led me to the design specification: https://wiki.ubuntu.com/NotifyOSD According to that, the new system is compliant (or at least intended to be) with the freedesktop.org standard. The most notable difference is that these new "bubble" notifications are shown for a limited duration, as chosen by notify-osd, but if the client app. specifies infinite duration, it goes into a fall-back mode and shows a traditional alert box. In that regards, seems like a step backwards from notification-daemon, given that alert boxes are less visually appealing. Of course they might jazz up the alert boxes at a later time. (In my prototype I've been using an infinite display duration, with the rationale that the kinds of messages being shown require user attention and should not be missed if the user isn't at the computer when they occur.) The closest example they have in the spec is the kerneloops one: https://wiki.ubuntu.com/NotifyOSD#kerneloops and their plan is to migrate from using the (old style) notification bubble (looks like a system tray bubble, even though I don't think there is a corresponding tray icon) to a traditional alert box (again, seems like a step backwards, but I guess they're trying to increase consistency and reduce visual variety in notifications). By using an infinite display duration via libnotify, according to their spec, it'll show an alert box, so this new usage will be consistent with the kerneloops use case. (Their spec addresses the issue of avoiding accidental key presses that might dismiss a pop-up notification, and this was given as a rationale for bubbles not having any clickable buttons. But I didn't see how they addressed this problem for alert boxes.) Having the notifications look nice would be good, but obviously the important thing is just getting them displayed. The bottom line looks like an app. that works with libnotify under Ubuntu 8.10 or earlier will continue to work with 9.04. -Tom From rgerhards at hq.adiscon.com Mon May 4 09:58:49 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 4 May 2009 09:58:49 +0200 Subject: [rsyslog] desktop notifications from syslog References: <49F8B2F7.9070201@gmail.com> <49FE9E12.7040209@gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B019@GRFEXC.intern.adiscon.com> Hi Tom, (my training folks are in a traffic jam, so some more time... ;)) this sounds very reasonable, but as I have not yet programmed with DBus or libnotify, I fear I need to push this off to after queue batches are done. I'd squeezed in between a short 3-hour job, but that's getting more. Hint: if you can provide me working example C code on how to connect to DBus, and push a notification to it, I can integrate that into an output plugin, which I'd leave for you to test ;) Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Tom Metro > Sent: Monday, May 04, 2009 9:50 AM > To: rsyslog-users > Subject: Re: [rsyslog] desktop notifications from syslog > > david at lang.hm wrote: > > Tom Metro wrote: > >> Is there a recommended technique for displaying notifications on a GUI > >> desktop from syslog? > > > > unfortunantly this is going to vary significantly based on which desktop > > and distro you are using. > > I'm OK with that, as long as it can be broken into two pieces, with > rsyslog providing some generic mechanism to provide access to the > events, and a separate client that glues rsyslog to the particular GUI. > > It seems that libnotify already covers most of the bases within the > Linux universe. According to: > http://www.galago-project.org/specs/notification/ > > it implements a freedesktop.org standard. I know it is well supported by > GNOME, and Googling shows at least some level of support for it in KDE, > ROX and XFCE. > > > > one of the changes in Ubuntu 9.04 (just released) was a drastic change in > > the notification daemon and how it works. > > Thanks for the tip. I dug up this: > https://wiki.ubuntu.com/JauntyJackalope/TechnicalOverview#New%20style%20for%2 0 > notifications%20and%20notification%20preferences > > which links to a video demonstrating the new notifications. What is > unclear is whether this is just a new GUI layer over the same code, or > at least something that is still protocol compatible with libnotify. > > A forum posting led me to the notify-osd package: > https://launchpad.net/ubuntu/+source/notify-osd > > which led me to the design specification: > https://wiki.ubuntu.com/NotifyOSD > > According to that, the new system is compliant (or at least intended to > be) with the freedesktop.org standard. The most notable difference is > that these new "bubble" notifications are shown for a limited duration, > as chosen by notify-osd, but if the client app. specifies infinite > duration, it goes into a fall-back mode and shows a traditional alert > box. In that regards, seems like a step backwards from > notification-daemon, given that alert boxes are less visually appealing. > Of course they might jazz up the alert boxes at a later time. > > (In my prototype I've been using an infinite display duration, with the > rationale that the kinds of messages being shown require user attention > and should not be missed if the user isn't at the computer when they occur.) > > The closest example they have in the spec is the kerneloops one: > https://wiki.ubuntu.com/NotifyOSD#kerneloops > > and their plan is to migrate from using the (old style) notification > bubble (looks like a system tray bubble, even though I don't think there > is a corresponding tray icon) to a traditional alert box (again, seems > like a step backwards, but I guess they're trying to increase > consistency and reduce visual variety in notifications). > > By using an infinite display duration via libnotify, according to their > spec, it'll show an alert box, so this new usage will be consistent with > the kerneloops use case. > > (Their spec addresses the issue of avoiding accidental key presses that > might dismiss a pop-up notification, and this was given as a rationale > for bubbles not having any clickable buttons. But I didn't see how they > addressed this problem for alert boxes.) > > Having the notifications look nice would be good, but obviously the > important thing is just getting them displayed. The bottom line looks > like an app. that works with libnotify under Ubuntu 8.10 or earlier will > continue to work with 9.04. > > -Tom > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon May 4 16:56:55 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 4 May 2009 07:56:55 -0700 (PDT) Subject: [rsyslog] desktop notifications from syslog In-Reply-To: <49FE9D7F.5060706@gmail.com> References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> Message-ID: On Mon, 4 May 2009, Tom Metro wrote: > Rainer Gerhards wrote: >> I really like the idea...and think that it is also the solution >> to some other things I have on my mind (like a way to get rsyslogd internals >> via the GUI, that would at least for debugging and probably for a couple of >> other things be useful). > ... >> What I have on my mind is a kind of interactive interface, where, in >> real-time, you can see things like queue saturation, modules loaded, maybe >> generate test events and so on. That's my debug case. Of course, some >> interactive features may be interesting for regular end user > > Are those kinds of internals accessible from an output plugin? If not, > then either the "subscribe to events via DBus" functionality would need > to be implemented in the core code, or these might be two independent > projects. I think there would be two independant projects 1. provide output to dbus 2. create a dbus management interface to query internal stats > I see you have an SNMP interface. I wonder if that has some common > ground with the functionality you're describing. Layering the SNMP > interface on the DBus interface, or just having both use a common > middleware with two different front-ends. I have not looked at the rsyslog snmp interface, but I assumed that it sendt and/or recieved snmp trap messages, not queried rsyslog internal stats. >> In any case, however, this sounds like a lot of work (that being the reason I >> did not yet start the effort). > > It can aways be approached incrementally. A first cut would provide only > a single DBus operation - subscribe to events. What events would be > entirely controlled by where the output module was specified in the > config file, and you'd probably be limited to only one occurrence of > that module in the config. the term 'subscribe to events' sounds to me like it recieves messages others send. in this case it needs to be generating events that others would subscribe to. am I just confused by the terminaology here? > An incremental enhancement might be to permit the DBus client to specify > a named channel it wishes to subscribe to. (I recall seeing something > about named channels in the rsyslog documentation. Not sure if they'd be > applicable here.) rsyslog does not have named channels internally. >> So while I would consider this approach technically inferior, I'd >> still take the route to create a libnotify output, where the users >> has the burden of correct configuring the params, if that can be done >> in a couple of hours. > > The quick hack might actually be to go back to my original approach of > using a named pipe for the first leg of IPC, with the hope that rsyslog > handles pipes more reliably (or I can figure out why they appeared not > to work reliably with sysklogd; I assume xconsole users would have > noticed if this mechanism was unreliable). Then use Michael Biebl's > technique of running the client from the user's X session so you don't > have to go through hoops to get connected to the right desktop. (I > already have a Perl script that implements this, I just need to launch > it from within the X session.) I have been using named pipes extensivly with both rsyslog and sysklogd, what problems were you having? > This is basically the same setup as xconsole, just a different UI. > > >> That, of course, would need to run on a large async queue, because it >> has the potential of blocking rsyslog and in consequence the system >> as whole (I assume that's not an issue with DBus). > > I'm curious to know if rsyslog will block if it fills the buffer going > to the named pipe, in much the same way it can block if the shell > execute process hangs? yes rysyslog would block, although you could create a action queue for that activity to keep that blockage from blocking all of rsyslog (and you may be able to use reinerscript to put logic in place to throw away messages if the queue is full, I don't know if the info needed to do this is available at this time) David Lang > I'm assuming not, as I've seen syslog.conf's that write to > /dev/xconsole, and ran fine despite there being no xconsole on the > system servicing the pipe. > > -Tom > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From tmetro+rsyslog at gmail.com Tue May 5 00:39:34 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 18:39:34 -0400 Subject: [rsyslog] desktop notifications from syslog In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B019@GRFEXC.intern.adiscon.com> References: <49F8B2F7.9070201@gmail.com> <49FE9E12.7040209@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702B019@GRFEXC.intern.adiscon.com> Message-ID: <49FF6EA6.7050406@gmail.com> Rainer Gerhards wrote: > this sounds very reasonable, but as I have not yet programmed with DBus or > libnotify... If we take the two-piece approach, the part coupled to rsyslog wouldn't use libnotify. Just DBus. > if you can provide me working example C code on how to connect to > DBus, and push a notification to it, I can integrate that into an > output plugin... The notify-send source would provide that, but again, if we go the two-piece route, the output plugin would act as a DBus server, rather than client. So other applications that provide DBus services would be where to look for examples. I can dig those up. (Dnsmasq and gnome-screensaver are two random examples.) > ...I fear I need to push this off to after queue batches are done. Sure thing. There's nothing time sensitive about this. I'd like to explore the named pipe approach first to see if I can get that working reliably. If so, there may be no need for a custom output plugin. -Tom From tmetro+rsyslog at gmail.com Tue May 5 01:13:43 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 04 May 2009 19:13:43 -0400 Subject: [rsyslog] desktop notifications from syslog In-Reply-To: References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> Message-ID: <49FF76A7.3050002@gmail.com> david at lang.hm wrote: > Tom Metro wrote: >> A first cut would provide only >> a single DBus operation - subscribe to events. > > the term 'subscribe to events' sounds to me like it recieves messages > others send. in this case it needs to be generating events that others > would subscribe to. am I just confused by the terminaology here? The client-server relationship may be inverted from what you expect. The proposed architecture is something like this: rsyslog output plugin w/DBus interface ^ | DBus | syslog-notify | | DBus session bus V notification daemon Here the 'syslog-notify' "glue" daemon runs under the user's X session, establishes a connection to rsyslog via DBus, and subscribes to events. That way the output plugin could support multiple subscribers, each eventually with their own topic/channel, and could be a no-op if there are no subscribers. >> An incremental enhancement might be to permit the DBus client to specify >> a named channel it wishes to subscribe to. (I recall seeing something >> about named channels in the rsyslog documentation. Not sure if they'd be >> applicable here.) > > rsyslog does not have named channels internally. This is what I was referring to: http://www.rsyslog.com/doc-rsyslog_conf_actions.html Output Channel Binds an output channel definition (see there for details) to this action. Output channel actions must start with a $-sign, e.g. if you would like to bind your output channel definition "mychannel" to the action, use "$mychannel". Output channels support template definitions like all all other actions. And explained further here: http://www.rsyslog.com/doc-rsyslog_conf_output.html which clarifies that this isn't quite what I thought it was. I was thinking it was a way of assigning a name to a set of selector rules, which could then later be directed to one or more outputs. If such a thing existed, then conceivably it would be useful to permit a DBus subscriber to specify the particular filtered channel of data it wished to subscribe to. Of course it might also be useful to allow DBus subscribers to specify an arbitrary selector pattern. (Though this raises security complications, as you don't want all syslog data to be readable by any local user.) > I have been using named pipes extensivly with both rsyslog and sysklogd, > what problems were you having? In many cases I observed sysklogd closing the pipe resulting in a loop in the reader process checking for EOF to exit. Perhaps that's normal and expected. That's easy enough to deal with. In at least one case I observed that after such an exit, when the reader was restarted, it failed to receive any new messages until sysklogd was restarted, suggesting it had closed the pipe and failed to reopen it when the next applicable message was processed. But my testing with named pipes has been too limited to draw any firm conclusions. I need to get the syslog broadcast problem resolved (which I discussed in another thread), as that's part of the overall system I'm setting up that would make active and regular use of these desktop notifications, and permit longer term testing. >> I'm curious to know if rsyslog will block if it fills the buffer going >> to the named pipe, in much the same way it can block if the shell >> execute process hangs? > > yes rysyslog would block... Does it differ from sysklogd in that respect, or does it not block if the pipe is closed (no reader running)? -Tom From david at lang.hm Tue May 5 01:49:24 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 4 May 2009 16:49:24 -0700 (PDT) Subject: [rsyslog] desktop notifications from syslog In-Reply-To: <49FF76A7.3050002@gmail.com> References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> <49FF76A7.3050002@gmail.com> Message-ID: On Mon, 4 May 2009, Tom Metro wrote: > david at lang.hm wrote: >> Tom Metro wrote: >>> A first cut would provide only >>> a single DBus operation - subscribe to events. >> >> the term 'subscribe to events' sounds to me like it recieves messages >> others send. in this case it needs to be generating events that others >> would subscribe to. am I just confused by the terminaology here? > > The client-server relationship may be inverted from what you expect. > > The proposed architecture is something like this: > > rsyslog output plugin w/DBus interface > ^ > | DBus > | > syslog-notify > | > | DBus session bus > V > notification daemon > > Here the 'syslog-notify' "glue" daemon runs under the user's X session, > establishes a connection to rsyslog via DBus, and subscribes to events. > > That way the output plugin could support multiple subscribers, each > eventually with their own topic/channel, and could be a no-op if there > are no subscribers. I think the best way to do this is for the rsyslog output plugin to not worry about who's listening. it just takes anything that the rsyslog rules tell it to forward and sends it to DBUS. at that point it's up to the admin to deal with the security issue about what's going to be visable >>> An incremental enhancement might be to permit the DBus client to specify >>> a named channel it wishes to subscribe to. (I recall seeing something >>> about named channels in the rsyslog documentation. Not sure if they'd be >>> applicable here.) >> >> rsyslog does not have named channels internally. > > This is what I was referring to: > > http://www.rsyslog.com/doc-rsyslog_conf_actions.html > Output Channel > > Binds an output channel definition (see there for details) to this > action. Output channel actions must start with a $-sign, e.g. if you > would like to bind your output channel definition "mychannel" to the > action, use "$mychannel". Output channels support template definitions > like all all other actions. > > And explained further here: > http://www.rsyslog.com/doc-rsyslog_conf_output.html > I'll look into this a bit later tonight > which clarifies that this isn't quite what I thought it was. I was > thinking it was a way of assigning a name to a set of selector rules, > which could then later be directed to one or more outputs. > > If such a thing existed, then conceivably it would be useful to permit a > DBus subscriber to specify the particular filtered channel of data it > wished to subscribe to. > > Of course it might also be useful to allow DBus subscribers to specify > an arbitrary selector pattern. (Though this raises security > complications, as you don't want all syslog data to be readable by any > local user.) you don't want to try and have dbus users push rules through dbus to rsyslog. they can choose to ignore some of the stuff that they get, but you don't want to go further than that. >> I have been using named pipes extensivly with both rsyslog and sysklogd, >> what problems were you having? > > In many cases I observed sysklogd closing the pipe resulting in a loop > in the reader process checking for EOF to exit. this is correct. applications reading from pipes like this can't assume that a EOF means that they should exit (unless you wrap them with something to restart them) > Perhaps that's normal > and expected. That's easy enough to deal with. In at least one case I > observed that after such an exit, when the reader was restarted, it > failed to receive any new messages until sysklogd was restarted, > suggesting it had closed the pipe and failed to reopen it when the next > applicable message was processed. hmm, I haven't had that problem. I wonder if the process was down long enough for the pipe to fill up and sysklog to decide there was a problem and stop writing to that pipe (some versions would do that instead of blocking) > But my testing with named pipes has been too limited to draw any firm > conclusions. I need to get the syslog broadcast problem resolved (which > I discussed in another thread), as that's part of the overall system I'm > setting up that would make active and regular use of these desktop > notifications, and permit longer term testing. what I'm doing for the syslog broadcast is defining a multicast MAC address for a specific IP, and then setting that IP address up on all the systems that need to see the message. (see http://www.linux-ha.org/ClusterIP for info on this and examples of how to set it up for testing) this lets me spread the load between multiple machines in one set while still having multiple sets of boxes recieve the same message. >>> I'm curious to know if rsyslog will block if it fills the buffer going >>> to the named pipe, in much the same way it can block if the shell >>> execute process hangs? >> >> yes rysyslog would block... > > Does it differ from sysklogd in that respect, or does it not block if > the pipe is closed (no reader running)? I don't know. David Lang From rgerhards at hq.adiscon.com Tue May 5 07:46:21 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 5 May 2009 07:46:21 +0200 Subject: [rsyslog] desktop notifications from syslog References: <49F8B2F7.9070201@gmail.com><49F8C67A.7060103@lists.bod.org><49F8FB18.8030000@gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com><49FE9D7F.5060706@gmail.com><49FF76A7.3050002@gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B01E@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Tuesday, May 05, 2009 1:49 AM > To: rsyslog-users > Subject: Re: [rsyslog] desktop notifications from syslog > > On Mon, 4 May 2009, Tom Metro wrote: > > > david at lang.hm wrote: > >> Tom Metro wrote: > >>> A first cut would provide only > >>> a single DBus operation - subscribe to events. > >> > >> the term 'subscribe to events' sounds to me like it recieves messages > >> others send. in this case it needs to be generating events that others > >> would subscribe to. am I just confused by the terminaology here? > > > > The client-server relationship may be inverted from what you expect. > > > > The proposed architecture is something like this: > > > > rsyslog output plugin w/DBus interface > > ^ > > | DBus > > | > > syslog-notify > > | > > | DBus session bus > > V > > notification daemon > > > > Here the 'syslog-notify' "glue" daemon runs under the user's X session, > > establishes a connection to rsyslog via DBus, and subscribes to events. > > > > That way the output plugin could support multiple subscribers, each > > eventually with their own topic/channel, and could be a no-op if there > > are no subscribers. > > I think the best way to do this is for the rsyslog output plugin to not > worry about who's listening. it just takes anything that the rsyslog rules > tell it to forward and sends it to DBUS. > > at that point it's up to the admin to deal with the security issue about > what's going to be visable Exactly. But it is simple to define a kind of "channel" at the rsyslog side. I'd program the output plugin so that it takes a config parameter that tells which dbus name to use. I'd also allow multiple instances, each of them using their own name. So you could, for example, set up some for emergency and some for warning messages - or some for user Alice and some for user Bob. All via the regular rule engine (but static inside the config). > >>> An incremental enhancement might be to permit the DBus client to specify > >>> a named channel it wishes to subscribe to. (I recall seeing something > >>> about named channels in the rsyslog documentation. Not sure if they'd be > >>> applicable here.) > >> > >> rsyslog does not have named channels internally. > > > > This is what I was referring to: > > > > http://www.rsyslog.com/doc-rsyslog_conf_actions.html > > Output Channel > > > > Binds an output channel definition (see there for details) to this > > action. Output channel actions must start with a $-sign, e.g. if you > > would like to bind your output channel definition "mychannel" to the > > action, use "$mychannel". Output channels support template definitions > > like all all other actions. > > > > And explained further here: > > http://www.rsyslog.com/doc-rsyslog_conf_output.html > > > > I'll look into this a bit later tonight Don't bother - that's a very old idea that did not work out as originally thought. I still support it, and there are use cases (e.g. space-limited files), but it is considered legacy functionality. > > > which clarifies that this isn't quite what I thought it was. I was > > thinking it was a way of assigning a name to a set of selector rules, > > which could then later be directed to one or more outputs. > > > > If such a thing existed, then conceivably it would be useful to permit a > > DBus subscriber to specify the particular filtered channel of data it > > wished to subscribe to. > > > > Of course it might also be useful to allow DBus subscribers to specify > > an arbitrary selector pattern. (Though this raises security > > complications, as you don't want all syslog data to be readable by any > > local user.) > > you don't want to try and have dbus users push rules through dbus to > rsyslog. Exactly. > they can choose to ignore some of the stuff that they get, but > you don't want to go further than that. Full ack > > >> I have been using named pipes extensivly with both rsyslog and sysklogd, > >> what problems were you having? > > > > In many cases I observed sysklogd closing the pipe resulting in a loop > > in the reader process checking for EOF to exit. > > this is correct. applications reading from pipes like this can't assume > that a EOF means that they should exit (unless you wrap them with > something to restart them) > > > Perhaps that's normal > > and expected. That's easy enough to deal with. In at least one case I > > observed that after such an exit, when the reader was restarted, it > > failed to receive any new messages until sysklogd was restarted, > > suggesting it had closed the pipe and failed to reopen it when the next > > applicable message was processed. > > hmm, I haven't had that problem. I wonder if the process was down long > enough for the pipe to fill up and sysklog to decide there was a problem > and stop writing to that pipe (some versions would do that instead of > blocking) Some versions of sysklogd had various problems with closed files. Even klogd could lose all messages if it were note restarted together with syslogd under some circumstances. I'd bet this was related to that bug. > > > But my testing with named pipes has been too limited to draw any firm > > conclusions. I need to get the syslog broadcast problem resolved (which > > I discussed in another thread), as that's part of the overall system I'm > > setting up that would make active and regular use of these desktop > > notifications, and permit longer term testing. > > what I'm doing for the syslog broadcast is defining a multicast MAC > address for a specific IP, and then setting that IP address up on all the > systems that need to see the message. (see > http://www.linux-ha.org/ClusterIP for info on this and examples of how to > set it up for testing) this lets me spread the load between multiple > machines in one set while still having multiple sets of boxes recieve the > same message. I think I'll add the broadcast patch soon, makes sense to me. I'd unconditionally add it (if no one objects), as I think it doesn't hurt anything if present but not used. > > >>> I'm curious to know if rsyslog will block if it fills the buffer going > >>> to the named pipe, in much the same way it can block if the shell > >>> execute process hangs? > >> > >> yes rysyslog would block... > > > > Does it differ from sysklogd in that respect, or does it not block if > > the pipe is closed (no reader running)? > > I don't know. Depends on the config. With default values, I'd expect that it drops messages instead of blocking. Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tmetro+rsyslog at gmail.com Tue May 5 08:14:19 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Tue, 05 May 2009 02:14:19 -0400 Subject: [rsyslog] desktop notifications from syslog In-Reply-To: References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> <49FF76A7.3050002@gmail.com> Message-ID: <49FFD93B.9080003@gmail.com> david at lang.hm wrote: > I think the best way to do this is for the rsyslog output plugin to not > worry about who's listening. it just takes anything that the rsyslog rules > tell it to forward and sends it to DBUS. > > at that point it's up to the admin to deal with the security issue about > what's going to be visable I agree. I was describing a potential future direction. > you don't want to try and have dbus users push rules through dbus to > rsyslog. they can choose to ignore some of the stuff that they get, but > you don't want to go further than that. Again, this was a hypothetical future capability. The use case would be allowing a third party listener to interact with rsyslog without having to have the user (or the installer for that app.) reconfigure rsyslog via its config file. That way the settings for that third party listener don't need to be put in two places. Of course the third party listener could always drop a file in /etc/rsyslog.d/, but consider a case where you have some GUI viewer of messages, and you want to let the user dynamically select the facilities or priorities that it monitors. An example of an application that permits 3rd party reconfiguration via DBus is Dnsmasq. An external process, like a resolvconf script, can inform Dnsmasq of a domain-specific DNS server to use dynamically after a VPN link is brought up. That's functionality that is beyond what can be achieved through the normal resolv.conf file, and it needs to happen dynamically. -Tom From tmetro+rsyslog at gmail.com Tue May 5 08:17:37 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Tue, 05 May 2009 02:17:37 -0400 Subject: [rsyslog] named pipes In-Reply-To: References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> <49FF76A7.3050002@gmail.com> Message-ID: <49FFDA01.2000204@gmail.com> david at lang.hm wrote: > Tom Metro wrote: >> In at least one case I observed that after such an exit, when the >> reader was restarted, it failed to receive any new messages until >> sysklogd was restarted, suggesting it had closed the pipe and >> failed to reopen it when the next applicable message was processed. > > I wonder if the process was down long enough for the pipe to fill up > and sysklog to decide there was a problem and stop writing to that > pipe (some versions would do that instead of blocking) Maybe. Though the buffer would have to have been quite small. I was testing with manually triggered message, and pushed through a half dozen of 80 characters or less. -Tom From tmetro+rsyslog at gmail.com Tue May 5 08:36:12 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Tue, 05 May 2009 02:36:12 -0400 Subject: [rsyslog] directing logs to a broadcast address fails In-Reply-To: <49FA2FE3.5080207@gmail.com> References: <49F7DF6B.9020208@gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AFE0@GRFEXC.intern.adiscon.com> <49F8D879.7080101@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF1@GRFEXC.intern.adiscon.com> <49FA2FE3.5080207@gmail.com> Message-ID: <49FFDE5C.6010005@gmail.com> david at lang.hm wrote: > Tom Metro wrote: >> I need to get the syslog broadcast problem resolved... > > what I'm doing for the syslog broadcast is defining a multicast MAC > address for a specific IP, and then setting that IP address up on all the > systems that need to see the message. (see > http://www.linux-ha.org/ClusterIP for info on this and examples of how to > set it up for testing) this lets me spread the load between multiple > machines in one set while still having multiple sets of boxes recieve the > same message. So to distribute the load you send some messages to multicast group A, and others to group B? Multicast in general makes sense if you are going to be sending a volume of messages to N > 1 log servers. (I noticed there was a multicast patch for sysklogd sitting in the Debian bug queue.) In my case I'm looking to distribute critical warning messages only, which will be rare, and it wouldn't benefit the network to use multicast, so I'd rather avoid the configuration overhead. See earlier messages in this thread for the details of the problem. The summary is that syslog messages directed at a broadcast address (x.x.x.255) fail to go anywhere on a Debian Etch box running sysklogd, or an Ubuntu 8.04 box running sysklogd or rsyslog, but rsyslog on Ubuntu 8.10 seems to work. I've backported that version of rsyslog to the 8.04 box, but it didn't resolve the problem. Inspecting the source code for the working version of rsyslog shows a lack of code to enable the broadcast flag, so I'm not sure why it works on 8.10. Patching the code to enable the broadcast flag didn't seem to help on 8.04. I know broadcast UDP packets work on the 8.04 box, as it uses DHCP successfully. I've started looking at the source to a DHCP client to see how it configures its socket to permit broadcasting. One thing I haven't tried recently is using the interface defined broadcast address (255.255.255.255), which is what the DHCP client uses. I tried that early on, but once I confirmed that the subnet broadcast address worked correctly on other machines, I ceased trying the global address. My guess is that there is some socket flag that still needs to be enabled to get it to work on 8.04. -Tom From david at lang.hm Tue May 5 08:47:53 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 4 May 2009 23:47:53 -0700 (PDT) Subject: [rsyslog] desktop notifications from syslog In-Reply-To: <49FFD93B.9080003@gmail.com> References: <49F8B2F7.9070201@gmail.com> <49F8C67A.7060103@lists.bod.org> <49F8FB18.8030000@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF2@GRFEXC.intern.adiscon.com> <49FE9D7F.5060706@gmail.com> <49FF76A7.3050002@gmail.com> <49FFD93B.9080003@gmail.com> Message-ID: On Tue, 5 May 2009, Tom Metro wrote: > david at lang.hm wrote: > > >> you don't want to try and have dbus users push rules through dbus to >> rsyslog. they can choose to ignore some of the stuff that they get, but >> you don't want to go further than that. > > Again, this was a hypothetical future capability. > > The use case would be allowing a third party listener to interact with > rsyslog without having to have the user (or the installer for that app.) > reconfigure rsyslog via its config file. That way the settings for > that third party listener don't need to be put in two places. > > Of course the third party listener could always drop a file in > /etc/rsyslog.d/, but consider a case where you have some GUI viewer of > messages, and you want to let the user dynamically select the facilities > or priorities that it monitors. > > An example of an application that permits 3rd party reconfiguration via > DBus is Dnsmasq. An external process, like a resolvconf script, can > inform Dnsmasq of a domain-specific DNS server to use dynamically after > a VPN link is brought up. That's functionality that is beyond what can > be achieved through the normal resolv.conf file, and it needs to happen > dynamically. I really don't think that it's a good idea to try and send lots of messages via DBUS, if you want a GUI logwatcher you really wouldn't want to tie it only to rsyslog. what you would want is something that recieves the full log stream through a pipe from the systems syslog daemon (sysklog, rsyslog, or syslog-ng) and then has it's own rules about wht to display. David Lang From david at lang.hm Tue May 5 09:29:16 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 5 May 2009 00:29:16 -0700 (PDT) Subject: [rsyslog] directing logs to a broadcast address fails In-Reply-To: <49FFDE5C.6010005@gmail.com> References: <49F7DF6B.9020208@gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AFE0@GRFEXC.intern.adiscon.com> <49F8D879.7080101@gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AFF1@GRFEXC.intern.adiscon.com> <49FA2FE3.5080207@gmail.com> <49FFDE5C.6010005@gmail.com> Message-ID: On Tue, 5 May 2009, Tom Metro wrote: > Date: Tue, 05 May 2009 02:36:12 -0400 > From: Tom Metro > > david at lang.hm wrote: >> Tom Metro wrote: >>> I need to get the syslog broadcast problem resolved... >> >> what I'm doing for the syslog broadcast is defining a multicast MAC >> address for a specific IP, and then setting that IP address up on all the >> systems that need to see the message. (see >> http://www.linux-ha.org/ClusterIP for info on this and examples of how to >> set it up for testing) this lets me spread the load between multiple >> machines in one set while still having multiple sets of boxes recieve the >> same message. > > So to distribute the load you send some messages to multicast group A, > and others to group B? no, much simpler. I'll describe more below > Multicast in general makes sense if you are going to be sending a volume > of messages to N > 1 log servers. (I noticed there was a multicast patch > for sysklogd sitting in the Debian bug queue.) that's to send the message to a multicast IP address, somthing different from what I am talking about. it turns out that the ethernet spec says that a mac address with a low bit of 1 in the first octect (such as 01:00:00:00:00:00) is defined as multicast, and acts like a broadcast packet through a switch, no matter what IP address it has (say 192.168.1.100) then you can use the iptables command (see the clusterIP link I sent you) to configure your linux box to say that it is node 1 of 1 and it will see all the packets. you can also say that you are node 1 of 2 (with another box being node 2 of 2) and both boxes will do a hash of the packet header info (dest IP, dest port, source IP, source port) and decide which node should handle that packet. that node will process the packet and other nodes will drop the packet. with UDP packets, you can have multiple sets of machines doing this with the same MAC address, at which point box 1 can send a packet to 192.168.1.100 and box 2 (node 1 of 1) will process it box 3 (node 1 of 2) will process it box 4 (node 2 of 2) will ignore it boxes 5-8 (nodes 1-4 of 10) will ignore it box 9 (node 5 of 10) will process it boxes 10-14 (nodes 6-10 of 10) will ignore it etc. so you are broadcasting to multiple clusters, but within each cluster it load balances and only one box in each cluster will fully process the packet. yes, UDP can be lost (this is _not_ the ultra-reliable option being discussed in other threads), but I have measured multiple test runs of a billion packets being sent with this mechanism over a switch at Gig-E wire speed to several seperate clusters receiving the logs, with no packet loss (on an early 4.1 version), so it's pretty reliable > In my case I'm looking to distribute critical warning messages only, > which will be rare, and it wouldn't benefit the network to use > multicast, so I'd rather avoid the configuration overhead. that is simpler for your purpose. > See earlier messages in this thread for the details of the problem. The > summary is that syslog messages directed at a broadcast address > (x.x.x.255) fail to go anywhere on a Debian Etch box running sysklogd, > or an Ubuntu 8.04 box running sysklogd or rsyslog, but rsyslog on Ubuntu > 8.10 seems to work. I've backported that version of rsyslog to the 8.04 > box, but it didn't resolve the problem. > > Inspecting the source code for the working version of rsyslog shows a > lack of code to enable the broadcast flag, so I'm not sure why it works > on 8.10. Patching the code to enable the broadcast flag didn't seem to > help on 8.04. > > I know broadcast UDP packets work on the 8.04 box, as it uses DHCP > successfully. > > I've started looking at the source to a DHCP client to see how it > configures its socket to permit broadcasting. One thing I haven't tried > recently is using the interface defined broadcast address > (255.255.255.255), which is what the DHCP client uses. I tried that > early on, but once I confirmed that the subnet broadcast address worked > correctly on other machines, I ceased trying the global address. that is a bit different from the broadcast address for your network. David Lang > My guess is that there is some socket flag that still needs to be > enabled to get it to work on 8.04. From rgerhards at hq.adiscon.com Wed May 6 07:18:20 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 07:18:20 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, May 01, 2009 10:14 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > As a side-note, I have also identified that we have overlooked a subtle > issue > > so far: backup actions - they need to work on the subset of the batch that > > had message permanent failures. So the message state actually needs to be > > part of the message inside the batch. But now, I think, things really begin > > to come together and are far less complex than initially thought. > > > > One problem with the state chart - that was why I said it is not 100% > correct > > - is that it does not properly abstract batches vs. single messages. Both of > > them entangled in a way that I thought [;)] to be very complex. But if you > > model that with processing states, then the batch processing state is simply > > a function of the individual message processing states. > > > > Please let me know if you also find a math model useful (but I'll probably > > need to do it in any case, because it helps me clean up my mind...). > > I think it will help clarify things a lot. with a good model we won't have > misunderstandings about what we are talking about. I have put on interim version of my model online: http://www.rsyslog.com/download/design.pdf Note that it is far from being ready, and some things are not as clear (or even correct) as I would like to see them. Also, it currently contains definitions for the various objects, but not any processing information. My aim is to get a very clear (and dense!) picture of the objects and the relationships between them from where I am convinced we can find some simple processing functions (which may or may not be easily enough translated in code). Feedback is appreciate, but please keep in mind that the document is a moving target. I'll be working on it full day today so I hope to have a better version this afternoon. Rainer From david at lang.hm Wed May 6 13:57:39 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 04:57:39 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: > > I have put on interim version of my model online: > > http://www.rsyslog.com/download/design.pdf > > Note that it is far from being ready, and some things are not as clear (or > even correct) as I would like to see them. Also, it currently contains > definitions for the various objects, but not any processing information. My > aim is to get a very clear (and dense!) picture of the objects and the > relationships between them from where I am convinced we can find some simple > processing functions (which may or may not be easily enough translated in > code). > > Feedback is appreciate, but please keep in mind that the document is a moving > target. I'll be working on it full day today so I hope to have a better > version this afternoon. thanks for sending this. unfortuntntly one question right away (and one that doesn't translate into text well :-( at the end of the first paragraph you say (cut-n-paste into text, symbols mangled) 'Often, objects Oi; i 2 N; i partition O, but this is not necessarily the case.' I'm not sure what is meant by this I read that as often the set of objects identified by i, with i being a subset of N such that i <= |gothic O| partition gothic O, but this is not nessasarily the case I am lost from the point N is introduced I think that you are saying that a set of messages to be processed are frequently contiguous, but not always. in section 2 you say be Sm={} the totally ordered set of message states I think a better thing to say would be either define Sm={} as the totally ordered set of messge states or let Sm={} be the totally orderd set of message states it may just be that I have been out of school for a long time and am forgetting defintitions, but it seems to me that you ae skipping the defintitions of a bunch of stuff in section 2. I think I'm puzzling it out, but eventually it's something to clean up. you use the term 'tuple' a lot in this section, and i'm not sure it's always correct. to me a tuple is a set of two items, like coordinates (X,Y), but you use it in at least one case where you can have an arbatrary number of items in the list. I think that in many cases where you say tuple you really would be better off saying list. in section 3.11 you say "However, the action itself may also end the transaction and notify the caller." by this do you mean that the action may abort the transaction? or that the action could decide to commit (complete) the transaction? if you mean abort the transaction, this makes sense (essentially on any doAction() call the return code could be 'fatal error, transaction aborted' and the queue walker code would have to fail the entire batch and retry) if you mean allow it to decide to commit the transaction early if it chooses, this strikes me as a wrong thing to do. in the case of rsyslog (where we are commiting a set of unrelated messages) it is not nessasarily a fatal problem, but it just seems to complicate things with little benifit. also the fact that you use the term 'notify the caller' could be read to mean that it executes a callback function from the main system in this case, I think it's simpler to just return an error code (which is the other way to read this) a bit further down you clarify this to mean that it may commit early. my initial reaction is that this is premature optimization. I see why you want to do this (the case of buffers filling up is a good example), but I think the complications that this produces (which you then touch on) are ugly enough that it may be worth the inefficiancy of failing the transaction (and having to resubmit a partial batch) rather than dealing with the need to say things like 'the transaction suceeded, but this message failed' as part of a doAction() call. in defining what happens when a transaction is aborted you say 'However, not all outputs work on actually transactional destination.', I think the better way to say this (borrowing from the database terminology) is that not all outputs provide atomic transactions. you say An output transaction is started by calling beginTransaction() either ex-plicitely or implicitely by a call to doAction() without calling beginTransaction() before I think it's a mistake to do both. pick one or the other and standarzie on it. I see the value in letting the first doAction() call trigger the beginTransaction() in simplifying the calling code, but if the output module must handle this case, make that the standard way of doing things and eliminate the seperate call of beginTransaction entirely this says that either the entire transaction is submitted, or it all fails, I think that the optimization of allowing the endTransaction() call to return 'the first X suceeded, the rest failed' may be worth supporting. it's FAR simpler than the 'doAction() may trigger an endTransaction() transparently' that you were exploring earlier. in your transaction diagram, a sucessful retry should move to to sucess, not ready (ready is only a state that the action will be in immediatly after it starts, before it has processed anything. as such it's fairly meaningless and could probably be combined with success as 'all prior work is done and ready to start a new transaction'. you may also need a state 'not initialized' for startup, or you may just say that the initialization needs to be done before the queues are setup, so it's not relavent to this discussion. if you really do want doAction() to be able to finish a transaction and start another one, the state diagram will be far more complicated. one final note on locking, I expect that the process of processing objects in the queue (marking them as pending, formatting them, and calling doAction() on them), is going to require some locking in the face of multiple worker threads (to prevent two threads from processing the same message). I see two ways of doing this. processbatch( foreach message (up to limit or number in queue){ lock queue mark message pending unlock queue formatMessage() doAction() } endTransaction() lock queue foreach message{ mark completed } unlock queue ) processbatch( lock queue foreach message (up to limit or number in queue){ mark message pending formatMessage() doAction() } unlock queue endTransaction() lock queue foreach message{ mark completed } unlock queue ) I suspect that the overhead in manipulating the lock is high enough that the second approach will be a win (similar to the efficiancies that were gained in the UDP input module by letting it add multiple messages to the queue inside one lock). As such I am seeing significant value in making the doAction() call be lightweight under all conditions, which is an argument against having it do any more than nessasary. I think that the beginTransaction() functionality is probably fast enough that adding that into doAction() is acceptable, but the endTransaction() functionality definantly has the potential to block for a substantial amount of time (and therefor must not happen while locks are held that could cause other threads to be waiting) except for your isolation of the output modules (which I have questioned elsewhere), it could even be a win to move the formatMessage() step out of the inner loop and have that called from the output module (doAction() would just pass the pointer to the queue object, the output module would remember them and do the formatting inside endTransaction()), this would still be thread-safe as the output module would only be reading the message queue. hmm, the more I think about this, the cleaner it seems to be. It would also delay the need to do any buffer allocation until the endTransaction() step. this doesn't nessasarily eliminate the peak memory usage (all threads could be in endTransaction at the same time), but it will significantly reduce the average memory useage (normally they _won't_ all be in endTransaction() with maximum size messages at the same time) there would be a fixed size array (based on the max batch size), to track what messages are in the batch. I think this is already needed for the worker thread to know which messages to mark as completed when the transaction completes doAction() would just put a pointer (logical pointer, not C memory pointer) to the message contents into the next slot in the array. there would be no need for a startTransaction. there would be a helper function, something like formatMessage(char *format, msg *message, char *output, int bufsize) that would format the message and write it to the output buffer, output would then point at the null at the end of the string endTransaction() would do all the work. it would allocate a buffer needed (note that it would know the actual message sizes, so could allocate based on the actual amount of data involved), do the work that was planned for startTransaction() (probably putting boilerplate in the buffer), call formatMessage() for each message in the list, output the messages, free the buffers, and return note that this assumes that when you have action queues, the action queues contain the same data that was in the main queue, and the formatting is done by the queue walker thread that called doAction(), not by the thread that walks the main queue and puts the messages in the action queue. If that isn't the case (if instead, the action queues contain the formatted strings, not the raw messages), then this won't work, but we should have another discussion around this (including the potential advantages of single-instance-store if you have lots of action queues) David Lang From rgerhards at hq.adiscon.com Wed May 6 14:36:47 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 14:36:47 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 1:58 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > > > > I have put on interim version of my model online: > > > > http://www.rsyslog.com/download/design.pdf > > > > Note that it is far from being ready, and some things are not as clear (or > > even correct) as I would like to see them. Also, it currently contains > > definitions for the various objects, but not any processing information. My > > aim is to get a very clear (and dense!) picture of the objects and the > > relationships between them from where I am convinced we can find some simple > > processing functions (which may or may not be easily enough translated in > > code). > > > > Feedback is appreciate, but please keep in mind that the document is a > moving > > target. I'll be working on it full day today so I hope to have a better > > version this afternoon. > > thanks for sending this. I have been updating it throughout the day, so I right now another version than you downloaded may be available. > > unfortuntntly one question right away (and one that doesn't translate into > text well :-( > > at the end of the first paragraph you say (cut-n-paste into text, symbols > mangled) > > 'Often, objects Oi; i 2 N; i partition O, > but this is not necessarily the case.' > > I'm not sure what is meant by this > > I read that as > often > the set of objects identified by i, with i being a subset of N such that i <= > |gothic O| > partition gothic O, but this is not nessasarily the case > I am lost from the point N is introduced It just describes i as an index. So, we have these "all-sets" gothic-o of all instances that exist in a running system. When you build sets O from these instances, they may or may not partition the all-sets. As I do not know how many of these O-sets exist, I use O_i to index them. So let's assume we have 5 instances e.g. of a message inside rsyslog. Then we have 5 or less of the O-sets inside the system. With I, that samply means we have O_1, O_2, ..., O_5 sets. Often messages may be in just one of these sets at a given time (partitions them). Now replace "message" with any of the other objects. That means that while often the all-set is partitioned in sets, this must not necessarily be the case (one object may be in two O-sets at the same time). > > I think that you are saying that a set of messages to be processed are > frequently contiguous, but not always. Important is that o is not "message", but object. Here, o can be anything, be it a message, an action, a queue, whatever. It's just a generic container. > > > > in section 2 you say > > be Sm={} the totally ordered set of message states > > I think a better thing to say would be either > > define Sm={} as the totally ordered set of messge states > or > let Sm={} be the totally orderd set of message states > > > it may just be that I have been out of school for a long time and am > forgetting defintitions, but it seems to me that you ae skipping the > defintitions of a bunch of stuff in section 2. I think I'm puzzling it > out, but eventually it's something to clean up. That's why I said it is preliminary ;) In fact, the current version has mangled much with the state sets, but not yet in a consistent way. I am not even sure if the structure of section 2 will survive or be moved to some other parts. While this is tempting, it is nice to have everything very condendsed close together, after all, that was the initial motivation. The sentences you quote above are also changed now. Besides that, I noticed that I a missing a couple of German <-> English translations, which add some extra problems ;) > you use the term 'tuple' a lot in this section, and i'm not sure it's > always correct. to me a tuple is a set of two items, that's an ordered pair... > like coordinates > (X,Y), but you use it in at least one case where you can have an arbatrary > number of items in the list. I think that in many cases where you say > tuple you really would be better off saying list. ... that's mostly the same term, but I have to admit that in the context I am using it, at least to me tuple sounds more intuitive. In code it isn't. At least Wikipedia seems to take the same position: http://en.wikipedia.org/wiki/Tuple > > > > in section 3.11 you say > "However, the action itself may also end the transaction and notify the > caller." by this do you mean that the action may abort the transaction? or > that the action could decide to commit (complete) the transaction? The later case. > > if you mean abort the transaction, this makes sense (essentially on any > doAction() call the return code could be 'fatal error, transaction > aborted' and the queue walker code would have to fail the entire batch and > retry) if you mean allow it to decide to commit the transaction early if > it chooses, this strikes me as a wrong thing to do. You need to think broader than databases. For example, the tcp forwarder NEEDS to commit every record, simply because it has no other chance in doing things. Well, it may commit only after a given buffer size, but it definitely can not (or should not) wait until the caller is finished. Even if it waited until endTransaction is called(), it could only then move data to the actual output, where it than - right in the middle - may see problems. It is far better if it can commit in between. > in the case of rsyslog > (where we are commiting a set of unrelated messages) it is not nessasarily > a fatal problem, but it just seems to complicate things with little > benifit. It actually simplifies things - because we need not take different approaches to different type of output plugins. > > also the fact that you use the term 'notify the caller' could be read to > mean that it executes a callback function from the main system in this > case, I think it's simpler to just return an error code (which is the > other way to read this) That's already in the newest draft [but other things are probably inconsistent in that]. > > a bit further down you clarify this to mean that it may commit early. my > initial reaction is that this is premature optimization. I see why you > want to do this (the case of buffers filling up is a good example), but I > think the complications that this produces (which you then touch on) are > ugly enough that it may be worth the inefficiancy of failing the > transaction (and having to resubmit a partial batch) rather than dealing > with the need to say things like 'the transaction suceeded, but this > message failed' as part of a doAction() call. > > > > > in defining what happens when a transaction is aborted you say 'However, > not all outputs work on actually transactional destination.', I think the > better way to say this (borrowing from the database terminology) is that > not all outputs provide atomic transactions. > > > you say > > An output transaction is started by calling beginTransaction() either > ex-plicitely or implicitely by a call to doAction() without calling > beginTransaction() before > > I think it's a mistake to do both. pick one or the other and standarzie on > it. I see the value in letting the first doAction() call trigger the > beginTransaction() in simplifying the calling code, but if the output > module must handle this case, make that the standard way of doing things > and eliminate the seperate call of beginTransaction entirely Good point! Makes a lot of sense. > > > > this says that either the entire transaction is submitted, or it all > fails, I think that the optimization of allowing the endTransaction() call > to return 'the first X suceeded, the rest failed' may be worth supporting. > it's FAR simpler than the 'doAction() may trigger an endTransaction() > transparently' that you were exploring earlier. I may be thinking wrong, but this sounds much more complex to me (again, do not think databases at this time). Just think about the buffering needs. > > > in your transaction diagram, a sucessful retry should move to to sucess, > not ready (ready is only a state that the action will be in immediatly > after it starts, before it has processed anything. as such it's fairly > meaningless and could probably be combined with success as 'all prior work > is done and ready to start a new transaction'. I just dumped the whole diagram. It still does not model action state correctly... > > you may also need a state 'not initialized' for startup, or you may just > say that the initialization needs to be done before the queues are setup, > so it's not relavent to this discussion. > > > if you really do want doAction() to be able to finish a transaction and > start another one, the state diagram will be far more complicated. > > one final note on locking, I expect that the process of processing objects > in the queue (marking them as pending, formatting them, and calling > doAction() on them), is going to require some locking in the face of > multiple worker threads (to prevent two threads from processing the same > message). I see two ways of doing this. > > processbatch( > foreach message (up to limit or number in queue){ > lock queue > mark message pending > unlock queue > formatMessage() > doAction() > } > endTransaction() > lock queue > foreach message{ > mark completed > } > unlock queue > ) > > processbatch( > lock queue > foreach message (up to limit or number in queue){ > mark message pending > formatMessage() > doAction() > } > unlock queue > endTransaction() > lock queue > foreach message{ > mark completed > } > unlock queue > ) doAction cannot be called within the queue worker. A simple reason is that this does not support direct mode. Also, it would take far too long to complete. If we have infinite retries, it may sit for a day or more inside this call ;) [not precisely in that call, but in a loop surrounding it]. If we need to close the tiny potential message loss window (and you have not yet convinced me there is reason to do so - looking forward to your disk-queue results), we must still do a "dequeue pending", then process "as usual" and go over the batch again to actually remove messages from the queue. > > I suspect that the overhead in manipulating the lock is high enough that > the second approach will be a win (similar to the efficiancies that were > gained in the UDP input module by letting it add multiple messages to the > queue inside one lock). > > As such I am seeing significant value in making the doAction() call be > lightweight under all conditions, which is an argument against having it > do any more than nessasary. We do not know what this may be. Again, don't think "database only". > I think that the beginTransaction() > functionality is probably fast enough that adding that into doAction() is > acceptable, but the endTransaction() functionality definantly has the > potential to block for a substantial amount of time (and therefor must not > happen while locks are held that could cause other threads to be waiting) > > except for your isolation of the output modules (which I have questioned > elsewhere), it could even be a win to move the formatMessage() step out of > the inner loop and have that called from the output module (doAction() > would just pass the pointer to the queue object, the output module would > remember them and do the formatting inside endTransaction()), this would > still be thread-safe as the output module would only be reading the > message queue. > > hmm, the more I think about this, the cleaner it seems to be. The problem is that you are still looking from a different design ;) Why should the output do this if it can be provided with the already fabricated information? At this stage in processing, no looking at all is involved (except for the action lock, which is required because the interface guarantees that actions are not called concurrently from multiple threads). > > It would also delay the need to do any buffer allocation until the > endTransaction() step. this doesn't nessasarily eliminate the peak memory > usage (all threads could be in endTransaction at the same time), but it > will significantly reduce the average memory useage (normally they _won't_ > all be in endTransaction() with maximum size messages at the same time) > > there would be a fixed size array (based on the max batch size), to > track what messages are in the batch. I think this is already needed for > the worker thread to know which messages to mark as completed when the > transaction completes > > doAction() would just put a pointer (logical pointer, not C memory > pointer) to the message contents into the next slot in the array. > > there would be no need for a startTransaction. > > there would be a helper function, something like > formatMessage(char *format, msg *message, char *output, int bufsize) > that would format the message and write it to the output buffer, output > would then point at the null at the end of the string > > endTransaction() would do all the work. it would allocate a buffer > needed (note that it would know the actual message sizes, so could > allocate based on the actual amount of data involved), do the work > that was planned for startTransaction() (probably putting > boilerplate in the buffer), call formatMessage() for each message in the > list, output the messages, free the buffers, and return That complicates the programming of all non-db type of output plugins considerably. I don't think this is justified. > > > > > note that this assumes that when you have action queues, the action queues > contain the same data that was in the main queue, and the formatting is > done by the queue walker thread that called doAction(), not by the thread > that walks the main queue and puts the messages in the action queue. That's correct. > If that isn't the case (if instead, the action queues contain the formatted > strings, not the raw messages), then this won't work, but we should have > another discussion around this (including the potential advantages of > single-instance-store if you have lots of action queues) Rainer From david at lang.hm Wed May 6 15:26:34 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 06:26:34 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Wed, 6 May 2009, Rainer Gerhards wrote: >> >>> >>> I have put on interim version of my model online: >>> >>> http://www.rsyslog.com/download/design.pdf >>> >>> Note that it is far from being ready, and some things are not as clear > (or >>> even correct) as I would like to see them. Also, it currently contains >>> definitions for the various objects, but not any processing information. > My >>> aim is to get a very clear (and dense!) picture of the objects and the >>> relationships between them from where I am convinced we can find some > simple >>> processing functions (which may or may not be easily enough translated in >>> code). >>> >>> Feedback is appreciate, but please keep in mind that the document is a >> moving >>> target. I'll be working on it full day today so I hope to have a better >>> version this afternoon. >> >> thanks for sending this. > > I have been updating it throughout the day, so I right now another version > than you downloaded may be available. >> >> unfortuntntly one question right away (and one that doesn't translate into >> text well :-( >> >> at the end of the first paragraph you say (cut-n-paste into text, symbols >> mangled) >> >> 'Often, objects Oi; i 2 N; i partition O, >> but this is not necessarily the case.' >> >> I'm not sure what is meant by this >> >> I read that as >> often >> the set of objects identified by i, with i being a subset of N such that i > <= >> |gothic O| >> partition gothic O, but this is not nessasarily the case >> I am lost from the point N is introduced > > It just describes i as an index. So, we have these "all-sets" gothic-o of all > instances that exist in a running system. When you build sets O from these > instances, they may or may not partition the all-sets. As I do not know how > many of these O-sets exist, I use O_i to index them. So let's assume we have > 5 instances e.g. of a message inside rsyslog. Then we have 5 or less of the > O-sets inside the system. With I, that samply means we have O_1, O_2, ..., > O_5 sets. Often messages may be in just one of these sets at a given time > (partitions them). Now replace "message" with any of the other objects. That > means that while often the all-set is partitioned in sets, this must not > necessarily be the case (one object may be in two O-sets at the same time). Ok, makes sense >> in section 2 you say >> >> be Sm={} the totally ordered set of message states >> >> I think a better thing to say would be either >> >> define Sm={} as the totally ordered set of messge states >> or >> let Sm={} be the totally orderd set of message states >> >> >> it may just be that I have been out of school for a long time and am >> forgetting defintitions, but it seems to me that you ae skipping the >> defintitions of a bunch of stuff in section 2. I think I'm puzzling it >> out, but eventually it's something to clean up. > > That's why I said it is preliminary ;) > > In fact, the current version has mangled much with the state sets, but not > yet in a consistent way. I am not even sure if the structure of section 2 > will survive or be moved to some other parts. While this is tempting, it is > nice to have everything very condendsed close together, after all, that was > the initial motivation. > > The sentences you quote above are also changed now. Besides that, I noticed > that I a missing a couple of German <-> English translations, which add some > extra problems ;) > >> you use the term 'tuple' a lot in this section, and i'm not sure it's >> always correct. to me a tuple is a set of two items, > > that's an ordered pair... > >> like coordinates >> (X,Y), but you use it in at least one case where you can have an arbatrary >> number of items in the list. I think that in many cases where you say >> tuple you really would be better off saying list. > > ... that's mostly the same term, but I have to admit that in the context I am > using it, at least to me tuple sounds more intuitive. In code it isn't. At > least Wikipedia seems to take the same position: > > http://en.wikipedia.org/wiki/Tuple now that I think about it, I've seen people use n-tuple to refer to arbatrarily wide sets. it just struck me as odd >> in section 3.11 you say >> "However, the action itself may also end the transaction and notify the >> caller." by this do you mean that the action may abort the transaction? or >> that the action could decide to commit (complete) the transaction? > > The later case. > >> >> if you mean abort the transaction, this makes sense (essentially on any >> doAction() call the return code could be 'fatal error, transaction >> aborted' and the queue walker code would have to fail the entire batch and >> retry) if you mean allow it to decide to commit the transaction early if >> it chooses, this strikes me as a wrong thing to do. > > You need to think broader than databases. For example, the tcp forwarder > NEEDS to commit every record, simply because it has no other chance in doing > things. Well, it may commit only after a given buffer size, but it definitely > can not (or should not) wait until the caller is finished. Even if it waited > until endTransaction is called(), it could only then move data to the actual > output, where it than - right in the middle - may see problems. It is far > better if it can commit in between. I'm not sure I agree with you on this. if you have lots of small messages, you do have advantages to sending them all to the stack at once. you don't _have_ to do so (tcp will combine the messages as it's waiting to send them out, but you could waste bandwith with small packets if the data comes in slowly enough) >> in the case of rsyslog >> (where we are commiting a set of unrelated messages) it is not nessasarily >> a fatal problem, but it just seems to complicate things with little >> benifit. > > It actually simplifies things - because we need not take different approaches > to different type of output plugins. I'm seeing it as the other way around, this complicates things by making there be different ways for the transaction to be committed. >> this says that either the entire transaction is submitted, or it all >> fails, I think that the optimization of allowing the endTransaction() call >> to return 'the first X suceeded, the rest failed' may be worth supporting. >> it's FAR simpler than the 'doAction() may trigger an endTransaction() >> transparently' that you were exploring earlier. > > I may be thinking wrong, but this sounds much more complex to me (again, do > not think databases at this time). Just think about the buffering needs. I'm not seeing how this increases the buffering needed >> if you really do want doAction() to be able to finish a transaction and >> start another one, the state diagram will be far more complicated. >> >> one final note on locking, I expect that the process of processing objects >> in the queue (marking them as pending, formatting them, and calling >> doAction() on them), is going to require some locking in the face of >> multiple worker threads (to prevent two threads from processing the same >> message). I see two ways of doing this. >> >> processbatch( >> foreach message (up to limit or number in queue){ >> lock queue >> mark message pending >> unlock queue >> formatMessage() >> doAction() >> } >> endTransaction() >> lock queue >> foreach message{ >> mark completed >> } >> unlock queue >> ) >> >> processbatch( >> lock queue >> foreach message (up to limit or number in queue){ >> mark message pending >> formatMessage() >> doAction() >> } >> unlock queue >> endTransaction() >> lock queue >> foreach message{ >> mark completed >> } >> unlock queue >> ) > > doAction cannot be called within the queue worker. A simple reason is that > this does not support direct mode. Also, it would take far too long to > complete. If we have infinite retries, it may sit for a day or more inside > this call ;) [not precisely in that call, but in a loop surrounding it]. Ok, I am missing things again. I had understood that doAction() _was_ called by the queue worker, that was how it passed the item to the output module code, and it would block if the doAction() call stalled (if you need to avoid it blocking, define an action queue) > If we need to close the tiny potential message loss window (and you have not > yet convinced me there is reason to do so - looking forward to your > disk-queue results), we must still do a "dequeue pending", then process "as > usual" and go over the batch again to actually remove messages from the > queue. yep, still pending. I had an emergancy come up, that among other things had me in the office from 4pm saturday to 2pm sunday :-( >> I suspect that the overhead in manipulating the lock is high enough that >> the second approach will be a win (similar to the efficiancies that were >> gained in the UDP input module by letting it add multiple messages to the >> queue inside one lock). >> >> As such I am seeing significant value in making the doAction() call be >> lightweight under all conditions, which is an argument against having it >> do any more than nessasary. > > We do not know what this may be. Again, don't think "database only". I am not. even if the action is writing to disk (with fsync), to a pipe, or calling an external program it can take a significant amount of time. you say above that doAction could block for hours, so I am confused a bit here. (the new version of the document may clear this up) >> I think that the beginTransaction() >> functionality is probably fast enough that adding that into doAction() is >> acceptable, but the endTransaction() functionality definantly has the >> potential to block for a substantial amount of time (and therefor must not >> happen while locks are held that could cause other threads to be waiting) >> >> except for your isolation of the output modules (which I have questioned >> elsewhere), it could even be a win to move the formatMessage() step out of >> the inner loop and have that called from the output module (doAction() >> would just pass the pointer to the queue object, the output module would >> remember them and do the formatting inside endTransaction()), this would >> still be thread-safe as the output module would only be reading the >> message queue. >> >> hmm, the more I think about this, the cleaner it seems to be. > > The problem is that you are still looking from a different design ;) Why > should the output do this if it can be provided with the already fabricated > information? At this stage in processing, no looking at all is involved > (except for the action lock, which is required because the interface > guarantees that actions are not called concurrently from multiple threads). if you don't have locking, what stops two worker threads from trying to de-queue the same message at the same time? reasons that I see to not format the message before putting it in the action queue. 1. performance bottleneck, if the queue walker for the main queue needs to format the message, it can't be moving messages, testing filters. for the default setup where you just use the entire string this isn't that big a deal, but if you have a more complicated format (escaping characters, doing substrings, etc) it can take substantially more time, some of that time is spent just parsing the format string to figure out what you need to do. 2. locking efficiancy this could be part of #1, if you are doing a lot then you want to drop and later re-aquire any locks you have so that other threads can get at the data, if you are doing very little then you can avoid doing. 3. single-instance-store, if you don't change the message between the main queue and the action queue you have the ability to just have one copy of the message contents (which you dynamicly allocate space for), instead of one copy for each queue. since memory is significantly slower than the CPU, avoiding the need to copy the data can speed things up as well >> It would also delay the need to do any buffer allocation until the >> endTransaction() step. this doesn't nessasarily eliminate the peak memory >> usage (all threads could be in endTransaction at the same time), but it >> will significantly reduce the average memory useage (normally they _won't_ >> all be in endTransaction() with maximum size messages at the same time) >> >> there would be a fixed size array (based on the max batch size), to >> track what messages are in the batch. I think this is already needed for >> the worker thread to know which messages to mark as completed when the >> transaction completes >> >> doAction() would just put a pointer (logical pointer, not C memory >> pointer) to the message contents into the next slot in the array. >> >> there would be no need for a startTransaction. >> >> there would be a helper function, something like >> formatMessage(char *format, msg *message, char *output, int bufsize) >> that would format the message and write it to the output buffer, output >> would then point at the null at the end of the string >> >> endTransaction() would do all the work. it would allocate a buffer >> needed (note that it would know the actual message sizes, so could >> allocate based on the actual amount of data involved), do the work >> that was planned for startTransaction() (probably putting >> boilerplate in the buffer), call formatMessage() for each message in the >> list, output the messages, free the buffers, and return > > That complicates the programming of all non-db type of output plugins > considerably. I don't think this is justified. I am not seeing what's db specific here, but I'll think about it more >> note that this assumes that when you have action queues, the action queues >> contain the same data that was in the main queue, and the formatting is >> done by the queue walker thread that called doAction(), not by the thread >> that walks the main queue and puts the messages in the action queue. > > That's correct. this seems to contridict what you said above. as I understand it (this is part of the diagram I didn't get created yet )-: rsyslog basicly breaks down into the following pieces from a process/thread point of view master thread (housekeeping, handling signals, etc) N input threads recieve messages parse each message into msg structure put the parsed msg structure in the main queue N main queue worker threads process messages in the main queue foreach message foreach filters if filter applies perform action (which is usually) format the message put formatted message into the action queue if action queue mode is DIRECT call doAction() N action queue worker threads process messages in the action queue foreach message call doAction() From david at lang.hm Wed May 6 15:36:24 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 06:36:24 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> Message-ID: > On Wed, 6 May 2009, Rainer Gerhards wrote: > >> >> I have been updating it throughout the day, so I right now another version >> than you downloaded may be available. I just tried to download it again and it looks like I got the same file (including the flow diagram that I thought you said you dumped for now) David Lang From rgerhards at hq.adiscon.com Wed May 6 15:40:20 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 15:40:20 +0200 Subject: [rsyslog] threading - was: output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com> > as I understand it (this is part of the diagram I didn't get created > yet )-: Actually, I don't know how you could envision that other than by drawing a line where a queue exists in the diagrams that exists. Honestly. I have no clue how other I could show this graphically... > > rsyslog basicly breaks down into the following pieces from a > process/thread point of view > But with words it goes better ;) Note that below is current multi-dequeue git branch, older versions (especially v3) are somewhat different: master thread (housekeeping, handling signals, etc) N input threads recieve messages put the message in the main queue N main queue worker threads process messages in the main queue foreach message parse message foreach filters if filter applies put message into the action queue [if action queue mode is DIRECT call processAction()] N action queue worker threads process messages in the action queue foreach message processAction def processAction: format the message call doAction() That's the very rough picture. If you take it together with the message flow diagrams, you'll see that a threading boundary is wherever a queue is. Does that help? Rainer From rgerhards at hq.adiscon.com Wed May 6 15:41:03 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 15:41:03 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> < alpine.DEB.1.10.0905060633570.5928@asgard> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B033@GRFEXC.intern.adiscon.com> I dumped it, but I did not yet create (or upload) a new one. I am still working on it ;) > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 3:36 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > > >> > >> I have been updating it throughout the day, so I right now another version > >> than you downloaded may be available. > > I just tried to download it again and it looks like I got the same file > (including the flow diagram that I thought you said you dumped for now) > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed May 6 15:53:47 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 06:53:47 -0700 (PDT) Subject: [rsyslog] threading - was: output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> as I understand it (this is part of the diagram I didn't get created >> yet )-: > > Actually, I don't know how you could envision that other than by drawing a > line where a queue exists in the diagrams that exists. Honestly. I have no > clue how other I could show this graphically... > >> >> rsyslog basicly breaks down into the following pieces from a >> process/thread point of view >> > > But with words it goes better ;) Note that below is current multi-dequeue git > branch, older versions (especially v3) are somewhat different: at this point I'm going to be lazy and just concentrate on the current version ;-) > > master thread (housekeeping, handling signals, etc) > > N input threads > recieve messages > put the message in the main queue > > > N main queue worker threads > process messages in the main queue > foreach message > parse message > foreach filters > if filter applies > put message into the action queue > [if action queue mode is DIRECT call processAction()] > > N action queue worker threads > process messages in the action queue > foreach message > processAction > > def processAction: > format the message > call doAction() > > That's the very rough picture. If you take it together with the message flow > diagrams, you'll see that a threading boundary is wherever a queue is. mostly, but not completely (just enough mismatch to cause confusion for me) > Does that help? yes, it helps a _lot_. I think that in most cases where I have said doAction() in the last week, a better fit for what I intended would have been processAction() I'm about to head into the office, and will do more once I get there. David Lang From david at lang.hm Wed May 6 15:54:08 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 06:54:08 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B033@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B033@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: > I dumped it, but I did not yet create (or upload) a new one. I am still > working on it ;) no problem. David Lang >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Wednesday, May 06, 2009 3:36 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] output plugin calling interface >> >>> On Wed, 6 May 2009, Rainer Gerhards wrote: >>> >>>> >>>> I have been updating it throughout the day, so I right now another > version >>>> than you downloaded may be available. >> >> I just tried to download it again and it looks like I got the same file >> (including the flow diagram that I thought you said you dumped for now) >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed May 6 15:56:20 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 15:56:20 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B033@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B034@GRFEXC.intern.adiscon.com> Actually, I am right now writing answers to email ;), which is a good thing, as this may save me from some editing cycles. But it explains why the document doesn't advance ;) > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 3:54 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > > I dumped it, but I did not yet create (or upload) a new one. I am still > > working on it ;) > > no problem. > > David Lang > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> Sent: Wednesday, May 06, 2009 3:36 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] output plugin calling interface > >> > >>> On Wed, 6 May 2009, Rainer Gerhards wrote: > >>> > >>>> > >>>> I have been updating it throughout the day, so I right now another > > version > >>>> than you downloaded may be available. > >> > >> I just tried to download it again and it looks like I got the same file > >> (including the flow diagram that I thought you said you dumped for now) > >> > >> David Lang > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Wed May 6 15:58:25 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 15:58:25 +0200 Subject: [rsyslog] threading - was: output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 3:54 PM > To: rsyslog-users > Subject: Re: [rsyslog] threading - was: output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > >> as I understand it (this is part of the diagram I didn't get created > >> yet )-: > > > > Actually, I don't know how you could envision that other than by drawing a > > line where a queue exists in the diagrams that exists. Honestly. I have no > > clue how other I could show this graphically... > > > >> > >> rsyslog basicly breaks down into the following pieces from a > >> process/thread point of view > >> > > > > But with words it goes better ;) Note that below is current multi-dequeue > git > > branch, older versions (especially v3) are somewhat different: > > at this point I'm going to be lazy and just concentrate on the current > version ;-) > > > > > master thread (housekeeping, handling signals, etc) > > > > N input threads > > recieve messages > > put the message in the main queue > > > > > > N main queue worker threads > > process messages in the main queue > > foreach message > > parse message > > foreach filters > > if filter applies > > put message into the action queue > > [if action queue mode is DIRECT call processAction()] > > > > N action queue worker threads > > process messages in the action queue > > foreach message > > processAction > > > > def processAction: > > format the message > > call doAction() > > > > That's the very rough picture. If you take it together with the message flow > > diagrams, you'll see that a threading boundary is wherever a queue is. > > mostly, but not completely (just enough mismatch to cause confusion for > me) > > > Does that help? > > yes, it helps a _lot_. > > I think that in most cases where I have said doAction() in the last week, > a better fit for what I intended would have been processAction() > We must be very careful here (and I'll check if I wasn't careful enough...). processAction() is generic core engine code. doAction() is provided by the plugin. Rainer From david at lang.hm Wed May 6 16:05:45 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 07:05:45 -0700 (PDT) Subject: [rsyslog] threading - was: output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Wed, 6 May 2009, Rainer Gerhards wrote: >> >>>> as I understand it (this is part of the diagram I didn't get created >>>> yet )-: >>> >>> Actually, I don't know how you could envision that other than by drawing > a >>> line where a queue exists in the diagrams that exists. Honestly. I have > no >>> clue how other I could show this graphically... >>> >>>> >>>> rsyslog basicly breaks down into the following pieces from a >>>> process/thread point of view >>>> >>> >>> But with words it goes better ;) Note that below is current multi-dequeue >> git >>> branch, older versions (especially v3) are somewhat different: >> >> at this point I'm going to be lazy and just concentrate on the current >> version ;-) >> >>> >>> master thread (housekeeping, handling signals, etc) >>> >>> N input threads >>> recieve messages >>> put the message in the main queue >>> >>> >>> N main queue worker threads >>> process messages in the main queue >>> foreach message >>> parse message >>> foreach filters >>> if filter applies >>> put message into the action queue >>> [if action queue mode is DIRECT call processAction()] >>> >>> N action queue worker threads >>> process messages in the action queue >>> foreach message >>> processAction >>> >>> def processAction: >>> format the message >>> call doAction() >>> >>> That's the very rough picture. If you take it together with the message > flow >>> diagrams, you'll see that a threading boundary is wherever a queue is. >> >> mostly, but not completely (just enough mismatch to cause confusion for >> me) >> >>> Does that help? >> >> yes, it helps a _lot_. >> >> I think that in most cases where I have said doAction() in the last week, >> a better fit for what I intended would have been processAction() >> > > We must be very careful here (and I'll check if I wasn't careful enough...). > processAction() is generic core engine code. doAction() is provided by the > plugin. clarifying one point from earlier, this seems to say that the data stored in an action queue is exactly the same as the data stored in the main queue (specificly, it's the msg structure, not the formatted string) when processAction() runs, it then does the output formatting and passed the formatted message to doAction() is this correct? David Lang From rgerhards at hq.adiscon.com Wed May 6 16:51:22 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 16:51:22 +0200 Subject: [rsyslog] threading - was: output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B036@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 4:06 PM > To: rsyslog-users > Subject: Re: [rsyslog] threading - was: output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> On Wed, 6 May 2009, Rainer Gerhards wrote: > >> > >>>> as I understand it (this is part of the diagram I didn't get created > >>>> yet )-: > >>> > >>> Actually, I don't know how you could envision that other than by drawing > > a > >>> line where a queue exists in the diagrams that exists. Honestly. I have > > no > >>> clue how other I could show this graphically... > >>> > >>>> > >>>> rsyslog basicly breaks down into the following pieces from a > >>>> process/thread point of view > >>>> > >>> > >>> But with words it goes better ;) Note that below is current multi-dequeue > >> git > >>> branch, older versions (especially v3) are somewhat different: > >> > >> at this point I'm going to be lazy and just concentrate on the current > >> version ;-) > >> > >>> > >>> master thread (housekeeping, handling signals, etc) > >>> > >>> N input threads > >>> recieve messages > >>> put the message in the main queue > >>> > >>> > >>> N main queue worker threads > >>> process messages in the main queue > >>> foreach message > >>> parse message > >>> foreach filters > >>> if filter applies > >>> put message into the action queue > >>> [if action queue mode is DIRECT call processAction()] > >>> > >>> N action queue worker threads > >>> process messages in the action queue > >>> foreach message > >>> processAction > >>> > >>> def processAction: > >>> format the message > >>> call doAction() > >>> > >>> That's the very rough picture. If you take it together with the message > > flow > >>> diagrams, you'll see that a threading boundary is wherever a queue is. > >> > >> mostly, but not completely (just enough mismatch to cause confusion for > >> me) > >> > >>> Does that help? > >> > >> yes, it helps a _lot_. > >> > >> I think that in most cases where I have said doAction() in the last week, > >> a better fit for what I intended would have been processAction() > >> > > > > We must be very careful here (and I'll check if I wasn't careful enough...). > > processAction() is generic core engine code. doAction() is provided by the > > plugin. > > clarifying one point from earlier, this seems to say that the data stored > in an action queue is exactly the same as the data stored in the main > queue (specificly, it's the msg structure, not the formatted string) > > when processAction() runs, it then does the output formatting and passed > the formatted message to doAction() > > is this correct? Mostly ;) - it is a deep copy of the structure from the main message queue. A deep copy is made because the rule engine may potentially change a message object (not yet implemented). It copy also helps a lot with the subtle issues that occur due to different queue types. Granted, it costs some memory, but it pays back by greatly reduced complexity (we could do a copy-on-change, but that's another larger thing that would needed to be looked at). Rainer From david at lang.hm Wed May 6 17:03:02 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 08:03:02 -0700 (PDT) Subject: [rsyslog] threading - was: output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B036@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B036@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Wednesday, May 06, 2009 4:06 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] threading - was: output plugin calling interface >> >> On Wed, 6 May 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> >>>> On Wed, 6 May 2009, Rainer Gerhards wrote: >>>> >>>>>> as I understand it (this is part of the diagram I didn't get created >>>>>> yet )-: >>>>> >>>>> Actually, I don't know how you could envision that other than by > drawing >>> a >>>>> line where a queue exists in the diagrams that exists. Honestly. I have >>> no >>>>> clue how other I could show this graphically... >>>>> >>>>>> >>>>>> rsyslog basicly breaks down into the following pieces from a >>>>>> process/thread point of view >>>>>> >>>>> >>>>> But with words it goes better ;) Note that below is current > multi-dequeue >>>> git >>>>> branch, older versions (especially v3) are somewhat different: >>>> >>>> at this point I'm going to be lazy and just concentrate on the current >>>> version ;-) >>>> >>>>> >>>>> master thread (housekeeping, handling signals, etc) >>>>> >>>>> N input threads >>>>> recieve messages >>>>> put the message in the main queue >>>>> >>>>> >>>>> N main queue worker threads >>>>> process messages in the main queue >>>>> foreach message >>>>> parse message >>>>> foreach filters >>>>> if filter applies >>>>> put message into the action queue >>>>> [if action queue mode is DIRECT call processAction()] >>>>> >>>>> N action queue worker threads >>>>> process messages in the action queue >>>>> foreach message >>>>> processAction >>>>> >>>>> def processAction: >>>>> format the message >>>>> call doAction() >>>>> >>>>> That's the very rough picture. If you take it together with the message >>> flow >>>>> diagrams, you'll see that a threading boundary is wherever a queue is. >>>> >>>> mostly, but not completely (just enough mismatch to cause confusion for >>>> me) >>>> >>>>> Does that help? >>>> >>>> yes, it helps a _lot_. >>>> >>>> I think that in most cases where I have said doAction() in the last > week, >>>> a better fit for what I intended would have been processAction() >>>> >>> >>> We must be very careful here (and I'll check if I wasn't careful > enough...). >>> processAction() is generic core engine code. doAction() is provided by > the >>> plugin. >> >> clarifying one point from earlier, this seems to say that the data stored >> in an action queue is exactly the same as the data stored in the main >> queue (specificly, it's the msg structure, not the formatted string) >> >> when processAction() runs, it then does the output formatting and passed >> the formatted message to doAction() >> >> is this correct? > > Mostly ;) - it is a deep copy of the structure from the main message queue. A > deep copy is made because the rule engine may potentially change a message > object (not yet implemented). It copy also helps a lot with the subtle issues > that occur due to different queue types. Granted, it costs some memory, but > it pays back by greatly reduced complexity (we could do a copy-on-change, but > that's another larger thing that would needed to be looked at). ok, this makes sense (and explains why you don't do single-instance-store, so you can ignore that portion of my prior comments) one other question, how do dynafiles work? as I see it (in my probably oversimplified view of things), the output module needs to be passed the filename to use for a particular message, or the output module needs to have access the the messae structure, and the dynafile template to do it's own processing. David Lang From rgerhards at hq.adiscon.com Wed May 6 17:17:56 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 17:17:56 +0200 Subject: [rsyslog] threading - was: output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B032@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B035@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B036@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B037@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 5:03 PM > To: rsyslog-users > Subject: Re: [rsyslog] threading - was: output plugin calling interface > > > > Mostly ;) - it is a deep copy of the structure from the main message queue. > A > > deep copy is made because the rule engine may potentially change a message > > object (not yet implemented). It copy also helps a lot with the subtle > issues > > that occur due to different queue types. Granted, it costs some memory, but > > it pays back by greatly reduced complexity (we could do a copy-on-change, > but > > that's another larger thing that would needed to be looked at). > > ok, this makes sense (and explains why you don't do single-instance-store, > so you can ignore that portion of my prior comments) I had them on my mind ;) > > one other question, how do dynafiles work? > > as I see it (in my probably oversimplified view of things), the output > module needs to be passed the filename to use for a particular message, exactly. Any output can request as many strings as it likes. Please note that there is much value in that. If you have just the plain structure, an operator can no longer mangle with the different values, because this is done during string building. Rainer > or > the output module needs to have access the the messae structure, and the > dynafile template to do it's own processing. > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Wed May 6 17:23:30 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 17:23:30 +0200 Subject: [rsyslog] output plugin calling interface References: <1241023853.25612.11.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AFE4@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B001@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B003@GRFEXC.intern.adiscon.com><1241114672.25612.14.camel@rf10up.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B006@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B008@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B00B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B028@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B031@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 3:27 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > >> in section 3.11 you say > >> "However, the action itself may also end the transaction and notify the > >> caller." by this do you mean that the action may abort the transaction? or > >> that the action could decide to commit (complete) the transaction? > > > > The later case. > > > >> > >> if you mean abort the transaction, this makes sense (essentially on any > >> doAction() call the return code could be 'fatal error, transaction > >> aborted' and the queue walker code would have to fail the entire batch and > >> retry) if you mean allow it to decide to commit the transaction early if > >> it chooses, this strikes me as a wrong thing to do. > > > > You need to think broader than databases. For example, the tcp forwarder > > NEEDS to commit every record, simply because it has no other chance in doing > > things. Well, it may commit only after a given buffer size, but it > definitely > > can not (or should not) wait until the caller is finished. Even if it waited > > until endTransaction is called(), it could only then move data to the actual > > output, where it than - right in the middle - may see problems. It is far > > better if it can commit in between. > > I'm not sure I agree with you on this. if you have lots of small messages, > you do have advantages to sending them all to the stack at once. you don't > _have_ to do so (tcp will combine the messages as it's waiting to send > them out, but you could waste bandwith with small packets if the data > comes in slowly enough) I agree with you on the performance. I disagree that this means the output transaction and the transaction from the upper layer must exactly match. Think RELP. Would that mean that a relp window must always be as large as the largest batch? OK, in this case I control the protocol and so could change it. But what with a SNMP trap? Or a rfc3195 conversation? They *have* different notation of a transaction. So the best thing I can do IMHO is permit the output plugin to tell the engine when its transaction was finished. That's no problem for the upper layer, it just needs to mark these messages as committed. But it is simply impossible for all outputs to have the same idea of transaction than the upper layer may have. > > >> in the case of rsyslog > >> (where we are commiting a set of unrelated messages) it is not nessasarily > >> a fatal problem, but it just seems to complicate things with little > >> benifit. > > > > It actually simplifies things - because we need not take different > approaches > > to different type of output plugins. > > I'm seeing it as the other way around, this complicates things by making > there be different ways for the transaction to be committed. OK, but what is complicated by that? Also think about third-party (already existing) output modules, which I need to define either a totally different output interface for or use the the extensible one I described. Even if I force a totally new interface, I still need to support non-transactional outputs in the upper layers. But then I need different code pathes to do that. > > >> this says that either the entire transaction is submitted, or it all > >> fails, I think that the optimization of allowing the endTransaction() call > >> to return 'the first X suceeded, the rest failed' may be worth supporting. > >> it's FAR simpler than the 'doAction() may trigger an endTransaction() > >> transparently' that you were exploring earlier. > > > > I may be thinking wrong, but this sounds much more complex to me (again, do > > not think databases at this time). Just think about the buffering needs. > > I'm not seeing how this increases the buffering needed That was related to the need to buffer to-be-processed messages until we finally get an endTranscation(). > > >> if you really do want doAction() to be able to finish a transaction and > >> start another one, the state diagram will be far more complicated. > >> > >> one final note on locking, I expect that the process of processing objects > >> in the queue (marking them as pending, formatting them, and calling > >> doAction() on them), is going to require some locking in the face of > >> multiple worker threads (to prevent two threads from processing the same > >> message). I see two ways of doing this. > >> > >> processbatch( > >> foreach message (up to limit or number in queue){ > >> lock queue > >> mark message pending > >> unlock queue > >> formatMessage() > >> doAction() > >> } > >> endTransaction() > >> lock queue > >> foreach message{ > >> mark completed > >> } > >> unlock queue > >> ) > >> > >> processbatch( > >> lock queue > >> foreach message (up to limit or number in queue){ > >> mark message pending > >> formatMessage() > >> doAction() > >> } > >> unlock queue > >> endTransaction() > >> lock queue > >> foreach message{ > >> mark completed > >> } > >> unlock queue > >> ) > > > > doAction cannot be called within the queue worker. A simple reason is that > > this does not support direct mode. Also, it would take far too long to > > complete. If we have infinite retries, it may sit for a day or more inside > > this call ;) [not precisely in that call, but in a loop surrounding it]. > > Ok, I am missing things again. I had understood that doAction() _was_ > called by the queue worker, It depends on whether or not there is a queue worker. A direct queue does not have one, so here it is called by the enqueuer (but you are right, you may say "queue worker" as an abstraction. Still, the pseudocode above is very far from how things work. It is much more like this: queueworker{ lock queue dequeue batch unlock queue process batch } If I assume it is useful to make this ultra-reliable (and I still doubt it is), that would change to something like this: queueworker{ lock queue mark previous batch as done dequeue batch & mark messages as being processed unlock queue process batch } Please note that the ultra-reliability looks rather simple in that pseudocode, it is much harder in reality. > that was how it passed the item to the output > module code, and it would block if the doAction() call stalled (if you > need to avoid it blocking, define an action queue) > > > If we need to close the tiny potential message loss window (and you have not > > yet convinced me there is reason to do so - looking forward to your > > disk-queue results), we must still do a "dequeue pending", then process "as > > usual" and go over the batch again to actually remove messages from the > > queue. > > yep, still pending. I had an emergancy come up, that among other things > had me in the office from 4pm saturday to 2pm sunday :-( :-( > > >> I suspect that the overhead in manipulating the lock is high enough that > >> the second approach will be a win (similar to the efficiancies that were > >> gained in the UDP input module by letting it add multiple messages to the > >> queue inside one lock). > >> > >> As such I am seeing significant value in making the doAction() call be > >> lightweight under all conditions, which is an argument against having it > >> do any more than nessasary. > > > > We do not know what this may be. Again, don't think "database only". > > I am not. even if the action is writing to disk (with fsync), to a pipe, > or calling an external program it can take a significant amount of time. > you say above that doAction could block for hours, so I am confused a bit > here. (the new version of the document may clear this up) My sentence should better be phrased "can not be called where you described it". See my chart and you'll see that the queue mutex is never looked for an extended period of time - not even in ultra-reliable mode (part of why it is so hard to do it). My understanding is that you work under the assumption that doAction() is called during dequeueing. As we don't know what doAction() will do - it may be very lengthy - it is strictly de-coupled from anything that holds the queue mutex (this also applies to v3 and is the base design). You may take the word "base design" as a hint that probably the whole engine would need to be rewritten if that design is to be changed. I do not see any reasons for such a change ;) I guess our discussion circles around the point that I did not yet convey the full picture on how things work. At least I have the "feeling" that you have a different architecture on your mind. > > >> I think that the beginTransaction() > >> functionality is probably fast enough that adding that into doAction() is > >> acceptable, but the endTransaction() functionality definantly has the > >> potential to block for a substantial amount of time (and therefor must not > >> happen while locks are held that could cause other threads to be waiting) > >> > >> except for your isolation of the output modules (which I have questioned > >> elsewhere), it could even be a win to move the formatMessage() step out of > >> the inner loop and have that called from the output module (doAction() > >> would just pass the pointer to the queue object, the output module would > >> remember them and do the formatting inside endTransaction()), this would > >> still be thread-safe as the output module would only be reading the > >> message queue. > >> > >> hmm, the more I think about this, the cleaner it seems to be. > > > > The problem is that you are still looking from a different design ;) Why > > should the output do this if it can be provided with the already fabricated > > information? At this stage in processing, no looking at all is involved > > (except for the action lock, which is required because the interface > > guarantees that actions are not called concurrently from multiple threads). > > if you don't have locking, what stops two worker threads from trying to > de-queue the same message at the same time? > > reasons that I see to not format the message before putting it in the > action queue. > > 1. performance bottleneck, if the queue walker for the main queue needs to > format the message, it can't be moving messages, testing filters. for the > default setup where you just use the entire string this isn't that big a > deal, but if you have a more complicated format (escaping characters, > doing substrings, etc) it can take substantially more time, some of that > time is spent just parsing the format string to figure out what you need > to do. > > 2. locking efficiancy this could be part of #1, if you are doing a lot > then you want to drop and later re-aquire any locks you have so that other > threads can get at the data, if you are doing very little then you can > avoid doing. > > 3. single-instance-store, if you don't change the message between the main > queue and the action queue you have the ability to just have one copy of > the message contents (which you dynamicly allocate space for), instead of > one copy for each queue. since memory is significantly slower than the > CPU, avoiding the need to copy the data can speed things up as well > > >> It would also delay the need to do any buffer allocation until the > >> endTransaction() step. this doesn't nessasarily eliminate the peak memory > >> usage (all threads could be in endTransaction at the same time), but it > >> will significantly reduce the average memory useage (normally they _won't_ > >> all be in endTransaction() with maximum size messages at the same time) > >> > >> there would be a fixed size array (based on the max batch size), to > >> track what messages are in the batch. I think this is already needed for > >> the worker thread to know which messages to mark as completed when the > >> transaction completes > >> > >> doAction() would just put a pointer (logical pointer, not C memory > >> pointer) to the message contents into the next slot in the array. > >> > >> there would be no need for a startTransaction. > >> > >> there would be a helper function, something like > >> formatMessage(char *format, msg *message, char *output, int bufsize) > >> that would format the message and write it to the output buffer, output > >> would then point at the null at the end of the string > >> > >> endTransaction() would do all the work. it would allocate a buffer > >> needed (note that it would know the actual message sizes, so could > >> allocate based on the actual amount of data involved), do the work > >> that was planned for startTransaction() (probably putting > >> boilerplate in the buffer), call formatMessage() for each message in the > >> list, output the messages, free the buffers, and return > > > > That complicates the programming of all non-db type of output plugins > > considerably. I don't think this is justified. > > I am not seeing what's db specific here, but I'll think about it more I hope I have addressed this with my explenations further above. If not, we need to dig deeper into generic outputs. Rainer From david at lang.hm Wed May 6 18:06:32 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 09:06:32 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>> in section 3.11 you say >>>> "However, the action itself may also end the transaction and notify the >>>> caller." by this do you mean that the action may abort the transaction? > or >>>> that the action could decide to commit (complete) the transaction? >>> >>> The later case. >>> >>>> >>>> if you mean abort the transaction, this makes sense (essentially on any >>>> doAction() call the return code could be 'fatal error, transaction >>>> aborted' and the queue walker code would have to fail the entire batch > and >>>> retry) if you mean allow it to decide to commit the transaction early if >>>> it chooses, this strikes me as a wrong thing to do. >>> >>> You need to think broader than databases. For example, the tcp forwarder >>> NEEDS to commit every record, simply because it has no other chance in > doing >>> things. Well, it may commit only after a given buffer size, but it >> definitely >>> can not (or should not) wait until the caller is finished. Even if it > waited >>> until endTransaction is called(), it could only then move data to the > actual >>> output, where it than - right in the middle - may see problems. It is far >>> better if it can commit in between. >> >> I'm not sure I agree with you on this. if you have lots of small messages, >> you do have advantages to sending them all to the stack at once. you don't >> _have_ to do so (tcp will combine the messages as it's waiting to send >> them out, but you could waste bandwith with small packets if the data >> comes in slowly enough) > > I agree with you on the performance. I disagree that this means the output > transaction and the transaction from the upper layer must exactly match. > Think RELP. Would that mean that a relp window must always be as large as the > largest batch? OK, in this case I control the protocol and so could change > it. But what with a SNMP trap? Or a rfc3195 conversation? They *have* > different notation of a transaction. So the best thing I can do IMHO is > permit the output plugin to tell the engine when its transaction was > finished. That's no problem for the upper layer, it just needs to mark these > messages as committed. But it is simply impossible for all outputs to have > the same idea of transaction than the upper layer may have. I guess I see the upper layer having a larger definition of a batch than the lower level to be a configuration error. not one to refuse to boot with, but one that could result in wasted effort. in the situations you describe another way of handling it would be to wait until the endTransaction() call and then return that it only suceeded with the first N messages (N being the number that fit it's limit) that is less efficiant than what you are proposing, (in that the messages >N need to be resubmitted), but it simplifies things by avoiding the need for the handling the unexpected commit (and buffering errors for the next call, etc) it ends up combining this situation with the 'disk full, only could write X messages' issue. and makes it so that transaction results only happen when you tell it to do a transaction. >> >>>> in the case of rsyslog >>>> (where we are commiting a set of unrelated messages) it is not > nessasarily >>>> a fatal problem, but it just seems to complicate things with little >>>> benifit. >>> >>> It actually simplifies things - because we need not take different >> approaches >>> to different type of output plugins. >> >> I'm seeing it as the other way around, this complicates things by making >> there be different ways for the transaction to be committed. > > OK, but what is complicated by that? Also think about third-party (already > existing) output modules, which I need to define either a totally different > output interface for or use the the extensible one I described. Even if I > force a totally new interface, I still need to support non-transactional > outputs in the upper layers. But then I need different code pathes to do > that. I was thinking about this, and am wondering if it's really a bad thing to seperate the code paths. I'm thinking along the lines that if the module doesn't support the doTransaction method you set batch_size=1 and use the old doAction() interface. if it does support the doTransaction method you use that exclusivly (the module internally may share code between doTransaction and doAction, and most will) as it is you already will have code paths that you only follow in the case where the module supports doTransaction, is it really simpler to have the two partially combined instead of keeping them seperate? >>>> if you really do want doAction() to be able to finish a transaction and >>>> start another one, the state diagram will be far more complicated. >>>> >>>> one final note on locking, I expect that the process of processing > objects >>>> in the queue (marking them as pending, formatting them, and calling >>>> doAction() on them), is going to require some locking in the face of >>>> multiple worker threads (to prevent two threads from processing the same >>>> message). I see two ways of doing this. >>>> >>>> processbatch( >>>> foreach message (up to limit or number in queue){ >>>> lock queue >>>> mark message pending >>>> unlock queue >>>> formatMessage() >>>> doAction() >>>> } >>>> endTransaction() >>>> lock queue >>>> foreach message{ >>>> mark completed >>>> } >>>> unlock queue >>>> ) >>>> >>>> processbatch( >>>> lock queue >>>> foreach message (up to limit or number in queue){ >>>> mark message pending >>>> formatMessage() >>>> doAction() >>>> } >>>> unlock queue >>>> endTransaction() >>>> lock queue >>>> foreach message{ >>>> mark completed >>>> } >>>> unlock queue >>>> ) >>> >>> doAction cannot be called within the queue worker. A simple reason is > that >>> this does not support direct mode. Also, it would take far too long to >>> complete. If we have infinite retries, it may sit for a day or more > inside >>> this call ;) [not precisely in that call, but in a loop surrounding it]. >> >> Ok, I am missing things again. I had understood that doAction() _was_ >> called by the queue worker, > > It depends on whether or not there is a queue worker. A direct queue does not > have one, so here it is called by the enqueuer (but you are right, you may > say "queue worker" as an abstraction. Still, the pseudocode above is very far > from how things work. It is much more like this: > > queueworker{ > lock queue > dequeue batch > unlock queue > process batch > } > > If I assume it is useful to make this ultra-reliable (and I still doubt it > is), that would change to something like this: > > queueworker{ > lock queue > mark previous batch as done > dequeue batch & mark messages as being processed > unlock queue > process batch > } > > Please note that the ultra-reliability looks rather simple in that > pseudocode, it is much harder in reality. Ok, makes sense. and by combining the marking of the prior batch completed inside the same lock as starting the next batch you avoid one set of lock transactions compared to what I was thinking. >>>> I suspect that the overhead in manipulating the lock is high enough that >>>> the second approach will be a win (similar to the efficiancies that were >>>> gained in the UDP input module by letting it add multiple messages to > the >>>> queue inside one lock). >>>> >>>> As such I am seeing significant value in making the doAction() call be >>>> lightweight under all conditions, which is an argument against having it >>>> do any more than nessasary. >>> >>> We do not know what this may be. Again, don't think "database only". >> >> I am not. even if the action is writing to disk (with fsync), to a pipe, >> or calling an external program it can take a significant amount of time. >> you say above that doAction could block for hours, so I am confused a bit >> here. (the new version of the document may clear this up) > > My sentence should better be phrased "can not be called where you described > it". See my chart and you'll see that the queue mutex is never looked for an > extended period of time - not even in ultra-reliable mode (part of why it is > so hard to do it). > > My understanding is that you work under the assumption that doAction() is > called during dequeueing. As we don't know what doAction() will do - it may > be very lengthy - it is strictly de-coupled from anything that holds the > queue mutex (this also applies to v3 and is the base design). You may take > the word "base design" as a hint that probably the whole engine would need to > be rewritten if that design is to be changed. I do not see any reasons for > such a change ;) > > I guess our discussion circles around the point that I did not yet convey the > full picture on how things work. At least I have the "feeling" that you have > a different architecture on your mind. yes, from prior discussions I was thinking that you were planning to have the queue worker iterate through the messaes on the queue, calling doAction() for each one as it got to it, and then when it hits the limit doing endTransaction() if you are instead going to do what you just described (pull the entire list of messages, then unlock the queue and process them) then it doesn't matter if the work is done in doAction() or endTransaction() the question of if it makes sense to have the batch mode be a completely seperate codepath, or have the two logical paths combined is still a good one. thinking out loud here, how hard would it be to detect that there is no processBatch() method and create one that is a wrapper around the doAction() interface when you load the module, and then let all the code after that use the new interface? David Lang From rgerhards at hq.adiscon.com Wed May 6 18:14:50 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 18:14:50 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> One quick question: > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 6:07 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > > I agree with you on the performance. I disagree that this means the output > > transaction and the transaction from the upper layer must exactly match. > > Think RELP. Would that mean that a relp window must always be as large as > the > > largest batch? OK, in this case I control the protocol and so could change > > it. But what with a SNMP trap? Or a rfc3195 conversation? They *have* > > different notation of a transaction. So the best thing I can do IMHO is > > permit the output plugin to tell the engine when its transaction was > > finished. That's no problem for the upper layer, it just needs to mark these > > messages as committed. But it is simply impossible for all outputs to have > > the same idea of transaction than the upper layer may have. > > I guess I see the upper layer having a larger definition of a batch than > the lower level to be a configuration error. The lower level limit is not a config setting but rather something that "happens to be that way". RELP by design uses a sliding window, so the "lower-level" batch is a moving target. Some for many other outputs. > not one to refuse to boot with, but one that could result in wasted > effort. > > in the situations you describe another way of handling it would be to wait > until the endTransaction() call and then return that it only suceeded with > the first N messages (N being the number that fit it's limit) > > that is less efficiant than what you are proposing, (in that the messages > >N need to be resubmitted), but it simplifies things by avoiding the need > for the handling the unexpected commit (and buffering errors for the next > call, etc) And now my core question: where do you think anything could be simplified? It is very easy to mark messages as committed (inside the batch), but it is comparatively complex to do the backup processing. Why do that? Let's say we have a batch size of 1,000 and we run relp forwarding. So why always have the full batch submitted just to come back and tell "I've done 100 an discarded there rest" - and this nine times... The complexity is *not* marking messages as commited inside the batch, the complexity is keeping already dequeued messages in a "being processed" state inside the queue. But both are two totally different topics. Rainer From rgerhards at hq.adiscon.com Wed May 6 18:28:52 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 18:28:52 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B03A@GRFEXC.intern.adiscon.com> > >>>> in the case of rsyslog > >>>> (where we are commiting a set of unrelated messages) it is not > > nessasarily > >>>> a fatal problem, but it just seems to complicate things with little > >>>> benifit. > >>> > >>> It actually simplifies things - because we need not take different > >> approaches > >>> to different type of output plugins. > >> > >> I'm seeing it as the other way around, this complicates things by making > >> there be different ways for the transaction to be committed. > > > > OK, but what is complicated by that? Also think about third-party (already > > existing) output modules, which I need to define either a totally different > > output interface for or use the the extensible one I described. Even if I > > force a totally new interface, I still need to support non-transactional > > outputs in the upper layers. But then I need different code pathes to do > > that. > > I was thinking about this, and am wondering if it's really a bad thing to > seperate the code paths. > > I'm thinking along the lines that if the module doesn't support the > doTransaction method you set batch_size=1 and use the old doAction() > interface. Oh, that's a *very serious* performance hit. Because batches not only support transactions but are the defining factor for the number of queue lock operations on a busy system. So with a batch size of 32, you do 1 lock. With a size of 1, you do 32 locks - both to process 32 messages. And now consider the usual config, where almost everything runs in direct mode. So if at least one output does not support batches, the main message queue must run at batches of one. Plus, I need complex logic to detect all that. A unified interface is simple and elegant and does not show any of these problems. > > if it does support the doTransaction method you use that exclusivly (the > module internally may share code between doTransaction and doAction, and > most will) What would that simplify? > > as it is you already will have code paths that you only follow in the case > where the module supports doTransaction, is it really simpler to have the > two partially combined instead of keeping them seperate? These separate code pathes will probably be a single "if" statement that does the endTransaction() call at the end of the batch. So it is actually a single statement and three lines of code. The rest, I think, is absolutely the same. > > If I assume it is useful to make this ultra-reliable (and I still doubt it > > is), that would change to something like this: > > > > queueworker{ > > lock queue > > mark previous batch as done > > dequeue batch & mark messages as being processed > > unlock queue > > process batch > > } > > > > Please note that the ultra-reliability looks rather simple in that > > pseudocode, it is much harder in reality. > > Ok, makes sense. and by combining the marking of the prior batch completed > inside the same lock as starting the next batch you avoid one set of lock > transactions compared to what I was thinking. Yep, that's the idea. > > >>>> I suspect that the overhead in manipulating the lock is high enough that > >>>> the second approach will be a win (similar to the efficiancies that were > >>>> gained in the UDP input module by letting it add multiple messages to > > the > >>>> queue inside one lock). > >>>> > >>>> As such I am seeing significant value in making the doAction() call be > >>>> lightweight under all conditions, which is an argument against having it > >>>> do any more than nessasary. > >>> > >>> We do not know what this may be. Again, don't think "database only". > >> > >> I am not. even if the action is writing to disk (with fsync), to a pipe, > >> or calling an external program it can take a significant amount of time. > >> you say above that doAction could block for hours, so I am confused a bit > >> here. (the new version of the document may clear this up) > > > > My sentence should better be phrased "can not be called where you described > > it". See my chart and you'll see that the queue mutex is never looked for an > > extended period of time - not even in ultra-reliable mode (part of why it is > > so hard to do it). > > > > My understanding is that you work under the assumption that doAction() is > > called during dequeueing. As we don't know what doAction() will do - it may > > be very lengthy - it is strictly de-coupled from anything that holds the > > queue mutex (this also applies to v3 and is the base design). You may take > > the word "base design" as a hint that probably the whole engine would need > to > > be rewritten if that design is to be changed. I do not see any reasons for > > such a change ;) > > > > I guess our discussion circles around the point that I did not yet convey > the > > full picture on how things work. At least I have the "feeling" that you have > > a different architecture on your mind. > > yes, from prior discussions I was thinking that you were planning to have > the queue worker iterate through the messaes on the queue, calling > doAction() for each one as it got to it, and then when it hits the limit > doing endTransaction() I am not sure if you overlooked that message, the new queue worker already *exists*. So you can look at it in actual code. Same for parts of the action interface (without the "real" transaction support). http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/queue.c;h=c2df928b6303 34449d117aa191ec1d502525346b;hb=multi-dequeue#l1396 All of this, of course, is experimental and will not work if only the slightest error occurs somewhere in the system (and it may crash even if everything else runs fine ;)) > > if you are instead going to do what you just described (pull the entire > list of messages, then unlock the queue and process them) then it doesn't > matter if the work is done in doAction() or endTransaction() > > the question of if it makes sense to have the batch mode be a completely > seperate codepath, or have the two logical paths combined is still a good > one. > > thinking out loud here, how hard would it be to detect that there is no > processBatch() method and create one that is a wrapper around the > doAction() interface when you load the module, and then let all the code > after that use the new interface? That sounds pretty much like what I am proposing above ;) You can consider this to be the wrapper: http://git.adiscon.com/?p=rsyslog.git;a=blob;f=action.c;h=928b30dc3c5cbb2f6af 41cdf2ea02a3f1c14b06a;hb=multi-dequeue#l544 This is the too-simplistic version, but I hope it conveys the idea. The bottom line is that it is more complicated to create a real wrapper than to implement the interface in the way I described. I fail to see the complexity that you are so concerned about... Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed May 6 18:29:27 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 09:29:27 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: > One quick question: > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Wednesday, May 06, 2009 6:07 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] output plugin calling interface >> >>> I agree with you on the performance. I disagree that this means the > output >>> transaction and the transaction from the upper layer must exactly match. >>> Think RELP. Would that mean that a relp window must always be as large as >> the >>> largest batch? OK, in this case I control the protocol and so could > change >>> it. But what with a SNMP trap? Or a rfc3195 conversation? They *have* >>> different notation of a transaction. So the best thing I can do IMHO is >>> permit the output plugin to tell the engine when its transaction was >>> finished. That's no problem for the upper layer, it just needs to mark > these >>> messages as committed. But it is simply impossible for all outputs to > have >>> the same idea of transaction than the upper layer may have. >> >> I guess I see the upper layer having a larger definition of a batch than >> the lower level to be a configuration error. > > The lower level limit is not a config setting but rather something that > "happens to be that way". RELP by design uses a sliding window, so the > "lower-level" batch is a moving target. Some for many other outputs. right, and if the admin sets the batch size higher than this limit it's a mistake. >> not one to refuse to boot with, but one that could result in wasted >> effort. >> >> in the situations you describe another way of handling it would be to wait >> until the endTransaction() call and then return that it only suceeded with >> the first N messages (N being the number that fit it's limit) >> >> that is less efficiant than what you are proposing, (in that the messages >>> N need to be resubmitted), but it simplifies things by avoiding the need >> for the handling the unexpected commit (and buffering errors for the next >> call, etc) > > And now my core question: where do you think anything could be simplified? It > is very easy to mark messages as committed (inside the batch), but it is > comparatively complex to do the backup processing. Why do that? Let's say we > have a batch size of 1,000 and we run relp forwarding. So why always have the > full batch submitted just to come back and tell "I've done 100 an discarded > there rest" - and this nine times... it's overhead, it costs performance, but I don't see it as adding complexity. I think this is because I see a need to have some partial retry processing no matter what, so if we have partial retry capability and it can handle this mode, then it is simpler to do so than it would be to have both the partial retry processing _and_ the output module issueing commits whenever it wants to. David Lang > The complexity is *not* marking messages as commited inside the batch, the > complexity is keeping already dequeued messages in a "being processed" state > inside the queue. But both are two totally different topics. From rgerhards at hq.adiscon.com Wed May 6 18:32:38 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 18:32:38 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com> > then it is simpler to do so than it would be to have > both the partial retry processing _and_ the output module issueing commits > whenever it wants to. Why? If I think about code, it is very hard to beat auto-commits in simplicity... But I'd better finish the new state diagram and at least part of the description, I think I finally found the right model for the lower layer. Rainer From david at lang.hm Wed May 6 18:51:17 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 09:51:17 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B03A@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B03A@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >>>>>> in the case of rsyslog >>>>>> (where we are commiting a set of unrelated messages) it is not >>> nessasarily >>>>>> a fatal problem, but it just seems to complicate things with little >>>>>> benifit. >>>>> >>>>> It actually simplifies things - because we need not take different >>>> approaches >>>>> to different type of output plugins. >>>> >>>> I'm seeing it as the other way around, this complicates things by making >>>> there be different ways for the transaction to be committed. >>> >>> OK, but what is complicated by that? Also think about third-party > (already >>> existing) output modules, which I need to define either a totally > different >>> output interface for or use the the extensible one I described. Even if I >>> force a totally new interface, I still need to support non-transactional >>> outputs in the upper layers. But then I need different code pathes to do >>> that. >> >> I was thinking about this, and am wondering if it's really a bad thing to >> seperate the code paths. >> >> I'm thinking along the lines that if the module doesn't support the >> doTransaction method you set batch_size=1 and use the old doAction() >> interface. > > Oh, that's a *very serious* performance hit. Because batches not only support > transactions but are the defining factor for the number of queue lock > operations on a busy system. So with a batch size of 32, you do 1 lock. With > a size of 1, you do 32 locks - both to process 32 messages. exactly as I expected. note that the performance hit is no worse than how things work today. > And now consider the usual config, where almost everything runs in direct > mode. So if at least one output does not support batches, the main message > queue must run at batches of one. > > Plus, I need complex logic to detect all that. A unified interface is simple > and elegant and does not show any of these problems. yes, you need to either limit your batch size to the minumum of the batch sizes defined for all the outputs that you are dealing with, or you need very complicated logic for example: does a message that is filtered out count against the batch size? if you get 10,000 logs/sec, set a batch size of 100, and 1% of the log messages need to go to your database, does this mean that you will end up doing 100 database transactions/sec, or 1 database transaction/sec? there are a lot of interesting and nasty interactions that you can get in these situations. I was thinking that anyone wanting to get top performance would basicly be required to define seperate action queues (not nessasarily large ones, I would say probably ~batch size * (# worker threads +1) would be a good default) and the main queue walker would just filter the messages into the action queues. then the action queue walkers would do the batching. no need to create a new way to buffer things, we already have a buffer mechanism, it's called a seperate action queue. >> if it does support the doTransaction method you use that exclusivly (the >> module internally may share code between doTransaction and doAction, and >> most will) > > What would that simplify? I am seeing that this would complicate the setup code a bit, but simplify the logic for processing messages from the queue as that wouldn't have to deal with batch and non-batch modes, it would only have the batch mode. >> as it is you already will have code paths that you only follow in the case >> where the module supports doTransaction, is it really simpler to have the >> two partially combined instead of keeping them seperate? > > These separate code pathes will probably be a single "if" statement that does > the endTransaction() call at the end of the batch. So it is actually a single > statement and three lines of code. The rest, I think, is absolutely the same. I'm not seeing that. it's not just the code at the end of the batch, it's also pulling the appropriate number of messages off the queue in the first place. so I would see the minimum as two 'if' blocks >>>>>> I suspect that the overhead in manipulating the lock is high enough > that >>>>>> the second approach will be a win (similar to the efficiancies that > were >>>>>> gained in the UDP input module by letting it add multiple messages to >>> the >>>>>> queue inside one lock). >>>>>> >>>>>> As such I am seeing significant value in making the doAction() call be >>>>>> lightweight under all conditions, which is an argument against having > it >>>>>> do any more than nessasary. >>>>> >>>>> We do not know what this may be. Again, don't think "database only". >>>> >>>> I am not. even if the action is writing to disk (with fsync), to a pipe, >>>> or calling an external program it can take a significant amount of time. >>>> you say above that doAction could block for hours, so I am confused a > bit >>>> here. (the new version of the document may clear this up) >>> >>> My sentence should better be phrased "can not be called where you > described >>> it". See my chart and you'll see that the queue mutex is never looked for > an >>> extended period of time - not even in ultra-reliable mode (part of why it > is >>> so hard to do it). >>> >>> My understanding is that you work under the assumption that doAction() is >>> called during dequeueing. As we don't know what doAction() will do - it > may >>> be very lengthy - it is strictly de-coupled from anything that holds the >>> queue mutex (this also applies to v3 and is the base design). You may > take >>> the word "base design" as a hint that probably the whole engine would > need >> to >>> be rewritten if that design is to be changed. I do not see any reasons > for >>> such a change ;) >>> >>> I guess our discussion circles around the point that I did not yet convey >> the >>> full picture on how things work. At least I have the "feeling" that you > have >>> a different architecture on your mind. >> >> yes, from prior discussions I was thinking that you were planning to have >> the queue worker iterate through the messaes on the queue, calling >> doAction() for each one as it got to it, and then when it hits the limit >> doing endTransaction() > > I am not sure if you overlooked that message, the new queue worker already > *exists*. So you can look at it in actual code. Same for parts of the action > interface (without the "real" transaction support). > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/queue.c;h=c2df928b6303 > 34449d117aa191ec1d502525346b;hb=multi-dequeue#l1396 > > All of this, of course, is experimental and will not work if only the > slightest error occurs somewhere in the system (and it may crash even if > everything else runs fine ;)) I'll try to look at this >> >> if you are instead going to do what you just described (pull the entire >> list of messages, then unlock the queue and process them) then it doesn't >> matter if the work is done in doAction() or endTransaction() >> >> the question of if it makes sense to have the batch mode be a completely >> seperate codepath, or have the two logical paths combined is still a good >> one. >> >> thinking out loud here, how hard would it be to detect that there is no >> processBatch() method and create one that is a wrapper around the >> doAction() interface when you load the module, and then let all the code >> after that use the new interface? > > That sounds pretty much like what I am proposing above ;) > > You can consider this to be the wrapper: > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=action.c;h=928b30dc3c5cbb2f6af > 41cdf2ea02a3f1c14b06a;hb=multi-dequeue#l544 > > This is the too-simplistic version, but I hope it conveys the idea. > > The bottom line is that it is more complicated to create a real wrapper than > to implement the interface in the way I described. I fail to see the > complexity that you are so concerned about... Ok, I could be wrong here. David Lang From rgerhards at hq.adiscon.com Wed May 6 18:53:42 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 18:53:42 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> OK, I uploaded a new document. It is not really clean yet, but much better than the version from around noon. I think I need at least another hour to make sure that terminology is used consistently. Most important, it may say "doAction()" where "processAction()" is more precise (but not always). That's because my thinking evolved today ;) Take care of the new state diagram and be sure to understand that it models an own *action state*, not a batch transaction state (that's different and for tomorrow ;)). Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Wednesday, May 06, 2009 6:33 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > > then it is simpler to do so than it would be to have > > both the partial retry processing _and_ the output module issueing commits > > whenever it wants to. > > > Why? If I think about code, it is very hard to beat auto-commits in > simplicity... > > But I'd better finish the new state diagram and at least part of the > description, I think I finally found the right model for the lower layer. > > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Wed May 6 18:59:14 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 18:59:14 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03A@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B03D@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 6:51 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > >>>>>> in the case of rsyslog > >>>>>> (where we are commiting a set of unrelated messages) it is not > >>> nessasarily > >>>>>> a fatal problem, but it just seems to complicate things with little > >>>>>> benifit. > >>>>> > >>>>> It actually simplifies things - because we need not take different > >>>> approaches > >>>>> to different type of output plugins. > >>>> > >>>> I'm seeing it as the other way around, this complicates things by making > >>>> there be different ways for the transaction to be committed. > >>> > >>> OK, but what is complicated by that? Also think about third-party > > (already > >>> existing) output modules, which I need to define either a totally > > different > >>> output interface for or use the the extensible one I described. Even if I > >>> force a totally new interface, I still need to support non-transactional > >>> outputs in the upper layers. But then I need different code pathes to do > >>> that. > >> > >> I was thinking about this, and am wondering if it's really a bad thing to > >> seperate the code paths. > >> > >> I'm thinking along the lines that if the module doesn't support the > >> doTransaction method you set batch_size=1 and use the old doAction() > >> interface. > > > > Oh, that's a *very serious* performance hit. Because batches not only > support > > transactions but are the defining factor for the number of queue lock > > operations on a busy system. So with a batch size of 32, you do 1 lock. With > > a size of 1, you do 32 locks - both to process 32 messages. > > exactly as I expected. note that the performance hit is no worse than how > things work today. Yes, I know. But why deliberately do worse than you can? > > > And now consider the usual config, where almost everything runs in direct > > mode. So if at least one output does not support batches, the main message > > queue must run at batches of one. > > > > Plus, I need complex logic to detect all that. A unified interface is simple > > and elegant and does not show any of these problems. > > yes, you need to either limit your batch size to the minumum of the batch > sizes defined for all the outputs that you are dealing with, or you need > very complicated logic ... but I need complicated logic to find that minimum batch size... > > for example: > > does a message that is filtered out count against the batch size? > > if you get 10,000 logs/sec, set a batch size of 100, and 1% of the log > messages need to go to your database, does this mean that you will end up > doing 100 database transactions/sec, or 1 database transaction/sec? > > there are a lot of interesting and nasty interactions that you can get in > these situations. ... and all oft hem I try to avoid by using a unified approach. > > > I was thinking that anyone wanting to get top performance would basicly > be required to define seperate action queues (not nessasarily large ones, > I would say probably ~batch size * (# worker threads +1) would be a good > default) and the main queue walker would just filter the messages into the > action queues. then the action queue walkers would do the batching. no > need to create a new way to buffer things, we already have a buffer > mechanism, it's called a seperate action queue. > > >> if it does support the doTransaction method you use that exclusivly (the > >> module internally may share code between doTransaction and doAction, and > >> most will) > > > > What would that simplify? > > I am seeing that this would complicate the setup code a bit, but simplify > the logic for processing messages from the queue as that wouldn't have to > deal with batch and non-batch modes, it would only have the batch mode. That's exactly the point: I do NOT have to deal with non-batch mode. The current experimental code *always* dequeues batches. So the whole engine does no longer support anything that is not a batch (but, of course, a batch can be a single message). > > >> as it is you already will have code paths that you only follow in the case > >> where the module supports doTransaction, is it really simpler to have the > >> two partially combined instead of keeping them seperate? > > > > These separate code pathes will probably be a single "if" statement that > does > > the endTransaction() call at the end of the batch. So it is actually a > single > > statement and three lines of code. The rest, I think, is absolutely the > same. > > I'm not seeing that. it's not just the code at the end of the batch, it's > also pulling the appropriate number of messages off the queue in the first > place. so I would see the minimum as two 'if' blocks Nope, we always pull batches off the queue. One if ;) > > >>>>>> I suspect that the overhead in manipulating the lock is high enough > > that > >>>>>> the second approach will be a win (similar to the efficiancies that > > were > >>>>>> gained in the UDP input module by letting it add multiple messages to > >>> the > >>>>>> queue inside one lock). > >>>>>> > >>>>>> As such I am seeing significant value in making the doAction() call be > >>>>>> lightweight under all conditions, which is an argument against having > > it > >>>>>> do any more than nessasary. > >>>>> > >>>>> We do not know what this may be. Again, don't think "database only". > >>>> > >>>> I am not. even if the action is writing to disk (with fsync), to a pipe, > >>>> or calling an external program it can take a significant amount of time. > >>>> you say above that doAction could block for hours, so I am confused a > > bit > >>>> here. (the new version of the document may clear this up) > >>> > >>> My sentence should better be phrased "can not be called where you > > described > >>> it". See my chart and you'll see that the queue mutex is never looked for > > an > >>> extended period of time - not even in ultra-reliable mode (part of why it > > is > >>> so hard to do it). > >>> > >>> My understanding is that you work under the assumption that doAction() is > >>> called during dequeueing. As we don't know what doAction() will do - it > > may > >>> be very lengthy - it is strictly de-coupled from anything that holds the > >>> queue mutex (this also applies to v3 and is the base design). You may > > take > >>> the word "base design" as a hint that probably the whole engine would > > need > >> to > >>> be rewritten if that design is to be changed. I do not see any reasons > > for > >>> such a change ;) > >>> > >>> I guess our discussion circles around the point that I did not yet convey > >> the > >>> full picture on how things work. At least I have the "feeling" that you > > have > >>> a different architecture on your mind. > >> > >> yes, from prior discussions I was thinking that you were planning to have > >> the queue worker iterate through the messaes on the queue, calling > >> doAction() for each one as it got to it, and then when it hits the limit > >> doing endTransaction() > > > > I am not sure if you overlooked that message, the new queue worker already > > *exists*. So you can look at it in actual code. Same for parts of the action > > interface (without the "real" transaction support). > > > > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/queue.c;h=c2df928b6303 > > 34449d117aa191ec1d502525346b;hb=multi-dequeue#l1396 > > > > All of this, of course, is experimental and will not work if only the > > slightest error occurs somewhere in the system (and it may crash even if > > everything else runs fine ;)) > > I'll try to look at this > > >> > >> if you are instead going to do what you just described (pull the entire > >> list of messages, then unlock the queue and process them) then it doesn't > >> matter if the work is done in doAction() or endTransaction() > >> > >> the question of if it makes sense to have the batch mode be a completely > >> seperate codepath, or have the two logical paths combined is still a good > >> one. > >> > >> thinking out loud here, how hard would it be to detect that there is no > >> processBatch() method and create one that is a wrapper around the > >> doAction() interface when you load the module, and then let all the code > >> after that use the new interface? > > > > That sounds pretty much like what I am proposing above ;) > > > > You can consider this to be the wrapper: > > > > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=action.c;h=928b30dc3c5cbb2f6af > > 41cdf2ea02a3f1c14b06a;hb=multi-dequeue#l544 > > > > This is the too-simplistic version, but I hope it conveys the idea. > > > > The bottom line is that it is more complicated to create a real wrapper than > > to implement the interface in the way I described. I fail to see the > > complexity that you are so concerned about... > > Ok, I could be wrong here. Maybe me - but I am arguing so hard because I do not at all see any complexity. But this may be a sign that I overlooked something. Better notice now than later... Raienr > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed May 6 21:54:29 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 12:54:29 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: > OK, I uploaded a new document. It is not really clean yet, but much better > than the version from around noon. I think I need at least another hour to > make sure that terminology is used consistently. Most important, it may say > "doAction()" where "processAction()" is more precise (but not always). That's > because my thinking evolved today ;) I thought we had decided that there was no need to have a beginTransaction() call if doAction() would do it implicitly. if something is commited early, I don't see any reason why you should require that everything pending is commited. all you should require is that M1 -> Mn be commited (a contiguous set starting from the first one) > Take care of the new state diagram and be sure to understand that it models > an own *action state*, not a batch transaction state (that's different and > for tomorrow ;)). I'm not sure that there can be an error from the inTx stage that would be worth retrying. errors there would not be related to outputting the message, but simply to processing it and preparing it to be output later. in fact, I'm not sure that retry belongs in the message state at all. I could see it argued that the commit may result in a temporary error that could be retried, but is that really something that the action (i.e. output module) should deal with? or should this be done at the transaction state? in reading the page after the diagram, it appears that you are thinking the same thing, in which case the retry and suspend nodes should be removed from the state diagram (or there may need to be a suspend node if you want the higher levels to be able to try again and the module to reject it) looking at your pseudocode, I started to re-write it, and I think things can be much simpler. if the retrys are done above this level, then the only thing that we need to do is to not hammer the destination. except for the fact that doAction() can trigger an EndTransaction() internally, there is no reason why doAction() can't take place while suspended (the output module can be preparing the stuff to send out). the only place that needs to deal with the issue is that the EndTransaction() should sleep if the state is not itx if doAction does beginTransaction() any time it's not in a transaction, there is no reason to have it as a seperate call. so without retries or beginTransaction, is there any reason for prepareAction() to exist? you also will need to detect that the doAction() did endTransaction() and that you don't need to issue a endTransaction() for this output module now (until you do the next doAction() ) David Lang > Rainer > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >> Sent: Wednesday, May 06, 2009 6:33 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] output plugin calling interface >> >>> then it is simpler to do so than it would be to have >>> both the partial retry processing _and_ the output module issueing > commits >>> whenever it wants to. >> >> >> Why? If I think about code, it is very hard to beat auto-commits in >> simplicity... >> >> But I'd better finish the new state diagram and at least part of the >> description, I think I finally found the right model for the lower layer. >> >> Rainer >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed May 6 22:08:56 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 6 May 2009 22:08:56 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B03F@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com > [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 9:54 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > > OK, I uploaded a new document. It is not really clean yet, > but much better > > than the version from around noon. I think I need at least > another hour to > > make sure that terminology is used consistently. Most > important, it may say > > "doAction()" where "processAction()" is more precise (but > not always). That's > > because my thinking evolved today ;) > > I thought we had decided that there was no need to have a > beginTransaction() call if doAction() would do it implicitly. OK, I think I'll stop posting unfinished work - I did not yet manage to edit that out. But it probably is a better idea to finish a consistent state, even if that takes a day or two. > > if something is commited early, I don't see any reason why you should > require that everything pending is commited. all you should > require is > that M1 -> Mn be commited (a contiguous set starting from the > first one) Yup, but that is what is described. It is not the batch that is commited, but everything that was uncommitted so far. > > > > Take care of the new state diagram and be sure to > understand that it models > > an own *action state*, not a batch transaction state > (that's different and > > for tomorrow ;)). > > I'm not sure that there can be an error from the inTx stage > that would be > worth retrying. errors there would not be related to outputting the > message, but simply to processing it and preparing it to be > output later. That's why I wrote *action state* and "in bold". This is not a message state. This is the state machine for the action logic. The message state is something different and not even yet described. This is a very important distinction. The action is a state machine and state transistions tell when various things need to be called. > > in fact, I'm not sure that retry belongs in the message state > at all. I > could see it argued that the commit may result in a temporary > error that > could be retried, but is that really something that the action (i.e. > output module) should deal with? or should this be done at > the transaction > state? It's not the retry of the message - that's the upper layer. Here, for example, we reset the connection, try to establish a broken link and all that. > > in reading the page after the diagram, it appears that you > are thinking > the same thing, in which case the retry and suspend nodes should be > removed from the state diagram (or there may need to be a > suspend node if > you want the higher levels to be able to try again and the module to > reject it) > > > looking at your pseudocode, I started to re-write it, and I > think things > can be much simpler. > > if the retrys are done above this level, then the only thing > that we need > to do is to not hammer the destination. > > except for the fact that doAction() can trigger an EndTransaction() > internally, there is no reason why doAction() can't take place while > suspended (the output module can be preparing the stuff to > send out). How will you send e.g. a tcp message while the network link is down? How to talk to a mail server if it is down? How to write to the file system if it is full? doAction is not a "copy this to a buffer" kind of thing, but rather something that potentially does real work. It looks like you prefer the matrix-like action interface we talked about a while ago, but this interface causes compatibility issues to existing modules, causes more code inside each module and causes far more complicated code inside the engine. Rainer > the > only place that needs to deal with the issue is that the > EndTransaction() > should sleep if the state is not itx > > if doAction does beginTransaction() any time it's not in a > transaction, > there is no reason to have it as a seperate call. > > so without retries or beginTransaction, is there any reason for > prepareAction() to exist? > > you also will need to detect that the doAction() did > endTransaction() and > that you don't need to issue a endTransaction() for this > output module now > (until you do the next doAction() ) > > David Lang > > > > Rainer > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > >> Sent: Wednesday, May 06, 2009 6:33 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] output plugin calling interface > >> > >>> then it is simpler to do so than it would be to have > >>> both the partial retry processing _and_ the output module issueing > > commits > >>> whenever it wants to. > >> > >> > >> Why? If I think about code, it is very hard to beat auto-commits in > >> simplicity... > >> > >> But I'd better finish the new state diagram and at least > part of the > >> description, I think I finally found the right model for > the lower layer. > >> > >> Rainer > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From david at lang.hm Wed May 6 23:24:19 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 6 May 2009 14:24:19 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B03F@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B03F@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 6 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Wednesday, May 06, 2009 9:54 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] output plugin calling interface >> >> On Wed, 6 May 2009, Rainer Gerhards wrote: >> >>> OK, I uploaded a new document. It is not really clean yet, >> but much better >>> than the version from around noon. I think I need at least >> another hour to >>> make sure that terminology is used consistently. Most >> important, it may say >>> "doAction()" where "processAction()" is more precise (but >> not always). That's >>> because my thinking evolved today ;) >> >> I thought we had decided that there was no need to have a >> beginTransaction() call if doAction() would do it implicitly. > > OK, I think I'll stop posting unfinished work - I did not yet manage to edit > that out. But it probably is a better idea to finish a consistent state, even > if that takes a day or two. Ok, I didn't know if you just accidently left it in, or had not redone that section. > >> >> if something is commited early, I don't see any reason why you should >> require that everything pending is commited. all you should >> require is >> that M1 -> Mn be commited (a contiguous set starting from the >> first one) > > Yup, but that is what is described. It is not the batch that is commited, but > everything that was uncommitted so far. Ok. >> >>> Take care of the new state diagram and be sure to >> understand that it models >>> an own *action state*, not a batch transaction state >> (that's different and >>> for tomorrow ;)). >> >> I'm not sure that there can be an error from the inTx stage >> that would be >> worth retrying. errors there would not be related to outputting the >> message, but simply to processing it and preparing it to be >> output later. > > That's why I wrote *action state* and "in bold". This is not a message state. > This is the state machine for the action logic. The message state is > something different and not even yet described. This is a very important > distinction. The action is a state machine and state transistions tell when > various things need to be called. Ok, just below the diagram you say "Note well that the state diagram describes the action state. It does not describe the transaction state" if the action is not doing the retry, why would it have a retry state? >> >> in fact, I'm not sure that retry belongs in the message state >> at all. I >> could see it argued that the commit may result in a temporary >> error that >> could be retried, but is that really something that the action (i.e. >> output module) should deal with? or should this be done at >> the transaction >> state? > > It's not the retry of the message - that's the upper layer. Here, for > example, we reset the connection, try to establish a broken link and all > that. why should this state be visable at all to the caller? this should be handled transparently inside the endTransaction code (even if that code is called by the doAction() call) everything else can operate without the link being up, so there's no reason for the caller to have to check if it is before everything that it does. >> in reading the page after the diagram, it appears that you >> are thinking >> the same thing, in which case the retry and suspend nodes should be >> removed from the state diagram (or there may need to be a >> suspend node if >> you want the higher levels to be able to try again and the module to >> reject it) >> >> >> looking at your pseudocode, I started to re-write it, and I >> think things >> can be much simpler. >> >> if the retrys are done above this level, then the only thing >> that we need >> to do is to not hammer the destination. >> >> except for the fact that doAction() can trigger an EndTransaction() >> internally, there is no reason why doAction() can't take place while >> suspended (the output module can be preparing the stuff to >> send out). > > How will you send e.g. a tcp message while the network link is down? How to > talk to a mail server if it is down? How to write to the file system if it is > full? > > doAction is not a "copy this to a buffer" kind of thing, but rather something > that potentially does real work. here is where we disagree. doAction() is getting the message and putting it in the buffer to be sent when the endTransaction is called. it may also decide to _do_ endTransaction, but it's the endTransaction logic that has the real work to do. and part of that real work is to have logic like 'if socket is dead, re-open it' if the filesystem is full, the socket cannot be opened, the socket goes away in the middle, etc the endTransaction() logic will need to return an batchFailed error. > It looks like you prefer the matrix-like action interface we talked about a > while ago, but this interface causes compatibility issues to existing > modules, causes more code inside each module and causes far more complicated > code inside the engine. yes, I do think that the matrix-like interface would be better, but I don't think I'm arguing that at this point (I have slipped into that at points, where I argued to move the format_message into the doAction, I'll try to watch that) at this point I understand the interface to be doAction(message) takes the string $message and add it to the batch to be sent (initializing the batch if it isn't already initialized) optionally decides to call endTransaction() endTransaction() takes the batch that was prepared by one or more doAction() calls, finalizes the batch if needed, and sends it to the destination. David Lang From rgerhards at hq.adiscon.com Thu May 7 07:28:06 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 May 2009 07:28:06 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03F@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B041@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, May 06, 2009 11:24 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > On Wed, 6 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com > >> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> Sent: Wednesday, May 06, 2009 9:54 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] output plugin calling interface > >> > >> On Wed, 6 May 2009, Rainer Gerhards wrote: > >> > >>> OK, I uploaded a new document. It is not really clean yet, > >> but much better > >>> than the version from around noon. I think I need at least > >> another hour to > >>> make sure that terminology is used consistently. Most > >> important, it may say > >>> "doAction()" where "processAction()" is more precise (but > >> not always). That's > >>> because my thinking evolved today ;) > >> > >> I thought we had decided that there was no need to have a > >> beginTransaction() call if doAction() would do it implicitly. > > > > OK, I think I'll stop posting unfinished work - I did not yet manage to edit > > that out. But it probably is a better idea to finish a consistent state, > even > > if that takes a day or two. > > Ok, I didn't know if you just accidently left it in, or had not redone > that section. I have to admit that I did not think enough when I answered your mail last evening. On closer look, the output module interface still needs this entry point and this is why it is in the pseudocode. Please do NOT comment yet but read on first... > > How will you send e.g. a tcp message while the network link is down? How to > > talk to a mail server if it is down? How to write to the file system if it > is > > full? > > > > doAction is not a "copy this to a buffer" kind of thing, but rather > something > > that potentially does real work. > > here is where we disagree. > > doAction() is getting the message and putting it in the buffer to be sent > when the endTransaction is called. > > it may also decide to _do_ endTransaction, but it's the endTransaction > logic that has the real work to do. > > and part of that real work is to have logic like 'if socket is dead, > re-open it' > > if the filesystem is full, the socket cannot be opened, the socket goes > away in the middle, etc the endTransaction() logic will need to return an > batchFailed error. ... and this disagreement we are actually circeling around in the whole discussion. You have a quite different design philosophy on your mind than I have. Rsyslog's core philosophy is that outputs should be ultra-slim and very easy to write. That does not outrule one can write a complex output, but it should be possible to write a very simplistic one. This is more a mini-driver design perspective, where a generic driver inside the core is just complemented by some specifics. Your design is that the core engine is slim and pushes off the complexity to the outputs. This is more a traditional driver concept, where the core requests rather broad functionality from its drivers. Out of this different approach we see different ways of doing things. You know that I value your opinion *very* much and it is always great fun and educating to talk with you. But I have to admit that it will require very strong argument to make me go away from the traditional rsyslog design perspective. My goals (in addition to things like performance and reliability, of course) are a) identify as much common functionality as possible and put that into a single place b) do not break backward compatibility unless strictly necessary For b), I would even be prepared to accept some mild increased complexity (which we do not have in this case). I think almost all of our discussion topics circle around the design philosophy. And this is also why we both think we are correct. Because we *are both*, just depending from what design philosophy you look at. I stick with the current rsyslog philosophy because it has served me very well over all those years. I consider it a premier reasons why my software projects are often much more successful than competing ones. The beauty of trying hard to put functionality into a single place, and doing that consistently, is that at some point in time your software will grow exponentially with each feature being added (which is then automatically inherited to the rest of the system, as everything is inside the right layer). > > > It looks like you prefer the matrix-like action interface we talked about a > > while ago, but this interface causes compatibility issues to existing > > modules, causes more code inside each module and causes far more complicated > > code inside the engine. > > yes, I do think that the matrix-like interface would be better, but I > don't think I'm arguing that at this point (I have slipped into that at > points, where I argued to move the format_message into the doAction, I'll > try to watch that) The core of the matrix-like interface is not how parameters are passed, but rather that there is only one entry point that does the work. You are right now modeling this design approach with two calls, but it basically remains the matrix-type of interface. If we'd take that route, it would be cleaner to use such a matrix interface. > > at this point I understand the interface to be > > doAction(message) > takes the string $message and add it to the batch to be sent > (initializing the batch if it isn't already initialized) > > optionally decides to call endTransaction() > > > endTransaction() > takes the batch that was prepared by one or more doAction() calls, > finalizes the batch if needed, and sends it to the destination. > Just to iterate the differences in our designs: doAction(message) takes the string $message and process it. This may be adding it to a batch, immediately processing it, or buffering for some time and then process it (initializing the batch if it isn't already initialized) endTransaction() finish whatever doAction() has left open. This may commit something or may not do anything at all (depending on the action's state) It is up to the plugin to decide when to do what. Plugins must not even support endTransation() if they commit every message. Important Note: I am using doAction()/endTransaction() where actually other entry points should be used. Because the former are mini-driver entry points while we talk about the transaction handler (better be called processMessage() and, maybe, endBatch()). In any case, I'll verify some of my ideas in code today. If my design matches rsyslog design, it should be fairly easy and straightforward to implement the transaction handler inside the action layer. If that turns out to be major pain, I may be thinking into the wrong direction. So that will probably be my focus for today. Rainer From rgerhards at hq.adiscon.com Thu May 7 15:53:50 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 May 2009 15:53:50 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> David, I have now looked at the code and modified it, so I get some "feeling" of how it looks and works (it doesn't matter if I need to dump or modify that code, it took a few hours, less than writing other things ;)). So I am still "only" biased, but not without alternative. The code looks much cleaner than what is in v3, btw. But I think I have also come closer to where our opinions differ. I mentioned this morning that we have different design approaches. But that's not the full picture... let me quote you: > > Take care of the new state diagram and be sure to understand that it models > > an own *action state*, not a batch transaction state (that's different and > > for tomorrow ;)). > > I'm not sure that there can be an error from the inTx stage that would be > worth retrying. errors there would not be related to outputting the > message, but simply to processing it and preparing it to be output later. > > in fact, I'm not sure that retry belongs in the message state at all. I > could see it argued that the commit may result in a temporary error that > could be retried, but is that really something that the action (i.e. > output module) should deal with? or should this be done at the transaction > state? > > in reading the page after the diagram, it appears that you are thinking > the same thing, in which case the retry and suspend nodes should be > removed from the state diagram (or there may need to be a suspend node if > you want the higher levels to be able to try again and the module to > reject it) > > > looking at your pseudocode, I started to re-write it, and I think things > can be much simpler. > > if the retrys are done above this level, then the only thing that we need > to do is to not hammer the destination. > > except for the fact that doAction() can trigger an EndTransaction() > internally, there is no reason why doAction() can't take place while > suspended (the output module can be preparing the stuff to send out). the > only place that needs to deal with the issue is that the EndTransaction() > should sleep if the state is not itx > > if doAction does beginTransaction() any time it's not in a transaction, > there is no reason to have it as a seperate call. > > so without retries or beginTransaction, is there any reason for > prepareAction() to exist? > > you also will need to detect that the doAction() did endTransaction() and > that you don't need to issue a endTransaction() for this output module now > (until you do the next doAction() ) I think we probably have different failure cases on our mind. We touched this, but probably did not make the issue clear enough. I now think that these different classes of failures require different handling, probably at different layers of the engine. Maybe this can help to combine our both views. I was first tempted to start the description right here in mail, but instead I have added some text to the "internals document", hoping that the information may be useful in the future, too (and knowing that I need to edit it soon ;)). Note that I have NOT yet updated any other part of the document. It's probably also affected by thinking about failure cases. So, I'd appreciate if you could have a look at sections 3.2 and 3.3 of http://www.rsyslog.com/download/design.pdf Thanks, Rainer From rgerhards at hq.adiscon.com Thu May 7 16:13:11 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 May 2009 16:13:11 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B04E@GRFEXC.intern.adiscon.com> I've just done another update tot he paper with some pseudocode (hopefully) upcoming later today. Pseudocode is not essential. > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Thursday, May 07, 2009 3:54 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > David, > > I have now looked at the code and modified it, so I get some "feeling" of how > it looks and works (it doesn't matter if I need to dump or modify that code, > it took a few hours, less than writing other things ;)). So I am still "only" > biased, but not without alternative. The code looks much cleaner than what is > in v3, btw. > > But I think I have also come closer to where our opinions differ. I mentioned > this morning that we have different design approaches. But that's not the > full picture... let me quote you: > > > > Take care of the new state diagram and be sure to understand that it > models > > > an own *action state*, not a batch transaction state (that's different > and > > > for tomorrow ;)). > > > > I'm not sure that there can be an error from the inTx stage that would be > > worth retrying. errors there would not be related to outputting the > > message, but simply to processing it and preparing it to be output later. > > > > in fact, I'm not sure that retry belongs in the message state at all. I > > could see it argued that the commit may result in a temporary error that > > could be retried, but is that really something that the action (i.e. > > output module) should deal with? or should this be done at the transaction > > state? > > > > in reading the page after the diagram, it appears that you are thinking > > the same thing, in which case the retry and suspend nodes should be > > removed from the state diagram (or there may need to be a suspend node if > > you want the higher levels to be able to try again and the module to > > reject it) > > > > > > looking at your pseudocode, I started to re-write it, and I think things > > can be much simpler. > > > > if the retrys are done above this level, then the only thing that we need > > to do is to not hammer the destination. > > > > except for the fact that doAction() can trigger an EndTransaction() > > internally, there is no reason why doAction() can't take place while > > suspended (the output module can be preparing the stuff to send out). the > > only place that needs to deal with the issue is that the EndTransaction() > > should sleep if the state is not itx > > > > if doAction does beginTransaction() any time it's not in a transaction, > > there is no reason to have it as a seperate call. > > > > so without retries or beginTransaction, is there any reason for > > prepareAction() to exist? > > > > you also will need to detect that the doAction() did endTransaction() and > > that you don't need to issue a endTransaction() for this output module now > > (until you do the next doAction() ) > > I think we probably have different failure cases on our mind. We touched > this, but probably did not make the issue clear enough. I now think that > these different classes of failures require different handling, probably at > different layers of the engine. Maybe this can help to combine our both > views. > > I was first tempted to start the description right here in mail, but instead > I have added some text to the "internals document", hoping that the > information may be useful in the future, too (and knowing that I need to edit > it soon ;)). > > Note that I have NOT yet updated any other part of the document. It's > probably also affected by thinking about failure cases. > > So, I'd appreciate if you could have a look at sections 3.2 and 3.3 of > > http://www.rsyslog.com/download/design.pdf > > Thanks, > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu May 7 17:43:48 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 May 2009 17:43:48 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B04E@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B050@GRFEXC.intern.adiscon.com> David, OK, I have uploaded a new version, this time with pseudocode for the batch processing layer. I have to say I am very optimistic that this is the right route, differentiating between two different cases of action failure causes really helped. I think this is something that I could implement right away and it looks so simple that it seems hard to make a mistake in doing so. ... but I guess you'll find something I have overlooked ;) This version should be safe to work with, I'll probably not update it any more today. Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Thursday, May 07, 2009 4:13 PM > To: rsyslog-users > Subject: Re: [rsyslog] output plugin calling interface > > I've just done another update tot he paper with some pseudocode (hopefully) > upcoming later today. Pseudocode is not essential. > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > Sent: Thursday, May 07, 2009 3:54 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] output plugin calling interface > > > > David, > > > > I have now looked at the code and modified it, so I get some "feeling" of > how > > it looks and works (it doesn't matter if I need to dump or modify that > code, > > it took a few hours, less than writing other things ;)). So I am still > "only" > > biased, but not without alternative. The code looks much cleaner than what > is > > in v3, btw. > > > > But I think I have also come closer to where our opinions differ. I > mentioned > > this morning that we have different design approaches. But that's not the > > full picture... let me quote you: > > > > > > Take care of the new state diagram and be sure to understand that it > > models > > > > an own *action state*, not a batch transaction state (that's different > > and > > > > for tomorrow ;)). > > > > > > I'm not sure that there can be an error from the inTx stage that would be > > > worth retrying. errors there would not be related to outputting the > > > message, but simply to processing it and preparing it to be output later. > > > > > > in fact, I'm not sure that retry belongs in the message state at all. I > > > could see it argued that the commit may result in a temporary error that > > > could be retried, but is that really something that the action (i.e. > > > output module) should deal with? or should this be done at the > transaction > > > state? > > > > > > in reading the page after the diagram, it appears that you are thinking > > > the same thing, in which case the retry and suspend nodes should be > > > removed from the state diagram (or there may need to be a suspend node if > > > you want the higher levels to be able to try again and the module to > > > reject it) > > > > > > > > > looking at your pseudocode, I started to re-write it, and I think things > > > can be much simpler. > > > > > > if the retrys are done above this level, then the only thing that we need > > > to do is to not hammer the destination. > > > > > > except for the fact that doAction() can trigger an EndTransaction() > > > internally, there is no reason why doAction() can't take place while > > > suspended (the output module can be preparing the stuff to send out). the > > > only place that needs to deal with the issue is that the EndTransaction() > > > should sleep if the state is not itx > > > > > > if doAction does beginTransaction() any time it's not in a transaction, > > > there is no reason to have it as a seperate call. > > > > > > so without retries or beginTransaction, is there any reason for > > > prepareAction() to exist? > > > > > > you also will need to detect that the doAction() did endTransaction() and > > > that you don't need to issue a endTransaction() for this output module > now > > > (until you do the next doAction() ) > > > > I think we probably have different failure cases on our mind. We touched > > this, but probably did not make the issue clear enough. I now think that > > these different classes of failures require different handling, probably at > > different layers of the engine. Maybe this can help to combine our both > > views. > > > > I was first tempted to start the description right here in mail, but > instead > > I have added some text to the "internals document", hoping that the > > information may be useful in the future, too (and knowing that I need to > edit > > it soon ;)). > > > > Note that I have NOT yet updated any other part of the document. It's > > probably also affected by thinking about failure cases. > > > > So, I'd appreciate if you could have a look at sections 3.2 and 3.3 of > > > > http://www.rsyslog.com/download/design.pdf > > > > Thanks, > > Rainer > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Thu May 7 20:49:34 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 7 May 2009 11:49:34 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> Message-ID: On Thu, 7 May 2009, Rainer Gerhards wrote: > David, > > I have now looked at the code and modified it, so I get some "feeling" of how > it looks and works (it doesn't matter if I need to dump or modify that code, > it took a few hours, less than writing other things ;)). So I am still "only" > biased, but not without alternative. The code looks much cleaner than what is > in v3, btw. > > But I think I have also come closer to where our opinions differ. I mentioned > this morning that we have different design approaches. But that's not the > full picture... let me quote you: > >>> Take care of the new state diagram and be sure to understand that it > models >>> an own *action state*, not a batch transaction state (that's different > and >>> for tomorrow ;)). >> >> I'm not sure that there can be an error from the inTx stage that would be >> worth retrying. errors there would not be related to outputting the >> message, but simply to processing it and preparing it to be output later. >> >> in fact, I'm not sure that retry belongs in the message state at all. I >> could see it argued that the commit may result in a temporary error that >> could be retried, but is that really something that the action (i.e. >> output module) should deal with? or should this be done at the transaction >> state? >> >> in reading the page after the diagram, it appears that you are thinking >> the same thing, in which case the retry and suspend nodes should be >> removed from the state diagram (or there may need to be a suspend node if >> you want the higher levels to be able to try again and the module to >> reject it) >> >> >> looking at your pseudocode, I started to re-write it, and I think things >> can be much simpler. >> >> if the retrys are done above this level, then the only thing that we need >> to do is to not hammer the destination. >> >> except for the fact that doAction() can trigger an EndTransaction() >> internally, there is no reason why doAction() can't take place while >> suspended (the output module can be preparing the stuff to send out). the >> only place that needs to deal with the issue is that the EndTransaction() >> should sleep if the state is not itx >> >> if doAction does beginTransaction() any time it's not in a transaction, >> there is no reason to have it as a seperate call. >> >> so without retries or beginTransaction, is there any reason for >> prepareAction() to exist? >> >> you also will need to detect that the doAction() did endTransaction() and >> that you don't need to issue a endTransaction() for this output module now >> (until you do the next doAction() ) > > I think we probably have different failure cases on our mind. We touched > this, but probably did not make the issue clear enough. I now think that > these different classes of failures require different handling, probably at > different layers of the engine. Maybe this can help to combine our both > views. > > I was first tempted to start the description right here in mail, but instead > I have added some text to the "internals document", hoping that the > information may be useful in the future, too (and knowing that I need to edit > it soon ;)). > > Note that I have NOT yet updated any other part of the document. It's > probably also affected by thinking about failure cases. > > So, I'd appreciate if you could have a look at sections 3.2 and 3.3 of > > http://www.rsyslog.com/download/design.pdf overall it looks good. one suggestion I would make is that since message based failures cannot be reliably detected, I would consider using the same failure process for all failures, and declare a message as bad if it fails the max retry number of times by itself (once you hit n=1) otherwise you end up resubmitting the entire batch a number of times before you try to narrow it down to the particular message. since the process of finding the bad message will take a number of retries, and then you will want to retry the suspect message several times (to make sure that it's really a message error, not a action error) this could result in a lot of retries. also, the algorithm that you posted has a subtle difference from what I had listed. yours is more straightforward and easier to understand (and requires no global knowledge), I think that mine is more efficiant in the rare failure case. there is a potential (very subtle) race condition in this area that will need attention when we get down to lower level discussion (no matter which algorithm is used) at this point I don't see this as critical (not even very important) as we are talking high-level concepts at this point, but I wanted to note this for a future conversation. two notes on the reliability section 1. I think we had figured out that reliability required touching each item 3 times instead of 2 (not 4 times as you note in the text) 2. I disagree with you on the idea that power issues should be handled at a different level. I'll try to track down some discussions on sysadmin/security mailing lists about this. David Lang From rgerhards at hq.adiscon.com Thu May 7 22:16:16 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 7 May 2009 22:16:16 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com> > > So, I'd appreciate if you could have a look at sections 3.2 > and 3.3 of > > > > http://www.rsyslog.com/download/design.pdf > > overall it looks good. > > one suggestion I would make is that since message based > failures cannot be > reliably detected, I would consider using the same failure > process for all > failures, and declare a message as bad if it fails the max > retry number of > times by itself (once you hit n=1) But then you either A) do not need the batch logic at all (because the action is configured for infinite retries) Or B) you loose many messages if the action is not configured for infinite retries and you have a longer-duration outage e.g. on a database server. Let's say it is offline for a couple of hours, then you lose almost everything in that period To prevent this, you need two different retry methods. > otherwise you end up resubmitting the entire batch a number of times > before you try to narrow it down to the particular message. since the > process of finding the bad message will take a number of > retries, and then > you will want to retry the suspect message several times (to > make sure > that it's really a message error, not a action error) this > could result in > a lot of retries. > > also, the algorithm that you posted has a subtle difference > from what I > had listed. It must, because it has two different levels of retries. > yours is more straightforward and easier to > understand (and > requires no global knowledge), I think that mine is more > efficiant in the > rare failure case. there is a potential (very subtle) race > condition in > this area that will need attention when we get down to lower level > discussion (no matter which algorithm is used) > > at this point I don't see this as critical (not even very > important) as we > are talking high-level concepts at this point, but I wanted > to note this > for a future conversation. I agree on that is is not critical at this point. I also have not even tried to optimize it. The critical point is the discussion above on the two different retry modes. It took me a lot of thinking to see the subtle issues, but trying to do all with just one mode was the root cause of the problems at least I faced. I am not sure how you could solve the dilemma above with just a single retry mode. > > > two notes on the reliability section That's why I not mentioned this section - so far, it is just a copy of a mailing list post (and all the comments it raised apply to it) > > 1. I think we had figured out that reliability required > touching each item > 3 times instead of 2 (not 4 times as you note in the text) > > 2. I disagree with you on the idea that power issues should > be handled at > a different level. I'll try to track down some discussions on > sysadmin/security mailing lists about this. Keep in mind that my key point is that you can not currently protect a busy system against message loss. The issue is not if a power failure may happen. I agree it can. I just think that you can not build a busy system without using at least partial in-memory queuing, which by definition is not save from power failures. So it doesn't make sense to protect a handful of messages when we loose much more of them anyways. > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From david at lang.hm Fri May 8 02:05:21 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 7 May 2009 17:05:21 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test Message-ID: I have a box put togeaterh for a first cut at a speed test of rsyslog in untra-reliable mode. the outline below is intended to minimize the number of variables. the box is a dual quad-core opteron with 8G of ram, one SATA drive and a fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 (redhat stock kernel) I intend to format the SSD with ext2 (as the application is providing data integrity, and to avoid the known performance problems with ext3 and fsync) for the rsyslog test I am thinking the following useing rsyslog 4.1.7 enable input file set the main queue mode to disk enable fsyncs everwhere set the output to log *.* to a file run a cron job that rolls the log file once a min and sends a HUP to rsyslog create a large file of log information run this for a while and then count the number of logs in each rolled log file. hopefully the number will be reasonably consistant. does this sound like a reasonable approach? or is this going to not be representitive for some reason? David Lang From david at lang.hm Fri May 8 09:06:37 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 00:06:37 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com> Message-ID: On Thu, 7 May 2009, Rainer Gerhards wrote: >>> So, I'd appreciate if you could have a look at sections 3.2 >> and 3.3 of >>> >>> http://www.rsyslog.com/download/design.pdf >> >> overall it looks good. >> >> one suggestion I would make is that since message based >> failures cannot be >> reliably detected, I would consider using the same failure >> process for all >> failures, and declare a message as bad if it fails the max >> retry number of >> times by itself (once you hit n=1) > > But then you either > > A) do not need the batch logic at all (because the action is configured for > infinite retries) > > Or > > B) you loose many messages if the action is not configured for infinite > retries and you have a longer-duration outage e.g. on a database server. > Let's say it is offline for a couple of hours, then you lose almost > everything in that period > > To prevent this, you need two different retry methods. good point. the problem is trying to figure out which type of failure you have. some failures can be identified by the output module as being data driven or infrastructure, but there are cases where it just can't tell (especially when talking to remote servers, database, relp, etc) how should these be handled? David Lang >> otherwise you end up resubmitting the entire batch a number of times >> before you try to narrow it down to the particular message. since the >> process of finding the bad message will take a number of >> retries, and then >> you will want to retry the suspect message several times (to >> make sure >> that it's really a message error, not a action error) this >> could result in >> a lot of retries. >> >> also, the algorithm that you posted has a subtle difference >> from what I >> had listed. > > It must, because it has two different levels of retries. > >> yours is more straightforward and easier to >> understand (and >> requires no global knowledge), I think that mine is more >> efficiant in the >> rare failure case. there is a potential (very subtle) race >> condition in >> this area that will need attention when we get down to lower level >> discussion (no matter which algorithm is used) >> >> at this point I don't see this as critical (not even very >> important) as we >> are talking high-level concepts at this point, but I wanted >> to note this >> for a future conversation. > > I agree on that is is not critical at this point. I also have not even tried > to optimize it. The critical point is the discussion above on the two > different retry modes. It took me a lot of thinking to see the subtle issues, > but trying to do all with just one mode was the root cause of the problems at > least I faced. > > I am not sure how you could solve the dilemma above with just a single retry > mode. > >> >> >> two notes on the reliability section > > That's why I not mentioned this section - so far, it is just a copy of a > mailing list post (and all the comments it raised apply to it) > >> >> 1. I think we had figured out that reliability required >> touching each item >> 3 times instead of 2 (not 4 times as you note in the text) >> >> 2. I disagree with you on the idea that power issues should >> be handled at >> a different level. I'll try to track down some discussions on >> sysadmin/security mailing lists about this. > > Keep in mind that my key point is that you can not currently protect a busy > system against message loss. The issue is not if a power failure may happen. > I agree it can. I just think that you can not build a busy system without > using at least partial in-memory queuing, which by definition is not save > from power failures. So it doesn't make sense to protect a handful of > messages when we loose much more of them anyways. > >> >> David Lang >> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Fri May 8 09:23:07 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 8 May 2009 09:23:07 +0200 Subject: [rsyslog] untra-reliable speed test References: Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, May 08, 2009 2:05 AM > To: rsyslog-users > Subject: [rsyslog] untra-reliable speed test > > I have a box put togeaterh for a first cut at a speed test of rsyslog in > untra-reliable mode. the outline below is intended to minimize the number > of variables. > > the box is a dual quad-core opteron with 8G of ram, one SATA drive and a > fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 > (redhat stock kernel) I intend to format the SSD with ext2 (as the > application is providing data integrity, and to avoid the known > performance problems with ext3 and fsync) Just a question, because I do not know enough about ext2: does ext2 guarantee that when an application does fsync, all data, INCLUDING related file system control structures are written to disk? Or, to phrase it the other way around, can ext2 guarantee that fsync'ed data can always be read after a power failure. I think along the lines of some control structures not being written, thus the fsynced app data may be present on the disk, but cannot be accessed any longer. In the worst case, would it be possible that a whole file be lost during a file system check after reboot? My *uneducated* understanding is that ext3 does guard against this (thus the performance problems) but ext2 does not. If my understanding would be correct (and I don't say so), we would need to use ext3. > for the rsyslog test I am thinking the following > > useing rsyslog 4.1.7 > enable input file Not sure if I got this bullet point right. Do you mean you intend to use imfile for input generation? In any case, I would suggest to do a test with UDP and one with TCP senders, both sending at maximum rate. With UDP, we would see a message loss rate, while with TCP we would see the actual number of messages that the system can process. So TCP is probably the more meaningful number, but packet loss rate for UDP - a common use case - would also be interesting, at least I think so. > set the main queue mode to disk > enable fsyncs everwhere Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is a *real* performance eater and puts a lot of burden on the consistency of the file system's control structures, thus my question on ext2 vs. ext3 above). > set the output to log *.* to a file > > run a cron job that rolls the log file once a min and sends a HUP to > rsyslog > > create a large file of log information > > run this for a while and then count the number of logs in each rolled log > file. hopefully the number will be reasonably consistant. > > does this sound like a reasonable approach? or is this going to not be > representitive for some reason? With the few comments above, I think this is a very reasonable approach and should provide very good insight. Actually, I hope that it can prove my point that this setup is too slow wrong... Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri May 8 10:04:22 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 8 May 2009 10:04:22 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B057@GRFEXC.intern.adiscon.com> > >> one suggestion I would make is that since message based > >> failures cannot be > >> reliably detected, I would consider using the same failure > >> process for all > >> failures, and declare a message as bad if it fails the max > >> retry number of > >> times by itself (once you hit n=1) > > > > But then you either > > > > A) do not need the batch logic at all (because the action is configured for > > infinite retries) > > > > Or > > > > B) you loose many messages if the action is not configured for infinite > > retries and you have a longer-duration outage e.g. on a database server. > > Let's say it is offline for a couple of hours, then you lose almost > > everything in that period > > > > To prevent this, you need two different retry methods. > > good point. > > the problem is trying to figure out which type of failure you have. I agree, but we face this problem in any case. For example, you can consider the v3 engine to be using A) logic. That, by the way, was why it took me so long to understand the other use case you validly described. I didn't see how the retry handling could make a difference because the end result seemingly was the same (but not so if you have two different failure scenarios and do different handling). The moral from the story, I think, is that we must try to differentiate between the two. > some failures can be identified by the output module as being data driven > or infrastructure, but there are cases where it just can't tell > (especially when talking to remote servers, database, relp, etc) > > how should these be handled? I think this mostly depends on the quality of the output module. First of all, "mostly" implies that there may be some other cases, where it really is impossible to differentiate between the two. In that case, I would treat the issue as an action-caused failure. There are two reasons for this: 1) rsyslog v3 currently does this always and not even a single person complained about that so far. This is an empiric argument, and it does not mean it caused problems. But it carries the co-notation that this seems not to be too bad. 2) If we would treat it as message-caused failure, we would no longer be able to handle extended outages of destination systems, which I consider a vitally important feature. When weighing the two, I know of lots of people who rely on 2), in sharp contrast to no person having problems with 1). So my conclusion is that it is less problematic to define an otherwise undefinable failure reason to be action-caused. Even more so as I assume this problem only exists in the minority of cases. Now back to the quality of the output module: thinking about databases, their API is usually very good at conveying back if there was a SQL error or a connection abort. So while a SQL error may also be an indication of a configuration problem, I would strongly tend to treat it is a being message-caused. This is under the assumption that any reasonable responsive admin will hopefully test his configuration at least once before turning it into production. And config SQL errors should manifest immediately, so I expect these to be fixed before a configuration runs in production. So it is the chore of the output module to interpret the return code it received from its API and decide whether this is more likely action-caused or message-caused. For database outputs, I would assume that it is always easy to classify failures that can only be action-caused, especially in the dominating case of a failed network connection or a failed server. For other outputs it may not be as easy. But, for example, all stream network outputs can detect a broken connection, so this also is a sure fit. For dynafiles, it really depends on how hard it is tried to differentiate between the two cases. But I think you can go great length here, too. Especially if you do not only look at the creat() return code, but, iff a failure occurs, you do more API calls to find out the cause. So I think the remaining problem is small enough to cause not too much issues (and if so, they are unavoidable in any case). Rainer From david at lang.hm Fri May 8 10:18:37 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 01:18:37 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 8 May 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> I have a box put togeaterh for a first cut at a speed test of rsyslog in >> untra-reliable mode. the outline below is intended to minimize the number >> of variables. >> >> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a >> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 >> (redhat stock kernel) I intend to format the SSD with ext2 (as the >> application is providing data integrity, and to avoid the known >> performance problems with ext3 and fsync) > > Just a question, because I do not know enough about ext2: does ext2 guarantee > that when an application does fsync, all data, INCLUDING related file system > control structures are written to disk? Or, to phrase it the other way > around, can ext2 guarantee that fsync'ed data can always be read after a > power failure. I think along the lines of some control structures not being > written, thus the fsynced app data may be present on the disk, but cannot be > accessed any longer. In the worst case, would it be possible that a whole > file be lost during a file system check after reboot? > > My *uneducated* understanding is that ext3 does guard against this (thus the > performance problems) but ext2 does not. the performance problem with ext3 is that it forces ALL pending writes to disk when anything does a fsync now that you mention it, I think that with all filesystems other than ext2 you need to do a fsync on the directory as well as on the file > If my understanding would be correct (and I don't say so), we would need to > use ext3. I'll try both (and later on, when I use by own kernel rather than the redhat one I'll also test XFS) I think that if no other disk activity is taking place ext3 maynot be too bad (one other advantage that ext2 would have over ext3 and XFS is that journaling filesystems have to write whatever they journal twice (once to the journal and once to the final location) >> for the rsyslog test I am thinking the following >> >> useing rsyslog 4.1.7 >> enable input file > > Not sure if I got this bullet point right. Do you mean you intend to use > imfile for input generation? yes, that was my intent. just to simplify things by making the test completely self contained to the one box. > In any case, I would suggest to do a test with UDP and one with TCP senders, > both sending at maximum rate. With UDP, we would see a message loss rate, > while with TCP we would see the actual number of messages that the system can > process. So TCP is probably the more meaningful number, but packet loss rate > for UDP - a common use case - would also be interesting, at least I think so. will do. I will be interested in seeing the UDP loss rate, I suspect that with appropriate OS tuning I can get it down to zero loss rate at the data rates that the rest of the system maintains (the OS has a buffer prior to rsyslog's input process that can cover delays on the input threads) >> set the main queue mode to disk >> enable fsyncs everwhere > > > Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is > a *real* performance eater and puts a lot of burden on the consistency of the > file system's control structures, thus my question on ext2 vs. ext3 above). does this do a fsync on the directory. >> set the output to log *.* to a file >> >> run a cron job that rolls the log file once a min and sends a HUP to >> rsyslog >> >> create a large file of log information >> >> run this for a while and then count the number of logs in each rolled log >> file. hopefully the number will be reasonably consistant. >> >> does this sound like a reasonable approach? or is this going to not be >> representitive for some reason? > > With the few comments above, I think this is a very reasonable approach and > should provide very good insight. > > Actually, I hope that it can prove my point that this setup is too slow > wrong... there will definantly be a performance issue at some point here, the question is if it's fast enough to be useable. the drive claims to be able to do >100,000 I/O ops/sec. if we can manage to get a few thousand logs/sec written on this, it will be extremely usable. David Lang From david at lang.hm Fri May 8 10:20:28 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 01:20:28 -0700 (PDT) Subject: [rsyslog] output plugin calling interface In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B057@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B057@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 8 May 2009, Rainer Gerhards wrote: >>>> one suggestion I would make is that since message based >>>> failures cannot be >>>> reliably detected, I would consider using the same failure >>>> process for all >>>> failures, and declare a message as bad if it fails the max >>>> retry number of >>>> times by itself (once you hit n=1) >>> >>> But then you either >>> >>> A) do not need the batch logic at all (because the action is configured > for >>> infinite retries) >>> >>> Or >>> >>> B) you loose many messages if the action is not configured for infinite >>> retries and you have a longer-duration outage e.g. on a database server. >>> Let's say it is offline for a couple of hours, then you lose almost >>> everything in that period >>> >>> To prevent this, you need two different retry methods. >> >> good point. >> >> the problem is trying to figure out which type of failure you have. > > I agree, but we face this problem in any case. For example, you can consider > the v3 engine to be using A) logic. That, by the way, was why it took me so > long to understand the other use case you validly described. I didn't see how > the retry handling could make a difference because the end result seemingly > was the same (but not so if you have two different failure scenarios and do > different handling). The moral from the story, I think, is that we must try > to differentiate between the two. > > >> some failures can be identified by the output module as being data driven >> or infrastructure, but there are cases where it just can't tell >> (especially when talking to remote servers, database, relp, etc) >> >> how should these be handled? > > I think this mostly depends on the quality of the output module. > > First of all, "mostly" implies that there may be some other cases, where it > really is impossible to differentiate between the two. In that case, I would > treat the issue as an action-caused failure. There are two reasons for this: > > 1) rsyslog v3 currently does this always and not even a single person > complained about that so far. This is an empiric argument, and it does not > mean it caused problems. But it carries the co-notation that this seems not > to be too bad. > > 2) If we would treat it as message-caused failure, we would no longer be able > to handle extended outages of destination systems, which I consider a vitally > important feature. > > When weighing the two, I know of lots of people who rely on 2), in sharp > contrast to no person having problems with 1). So my conclusion is that it is > less problematic to define an otherwise undefinable failure reason to be > action-caused. Even more so as I assume this problem only exists in the > minority of cases. > > Now back to the quality of the output module: thinking about databases, their > API is usually very good at conveying back if there was a SQL error or a > connection abort. So while a SQL error may also be an indication of a > configuration problem, I would strongly tend to treat it is a being > message-caused. This is under the assumption that any reasonable responsive > admin will hopefully test his configuration at least once before turning it > into production. And config SQL errors should manifest immediately, so I > expect these to be fixed before a configuration runs in production. So it is > the chore of the output module to interpret the return code it received from > its API and decide whether this is more likely action-caused or > message-caused. For database outputs, I would assume that it is always easy > to classify failures that can only be action-caused, especially in the > dominating case of a failed network connection or a failed server. > > For other outputs it may not be as easy. But, for example, all stream network > outputs can detect a broken connection, so this also is a sure fit. > > For dynafiles, it really depends on how hard it is tried to differentiate > between the two cases. But I think you can go great length here, too. > Especially if you do not only look at the creat() return code, but, iff a > failure occurs, you do more API calls to find out the cause. > > So I think the remaining problem is small enough to cause not too much issues > (and if so, they are unavoidable in any case). sounds reasonable. David Lang From rgerhards at hq.adiscon.com Fri May 8 10:34:23 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 08 May 2009 10:34:23 +0200 Subject: [rsyslog] untra-reliable speed test In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> Message-ID: <1241771663.25612.139.camel@rf10up.intern.adiscon.com> On Fri, 2009-05-08 at 01:18 -0700, david at lang.hm wrote: > On Fri, 8 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> I have a box put togeaterh for a first cut at a speed test of rsyslog in > >> untra-reliable mode. the outline below is intended to minimize the number > >> of variables. > >> > >> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a > >> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 > >> (redhat stock kernel) I intend to format the SSD with ext2 (as the > >> application is providing data integrity, and to avoid the known > >> performance problems with ext3 and fsync) > > > > Just a question, because I do not know enough about ext2: does ext2 guarantee > > that when an application does fsync, all data, INCLUDING related file system > > control structures are written to disk? Or, to phrase it the other way > > around, can ext2 guarantee that fsync'ed data can always be read after a > > power failure. I think along the lines of some control structures not being > > written, thus the fsynced app data may be present on the disk, but cannot be > > accessed any longer. In the worst case, would it be possible that a whole > > file be lost during a file system check after reboot? > > > > My *uneducated* understanding is that ext3 does guard against this (thus the > > performance problems) but ext2 does not. > > the performance problem with ext3 is that it forces ALL pending writes to > disk when anything does a fsync > > now that you mention it, I think that with all filesystems other than ext2 I think you meant ext3 here? > you need to do a fsync on the directory as well as on the file > another uneducated question: does that ensure that all fs control structures be written? I mean things like the chain that links file parts together. My understanding is the answer is "yes", but I prefer to ask as I am not 100% sure. > > If my understanding would be correct (and I don't say so), we would need to > > use ext3. > > I'll try both (and later on, when I use by own kernel rather than the > redhat one I'll also test XFS) > > I think that if no other disk activity is taking place ext3 maynot be too > bad (one other advantage that ext2 would have over ext3 and XFS is that > journaling filesystems have to write whatever they journal twice (once to > the journal and once to the final location) ack > > >> for the rsyslog test I am thinking the following > >> > >> useing rsyslog 4.1.7 > >> enable input file > > > > Not sure if I got this bullet point right. Do you mean you intend to use > > imfile for input generation? > > yes, that was my intent. just to simplify things by making the test > completely self contained to the one box. there is a kind of interaction between imfile and the queue in that imfile flags its messages as "delayable", which was introduced to prevent imfile unnecessarily putting data too fast into the queue. But on the other hand, this should tune the system to the actual max rate (at least in theory). > > > In any case, I would suggest to do a test with UDP and one with TCP senders, > > both sending at maximum rate. With UDP, we would see a message loss rate, > > while with TCP we would see the actual number of messages that the system can > > process. So TCP is probably the more meaningful number, but packet loss rate > > for UDP - a common use case - would also be interesting, at least I think so. > > will do. > > I will be interested in seeing the UDP loss rate, I suspect that with > appropriate OS tuning I can get it down to zero loss rate at the data > rates that the rest of the system maintains (the OS has a buffer prior to > rsyslog's input process that can cover delays on the input threads) Let's say you find out the max rate R via e.g. TCP, and then use R as an upper bound of the UDP traffic, that should work. But I would also find it interesting to see how many messages are dropped if you send at a rate >> R. I would not be surprised if the resulting commit rate would be (even far) below R. > > >> set the main queue mode to disk > >> enable fsyncs everwhere > > > > > > Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is > > a *real* performance eater and puts a lot of burden on the consistency of the > > file system's control structures, thus my question on ext2 vs. ext3 above). > > does this do a fsync on the directory. No! But I think it would be easy to add (but easy only in a non-optimized way, optimization would take more effort). > > >> set the output to log *.* to a file > >> > >> run a cron job that rolls the log file once a min and sends a HUP to > >> rsyslog > >> > >> create a large file of log information > >> > >> run this for a while and then count the number of logs in each rolled log > >> file. hopefully the number will be reasonably consistant. > >> > >> does this sound like a reasonable approach? or is this going to not be > >> representitive for some reason? > > > > With the few comments above, I think this is a very reasonable approach and > > should provide very good insight. > > > > Actually, I hope that it can prove my point that this setup is too slow > > wrong... > > there will definantly be a performance issue at some point here, the > question is if it's fast enough to be useable. > > the drive claims to be able to do >100,000 I/O ops/sec. if we can manage > to get a few thousand logs/sec written on this, it will be extremely > usable. OK, a "few thousand" is not what I have on my mind for a high-performance system (a "few ten-thousand), but I agree that it can be considered a busy system. So a "few thousand" (maybe more than 5,000?) should be sufficient to prove the original point - especially as harware gets faster AND you can use solid state disks or similar mechanisms (if assuming they qualify for the reliability criteria). One thing we need to think about is burst traffic rate, especially with UDP. I tend to think that such a system must be able to support UDP traffic, too (what is a questionable opinion) and, if so, we must not only look at the sustained but even more at the burst rate. As I side-note, you will probably see that the disk queue can be optimized. If sufficient effort is made, I think it can perform at least perform faster at a factor of four to six. The reason is that it was never really meant to be used on a busy box in this way. While knowing this, we should not start a new discussion about these optimizations, simply because they take considerable additional time and we can not fit that part into anything we have on our mind for the forseable future. Rainer From david at lang.hm Fri May 8 11:07:52 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 02:07:52 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test In-Reply-To: <1241771663.25612.139.camel@rf10up.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <1241771663.25612.139.camel@rf10up.intern.adiscon.com> Message-ID: On Fri, 8 May 2009, Rainer Gerhards wrote: > On Fri, 2009-05-08 at 01:18 -0700, david at lang.hm wrote: >> On Fri, 8 May 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> >>>> I have a box put togeaterh for a first cut at a speed test of rsyslog in >>>> untra-reliable mode. the outline below is intended to minimize the number >>>> of variables. >>>> >>>> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a >>>> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 >>>> (redhat stock kernel) I intend to format the SSD with ext2 (as the >>>> application is providing data integrity, and to avoid the known >>>> performance problems with ext3 and fsync) >>> >>> Just a question, because I do not know enough about ext2: does ext2 guarantee >>> that when an application does fsync, all data, INCLUDING related file system >>> control structures are written to disk? Or, to phrase it the other way >>> around, can ext2 guarantee that fsync'ed data can always be read after a >>> power failure. I think along the lines of some control structures not being >>> written, thus the fsynced app data may be present on the disk, but cannot be >>> accessed any longer. In the worst case, would it be possible that a whole >>> file be lost during a file system check after reboot? >>> >>> My *uneducated* understanding is that ext3 does guard against this (thus the >>> performance problems) but ext2 does not. >> >> the performance problem with ext3 is that it forces ALL pending writes to >> disk when anything does a fsync >> >> now that you mention it, I think that with all filesystems other than ext2 > I think you meant ext3 here? > >> you need to do a fsync on the directory as well as on the file >> > > another uneducated question: does that ensure that all fs control > structures be written? I mean things like the chain that links file > parts together. My understanding is the answer is "yes", but I prefer to > ask as I am not 100% sure. yes, if you do a fsync on the file and on the directory the file is in you are absolutly safe. this is what the good mail servers do when recieving a message. if the file size does not change (say you pre-allocate the file, or are overwriting a file, like you could be doing for a queue) you don't have to do the fsync on the directory. >>> If my understanding would be correct (and I don't say so), we would need to >>> use ext3. >> >> I'll try both (and later on, when I use by own kernel rather than the >> redhat one I'll also test XFS) >> >> I think that if no other disk activity is taking place ext3 maynot be too >> bad (one other advantage that ext2 would have over ext3 and XFS is that >> journaling filesystems have to write whatever they journal twice (once to >> the journal and once to the final location) > ack > >> >>>> for the rsyslog test I am thinking the following >>>> >>>> useing rsyslog 4.1.7 >>>> enable input file >>> >>> Not sure if I got this bullet point right. Do you mean you intend to use >>> imfile for input generation? >> >> yes, that was my intent. just to simplify things by making the test >> completely self contained to the one box. > > there is a kind of interaction between imfile and the queue in that > imfile flags its messages as "delayable", which was introduced to > prevent imfile unnecessarily putting data too fast into the queue. But > on the other hand, this should tune the system to the actual max rate > (at least in theory). > >> >>> In any case, I would suggest to do a test with UDP and one with TCP senders, >>> both sending at maximum rate. With UDP, we would see a message loss rate, >>> while with TCP we would see the actual number of messages that the system can >>> process. So TCP is probably the more meaningful number, but packet loss rate >>> for UDP - a common use case - would also be interesting, at least I think so. >> >> will do. >> >> I will be interested in seeing the UDP loss rate, I suspect that with >> appropriate OS tuning I can get it down to zero loss rate at the data >> rates that the rest of the system maintains (the OS has a buffer prior to >> rsyslog's input process that can cover delays on the input threads) > > Let's say you find out the max rate R via e.g. TCP, and then use R as an > upper bound of the UDP traffic, that should work. But I would also find > it interesting to see how many messages are dropped if you send at a > rate >> R. I would not be surprised if the resulting commit rate would > be (even far) below R. it depends on where things get dropped. if I send enough UDP packets to flood the OS buffer, it will drop the packets and rsyslog will never know that they existed. below that, when rsyslog has a full queue and there is lock contention between the thread trying to insert messages into the queue and the thread pulling messages out of the queue it does slow down. I don't know if that will be visable on the disk-based queue, but it was _very_ visable on the memory based queue. >> >>>> set the main queue mode to disk >>>> enable fsyncs everwhere >>> >>> >>> Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is >>> a *real* performance eater and puts a lot of burden on the consistency of the >>> file system's control structures, thus my question on ext2 vs. ext3 above). >> >> does this do a fsync on the directory. > > No! But I think it would be easy to add (but easy only in a > non-optimized way, optimization would take more effort). I'll test as-is, and if the numbers are high enough to be interesting, we'll hack that in and see how badly it hurts us (to drive things in a worst-case way) >> >>>> set the output to log *.* to a file >>>> >>>> run a cron job that rolls the log file once a min and sends a HUP to >>>> rsyslog >>>> >>>> create a large file of log information >>>> >>>> run this for a while and then count the number of logs in each rolled log >>>> file. hopefully the number will be reasonably consistant. >>>> >>>> does this sound like a reasonable approach? or is this going to not be >>>> representitive for some reason? >>> >>> With the few comments above, I think this is a very reasonable approach and >>> should provide very good insight. >>> >>> Actually, I hope that it can prove my point that this setup is too slow >>> wrong... >> >> there will definantly be a performance issue at some point here, the >> question is if it's fast enough to be useable. >> >> the drive claims to be able to do >100,000 I/O ops/sec. if we can manage >> to get a few thousand logs/sec written on this, it will be extremely >> usable. > > OK, a "few thousand" is not what I have on my mind for a > high-performance system (a "few ten-thousand), but I agree that it can > be considered a busy system. So a "few thousand" (maybe more than > 5,000?) should be sufficient to prove the original point - especially as > harware gets faster AND you can use solid state disks or similar > mechanisms (if assuming they qualify for the reliability criteria). I'm a bit amused by this criteria. IIRC, when I started playing with rsyslog before any of the performance improvements were done, wasn't this the best data rate that you could get out of rsyslog with a ram-based queue? i know that with two outputs (disk + relay) I was only getting ~30,000 messages/sec. (with disk only output it could get up to ~80,000) also note that these tests are being done on the version _without_ batch processing. I need to think about it a bit more to be sure there aren't any holes in my thinking, but I believe that you would only need to do one set of fsyncs per batch that's processed. so setting a batch size of 100 should increase the messages/sec by a similar factor. this is only on the output side for now, but if this proves to be interesting, some inputs could batch as well (from your comments it sounds as if relp can send a batch of messages and then get acknowledgement of all of them at once, if so, that could serve as the input) > One thing we need to think about is burst traffic rate, especially with > UDP. I tend to think that such a system must be able to support UDP > traffic, too (what is a questionable opinion) and, if so, we must not > only look at the sustained but even more at the burst rate. yes and no. while I see the need to support UDP, it's not going to be reliable (the Os bufferes them before they get to the system, ignoring the network ability to drop them), and if you really need high UDP burst rates you could run two copies of rsyslog, one ultra-reliable (with reliable inputs), and a second one with a memory queue, feeding into the ultra-reliable one with a batched input method. but it will be good to see where the limits are. > As I side-note, you will probably see that the disk queue can be > optimized. If sufficient effort is made, I think it can perform at least > perform faster at a factor of four to six. The reason is that it was > never really meant to be used on a busy box in this way. While knowing > this, we should not start a new discussion about these optimizations, > simply because they take considerable additional time and we can not fit > that part into anything we have on our mind for the forseable future. yeah, I've been thinking of various things that could be done here, but I won't ask about any of them for now ;-) David Lang From rgerhards at hq.adiscon.com Fri May 8 13:28:08 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 08 May 2009 13:28:08 +0200 Subject: [rsyslog] ultra-reliable speed test In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <1241771663.25612.139.camel@rf10up.intern.adiscon.com> Message-ID: <1241782088.25612.188.camel@rf10up.intern.adiscon.com> On Fri, 2009-05-08 at 02:07 -0700, david at lang.hm wrote: > > another uneducated question: does that ensure that all fs control > > structures be written? I mean things like the chain that links file > > parts together. My understanding is the answer is "yes", but I prefer to > > ask as I am not 100% sure. > > yes, if you do a fsync on the file and on the directory the file is in you > are absolutly safe. this is what the good mail servers do when recieving a > message. > > if the file size does not change (say you pre-allocate the file, or are > overwriting a file, like you could be doing for a queue) you don't have to > do the fsync on the directory. > thanks and very good to know > > > > Let's say you find out the max rate R via e.g. TCP, and then use R as an > > upper bound of the UDP traffic, that should work. But I would also find > > it interesting to see how many messages are dropped if you send at a > > rate >> R. I would not be surprised if the resulting commit rate would > > be (even far) below R. > > it depends on where things get dropped. if I send enough UDP packets to > flood the OS buffer, it will drop the packets and rsyslog will never know > that they existed. that's what I am thinking (and concerned) about > below that, when rsyslog has a full queue and there is lock contention > between the thread trying to insert messages into the queue and the thread > pulling messages out of the queue it does slow down. I don't know if that > will be visable on the disk-based queue, but it was _very_ visable on the > memory based queue. > > >> > >>>> set the main queue mode to disk > >>>> enable fsyncs everwhere > >>> > >>> > >>> Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is > >>> a *real* performance eater and puts a lot of burden on the consistency of the > >>> file system's control structures, thus my question on ext2 vs. ext3 above). > >> > >> does this do a fsync on the directory. > > > > No! But I think it would be easy to add (but easy only in a > > non-optimized way, optimization would take more effort). > > I'll test as-is, and if the numbers are high enough to be interesting, > we'll hack that in and see how badly it hurts us (to drive things in a > worst-case way) ack > > OK, a "few thousand" is not what I have on my mind for a > > high-performance system (a "few ten-thousand), but I agree that it can > > be considered a busy system. So a "few thousand" (maybe more than > > 5,000?) should be sufficient to prove the original point - especially as > > harware gets faster AND you can use solid state disks or similar > > mechanisms (if assuming they qualify for the reliability criteria). > > I'm a bit amused by this criteria. IIRC, when I started playing with > rsyslog before any of the performance improvements were done, wasn't this > the best data rate that you could get out of rsyslog with a ram-based > queue? > > i know that with two outputs (disk + relay) I was only getting ~30,000 > messages/sec. (with disk only output it could get up to ~80,000) > That's the price you have to pay for educating me ;) You convinced me that this data rate is too slow for a really busy server, and so I am now applying that knowledge ;) > also note that these tests are being done on the version _without_ batch > processing. I need to think about it a bit more to be sure there aren't > any holes in my thinking, but I believe that you would only need to do one > set of fsyncs per batch that's processed. so setting a batch size of 100 > should increase the messages/sec by a similar factor. I hadn't thought about this, but now that you say it, I agree. Actually, an fsync per queue lock release would probably be the rigth criterion. I think that is almost equivalent to what you said, but the advantage of that definition is that I can simply watch out for these *already existing* places as a guideline. That can indeed make a considerable difference. > this is only on the output side for now, but if this proves to be > interesting, some inputs could batch as well (from your comments it sounds > as if relp can send a batch of messages and then get acknowledgement of > all of them at once, if so, that could serve as the input) That's a sliding window, but this is something that really does not belong into the app layer (and is not visible their). It is the same thing as the tcp sliding window, which you know to exist but do not know any specifics of. Even if we would make the relp sliding window visible to the app layer, it wouldn't provide much benefit. The only I can think of is lock contention but with the queue workers acquiring the lock now only once per batch, the probability is greatly reduced. > > One thing we need to think about is burst traffic rate, especially with > > UDP. I tend to think that such a system must be able to support UDP > > traffic, too (what is a questionable opinion) and, if so, we must not > > only look at the sustained but even more at the burst rate. > > yes and no. while I see the need to support UDP, it's not going to be > reliable (the Os bufferes them before they get to the system, ignoring the > network ability to drop them), and if you really need high UDP burst rates > you could run two copies of rsyslog, one ultra-reliable (with reliable > inputs), and a second one with a memory queue, feeding into the > ultra-reliable one with a batched input method. ack - as I said, the opinion is questionable... But what if you have important devices that simply do not speak anything else but UDP (they still seem to exist...). However, think of it that way: You limit the max burst rate by using an ultra-reliable queue. You do so, because you do not want to lose messages when a sudden power failure occurs. To support that configuration, you need to run the second instance. It queues in memory until the (slower) reliable rsyslogd can now accept the message and put it into the reliable queue. Let's say that you have a burst of r messages and that from these burst only r/2 can be enqueued (because the ultra reliable queue is so slow). So you lose r/2 messages. Now consider the case that you run rsyslog with just a reliable queue, one that is kept in memory but not able to cover the power failure scenario. Obviously, all messages in that queue are lost when power fails (or almost all to be precise). However, that system has a much broader bandwidth. So with it, there would never have been r messages inside the queue, because that system has a much higher sustained message rate (and thus the burst causes much less of trouble). Let's say the system is just twice as fast in this setup (I guess it usually would be *much* faster). Than, it would be able to process all r records. In that scenario, the ultra-reliable system loses r/2 messages, whereas the somewhat more "unreliable" system loses none - by virtue of being able to process messages as they arrive. Now extend that picture to messages residing inside the OS buffers or even those that are still queued in their sources because a stream transport blocked sending them. I know that each detail of this picture can be argued at length about. However, my opinion is that there is no "ultra-reliable" system in life, only various probabilities in losing messages. These probabilities often depend on each other, what makes calculating them very hard to impossible. Still, the probability of message loss in the system at large is just the product of the probabilities in each of its components. And reliability is just the inverse of that probability. This is where *I* conclude that it can make sense to permit a system to lose some messages under certain circumstances, if that influences the overall probability calculation towards the desired end result. In that sense, I tend to think that a fast, memory-queuing rsyslogd instance can be much more reliable compared to one that is configured as being ultra-reliable, where the rest of the system at large is badly influenced by this (the scenario above). However, I also know that for regulatory requirements, you often seem to need to prove that a system may not lose messages once it has received them, even at the cost of an overall increased probability of message loss. My view of reliability is much the same as my view of security: there is no such thing as "being totally secure", you can just reduce the probability that something bad happens. The worst thing in security is someone who thinks he is "totally secure" and as such is no longer actively looking at potential issues. The same I see for reliability. There is no thing like "being totally reliable" and it is a really bad idea to think you could ever be. Knowing this, one may begin to think about how to decrease the overall probability of message loss AND think about what rate is acceptable (and what to do with these cases, e.g. "how can they hurt"). ... but ... enough of philosophy, I am not sure if it helps this discussion ;) (but I thought it is useful to "see" what I have on my mind when talking about these things). > > As I side-note, you will probably see that the disk queue can be > > optimized. If sufficient effort is made, I think it can perform at least > > perform faster at a factor of four to six. The reason is that it was > > never really meant to be used on a busy box in this way. While knowing > > this, we should not start a new discussion about these optimizations, > > simply because they take considerable additional time and we can not fit > > that part into anything we have on our mind for the forseable future. > > yeah, I've been thinking of various things that could be done here, but I > won't ask about any of them for now ;-) Oh yes, a broad range. Simple things like zipping the data and keeping all handles always open to complex things like a dedicated, random-accesss, database-like disk queue store (being even preformatted). If you look at the code, you'll possibly notice that the disk queue system uses stream drivers to persist the data. This would be the hook to extend. ... but: that's a story for another quarter ;) Thanks again for your careful thought-out comments, they really help in getting things right. Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From ktm at rice.edu Fri May 8 15:04:20 2009 From: ktm at rice.edu (Kenneth Marshall) Date: Fri, 8 May 2009 08:04:20 -0500 Subject: [rsyslog] untra-reliable speed test In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> Message-ID: <20090508130420.GD23405@it.is.rice.edu> On Fri, May 08, 2009 at 01:18:37AM -0700, david at lang.hm wrote: > On Fri, 8 May 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> I have a box put togeaterh for a first cut at a speed test of rsyslog in > >> untra-reliable mode. the outline below is intended to minimize the number > >> of variables. > >> > >> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a > >> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 > >> (redhat stock kernel) I intend to format the SSD with ext2 (as the > >> application is providing data integrity, and to avoid the known > >> performance problems with ext3 and fsync) > > > > Just a question, because I do not know enough about ext2: does ext2 guarantee > > that when an application does fsync, all data, INCLUDING related file system > > control structures are written to disk? Or, to phrase it the other way > > around, can ext2 guarantee that fsync'ed data can always be read after a > > power failure. I think along the lines of some control structures not being > > written, thus the fsynced app data may be present on the disk, but cannot be > > accessed any longer. In the worst case, would it be possible that a whole > > file be lost during a file system check after reboot? > > > > My *uneducated* understanding is that ext3 does guard against this (thus the > > performance problems) but ext2 does not. > > the performance problem with ext3 is that it forces ALL pending writes to > disk when anything does a fsync > > now that you mention it, I think that with all filesystems other than ext2 > you need to do a fsync on the directory as well as on the file > > > If my understanding would be correct (and I don't say so), we would need to > > use ext3. > FYI, I think if you use ext3 with data=writeback, you will not have the flush everything problem. Of course, you will need to precreate the files. Regards, Ken > I'll try both (and later on, when I use by own kernel rather than the > redhat one I'll also test XFS) > > I think that if no other disk activity is taking place ext3 maynot be too > bad (one other advantage that ext2 would have over ext3 and XFS is that > journaling filesystems have to write whatever they journal twice (once to > the journal and once to the final location) > > >> for the rsyslog test I am thinking the following > >> > >> useing rsyslog 4.1.7 > >> enable input file > > > > Not sure if I got this bullet point right. Do you mean you intend to use > > imfile for input generation? > > yes, that was my intent. just to simplify things by making the test > completely self contained to the one box. > > > In any case, I would suggest to do a test with UDP and one with TCP senders, > > both sending at maximum rate. With UDP, we would see a message loss rate, > > while with TCP we would see the actual number of messages that the system can > > process. So TCP is probably the more meaningful number, but packet loss rate > > for UDP - a common use case - would also be interesting, at least I think so. > > will do. > > I will be interested in seeing the UDP loss rate, I suspect that with > appropriate OS tuning I can get it down to zero loss rate at the data > rates that the rest of the system maintains (the OS has a buffer prior to > rsyslog's input process that can cover delays on the input threads) > > >> set the main queue mode to disk > >> enable fsyncs everwhere > > > > > > Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is > > a *real* performance eater and puts a lot of burden on the consistency of the > > file system's control structures, thus my question on ext2 vs. ext3 above). > > does this do a fsync on the directory. > > >> set the output to log *.* to a file > >> > >> run a cron job that rolls the log file once a min and sends a HUP to > >> rsyslog > >> > >> create a large file of log information > >> > >> run this for a while and then count the number of logs in each rolled log > >> file. hopefully the number will be reasonably consistant. > >> > >> does this sound like a reasonable approach? or is this going to not be > >> representitive for some reason? > > > > With the few comments above, I think this is a very reasonable approach and > > should provide very good insight. > > > > Actually, I hope that it can prove my point that this setup is too slow > > wrong... > > there will definantly be a performance issue at some point here, the > question is if it's fast enough to be useable. > > the drive claims to be able to do >100,000 I/O ops/sec. if we can manage > to get a few thousand logs/sec written on this, it will be extremely > usable. > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Fri May 8 18:01:24 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 8 May 2009 18:01:24 +0200 Subject: [rsyslog] output plugin calling interface References: <9B6E2A8877C38245BFB15CC491A11DA702B038@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B039@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B03C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B04D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B051@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702B057@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B061@GRFEXC.intern.adiscon.com> David, I have updated http://www.rsyslog.com/download/design.pdf So the definitions should be more clear now. But I managed to edit only up to section 2.6. Note that there is a new state diagram in 2.5, but limited description so far. But I guess it is useful for you. In 2.4, you find the definition of what I call the "at-risk-set" of messages. These are the ones that are potentially lost during e.g. a power failure. Note that that set is non-empty even if we have ultra-reliable queues, because there exist some messages which were not yet enqueued (those that the inputs are working on). Rainer From david at lang.hm Fri May 8 19:37:41 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 10:37:41 -0700 (PDT) Subject: [rsyslog] ultra-reliable speed test In-Reply-To: <1241782088.25612.188.camel@rf10up.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <1241771663.25612.139.camel@rf10up.intern.adiscon.com> <1241782088.25612.188.camel@rf10up.intern.adiscon.com> Message-ID: On Fri, 8 May 2009, Rainer Gerhards wrote: >> also note that these tests are being done on the version _without_ batch >> processing. I need to think about it a bit more to be sure there aren't >> any holes in my thinking, but I believe that you would only need to do one >> set of fsyncs per batch that's processed. so setting a batch size of 100 >> should increase the messages/sec by a similar factor. > > I hadn't thought about this, but now that you say it, I agree. Actually, > an fsync per queue lock release would probably be the rigth criterion. I > think that is almost equivalent to what you said, but the advantage of > that definition is that I can simply watch out for these *already > existing* places as a guideline. That can indeed make a considerable > difference. exactly. >> this is only on the output side for now, but if this proves to be >> interesting, some inputs could batch as well (from your comments it sounds >> as if relp can send a batch of messages and then get acknowledgement of >> all of them at once, if so, that could serve as the input) > > That's a sliding window, but this is something that really does not > belong into the app layer (and is not visible their). It is the same > thing as the tcp sliding window, which you know to exist but do not know > any specifics of. > > Even if we would make the relp sliding window visible to the app layer, > it wouldn't provide much benefit. The only I can think of is lock > contention but with the queue workers acquiring the lock now only once > per batch, the probability is greatly reduced. doing a fsync once per batch would also be a considerable savings (assuming the basic rate is high enough to be meaningful) this would mean a change to the relp definition, it would need to have each side pass the other it's 'max batch size' when the systems connect for the first time (defaulting to 1 if the other side doesn't say anything) >>> One thing we need to think about is burst traffic rate, especially with >>> UDP. I tend to think that such a system must be able to support UDP >>> traffic, too (what is a questionable opinion) and, if so, we must not >>> only look at the sustained but even more at the burst rate. >> >> yes and no. while I see the need to support UDP, it's not going to be >> reliable (the Os bufferes them before they get to the system, ignoring the >> network ability to drop them), and if you really need high UDP burst rates >> you could run two copies of rsyslog, one ultra-reliable (with reliable >> inputs), and a second one with a memory queue, feeding into the >> ultra-reliable one with a batched input method. > > ack - as I said, the opinion is questionable... But what if you have > important devices that simply do not speak anything else but UDP (they > still seem to exist...). > > However, think of it that way: > > You limit the max burst rate by using an ultra-reliable queue. You do > so, because you do not want to lose messages when a sudden power failure > occurs. To support that configuration, you need to run the second > instance. It queues in memory until the (slower) reliable rsyslogd can > now accept the message and put it into the reliable queue. Let's say > that you have a burst of r messages and that from these burst only r/2 > can be enqueued (because the ultra reliable queue is so slow). So you > lose r/2 messages. > > Now consider the case that you run rsyslog with just a reliable queue, > one that is kept in memory but not able to cover the power failure > scenario. Obviously, all messages in that queue are lost when power > fails (or almost all to be precise). However, that system has a much > broader bandwidth. So with it, there would never have been r messages > inside the queue, because that system has a much higher sustained > message rate (and thus the burst causes much less of trouble). Let's say > the system is just twice as fast in this setup (I guess it usually would > be *much* faster). Than, it would be able to process all r records. > > In that scenario, the ultra-reliable system loses r/2 messages, whereas > the somewhat more "unreliable" system loses none - by virtue of being > able to process messages as they arrive. > > > Now extend that picture to messages residing inside the OS buffers or > even those that are still queued in their sources because a stream > transport blocked sending them. > > I know that each detail of this picture can be argued at length about. > > However, my opinion is that there is no "ultra-reliable" system in life, > only various probabilities in losing messages. These probabilities > often depend on each other, what makes calculating them very hard to > impossible. Still, the probability of message loss in the system at > large is just the product of the probabilities in each of its > components. And reliability is just the inverse of that probability. > > This is where *I* conclude that it can make sense to permit a system to > lose some messages under certain circumstances, if that influences the > overall probability calculation towards the desired end result. In that > sense, I tend to think that a fast, memory-queuing rsyslogd instance can > be much more reliable compared to one that is configured as being > ultra-reliable, where the rest of the system at large is badly > influenced by this (the scenario above). > > However, I also know that for regulatory requirements, you often seem to > need to prove that a system may not lose messages once it has received > them, even at the cost of an overall increased probability of message > loss. it's a bit more than that. In my case I have two completely different use-cases, and will almost certinly end up running two different sets of rsyslog (potentially on different sets of servers) case #1 'normal system syslogs' 99.9% reliability (easy to achieve with UDP) is easily good enough. the sender is normal software that knows nothing about rsyslog high volume, mostly junk 'logs of record' the application is modified to do application level acknowledgements (relp or similar), and the system must be architected to not loose logs once they are acknowledged short of a disaster that physically destroys equipment (storage drives must be redundant so that a drive failure does not loose logs) low volume, every log is critical. > My view of reliability is much the same as my view of security: there is > no such thing as "being totally secure", you can just reduce the > probability that something bad happens. The worst thing in security is > someone who thinks he is "totally secure" and as such is no longer > actively looking at potential issues. > > The same I see for reliability. There is no thing like "being totally > reliable" and it is a really bad idea to think you could ever be. > Knowing this, one may begin to think about how to decrease the overall > probability of message loss AND think about what rate is acceptable (and > what to do with these cases, e.g. "how can they hurt"). > > ... but ... enough of philosophy, I am not sure if it helps this > discussion ;) (but I thought it is useful to "see" what I have on my > mind when talking about these things). and like security, different solutions are appropriate for different situations. there are some types of data and environments where you put lots of protections in place, even if they slow work down, but in other situations that level of protection would not benifit anyone. >>> As I side-note, you will probably see that the disk queue can be >>> optimized. If sufficient effort is made, I think it can perform at least >>> perform faster at a factor of four to six. The reason is that it was >>> never really meant to be used on a busy box in this way. While knowing >>> this, we should not start a new discussion about these optimizations, >>> simply because they take considerable additional time and we can not fit >>> that part into anything we have on our mind for the forseable future. >> >> yeah, I've been thinking of various things that could be done here, but I >> won't ask about any of them for now ;-) > > Oh yes, a broad range. Simple things like zipping the data and keeping > all handles always open to complex things like a dedicated, > random-accesss, database-like disk queue store (being even > preformatted). If you look at the code, you'll possibly notice that the > disk queue system uses stream drivers to persist the data. This would be > the hook to extend. yep, you can also do tricks like allocating 4k for each message, no matter what it's size, to avoid the need to maintain a seperate 'table of contents' that you have to look at and modify when processing a message. > ... but: that's a story for another quarter ;) yep David Lang From david at lang.hm Fri May 8 19:38:55 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 8 May 2009 10:38:55 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test In-Reply-To: <20090508130420.GD23405@it.is.rice.edu> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <20090508130420.GD23405@it.is.rice.edu> Message-ID: On Fri, 8 May 2009, Kenneth Marshall wrote: > On Fri, May 08, 2009 at 01:18:37AM -0700, david at lang.hm wrote: >> On Fri, 8 May 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> >>>> I have a box put togeaterh for a first cut at a speed test of rsyslog in >>>> untra-reliable mode. the outline below is intended to minimize the number >>>> of variables. >>>> >>>> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a >>>> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 >>>> (redhat stock kernel) I intend to format the SSD with ext2 (as the >>>> application is providing data integrity, and to avoid the known >>>> performance problems with ext3 and fsync) >>> >>> Just a question, because I do not know enough about ext2: does ext2 guarantee >>> that when an application does fsync, all data, INCLUDING related file system >>> control structures are written to disk? Or, to phrase it the other way >>> around, can ext2 guarantee that fsync'ed data can always be read after a >>> power failure. I think along the lines of some control structures not being >>> written, thus the fsynced app data may be present on the disk, but cannot be >>> accessed any longer. In the worst case, would it be possible that a whole >>> file be lost during a file system check after reboot? >>> >>> My *uneducated* understanding is that ext3 does guard against this (thus the >>> performance problems) but ext2 does not. >> >> the performance problem with ext3 is that it forces ALL pending writes to >> disk when anything does a fsync >> >> now that you mention it, I think that with all filesystems other than ext2 >> you need to do a fsync on the directory as well as on the file >> >>> If my understanding would be correct (and I don't say so), we would need to >>> use ext3. >> > FYI, > > I think if you use ext3 with data=writeback, you will not have the > flush everything problem. Of course, you will need to precreate the > files. with an application that does fsync, at that point you have no reliability gain compared to ext2, but you still have the overhead of the journal (including the need to write the data twice, once to the journal, once to it's final location) David Lang > Regards, > Ken > >> I'll try both (and later on, when I use by own kernel rather than the >> redhat one I'll also test XFS) >> >> I think that if no other disk activity is taking place ext3 maynot be too >> bad (one other advantage that ext2 would have over ext3 and XFS is that >> journaling filesystems have to write whatever they journal twice (once to >> the journal and once to the final location) >> >>>> for the rsyslog test I am thinking the following >>>> >>>> useing rsyslog 4.1.7 >>>> enable input file >>> >>> Not sure if I got this bullet point right. Do you mean you intend to use >>> imfile for input generation? >> >> yes, that was my intent. just to simplify things by making the test >> completely self contained to the one box. >> >>> In any case, I would suggest to do a test with UDP and one with TCP senders, >>> both sending at maximum rate. With UDP, we would see a message loss rate, >>> while with TCP we would see the actual number of messages that the system can >>> process. So TCP is probably the more meaningful number, but packet loss rate >>> for UDP - a common use case - would also be interesting, at least I think so. >> >> will do. >> >> I will be interested in seeing the UDP loss rate, I suspect that with >> appropriate OS tuning I can get it down to zero loss rate at the data >> rates that the rest of the system maintains (the OS has a buffer prior to >> rsyslog's input process that can cover delays on the input threads) >> >>>> set the main queue mode to disk >>>> enable fsyncs everwhere >>> >>> >>> Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is >>> a *real* performance eater and puts a lot of burden on the consistency of the >>> file system's control structures, thus my question on ext2 vs. ext3 above). >> >> does this do a fsync on the directory. >> >>>> set the output to log *.* to a file >>>> >>>> run a cron job that rolls the log file once a min and sends a HUP to >>>> rsyslog >>>> >>>> create a large file of log information >>>> >>>> run this for a while and then count the number of logs in each rolled log >>>> file. hopefully the number will be reasonably consistant. >>>> >>>> does this sound like a reasonable approach? or is this going to not be >>>> representitive for some reason? >>> >>> With the few comments above, I think this is a very reasonable approach and >>> should provide very good insight. >>> >>> Actually, I hope that it can prove my point that this setup is too slow >>> wrong... >> >> there will definantly be a performance issue at some point here, the >> question is if it's fast enough to be useable. >> >> the drive claims to be able to do >100,000 I/O ops/sec. if we can manage >> to get a few thousand logs/sec written on this, it will be extremely >> usable. >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From ktm at rice.edu Fri May 8 19:54:38 2009 From: ktm at rice.edu (Kenneth Marshall) Date: Fri, 8 May 2009 12:54:38 -0500 Subject: [rsyslog] untra-reliable speed test In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <20090508130420.GD23405@it.is.rice.edu> Message-ID: <20090508175438.GL23405@it.is.rice.edu> On Fri, May 08, 2009 at 10:38:55AM -0700, david at lang.hm wrote: > On Fri, 8 May 2009, Kenneth Marshall wrote: > > > On Fri, May 08, 2009 at 01:18:37AM -0700, david at lang.hm wrote: > >> On Fri, 8 May 2009, Rainer Gerhards wrote: > >> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>> > >>>> I have a box put togeaterh for a first cut at a speed test of rsyslog in > >>>> untra-reliable mode. the outline below is intended to minimize the number > >>>> of variables. > >>>> > >>>> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a > >>>> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53 > >>>> (redhat stock kernel) I intend to format the SSD with ext2 (as the > >>>> application is providing data integrity, and to avoid the known > >>>> performance problems with ext3 and fsync) > >>> > >>> Just a question, because I do not know enough about ext2: does ext2 guarantee > >>> that when an application does fsync, all data, INCLUDING related file system > >>> control structures are written to disk? Or, to phrase it the other way > >>> around, can ext2 guarantee that fsync'ed data can always be read after a > >>> power failure. I think along the lines of some control structures not being > >>> written, thus the fsynced app data may be present on the disk, but cannot be > >>> accessed any longer. In the worst case, would it be possible that a whole > >>> file be lost during a file system check after reboot? > >>> > >>> My *uneducated* understanding is that ext3 does guard against this (thus the > >>> performance problems) but ext2 does not. > >> > >> the performance problem with ext3 is that it forces ALL pending writes to > >> disk when anything does a fsync > >> > >> now that you mention it, I think that with all filesystems other than ext2 > >> you need to do a fsync on the directory as well as on the file > >> > >>> If my understanding would be correct (and I don't say so), we would need to > >>> use ext3. > >> > > FYI, > > > > I think if you use ext3 with data=writeback, you will not have the > > flush everything problem. Of course, you will need to precreate the > > files. > > with an application that does fsync, at that point you have no reliability > gain compared to ext2, but you still have the overhead of the journal > (including the need to write the data twice, once to the journal, once to > it's final location) > > David Lang > I thought that with data=writeback only meta-data is committed to the journal, not the file data. Ken > > Regards, > > Ken > > > >> I'll try both (and later on, when I use by own kernel rather than the > >> redhat one I'll also test XFS) > >> > >> I think that if no other disk activity is taking place ext3 maynot be too > >> bad (one other advantage that ext2 would have over ext3 and XFS is that > >> journaling filesystems have to write whatever they journal twice (once to > >> the journal and once to the final location) > >> > >>>> for the rsyslog test I am thinking the following > >>>> > >>>> useing rsyslog 4.1.7 > >>>> enable input file > >>> > >>> Not sure if I got this bullet point right. Do you mean you intend to use > >>> imfile for input generation? > >> > >> yes, that was my intent. just to simplify things by making the test > >> completely self contained to the one box. > >> > >>> In any case, I would suggest to do a test with UDP and one with TCP senders, > >>> both sending at maximum rate. With UDP, we would see a message loss rate, > >>> while with TCP we would see the actual number of messages that the system can > >>> process. So TCP is probably the more meaningful number, but packet loss rate > >>> for UDP - a common use case - would also be interesting, at least I think so. > >> > >> will do. > >> > >> I will be interested in seeing the UDP loss rate, I suspect that with > >> appropriate OS tuning I can get it down to zero loss rate at the data > >> rates that the rest of the system maintains (the OS has a buffer prior to > >> rsyslog's input process that can cover delays on the input threads) > >> > >>>> set the main queue mode to disk > >>>> enable fsyncs everwhere > >>> > >>> > >>> Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is > >>> a *real* performance eater and puts a lot of burden on the consistency of the > >>> file system's control structures, thus my question on ext2 vs. ext3 above). > >> > >> does this do a fsync on the directory. > >> > >>>> set the output to log *.* to a file > >>>> > >>>> run a cron job that rolls the log file once a min and sends a HUP to > >>>> rsyslog > >>>> > >>>> create a large file of log information > >>>> > >>>> run this for a while and then count the number of logs in each rolled log > >>>> file. hopefully the number will be reasonably consistant. > >>>> > >>>> does this sound like a reasonable approach? or is this going to not be > >>>> representitive for some reason? > >>> > >>> With the few comments above, I think this is a very reasonable approach and > >>> should provide very good insight. > >>> > >>> Actually, I hope that it can prove my point that this setup is too slow > >>> wrong... > >> > >> there will definantly be a performance issue at some point here, the > >> question is if it's fast enough to be useable. > >> > >> the drive claims to be able to do >100,000 I/O ops/sec. if we can manage > >> to get a few thousand logs/sec written on this, it will be extremely > >> usable. > >> > >> David Lang > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > >> > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Mon May 11 09:34:23 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 11 May 2009 09:34:23 +0200 Subject: [rsyslog] ultra-reliable speed test References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com><1241771663.25612.139.camel@rf10up.intern.adiscon.com><1241782088.25612.188.camel@rf10up.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B065@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, May 08, 2009 7:38 PM > To: rsyslog-users > Subject: Re: [rsyslog] ultra-reliable speed test > > > However, I also know that for regulatory requirements, you often seem to > > need to prove that a system may not lose messages once it has received > > them, even at the cost of an overall increased probability of message > > loss. > > it's a bit more than that. > > In my case I have two completely different use-cases, and will almost > certinly end up running two different sets of rsyslog (potentially on > different sets of servers) OK, that is the ultimate explanation. Due to our lengthy discussions about performance, I was so preoccupied with performance that I did not realize that you talk about a very different use case. More importantly, I did not see that you talk about using only reliable transports (or, in other words: no standard syslog at all). From that perspective, everything makes perfectly sense to me, too. I'd still be interested in the performance numbers (though they are no longer needed to convince me this is a valid use case ;)). Just to verify: this is a use case that you cannot build with e.g. syslog-ng, as it does not speak any truly reliable logging protocol. Actually, you need audit-grade protocols, and then an audit-grade core engine makes sense. Did I get you right this time? Rainer > > case #1 > > 'normal system syslogs' > 99.9% reliability (easy to achieve with UDP) is easily good enough. > the sender is normal software that knows nothing about rsyslog > high volume, mostly junk > > 'logs of record' > the application is modified to do application level acknowledgements > (relp or similar), and the system must be architected to not loose logs > once they are acknowledged short of a disaster that physically destroys > equipment (storage drives must be redundant so that a drive failure does > not loose logs) > low volume, every log is critical. From david at lang.hm Mon May 11 12:17:17 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 11 May 2009 03:17:17 -0700 (PDT) Subject: [rsyslog] ultra-reliable speed test In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B065@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com><1241771663.25612.139.camel@rf10up.intern.adiscon.com><1241782088.25612.188.camel@rf10up.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B065@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 11 May 2009, Rainer Gerhards wrote: >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >>> However, I also know that for regulatory requirements, you often seem to >>> need to prove that a system may not lose messages once it has received >>> them, even at the cost of an overall increased probability of message >>> loss. >> >> it's a bit more than that. >> >> In my case I have two completely different use-cases, and will almost >> certinly end up running two different sets of rsyslog (potentially on >> different sets of servers) > > OK, that is the ultimate explanation. Due to our lengthy discussions about > performance, I was so preoccupied with performance that I did not realize > that you talk about a very different use case. More importantly, I did not > see that you talk about using only reliable transports (or, in other words: > no standard syslog at all). From that perspective, everything makes perfectly > sense to me, too. > > I'd still be interested in the performance numbers (though they are no longer > needed to convince me this is a valid use case ;)). > > Just to verify: this is a use case that you cannot build with e.g. syslog-ng, > as it does not speak any truly reliable logging protocol. Actually, you need > audit-grade protocols, and then an audit-grade core engine makes sense. > > Did I get you right this time? exactly. David Lang > Rainer > >> >> case #1 >> >> 'normal system syslogs' >> 99.9% reliability (easy to achieve with UDP) is easily good enough. >> the sender is normal software that knows nothing about rsyslog >> high volume, mostly junk >> >> 'logs of record' >> the application is modified to do application level acknowledgements >> (relp or similar), and the system must be architected to not loose logs >> once they are acknowledged short of a disaster that physically destroys >> equipment (storage drives must be redundant so that a drive failure does >> not loose logs) >> low volume, every log is critical. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Mon May 11 18:03:07 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 11 May 2009 18:03:07 +0200 Subject: [rsyslog] rsyslog configuration graphs Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B06C@GRFEXC.intern.adiscon.com> Hi all, I spent a bit more time than I expected today on the ability to generate configuration graphs. But I think it was well-invested time, hopefully for others, too. The full story plus some examples are in my blog: http://blog.gerhards.net/2009/05/rsyslog-configuration-graphs.html This will be available starting with 4.3.1, which I hope to release soon. Rainer From tmetro+rsyslog at gmail.com Mon May 11 23:52:19 2009 From: tmetro+rsyslog at gmail.com (Tom Metro) Date: Mon, 11 May 2009 17:52:19 -0400 Subject: [rsyslog] rsyslog configuration graphs In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B06C@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B06C@GRFEXC.intern.adiscon.com> Message-ID: <4A089E13.5080808@gmail.com> Rainer Gerhards wrote: > ...the ability to generate configuration graphs. This would be another use case for a DBus interface. Instead of: http://www.rsyslog.com/doc-rsconf1_generateconfiggraph.html If [$GenerateConfigGraph] is specified, a graph is created. This happens both during a regular startup as well a config check run. It is recommended to include this directive only for documentation purposes and remove it from a production configuraton. you'd have another process connect to rsyslog's DBus interface and request a graph on demand, and eliminate the need to temporarily add a directive. In addition: The drawback, of course, is that you need to run Graphviz once you have generated the control file... this would permit having a script that grabbed the graph data and presented it with Graphviz with just one user command. Just something to consider, seeing as you were already thinking of maybe adding a DBus interface some day. -Tom From david at lang.hm Tue May 12 22:23:00 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 12 May 2009 13:23:00 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test In-Reply-To: <1241771663.25612.139.camel@rf10up.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702B053@GRFEXC.intern.adiscon.com> <1241771663.25612.139.camel@rf10up.intern.adiscon.com> Message-ID: I've completed my first round of testing this is a a fusionio SSD card with a 8-core opteron system, 8G ram running debian Lenny (debian 5) 2.6.26 kernel rsyslog.conf $ModLoad imuxsock # provides support for local system logging $ModLoad imklog # provides kernel logging support (previously done by rklogd) $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $WorkDirectory /logs $HUPisRestart off $MainMsgQueueCheckpointInterval 1 $MainMsgQueueFilename mainq $MainMsgQueueType disk $OptimizeForUniprocessor off #$ActionfileEnableSync on #$ActionQueueCheckpointInterval 1 #$ActionQueueFileName queue1 #$ActionQueueType disk *.* /logs/messages;RSYSLOG_TraditionalFileFormat input provided by cat largefile | logger I did tests with and without the action queue stuff enabled the results were not quite what I expected, but interesting xfs w/ actionqueue 1200/sec xfs 2000/sec ext3 w/actionqueue 2000/sec ext3 4600/sec ext4 w/actionqueue 2000/sec ext4 4000/sec ext2 w/actionqueue 5300/sec ext2 7400/sec note that with ext2 I don't think the input could keep up (there were not multiple queue files the way there were for all the others), when I shifted to infile as input the ext2 rate increased to ~7800/sec, and the cpu utilization dropped by 50-70% I have not yet tried anything with multiple worker threads. I captured some strace files and have posted them at http://rsyslog.lang.hm/rsyslog David Lang From rgerhards at hq.adiscon.com Tue May 12 22:32:57 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 12 May 2009 22:32:57 +0200 Subject: [rsyslog] untra-reliable speed test Message-ID: <000a01c9d341$146819a7$100013ac@intern.adiscon.com> Interesting, will look at details tomorrow... With disk queues, there is always only a single queue worker (the disk queue is purely sequential). rainer ----- Urspr?ngliche Nachricht ----- Von: "david at lang.hm" An: "rsyslog-users" Gesendet: 12.05.09 22:23 Betreff: Re: [rsyslog] untra-reliable speed test I've completed my first round of testing this is a a fusionio SSD card with a 8-core opteron system, 8G ram running debian Lenny (debian 5) 2.6.26 kernel rsyslog.conf $ModLoad imuxsock # provides support for local system logging $ModLoad imklog # provides kernel logging support (previously done by rklogd) $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $WorkDirectory /logs $HUPisRestart off $MainMsgQueueCheckpointInterval 1 $MainMsgQueueFilename mainq $MainMsgQueueType disk $OptimizeForUniprocessor off #$ActionfileEnableSync on #$ActionQueueCheckpointInterval 1 #$ActionQueueFileName queue1 #$ActionQueueType disk *.* /logs/messages;RSYSLOG_TraditionalFileFormat input provided by cat largefile | logger I did tests with and without the action queue stuff enabled the results were not quite what I expected, but interesting xfs w/ actionqueue 1200/sec xfs 2000/sec ext3 w/actionqueue 2000/sec ext3 4600/sec ext4 w/actionqueue 2000/sec ext4 4000/sec ext2 w/actionqueue 5300/sec ext2 7400/sec note that with ext2 I don't think the input could keep up (there were not multiple queue files the way there were for all the others), when I shifted to infile as input the ext2 rate increased to ~7800/sec, and the cpu utilization dropped by 50-70% I have not yet tried anything with multiple worker threads. I captured some strace files and have posted them at http://rsyslog.lang.hm/rsyslog David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From david at lang.hm Tue May 12 23:05:03 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 12 May 2009 14:05:03 -0700 (PDT) Subject: [rsyslog] untra-reliable speed test In-Reply-To: <000a01c9d341$146819a7$100013ac@intern.adiscon.com> References: <000a01c9d341$146819a7$100013ac@intern.adiscon.com> Message-ID: On Tue, 12 May 2009, Rainer Gerhards wrote: > Interesting, will look at details tomorrow... the strace results show a lot of lock contention > With disk queues, there is always only a single queue worker (the disk queue is purely sequential). interesting, the OS can do enough in parallel that it may be worth looking into this if we ever go in the direction of optimizing this mode. this was all with rsyslog 4.1.7 David Lang > rainer > > ----- Urspr?ngliche Nachricht ----- > Von: "david at lang.hm" > An: "rsyslog-users" > Gesendet: 12.05.09 22:23 > Betreff: Re: [rsyslog] untra-reliable speed test > > I've completed my first round of testing > > this is a a fusionio SSD card with a 8-core opteron system, 8G ram > > running debian Lenny (debian 5) 2.6.26 kernel > > rsyslog.conf > > $ModLoad imuxsock # provides support for local system logging > $ModLoad imklog # provides kernel logging support (previously done by rklogd) > $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat > $WorkDirectory /logs > $HUPisRestart off > $MainMsgQueueCheckpointInterval 1 > $MainMsgQueueFilename mainq > $MainMsgQueueType disk > $OptimizeForUniprocessor off > > #$ActionfileEnableSync on > #$ActionQueueCheckpointInterval 1 > #$ActionQueueFileName queue1 > #$ActionQueueType disk > *.* /logs/messages;RSYSLOG_TraditionalFileFormat > > > input provided by cat largefile | logger > > > I did tests with and without the action queue stuff enabled > > the results were not quite what I expected, but interesting > > xfs w/ actionqueue 1200/sec > xfs 2000/sec > ext3 w/actionqueue 2000/sec > ext3 4600/sec > ext4 w/actionqueue 2000/sec > ext4 4000/sec > ext2 w/actionqueue 5300/sec > ext2 7400/sec > > note that with ext2 I don't think the input could keep up (there were not > multiple queue files the way there were for all the others), when I > shifted to infile as input the ext2 rate increased to ~7800/sec, and the > cpu utilization dropped by 50-70% > > I have not yet tried anything with multiple worker threads. > > I captured some strace files and have posted them at > http://rsyslog.lang.hm/rsyslog > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Wed May 13 06:40:42 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 13 May 2009 06:40:42 +0200 Subject: [rsyslog] untra-reliable speed test Message-ID: <000b01c9d385$36f1783e$100013ac@intern.adiscon.com> Could you give it a try without the checkpoint interval? That should make a difference. More answers when i am at a real machine ;) ----- Urspr?ngliche Nachricht ----- Von: "david at lang.hm" An: "rsyslog-users" Gesendet: 12.05.09 23:05 Betreff: Re: [rsyslog] untra-reliable speed test On Tue, 12 May 2009, Rainer Gerhards wrote: > Interesting, will look at details tomorrow... the strace results show a lot of lock contention > With disk queues, there is always only a single queue worker (the disk queue is purely sequential). interesting, the OS can do enough in parallel that it may be worth looking into this if we ever go in the direction of optimizing this mode. this was all with rsyslog 4.1.7 David Lang > rainer > > ----- Urspr?ngliche Nachricht ----- > Von: "david at lang.hm" > An: "rsyslog-users" > Gesendet: 12.05.09 22:23 > Betreff: Re: [rsyslog] untra-reliable speed test > > I've completed my first round of testing > > this is a a fusionio SSD card with a 8-core opteron system, 8G ram > > running debian Lenny (debian 5) 2.6.26 kernel > > rsyslog.conf > > $ModLoad imuxsock # provides support for local system logging > $ModLoad imklog # provides kernel logging support (previously done by rklogd) > $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat > $WorkDirectory /logs > $HUPisRestart off > $MainMsgQueueCheckpointInterval 1 > $MainMsgQueueFilename mainq > $MainMsgQueueType disk > $OptimizeForUniprocessor off > > #$ActionfileEnableSync on > #$ActionQueueCheckpointInterval 1 > #$ActionQueueFileName queue1 > #$ActionQueueType disk > *.* /logs/messages;RSYSLOG_TraditionalFileFormat > > > input provided by cat largefile | logger > > > I did tests with and without the action queue stuff enabled > > the results were not quite what I expected, but interesting > > xfs w/ actionqueue 1200/sec > xfs 2000/sec > ext3 w/actionqueue 2000/sec > ext3 4600/sec > ext4 w/actionqueue 2000/sec > ext4 4000/sec > ext2 w/actionqueue 5300/sec > ext2 7400/sec > > note that with ext2 I don't think the input could keep up (there were not > multiple queue files the way there were for all the others), when I > shifted to infile as input the ext2 rate increased to ~7800/sec, and the > cpu utilization dropped by 50-70% > > I have not yet tried anything with multiple worker threads. > > I captured some strace files and have posted them at > http://rsyslog.lang.hm/rsyslog > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From rgerhards at hq.adiscon.com Wed May 13 10:56:41 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 13 May 2009 10:56:41 +0200 Subject: [rsyslog] untra-reliable speed test In-Reply-To: References: <000a01c9d341$146819a7$100013ac@intern.adiscon.com> Message-ID: <1242205001.25612.254.camel@rf10up.intern.adiscon.com> On Tue, 2009-05-12 at 14:05 -0700, david at lang.hm wrote: > On Tue, 12 May 2009, Rainer Gerhards wrote: > > > Interesting, will look at details tomorrow... > > the strace results show a lot of lock contention I've now taken a closer look and everything looks pretty much like what I expected. The lock contention in this situation is not a bad sign: they show that the locks are actually utilized to synchronize producer and consumer. Think about this (simplified) scenario: input -> queue -> output Let's say the processing time is the cost we incur. If we look at it, the queue's cost dominates by far the combined cost of input and output. In most cases, it dominates input+output cost so much, that you can express the total cost as just the cost of the queue operation, without looking at anything else. So the input needs to wait until the queue is ready to accept a new message. Once it has done so, the output is notified and immediately acquires the queue lock and begins the dequeue operation. At the same time, the input has already finished input processing (as I said, this happens in virtually "no time" compared to the queue operation). So it needs to wait for the queue lock. Once the dequeue operation is finished, the output releases the lock, and processes the message in virtually no time, too. The input acquired the queue lock, and the whole story begins right from the start. A small queue may build up depending on the OS scheduler, but I think most often, input and output will just wait for the queue to complete. In that sense, this mode is similar to DIRECT mode, except that a queue can build up when the action needs to be retried. > > > With disk queues, there is always only a single queue worker (the disk queue is purely sequential). > > interesting, the OS can do enough in parallel that it may be worth looking > into this if we ever go in the direction of optimizing this mode. > If we optimize it, the best thing to do is a totally new queue storage driver for such cases. Sequential files do not really work well if we have multiple producers running. This is a major effort and even then we need to think about the implications I raised in regard to processing cost above. I have thought a bit more about the situation. First of all, rsyslog was never designed for this use case (preserve every message EVEN in case of sudden power fail). When I introduced purely disk-based queues, this was done to support disk-assisted mode. I needed a queue type to permit me store things on disk, if we run out of memory. As a "side-effect", a pure disk mode was available also (I'd never implemented it for the sake of itself). As it was there, I decided to expose this mode and made it user-configurable. I thought (probably correct) that it could solve some need - a need that I'd consider "very exotic" (think about the reliance on a audit-grade protocol for this to really make sense). And I added the checkpoint capability because it seemed useful, even with disk-based queues, which could be guarded from total loss of messages by using a reasonable checkpoint interval. Again, a checkpoint interval of one is permitted just because this capability came "for free" and could be handy in some use cases. The kiosk example we discussed last year (?) on the mailing list looked like a good match for such an exotic environment. Sudden power loss was an option, and we had low traffic volume. Bingo, perfect match. However, I'd never thought about a reasonable high-volume system using disk-only queues. Think about the cost functions, such a system boils down to a DIRECT mode queue which just takes an exceptional lot of time for processing messages. So probably the best approach for this situation would be to run the queue actually in direct mode. That removes the overwhelming cost of queue operations. Direct mode also ensures that the input receives an ack from the output [but there may be subtle issues which I need to check to make sure this is always the case, so do not take this for granted - but if it is not yet so, this should not be too complex to change]. With this approach, we have two issues left: a) the output action may be so slow, that it actually is the dominating cost factor and not disk queue operation b) the output action may block for an extended period of time (e.g. during a retry) In case a), a disk-queue makes sense, because it's cost is irrelevant in this scenario. Indeed, it is irrelevant under all circumstances. As such, we can configure a disk-only action queue in that case. Note that this implies a *very* slow output. Case b) is more complicated. We do NOT have any proper way to address it with current code. The solution IMHO is to introduce a new queue mode "Disk Queue on Delay" which starts an ultra-reliable disk queue (preferably with a faster queue store driver) if and only if the action indicates that it will need extended processing time. This requires some changes to action processing, but the action state machine should be capable to handle that with relatively slight modification [again, an educated guess, not a guarantee]). In that scenario, we run the action immediately whenever possible. Only if that take the (considerable) extra effort of buffering messages into a much-slower on disk queue. Note that such a mode makes only sense with audit-grade protocols and senders (which hold processing until the ACK has been received). As such, a busy system automatically slows down to the rate that the queue writer can handle. In this sense, the overall system (e.g. a financial trading system!) may be slowed down by the unavailability of a failing output (which in turn causes the extra and very high cost of disk queue operations). It needs to be considered if that is an acceptable price. The faster an ultra-reliable queue disk store driver performs, the more cases we can handle in the spirit of a) above. In theory, this can lead to elimination of b) cases. Nevertheless, I hope I have shown that re-designing the queue (drivers) to support high throughput AND ultra-reliable operations AT THE SAME TIME is far from being a trivial task. To do it right, it involves some other changes too. I'll have that rough picture on my mind when I work on the queue. I hope this clarifies. Rainer > this was all with rsyslog 4.1.7 > > David Lang > > > rainer > > > > ----- Urspr?ngliche Nachricht ----- > > Von: "david at lang.hm" > > An: "rsyslog-users" > > Gesendet: 12.05.09 22:23 > > Betreff: Re: [rsyslog] untra-reliable speed test > > > > I've completed my first round of testing > > > > this is a a fusionio SSD card with a 8-core opteron system, 8G ram > > > > running debian Lenny (debian 5) 2.6.26 kernel > > > > rsyslog.conf > > > > $ModLoad imuxsock # provides support for local system logging > > $ModLoad imklog # provides kernel logging support (previously done by rklogd) > > $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat > > $WorkDirectory /logs > > $HUPisRestart off > > $MainMsgQueueCheckpointInterval 1 > > $MainMsgQueueFilename mainq > > $MainMsgQueueType disk > > $OptimizeForUniprocessor off > > > > #$ActionfileEnableSync on > > #$ActionQueueCheckpointInterval 1 > > #$ActionQueueFileName queue1 > > #$ActionQueueType disk > > *.* /logs/messages;RSYSLOG_TraditionalFileFormat > > > > > > input provided by cat largefile | logger > > > > > > I did tests with and without the action queue stuff enabled > > > > the results were not quite what I expected, but interesting > > > > xfs w/ actionqueue 1200/sec > > xfs 2000/sec > > ext3 w/actionqueue 2000/sec > > ext3 4600/sec > > ext4 w/actionqueue 2000/sec > > ext4 4000/sec > > ext2 w/actionqueue 5300/sec > > ext2 7400/sec > > > > note that with ext2 I don't think the input could keep up (there were not > > multiple queue files the way there were for all the others), when I > > shifted to infile as input the ext2 rate increased to ~7800/sec, and the > > cpu utilization dropped by 50-70% > > > > I have not yet tried anything with multiple worker threads. > > > > I captured some strace files and have posted them at > > http://rsyslog.lang.hm/rsyslog > > > > David Lang > > > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From liangjun at osslab.org Wed May 13 12:23:51 2009 From: liangjun at osslab.org (liangjun) Date: Wed, 13 May 2009 18:23:51 +0800 Subject: [rsyslog] about "Property Replacer"!! Message-ID: <4A0A9FB7.1040406@osslab.org> hello! about The Property Replacer i have some problem: *extraction can be done based on so-called "fields",* this is a example about fields! %msg% is " DROP_url_www.sina.com.cn:IN=eth1 OUT=eth0 SRC=192.168.10.78 DST=61.172.201.194 LEN=1182 TOS=0x00 PREC=0x00 TTL=63 ID=14368 DF PROTO=TCP SPT=33343 DPT=80 WINDOW=92 RES=0x00 ACK PSH URGP=0" %msg:F,32:2% is "DROP_url_www.sina.co" ,is not "DROP_url_www.sina.com.cn:IN=eth1" ,so why? and i do some another test .and the fields always is 20 characters! From rgerhards at hq.adiscon.com Wed May 13 12:43:10 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 13 May 2009 12:43:10 +0200 Subject: [rsyslog] about "Property Replacer"!! References: <4A0A9FB7.1040406@osslab.org> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com> Hi, let me explain. You tell rsyslog to use ASCII-SP " ", code 32, as a delimiter. Now looking at the message, it starts with a space (almost all RFC3164-messages do because of the definitions in RFC3164). So field 1 is an empty field, and field 2 is what you actually get. It is delimited by another space. Hope this helps, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of liangjun > Sent: Wednesday, May 13, 2009 12:24 PM > To: rsyslog at lists.adiscon.com > Subject: [rsyslog] about "Property Replacer"!! > > hello! > about The Property Replacer i have some problem: > *extraction can be done based on so-called "fields",* > this is a example about fields! > %msg% is " DROP_url_www.sina.com.cn:IN=eth1 OUT=eth0 SRC=192.168.10.78 > DST=61.172.201.194 LEN=1182 TOS=0x00 PREC=0x00 TTL=63 ID=14368 DF > PROTO=TCP SPT=33343 DPT=80 WINDOW=92 RES=0x00 ACK PSH URGP=0" > %msg:F,32:2% is "DROP_url_www.sina.co" ,is not > "DROP_url_www.sina.com.cn:IN=eth1" ,so why? > and i do some another test .and the fields always is 20 characters! > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From liangjun at osslab.org Wed May 13 13:08:55 2009 From: liangjun at osslab.org (liangjun) Date: Wed, 13 May 2009 19:08:55 +0800 Subject: [rsyslog] about "Property Replacer"!! In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com> References: <4A0A9FB7.1040406@osslab.org> <9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com> Message-ID: <4A0AAA47.7010904@osslab.org> thank you reply! yes., you are right. but %msg:F,32:2% is "DROP_url_www.sina.co" ,is not "DROP_url_www.sina.com.cn:IN=eth1" . why? > Hi, > > let me explain. You tell rsyslog to use ASCII-SP " ", code 32, as a > delimiter. Now looking at the message, it starts with a space (almost all > RFC3164-messages do because of the definitions in RFC3164). So field 1 is an > empty field, and field 2 is what you actually get. It is delimited by another > space. > > Hope this helps, > Rainer > > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of liangjun >> Sent: Wednesday, May 13, 2009 12:24 PM >> To: rsyslog at lists.adiscon.com >> Subject: [rsyslog] about "Property Replacer"!! >> >> hello! >> about The Property Replacer i have some problem: >> *extraction can be done based on so-called "fields",* >> this is a example about fields! >> %msg% is " DROP_url_www.sina.com.cn:IN=eth1 OUT=eth0 SRC=192.168.10.78 >> DST=61.172.201.194 LEN=1182 TOS=0x00 PREC=0x00 TTL=63 ID=14368 DF >> PROTO=TCP SPT=33343 DPT=80 WINDOW=92 RES=0x00 ACK PSH URGP=0" >> %msg:F,32:2% is "DROP_url_www.sina.co" ,is not >> "DROP_url_www.sina.com.cn:IN=eth1" ,so why? >> and i do some another test .and the fields always is 20 characters! >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed May 13 13:53:52 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 13 May 2009 13:53:52 +0200 Subject: [rsyslog] about "Property Replacer"!! References: <4A0A9FB7.1040406@osslab.org><9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com> <4A0AAA47.7010904@osslab.org> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B08D@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of liangjun > Sent: Wednesday, May 13, 2009 1:09 PM > To: rsyslog-users > Subject: Re: [rsyslog] about "Property Replacer"!! > > thank you reply! > yes., you are right. > but %msg:F,32:2% is "DROP_url_www.sina.co" ,is not > "DROP_url_www.sina.com.cn:IN=eth1" . why? Oh, I had overlooked this. Sounds like a bug, let me check... From rgerhards at hq.adiscon.com Wed May 13 14:11:30 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 13 May 2009 14:11:30 +0200 Subject: [rsyslog] about "Property Replacer"!! References: <4A0A9FB7.1040406@osslab.org><9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com><4A0AAA47.7010904@osslab.org> <9B6E2A8877C38245BFB15CC491A11DA702B08D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702B090@GRFEXC.intern.adiscon.com> I have just checked, but I do not see this issue. Can you please post your complete configuration file. Also, please let me know which version of rsyslog you are using. And if it is not the latest of the branch you are using, please upgrade to that. Thanks, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Wednesday, May 13, 2009 1:54 PM > To: rsyslog-users > Subject: Re: [rsyslog] about "Property Replacer"!! > > > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of liangjun > > Sent: Wednesday, May 13, 2009 1:09 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] about "Property Replacer"!! > > > > thank you reply! > > yes., you are right. > > but %msg:F,32:2% is "DROP_url_www.sina.co" ,is not > > "DROP_url_www.sina.com.cn:IN=eth1" . why? > > Oh, I had overlooked this. Sounds like a bug, let me check... > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From liangjun at osslab.org Wed May 13 14:24:45 2009 From: liangjun at osslab.org (liangjun) Date: Wed, 13 May 2009 20:24:45 +0800 Subject: [rsyslog] about "Property Replacer"!! In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702B090@GRFEXC.intern.adiscon.com> References: <4A0A9FB7.1040406@osslab.org><9B6E2A8877C38245BFB15CC491A11DA702B08C@GRFEXC.intern.adiscon.com><4A0AAA47.7010904@osslab.org> <9B6E2A8877C38245BFB15CC491A11DA702B08D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702B090@GRFEXC.intern.adiscon.com> Message-ID: <4A0ABC0D.3090801@osslab.org> %msg:F,32:2% is "DROP_url_www.sina.co" ,is not "DROP_url_www.sina.com.cn:IN=eth1" , and i do some another test and i find %msg:F,32:2% always is 20 characters! # rsyslogd -v rsyslogd 3.22.0, compiled with: FEATURE_REGEXP: Yes FEATURE_LARGEFILE: Yes FEATURE_NETZIP (message compression): Yes GSSAPI Kerberos 5 support: No FEATURE_DEBUG (debug build, slow code): No Atomic operations supported: Yes Runtime Instrumentation (slow code): No /etc/rsyslog.conf --------------------------------------------------------------- $ModLoad ommysql # To use the database functionality, MySQL must be enabled in the config file BEFORE the first database table action is used. $ModLoad immark.so # provides --MARK-- message capability $ModLoad imuxsock # provides support for local system logging $ModLoad imklog # provides kernel logging support (previously done by rklogd) #$ModLoad immark # provides --MARK-- message capability $ModLoad imudp $UDPServerRun 514 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat # # Set the default permissions for all log files. # $FileOwner root $FileGroup adm $FileCreateMode 0640 $DirCreateMode 0755 # # Include all config files in /etc/rsyslog.d/ # $IncludeConfig /etc/rsyslog.d/*.conf ############### #### RULES #### ############### # # First some standard log files. Log by facility. # auth,authpriv.* /var/log/auth.log *.*;auth,authpriv.none -/var/log/syslog #cron.* /var/log/cron.log daemon.* -/var/log/daemon.log kern.* -/var/log/kern.log lpr.* -/var/log/lpr.log mail.* -/var/log/mail.log user.* -/var/log/user.log # # Logging for the mail system. Split it up so that # it is easy to write scripts to parse these files. # mail.info -/var/log/mail.info mail.warn -/va