From Luis.Fernando.Munoz.Mejias at cern.ch Wed Apr 1 18:02:14 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?utf-8?q?Mu=C3=B1oz_Mej=C3=ADas?=) Date: Wed, 1 Apr 2009 18:02:14 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations Message-ID: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> Hello, world. I discussed this in private with Rainer, and he suggested me to bring the discussion here. I'm already developing an output module for feeding an Oracle database with rsyslog input. Rainer already committed some patches to the "oracle" branch, in git. Let me remember that this is highly experimental, and I'm sending a big semantic change today. But, in principle, the module does what you'd expect from it: it connects to a DB, receives a SQL statement via doAction, prepares that statement, runs it, commits. It works, but it's way too slow for my needs. As I said when I started this project, I need to be very fast, to prepare the statement at connection time, run it many times, and definitely want batch operations. Say, I want to insert 1000 entries with a single call to the Oracle interface, then commit. With what I know now of rsyslog, I can do it more or less like this: $OmoracleStatementTemplate,"insert into foo(field1, field2, field3) values(:val1, :val2, :val3)" which is the statement to prepare by Oracle. This way, I can prepare the statement at createInstance() time. Then, I can specify the batch size with something like $OmoracleBatchSize 1000 With this, also at createInstance() time I can specify that doAction is called only if there are 1000 entries pending for this selector, like this: CODE_STD_STRING_REQUESTparseSelectorAct(batch_size); The bad part is that rsyslog will deliver to the output module a single string per entry. So, I'd have to split each entry into its fields as part of the doAction() code. I'd need some funny separator for each field, to avoid problems. So far, it can be done. But the configuration would look like this: $OmoracleDB logdb $OmoracleDBUser dbuser $OmoracleDBPassword dbpassword $OmoracleStatement "insert into foo(col1, col2) values (:fied1, :field2)" $OmoracleBatchSize 1000 $OmoracleFieldSeparator **** *.* :omoracle:;"%field1%****%field2%" and make doAction split the fields appropriately. I bet it works. But it's probably too ugly for users. Cleaner ways may need deeper changes into rsyslog's API so that the module gets direct access to each field. That's probably a lot of work and I can't wait for that. So, my questions (at last!): Are there any other alternatives? Is this "ugly" way of working good for other users? Should I keep it for internal use? Thanks a lot. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From rgerhards at hq.adiscon.com Wed Apr 1 18:54:05 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 1 Apr 2009 18:54:05 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > Sent: Wednesday, April 01, 2009 6:02 PM > To: rsyslog-users > Subject: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > > Hello, world. > > I discussed this in private with Rainer, and he suggested me to bring > the discussion here. > > I'm already developing an output module for feeding an Oracle database > with rsyslog input. Rainer already committed some patches to the > "oracle" branch, in git. Let me remember that this is highly > experimental, and I'm sending a big semantic change today. But, in > principle, the module does what you'd expect from it: it connects to a > DB, receives a SQL statement via doAction, prepares that statement, > runs > it, commits. If I didn't screw up, everything should be committed now. > It works, but it's way too slow for my needs. As I said when I started > this project, I need to be very fast, to prepare the statement at > connection time, run it many times, and definitely want batch > operations. Say, I want to insert 1000 entries with a single call to > the > Oracle interface, then commit. > > With what I know now of rsyslog, I can do it more or less like this: > > $OmoracleStatementTemplate,"insert into foo(field1, field2, field3) > values(:val1, :val2, :val3)" > > which is the statement to prepare by Oracle. This way, I can prepare > the > statement at createInstance() time. Then, I can specify the batch size > with something like > > $OmoracleBatchSize 1000 > > With this, also at createInstance() time I can specify that doAction is > called only if there are 1000 entries pending for this selector, like > this: > > CODE_STD_STRING_REQUESTparseSelectorAct(batch_size); > > The bad part is that rsyslog will deliver to the output module a single > string per entry. So, I'd have to split each entry into its fields as > part of the doAction() code. I'd need some funny separator for each > field, to avoid problems. So far, it can be done. But the configuration > would look like this: > > $OmoracleDB logdb > $OmoracleDBUser dbuser > $OmoracleDBPassword dbpassword > $OmoracleStatement "insert into foo(col1, col2) values (:fied1, > :field2)" > $OmoracleBatchSize 1000 > $OmoracleFieldSeparator **** > > *.* :omoracle:;"%field1%****%field2%" > > and make doAction split the fields appropriately. There are a couple of subtleties, but I think it can work. In essence, you need a template that feeds into the values via ($template!) and also a config string for the prepared statement. It's actually not even that hard to do. It may be useful (and of course doable) to enable the property replace to escape special characters, so that, for example, we could use CSV and replace commas by two of them. > > I bet it works. But it's probably too ugly for users. Cleaner ways may > need deeper changes into rsyslog's API so that the module gets direct > access to each field. That's probably a lot of work and I can't wait > for > that. I need to check if there are actually larger changes required. The main reason for this interface initially was security (do not pass to the module the full object). Assuming that I have the object available at the time of the plugin call, I could use a different entry point to pass that data in. If so, that would not be too much effort. Security concerns could be (somewhat) addressed by a config statement which enables such object access for the next action, so one could specifically grant that privilege. What is the overall opinion on this list? Should we look further into that direction? Rainer > > So, my questions (at last!): Are there any other alternatives? Is this > "ugly" way of working good for other users? Should I keep it for > internal use? > > Thanks a lot. > -- > Luis Fernando Mu?oz Mej?as > Luis.Fernando.Munoz.Mejias at cern.ch > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From epiphani at gmail.com Wed Apr 1 20:03:12 2009 From: epiphani at gmail.com (Aaron Wiebe) Date: Wed, 1 Apr 2009 14:03:12 -0400 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations In-Reply-To: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: On Wed, Apr 1, 2009 at 12:02 PM, Luis Fernando Mu?oz Mej?as wrote: > It works, but it's way too slow for my needs. As I said when I started > this project, I need to be very fast, to prepare the statement at > connection time, run it many times, and definitely want batch > operations. Say, I want to insert 1000 entries with a single call to the > Oracle interface, then commit. .. > which is the statement to prepare by Oracle. This way, I can prepare the > statement at createInstance() time. Then, I can specify the batch size > with something like > > $OmoracleBatchSize 1000 Another thing you might want to think about is the idea of using a callback timer, as was outlined for another prospective feature implementation here: http://www.rsyslog.com/Article334.phtml The general idea being, while having a batch size is important, if you don't have some functional timer callback to the output module, you will end up in the situation of not flushing regularly. On lower-traffic outputs, this would reduce the risk of losing a lot of data. So you could have two different mechanisms: - A high-watermark batch commit - A timed commit in the case that high-watermarks aren't met in a certain time period. That way you could commit every.. say, 60 seconds, in the case you haven't hit your high watermark. Just some food for thought. -Aaron From Luis.Fernando.Munoz.Mejias at cern.ch Wed Apr 1 21:20:55 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?utf-8?q?Mu=C3=B1oz_Mej=C3=ADas?=) Date: Wed, 1 Apr 2009 21:20:55 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> Message-ID: <200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> > > I'm already developing an output module for feeding an Oracle > > database with rsyslog input. [...] But, in principle, the module > > does what you'd expect from it: [...] > > If I didn't screw up, everything should be committed now. I just checked. It is. I've also tested the changes you applied and work perfectly. Thanks a lot for the reviews!! :) > There are a couple of subtleties, but I think it can work. In essence, > you need a template that feeds into the values via ($template!) and > also a config string for the prepared statement. It's actually not > even that hard to do. It may be useful (and of course doable) to > enable the property replace to escape special characters, so that, for > example, we could use CSV and replace commas by two of them. Making properties in CSV format is indeed a good idea. > > I bet it works. But it's probably too ugly for users. Cleaner ways > > may need deeper changes into rsyslog's API so that the module gets > > direct access to each field. That's probably a lot of work and I > > can't wait for that. > I need to check if there are actually larger changes required. The > main reason for this interface initially was security (do not pass to > the module the full object). It's a good reason. If it's easy to generate and pass a deep copy of the object (and it's not a performance killer, it shouldn't), we can discuss it. Otherwise, I don't think this is worth the effort. > Assuming that I have the object available > at the time of the plugin call, I could use a different entry point to > pass that data in. If so, that would not be too much effort. Security > concerns could be (somewhat) addressed by a config statement which > enables such object access for the next action, so one could > specifically grant that privilege. I'm not quite sure about this: if two entries request direct access to the same object, one is buggy and modifies it, then the second one can suffer unpredictable consequences. I think it's better to pass a deep copy, free it once the module call returns, and do it only for modules that actually need that new entry point. If such deep copies are expensive, then we are just fine the way we are now. Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From Luis.Fernando.Munoz.Mejias at cern.ch Wed Apr 1 21:23:43 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?iso-8859-1?q?Mu=F1oz_Mej=EDas?=) Date: Wed, 1 Apr 2009 21:23:43 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations In-Reply-To: References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <200904012123.43374.Luis.Fernando.Munoz.Mejias@cern.ch> Aaron, Thanks for your feedback. > Another thing you might want to think about is the idea of using a > callback timer, as was outlined for another prospective feature > implementation here: http://www.rsyslog.com/Article334.phtml I'd like to, indeed. But it's lower priority to me. The volume of the data sources I'm pushing into Oracle is *really* high. Then, if I stick to a single big batch, I can let the core do the actual batch management. Otherwise, I need either to receive things in very small batches that I concatenate or to call the core to dump its internal buffer. Both are doable, but require some work I'd rather not do right now. O:) > The general idea being, while having a batch size is important, if you > don't have some functional timer callback to the output module, you > will end up in the situation of not flushing regularly. On > lower-traffic outputs, this would reduce the risk of losing a lot of > data. So you could have two different mechanisms: > What I'd suggest is to have several batch sizes for different selectors: $OmoracleBatchSize 1000 if(large_volume_expression) then :omoracle:;LargeTemplateName $OmoracleBatchSize 1 if(really_small_volume_expression) then :omoracle:;SmallTemplateName This way, I don't need a timer to communicate with the core, and simplify my code. In the worst case scenario, SSH could suddenly stop working at the entire CERN and I'd lose the last 999 messages. I admit the critical information on why SSH stopped working is on those 999 messages, but for the moment I accept that risk. ;) Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From rgerhards at hq.adiscon.com Thu Apr 2 12:03:33 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 12:03:33 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> <200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> > > There are a couple of subtleties, but I think it can work. In > essence, > > you need a template that feeds into the values via ($template!) and > > also a config string for the prepared statement. It's actually not > > even that hard to do. It may be useful (and of course doable) to > > enable the property replace to escape special characters, so that, > for > > example, we could use CSV and replace commas by two of them. > > Making properties in CSV format is indeed a good idea. I'll add a "csv" option to the property replacer. It will format a field according to RFC 4180: http://tools.ietf.org/html/rfc4180 I will always enclose values in double quotes to keep the code simple (I think also on the parser side). Let me know if you think this approach is useful. > > > > I bet it works. But it's probably too ugly for users. Cleaner ways > > > may need deeper changes into rsyslog's API so that the module gets > > > direct access to each field. That's probably a lot of work and I > > > can't wait for that. > > > I need to check if there are actually larger changes required. The > > main reason for this interface initially was security (do not pass to > > the module the full object). > > It's a good reason. If it's easy to generate and pass a deep copy of > the > object (and it's not a performance killer, it shouldn't), we can > discuss > it. Otherwise, I don't think this is worth the effort. I have done some review of the code. Not in-depth, but I think good enough. I do have the message object available (thankfully, the optimization to store just the strings did not yet take place ;)). Creating a deep copy is not problematic and I'd assume typically bears roughly equivalent cost to creating the template string (which is kind of expensive). There obviously must be a new interface declared for this and your plugin should support both the new and the old interface. It may be OK if you return RS_RET_DISABLED on the first call to the old interface, what means a downlevel engine can not use your plugin (acceptable from my POV). Feature-Wise, however, I think you lose a couple of things. Most importantly, the template processor & property replacer allows you to rewrite parts of the message. If we pass in the plain message object, you lose this ability. So any modifications must be made directly from within the plugin. I'd say that's a big disadvantage. As the scripting engine evolves, we will probably be able to overcome this limitation by permitting modifications to message objects, but that's a long way until we are there... > > Assuming that I have the object available > > at the time of the plugin call, I could use a different entry point > to > > pass that data in. If so, that would not be too much effort. Security > > concerns could be (somewhat) addressed by a config statement which > > enables such object access for the next action, so one could > > specifically grant that privilege. > > I'm not quite sure about this: if two entries request direct access to > the same object, one is buggy and modifies it, then the second one can > suffer unpredictable consequences. I think it's better to pass a deep > copy, free it once the module call returns, and do it only for modules > that actually need that new entry point. If such deep copies are > expensive, then we are just fine the way we are now. Agreed, but I think a deep copy does not address anything. With the template system (in theory but not yet in practice), a plugin can not access any information other than what the users has configured in the template. If the full object is passed, this can not prevented. In practice, today, plugins are loaded in-process and as such can access the whole process space. But there are ideas to create an out-of-process plugin interface for very security sensitive environments. They would be hurt (or require additional configuration) but the "full object access" approach. Feedback is appreciated. Rainer > > Cheers. > > > -- > Luis Fernando Mu?oz Mej?as > Luis.Fernando.Munoz.Mejias at cern.ch > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu Apr 2 12:15:56 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 12:15:56 +0200 Subject: [rsyslog] review request: omprog Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE01@GRFEXC.intern.adiscon.com> Hi all, a forum post made me (again) aware of functionality missing in rsyslog: http://kb.monitorware.com/problem-to-migrate-from-syslog-ng-to-rsyslog-t8982. html That is the execution of a program which receives all log messages passed in via stdin. I have now done a first, rough, implementation of "omprog" which shall provide this feature. I would appreciate if some could quickly review the code, especially lines 97 to 135, where I clean up after fork and before I exec the program. The code can be found here: http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omprog/omprog.c;h=2a07 8a6d862c2230a8caa9a489e08a7b5ea4cb29;hb=refs/heads/omprog#l97 There are three questions that I have: 1. is the method used sufficiently secure? 2. is there a better way to close open file handles 3. am I resetting the sigaction() correctly? Especially #3 puzzles me, because I can not use sigterm to cancel a child (via a different bash). Also, waitpid() always returns -1 and tells me "there are no children". So I am under the impression I am doing something wrong. As I also have limited experience in that area of executing external programs, I'd appreciate advice from those in the know. Feel free to pass this along, if you are able to motivate someone else on this topic ;) Thanks, Rainer From rgerhards at hq.adiscon.com Thu Apr 2 12:55:05 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 12:55:05 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and supportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE02@GRFEXC.intern.adiscon.com> > > Making properties in CSV format is indeed a good idea. > > I'll add a "csv" option to the property replacer. It will format a > field > according to RFC 4180: > > http://tools.ietf.org/html/rfc4180 > > I will always enclose values in double quotes to keep the code simple > (I > think also on the parser side). Let me know if you think this approach > is > useful. Actually, it was trivial and I have just done it. May be useful in other cases, too. The new option is available via the master git branch. I have also merge it into the oracle branch, which I also updated with everything done to master so far. Rainer From rsyslog at lists.bod.org Thu Apr 2 13:44:59 2009 From: rsyslog at lists.bod.org (Paul Chambers) Date: Thu, 02 Apr 2009 04:44:59 -0700 Subject: [rsyslog] review request: omprog In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE01@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AE01@GRFEXC.intern.adiscon.com> Message-ID: <49D4A53B.5050501@lists.bod.org> I only had a quick glance, but have a couple of questions: a) is popen() not suitable for some reason? it's a little less efficient (since it starts a shell to interpret the command line passed in) but your code would be much simpler. b) are you sure you need to close the file handles and reset signal handlers yourself? from the execve() man page: "execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded. The program invoked inherits the calling process's PID, and any open file descriptors that are not set to close-on-exec. Signals pending on the calling process are cleared. Any signals set to be caught by the calling process are reset to their default behaviour. The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL." Sounds like execve is doing both for you. It'd be easy to verify - write a little external program that dumps the open fds and signal handlers when it starts. -- Paul Rainer Gerhards wrote: > Hi all, > > a forum post made me (again) aware of functionality missing in rsyslog: > > http://kb.monitorware.com/problem-to-migrate-from-syslog-ng-to-rsyslog-t8982. > html > > That is the execution of a program which receives all log messages passed in > via stdin. I have now done a first, rough, implementation of "omprog" which > shall provide this feature. > > I would appreciate if some could quickly review the code, especially lines 97 > to 135, where I clean up after fork and before I exec the program. The code > can be found here: > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omprog/omprog.c;h=2a07 > 8a6d862c2230a8caa9a489e08a7b5ea4cb29;hb=refs/heads/omprog#l97 > > There are three questions that I have: > > 1. is the method used sufficiently secure? > 2. is there a better way to close open file handles > 3. am I resetting the sigaction() correctly? > > Especially #3 puzzles me, because I can not use sigterm to cancel a child > (via a different bash). Also, waitpid() always returns -1 and tells me "there > are no children". > > So I am under the impression I am doing something wrong. As I also have > limited experience in that area of executing external programs, I'd > appreciate advice from those in the know. > > Feel free to pass this along, if you are able to motivate someone else on > this topic ;) > > Thanks, > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu Apr 2 14:08:07 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 14:08:07 +0200 Subject: [rsyslog] review request: omprog References: <9B6E2A8877C38245BFB15CC491A11DA702AE01@GRFEXC.intern.adiscon.com> <49D4A53B.5050501@lists.bod.org> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE03@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Paul Chambers > Sent: Thursday, April 02, 2009 1:45 PM > To: rsyslog-users > Subject: Re: [rsyslog] review request: omprog > > I only had a quick glance, but have a couple of questions: > > a) is popen() not suitable for some reason? it's a little less > efficient > (since it starts a shell to interpret the command line passed in) but > your code would be much simpler. I have two problems with popen() - the "real" one is that I cannot obtain the pid of the started program. Or better phrased, no search brought up how to do that. But (the second problem) the search turned out that there exist multiple cross-platform issues with popen() and *a lot* of folks recommended against it. > > b) are you sure you need to close the file handles and reset signal > handlers yourself? from the execve() man page: > > "execve() does not return on success, and the text, data, bss, and > stack > of the calling process are overwritten by that of the program loaded. > The program invoked inherits the calling process's PID, and any open > file descriptors that are not set to close-on-exec. Signals pending on > the calling process are cleared. Any signals set to be caught by the > calling process are reset to their default behaviour. The SIGCHLD > signal > (when set to SIG_IGN) may or may not be reset to SIG_DFL." On my Fedora 10, the execve man page is much less specific on this (actually, open files are not mentioned at all in the "what is cleaned up" list. This lets me believe that there at least is a portability problem. However, you are right with the signals, so I'd only need to clear SIGCHLD. But what if there is some platform who preserves something else? As resetting signals is very quick, it's probably better to do it for all of them Probably it would make sense to add these notes as comments into the function header. Guess other's will have the same questions :) > > Sounds like execve is doing both for you. It'd be easy to verify - > write > a little external program that dumps the open fds and signal handlers > when it starts. How can I check which fd's are open? That would be the solution also to see what I need to close... Maybe I am overlooking the obvious. Thanks for looking at the code, definitely helpful! Rainer > > -- Paul > > Rainer Gerhards wrote: > > Hi all, > > > > a forum post made me (again) aware of functionality missing in > rsyslog: > > > > http://kb.monitorware.com/problem-to-migrate-from-syslog-ng-to- > rsyslog-t8982. > > html > > > > That is the execution of a program which receives all log messages > passed in > > via stdin. I have now done a first, rough, implementation of "omprog" > which > > shall provide this feature. > > > > I would appreciate if some could quickly review the code, especially > lines 97 > > to 135, where I clean up after fork and before I exec the program. > The code > > can be found here: > > > > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omprog/omprog.c; > h=2a07 > > 8a6d862c2230a8caa9a489e08a7b5ea4cb29;hb=refs/heads/omprog#l97 > > > > There are three questions that I have: > > > > 1. is the method used sufficiently secure? > > 2. is there a better way to close open file handles > > 3. am I resetting the sigaction() correctly? > > > > Especially #3 puzzles me, because I can not use sigterm to cancel a > child > > (via a different bash). Also, waitpid() always returns -1 and tells > me "there > > are no children". > > > > So I am under the impression I am doing something wrong. As I also > have > > limited experience in that area of executing external programs, I'd > > appreciate advice from those in the know. > > > > Feel free to pass this along, if you are able to motivate someone > else on > > this topic ;) > > > > Thanks, > > Rainer > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From aoz.syn at gmail.com Thu Apr 2 16:22:28 2009 From: aoz.syn at gmail.com (RB) Date: Thu, 2 Apr 2009 08:22:28 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations In-Reply-To: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> On Wed, Apr 1, 2009 at 10:02, Luis Fernando Mu?oz Mej?as wrote: > Hello, world. Oi. :) Sorry I'm late to the game. > It works, but it's way too slow for my needs. As I said when I started > this project, I need to be very fast, to prepare the statement at > connection time, run it many times, and definitely want batch > operations. Say, I want to insert 1000 entries with a single call to the > Oracle interface, then commit. Forgive me - my database-performance-fu and oracle-fu are not terribly strong, I may make a fool of myself here. What is the performance gain of making a prepared statement over just executing raw statements? IOW, why choose (please forgive my SQL): CREATE PROCEDURE zazz AS insert into foo(field1, field2, field3) values(:val1, :val2, :val3); SET TRANSACTION; zazz("foo", "bar", "baz"); zazz("foo1", "bar1", "baz1"); zazz("foo2", "bar2", "baz2"); COMMIT; -- over SET TRANSACTION; INSERT INTO foo(field1, field2, field3) values("foo", "bar", "baz"); INSERT INTO foo(field1, field2, field3) values("foo1", "bar1", "baz1"); INSERT INTO foo(field1, field2, field3) values("foo2", "bar2", "baz2"); COMMIT; Perhaps that's not even what you're doing. I know there are other considerations and niceties with procedures, but the latter syntax would still allow for batched transactions while enabling rsyslog to do the dirty work of formatting the query and not necessitating exposure of internal structures. I confess to being a bit confused as to why the existing output module interface wasn't readily extending to batching, since I've tended to see the output modules as more of thin, final-hop proxies. IMHO, database output modules should still pretty much blindly execute whatever SQL rsyslog hands them, be that wrapped in a transaction or not. That said (and more a question for Rainer), do rsyslog templates have support for a null character? If so, it may be a more viable approach for delimiting simple fields than changing the output module API. Of course the CSV approach works too, but seems easier to break out of than null-delimiting. From rgerhards at hq.adiscon.com Thu Apr 2 16:57:32 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 16:57:32 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE08@GRFEXC.intern.adiscon.com> Just a partial response (but quick ;)) > I confess to being a bit confused as to why the existing output module > interface wasn't readily extending to batching, That would be the real solution (and David Lang suggested it long ago). The only problem is it takes quite some effort, as we need to make sure we do not lose messages along that way. It is still on my agenda, but without a sponsor I fear it'll stay there for quite a while. In most cases, rsyslog is simply too fast to see a bottleneck. > since I've tended to > see the output modules as more of thin, final-hop proxies. IMHO, > database output modules should still pretty much blindly execute > whatever SQL rsyslog hands them, be that wrapped in a transaction or > not. > > That said (and more a question for Rainer), do rsyslog templates have > support for a null character? If so, it may be a more viable approach > for delimiting simple fields than changing the output module API. Of > course the CSV approach works too, but seems easier to break out of > than null-delimiting. Nope, also on the agenda. Here sysklogd legacy bites. All C-strings internally, so while not complex, a *lot* of work is required to change that (basically all string operations must be touched, and that in a program that mostly does string operations...). Rainer From Luis.Fernando.Munoz.Mejias at cern.ch Thu Apr 2 17:21:29 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?utf-8?q?Mu=C3=B1oz_Mej=C3=ADas?=) Date: Thu, 2 Apr 2009 17:21:29 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations In-Reply-To: <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> Message-ID: <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> RB, > Oi. :) Sorry I'm late to the game. Your contribution is appreciated. :) > Forgive me - my database-performance-fu and oracle-fu are not terribly > strong, I may make a fool of myself here. What is the performance > gain of making a prepared statement over just executing raw > statements? The statement is parsed only once, so you save the overhead of parsing and doing an execution plan for each execution, which will be identical. And I expect to insert hundreds of entries per second. :) All you have to do is pass the arguments. > CREATE PROCEDURE zazz AS > insert into foo(field1, field2, field3) values(:val1, :val2, > :val3); SET TRANSACTION; zazz("foo", "bar", "baz"); zazz("foo1", > "bar1", "baz1"); zazz("foo2", "bar2", "baz2"); COMMIT; > > -- over > > SET TRANSACTION; > INSERT INTO foo(field1, field2, field3) values("foo", > "bar", "baz"); > INSERT INTO foo(field1, field2, field3) values("foo1", > "bar1", "baz1"); > INSERT INTO foo(field1, field2, field3) > values("foo2", "bar2", "baz2"); COMMIT; > With this code, Oracle (any DB, actually) needs to parse each insert, and then choose the execution plan that looks best once. What you get by preparing the statement and using batches is that the client (rsyslog core) will store these triplets: (foo, bar, baz) (foo1, bar1, baz1) (foo2, bar2, baz2) and when you've hit a limit (say, you're on (foo1000, bar1000, baz1000)) send them all to the server at once (thus calling only once to doAction, calling only once to the Oracle interface), who will blindly execute the statement without wasting a single cycle on parsing or evaluating execution plans: it's already done. > Perhaps that's not even what you're doing. For the moment I'm doing BEGIN INSERT INTO foo(field1, field2, field3) values("foo", "bar", "baz"); COMMIT; BEGIN INSERT INTO foo(field1, field2, field3) values("foo1", "bar1", "baz1"); COMMIT; You can already imagine the overhead involved. Actually, all DB-based modules on rsyslog do the same. > I know there are other considerations and niceties with procedures, It's not even a stored procedure, it's on the client doing communicating many times versus only one with the DB. > but the latter syntax would still allow for batched transactions while > enabling rsyslog to do the dirty work of formatting the query and not > necessitating exposure of internal structures. > Indeed, I want rsyslog doing most of the work for me. But the overhead involved in parsing and evaluating execution plans is unacceptable on my context. So I'm looking here for the balance between rsyslog doing work for me and rsyslog performing as good as I need it. Perhaps exposing the structures is not a good idea, either. > IMHO, database output modules should still pretty much blindly execute > whatever SQL rsyslog hands them, be that wrapped in a transaction or > not. > Yes and no. Yes, rsyslog should be the one who tells the statement to be executed. But there is no need for rsyslog to repeat that statement for each entry (millions per day). Doing it at initialization time is enough. I made a small Python prototype to do something similar to what you propose, with no batches, but committing each 1000 entries. The speedup I got by introducing batches was about a factor 50. And the statement was already prepared. Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From Luis.Fernando.Munoz.Mejias at cern.ch Thu Apr 2 17:34:19 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?utf-8?q?Mu=C3=B1oz_Mej=C3=ADas?=) Date: Thu, 2 Apr 2009 17:34:19 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> Message-ID: <200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch> El Jueves, 2 de Abril de 2009 12:03, Rainer Gerhards escribi?: > > Making properties in CSV format is indeed a good idea. > > I'll add a "csv" option to the property replacer. It will format a > field according to RFC 4180: > Thanks. I'll start working on it. For handling CSVs I'm planning to use libmba, which is distributed with Red Hat distros. Is it OK to have such a dependency on a module? > I will always enclose values in double quotes to keep the code simple > (I think also on the parser side). Let me know if you think this > approach is useful. Looks useful. I'm starting to play with it. > I have done some review of the code. Not in-depth, but I think good > enough. I do have the message object available (thankfully, the > optimization to store just the strings did not yet take place > ;)). Creating a deep copy is not problematic and I'd assume typically > bears roughly equivalent cost to creating the template string (which > is kind of expensive). There obviously must be a new interface > declared for this and your plugin should support both the new and the > old interface. It may be OK if you return RS_RET_DISABLED on the first > call to the old interface, what means a downlevel engine can not use > your plugin (acceptable from my POV). > Sounds acceptable. > Feature-Wise, however, I think you lose a couple of things. Most > importantly, the template processor & property replacer allows you to > rewrite parts of the message. If we pass in the plain message object, > you lose this ability. So any modifications must be made directly from > within the plugin. I'd say that's a big disadvantage. > A huge one. If, for instance, I want to extract user name and IP address from SSH log in messages, I don't want my plugin to be aware of SSH messages. I prefer to have it done by rsyslog, as it is done now. If we have to make the modules extract fields on their own, we are duplicating existing code and introducing many new bugs. That's not acceptable. I wonder if, however, we can get any access to the properties rsyslog processes, before they are concatenated into a single string. For instance, if I specify: %fromhost%,%timestamp:::date-rfc3339%,%msg% Can I have an array of pointers to each already processed property? If this can't be done, CSV parsing is already excellent for our needs. > > I think it's better to pass a deep copy, free it once the module > > call returns, and do it only for modules that actually need that new > > entry point. If such deep copies are expensive, then we are just > > fine the way we are now. > > Agreed, but I think a deep copy does not address anything. With the > template system (in theory but not yet in practice), a plugin can not > access any information other than what the users has configured in the > template. If the full object is passed, this can not prevented. Indeed. I was thinking of the set of things I suppose you have right before generating the string that is passed to doAction. Anything else is a security problem. Again, if it's difficult or overkill, discard this idea. Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From aoz.syn at gmail.com Thu Apr 2 18:12:20 2009 From: aoz.syn at gmail.com (RB) Date: Thu, 2 Apr 2009 10:12:20 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support for batch operations In-Reply-To: <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <4255c2570904020912x7cf931a5l762f060715334db2@mail.gmail.com> > The statement is parsed only once, so you save the overhead of parsing > and doing an execution plan for each execution, which will be > identical. And I expect to insert hundreds of entries per second. :) This is where more extensive database programming experience probably helps; I was unaware of (but understand the reason for) the additional overhead. Good to know. > context. So I'm looking here for the balance between rsyslog doing work > for me and rsyslog performing as good as I need it. Perhaps exposing the > structures is not a good idea, either. Perhaps you could [ab]use the fact that ppString is an array and do something like ommail does, using more than one string/template when using a custom subject. What I don't know off the top of my head is whether this would limit the number of different Oracle outputs you could connect to. RB From rgerhards at hq.adiscon.com Thu Apr 2 18:27:10 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 18:27:10 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904020912x7cf931a5l762f060715334db2@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE0B@GRFEXC.intern.adiscon.com> > > context. So I'm looking here for the balance between rsyslog doing > work > > for me and rsyslog performing as good as I need it. Perhaps exposing > the > > structures is not a good idea, either. > > Perhaps you could [ab]use the fact that ppString is an array and do > something like ommail does, using more than one string/template when > using a custom subject. What I don't know off the top of my head is > whether this would limit the number of different Oracle outputs you > could connect to. The problem is that this number is expected to be fixed at compile time. I think it is possible that it is dynamically changed upon action creation, but it is a very "creative" use of this facility and I am not sure how well it will work. It may be worth considering a linked list of strings to pass, but on the other hand CSV parsing should involve not much overhead (just it is not as nice as a generic solution...). Raienr From rgerhards at hq.adiscon.com Thu Apr 2 18:36:07 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 18:36:07 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and supportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> <200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > Sent: Thursday, April 02, 2009 5:34 PM > To: rsyslog at lists.adiscon.com > Subject: Re: [rsyslog] RFC: On rsyslog output modules and > supportforbatchoperations > > El Jueves, 2 de Abril de 2009 12:03, Rainer Gerhards escribi?: > > > Making properties in CSV format is indeed a good idea. > > > > I'll add a "csv" option to the property replacer. It will format a > > field according to RFC 4180: > > > Thanks. I'll start working on it. For handling CSVs I'm planning to use > libmba, which is distributed with Red Hat distros. Is it OK to have > such > a dependency on a module? That's the module author's decison. It just should be documented. I don't think it is any problem at all. I think it is available on all major platforms, isn't it? I am more strict with the core, core dependencies need to be loaded on every (actually!) system that uses rsyslog as default syslogd (think Fedore, Debian and variants). That would be much more of a problem. > > > I will always enclose values in double quotes to keep the code simple > > (I think also on the parser side). Let me know if you think this > > approach is useful. > > Looks useful. I'm starting to play with it. > > > I have done some review of the code. Not in-depth, but I think good > > enough. I do have the message object available (thankfully, the > > optimization to store just the strings did not yet take place > > ;)). Creating a deep copy is not problematic and I'd assume typically > > bears roughly equivalent cost to creating the template string (which > > is kind of expensive). There obviously must be a new interface > > declared for this and your plugin should support both the new and the > > old interface. It may be OK if you return RS_RET_DISABLED on the > first > > call to the old interface, what means a downlevel engine can not use > > your plugin (acceptable from my POV). > > > Sounds acceptable. > > > Feature-Wise, however, I think you lose a couple of things. Most > > importantly, the template processor & property replacer allows you to > > rewrite parts of the message. If we pass in the plain message object, > > you lose this ability. So any modifications must be made directly > from > > within the plugin. I'd say that's a big disadvantage. > > > A huge one. If, for instance, I want to extract user name and IP > address > from SSH log in messages, I don't want my plugin to be aware of SSH > messages. I prefer to have it done by rsyslog, as it is done now. > > If we have to make the modules extract fields on their own, we are > duplicating existing code and introducing many new bugs. That's not > acceptable. I wonder if, however, we can get any access to the > properties rsyslog processes, before they are concatenated into a > single > string. > > For instance, if I specify: > > %fromhost%,%timestamp:::date-rfc3339%,%msg% > > Can I have an array of pointers to each already processed property? I will investigate that. It just occured to me that I have the template compiled as a linked list. It should not be too much of an effort to turn values into a linked list rather than a string. Will check, but no promise. > > If this can't be done, CSV parsing is already excellent for our needs. > > > > I think it's better to pass a deep copy, free it once the module > > > call returns, and do it only for modules that actually need that > new > > > entry point. If such deep copies are expensive, then we are just > > > fine the way we are now. > > > > Agreed, but I think a deep copy does not address anything. With the > > template system (in theory but not yet in practice), a plugin can not > > access any information other than what the users has configured in > the > > template. If the full object is passed, this can not prevented. > > Indeed. I was thinking of the set of things I suppose you have right > before generating the string that is passed to doAction. Anything else > is a security problem. > > Again, if it's difficult or overkill, discard this idea. It is good to have this discussion, it was dangeling for quite some while (and I miss some participants ;)). It may turn out to help us toward a more powerful interface for future development. Rainer > > Cheers. > -- > Luis Fernando Mu?oz Mej?as > Luis.Fernando.Munoz.Mejias at cern.ch > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu Apr 2 18:45:16 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 18:45:16 +0200 Subject: [rsyslog] RFC: On rsyslog output modules andsupportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com><200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE0D@GRFEXC.intern.adiscon.com> > > For instance, if I specify: > > > > %fromhost%,%timestamp:::date-rfc3339%,%msg% > > > > Can I have an array of pointers to each already processed property? > > I will investigate that. It just occured to me that I have the template > compiled as a linked list. It should not be too much of an effort to > turn > values into a linked list rather than a string. Will check, but no > promise. That was a good discussion result. I think it is trivial to generate such a linked list. However, it is not as trivial (but I think simple enough) to extend the plugin interface so that a plugin may request the linked list instead of the string. In order to preserve interfaces, I'll probably need to abuse the ppStrings[] array, so that in the linkedList case a cast is necessary. But I think this is clean enough. We just need to ensure that the plugin does not abort an older rsyslogd and an older rsyslogd does not abort the plugin ;) A lot of chore is on the plugin, I think, in checking that everything is actually available... From the user's perspective, the same template syntax is used, no matter how it is passed to the plugin. This can be considered nice (but some may not like it ;)). Will try to dig deeper into this. Rainer From mbiebl at gmail.com Thu Apr 2 18:52:11 2009 From: mbiebl at gmail.com (Michael Biebl) Date: Thu, 2 Apr 2009 18:52:11 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and supportforbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> <200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com> Message-ID: 2009/4/2 Rainer Gerhards : >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as >> Sent: Thursday, April 02, 2009 5:34 PM >> To: rsyslog at lists.adiscon.com >> Subject: Re: [rsyslog] RFC: On rsyslog output modules and >> supportforbatchoperations >> >> El Jueves, 2 de Abril de 2009 12:03, Rainer Gerhards escribi?: >> > > Making properties in CSV format is indeed a good idea. >> > >> > I'll add a "csv" option to the property replacer. It will format a >> > field according to RFC 4180: >> > >> Thanks. I'll start working on it. For handling CSVs I'm planning to use >> libmba, which is distributed with Red Hat distros. Is it OK to have >> such >> a dependency on a module? > > That's the module author's decison. It just should be documented. I don't > think it is any problem at all. I think it is available on all major > platforms, isn't it? > I couldn't find libmba in the Debian/Ubuntu repositories fwiw. Cheers, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? From rgerhards at hq.adiscon.com Thu Apr 2 19:00:02 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 2 Apr 2009 19:00:02 +0200 Subject: [rsyslog] RFC: On rsyslog output modulesandsupportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com><200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE0D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE0E@GRFEXC.intern.adiscon.com> It is probably clean that we simply define a new public entry point inside the rsyslog core that old versions do not have. The output plugin simply uses it. If it is tried to be loaded on old rsyslogd, the entry point is not found and the loader refuses to load the module. It is somewhat ugly in that the error message may be misleading (doc can solve that), but otherwise I think it works perfectly well - after all, the only thing we could do is disable the module on versions that do not support the functionality. That just disables the ability to use an alternate implementation in case of the one not available... well.. we can do it that way via an internal API, too, if I think correctly. OK, a solution begins to form ;) Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Thursday, April 02, 2009 6:45 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output > modulesandsupportforbatchoperations > > > > For instance, if I specify: > > > > > > %fromhost%,%timestamp:::date-rfc3339%,%msg% > > > > > > Can I have an array of pointers to each already processed property? > > > > I will investigate that. It just occured to me that I have the > template > > compiled as a linked list. It should not be too much of an effort to > > turn > > values into a linked list rather than a string. Will check, but no > > promise. > > That was a good discussion result. I think it is trivial to generate > such a > linked list. However, it is not as trivial (but I think simple enough) > to > extend the plugin interface so that a plugin may request the linked > list > instead of the string. In order to preserve interfaces, I'll probably > need to > abuse the ppStrings[] array, so that in the linkedList case a cast is > necessary. But I think this is clean enough. We just need to ensure > that the > plugin does not abort an older rsyslogd and an older rsyslogd does not > abort > the plugin ;) A lot of chore is on the plugin, I think, in checking > that > everything is actually available... > > >From the user's perspective, the same template syntax is used, no > matter how > it is passed to the plugin. This can be considered nice (but some may > not > like it ;)). > > Will try to dig deeper into this. > > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rsyslog at lists.bod.org Thu Apr 2 21:00:01 2009 From: rsyslog at lists.bod.org (rsyslog at lists.bod.org) Date: Thu, 02 Apr 2009 12:00:01 -0700 Subject: [rsyslog] review request: omprog In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE03@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AE01@GRFEXC.intern.adiscon.com> <49D4A53B.5050501@lists.bod.org> <9B6E2A8877C38245BFB15CC491A11DA702AE03@GRFEXC.intern.adiscon.com> Message-ID: <49D50B31.5080904@lists.bod.org> Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Paul Chambers >> Sent: Thursday, April 02, 2009 1:45 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] review request: omprog >> >> I only had a quick glance, but have a couple of questions: >> >> a) is popen() not suitable for some reason? it's a little less >> efficient >> (since it starts a shell to interpret the command line passed in) but >> your code would be much simpler. >> > > I have two problems with popen() - the "real" one is that I cannot obtain the > pid of the started program. Or better phrased, no search brought up how to do > that. > I don't know offhand of a portable/clean way to do that. Though pclose() gets it from the FILE * somehow. So looking at the source for pclose() should tell you. > But (the second problem) the search turned out that there exist > multiple cross-platform issues with popen() and *a lot* of folks recommended > against it. > Fair enough. I can certainly believe there would be some implementation differences across platforms, though I don't have a feel for how significant they'd be. >> b) are you sure you need to close the file handles and reset signal >> handlers yourself? from the execve() man page: >> >> "execve() does not return on success, and the text, data, bss, and >> stack >> of the calling process are overwritten by that of the program loaded. >> The program invoked inherits the calling process's PID, and any open >> file descriptors that are not set to close-on-exec. Signals pending on >> the calling process are cleared. Any signals set to be caught by the >> calling process are reset to their default behaviour. The SIGCHLD >> signal >> (when set to SIG_IGN) may or may not be reset to SIG_DFL." >> > > On my Fedora 10, the execve man page is much less specific on this (actually, > open files are not mentioned at all in the "what is cleaned up" list. This > lets me believe that there at least is a portability problem. However, you > are right with the signals, so I'd only need to clear SIGCHLD. But what if > there is some platform who preserves something else? As resetting signals is > very quick, it's probably better to do it for all of them > The 'close on exec' flag (FD_CLOEXEC) is one of the few things posix specifies as part of the file descriptor flags. From the posix spec for execve: "File descriptors open in the calling process image shall remain open in the new process image, except for those whose close-on- /exec/ flag FD_CLOEXEC is set. For those file descriptors that remain open, all attributes of the open file description remain unchanged. For any file descriptor that is closed for this reason, file locks are removed as a result of the close as described in /close/() . Locks that are not removed by closing of file descriptors remain unchanged." So I seriously doubt it'd be a portability issue. I'd suggest looking at the glibc and posix documentation for execve to get the whole story on execve. > Probably it would make sense to add these notes as comments into the function > header. Guess other's will have the same questions :) > Quite probably :) >> Sounds like execve is doing both for you. It'd be easy to verify - >> write >> a little external program that dumps the open fds and signal handlers >> when it starts. >> > > How can I check which fd's are open? That would be the solution also to see > what I need to close... Maybe I am overlooking the obvious. > One method on linux is to look at the contents of /proc//fd (directory of symlinks) and /proc//fdinfo (directory of dynamic text files with current flags and position). Calling getrlimit(RLIMIT_NOFILE) will tell you the maximum number of files that the process is permitted to have open simultaneously. I think sysconf(_SC_OPEN_MAX) is another way to get the same info, if setrlimit hasn't been used. That'd avoid the 64k loop iterations, at least. fcntl(fd, F_GETFD, 0) will return the flags set on a given file descriptor, F_SETFD will set them. FD_CLOEXEC is the bit in the flags that controls if a file is closed when an exec happens. I've never looked at the source for 'lsof' or 'pfiles', those may also contain tricks for finding out what files are open. > Thanks for looking at the code, definitely helpful! > Glad to help where I can. Not that I consider myself an expert in this area... By the way, I may start writing a custom plugin myself soon. Got some ideas that I'd like to experiment with for embedded devices (my day job - Amazon Kindle, Palm Pre, TiVo, etc. :) -- Paul From david at lang.hm Fri Apr 3 06:02:23 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 2 Apr 2009 21:02:23 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 1 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as >> >> I bet it works. But it's probably too ugly for users. Cleaner ways may >> need deeper changes into rsyslog's API so that the module gets direct >> access to each field. That's probably a lot of work and I can't wait >> for >> that. > > I need to check if there are actually larger changes required. The main > reason for this interface initially was security (do not pass to the module > the full object). given that rsyslog is multi-threaded, not multi-process, any thread can get at the memory of any other thread. this significantly limits the amount of security that you can get by not passing a direct pointer to the full object. while I am a security person (it's my full time job), I'm not sure that it's worth it to limit the official module interface like this. David Lang From david at lang.hm Fri Apr 3 06:07:47 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 2 Apr 2009 21:07:47 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADF9@GRFEXC.intern.adiscon.com> <200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com> Message-ID: On Thu, 2 Apr 2009, Rainer Gerhards wrote: > Agreed, but I think a deep copy does not address anything. With the template > system (in theory but not yet in practice), a plugin can not access any > information other than what the users has configured in the template. If the > full object is passed, this can not prevented. In practice, today, plugins > are loaded in-process and as such can access the whole process space. But > there are ideas to create an out-of-process plugin interface for very > security sensitive environments. They would be hurt (or require additional > configuration) but the "full object access" approach. if you really want to have a output module that's seperate from a security point of view, have a lightweight output module (that can have full access to everything) mediate all communication to the external module (that would only get what is sent to it and is a seperate process) this would give you the security you are thinking of, but still allow in-process modules to have the increased access to data. this isn't the first case where it would have been helpful to have access to more of the properties (the UDP forgery module I sent in was another, but I was able to work-around that by adding data to the message and having the plugin parse it out, inefficiant, but possible) David Lang From david at lang.hm Fri Apr 3 06:36:03 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 2 Apr 2009 21:36:03 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE08@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AE08@GRFEXC.intern.adiscon.com> Message-ID: On Thu, 2 Apr 2009, Rainer Gerhards wrote: > Just a partial response (but quick ;)) > >> I confess to being a bit confused as to why the existing output module >> interface wasn't readily extending to batching, > > That would be the real solution (and David Lang suggested it long ago). The > only problem is it takes quite some effort, as we need to make sure we do not > lose messages along that way. for those who are interested, what I proposed was to shift completely away from the idea that the output module processes a fixed number of records, and instead have a loop something like the following. while (events) if (# events > N) grab first N events else grab all events create sql string insert to database mark the events grabbed as written with the create sql string being something like the following perlish code $sql=$header.join($mid, at events).$footer; so you could say $header='insert into table logs values (' $mid = '),(' $footer=); and if you pass it three events you get insert into table logs values (msg1),(msg2),(msg3); five values you would get insert into table logs values (msg1),(msg2),(msg3),(msg4),(msg5); I was not concerned about the command parsing time, due to the fact that if it takes a little longer, it just means that there are more events in the queue for the next pass to handle. there could reach a point where you have so many events that it matters, but since this process could easily insert hundreds or thousands of messages in one statement the overhead is pretty low > It is still on my agenda, but without a sponsor I fear it'll stay there > for quite a while. still hoping > In most cases, rsyslog is simply too fast to see a bottleneck. true David Lang From rgerhards at hq.adiscon.com Fri Apr 3 14:18:35 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 3 Apr 2009 14:18:35 +0200 Subject: [rsyslog] RFC: On rsyslog outputmodulesandsupportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com><200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE0D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE0E@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE1F@GRFEXC.intern.adiscon.com> I have worked on the new interface this morning. As always, there were a couple of subtleties, but I now have applied the patch to the current master. Now, an output plugin can receive the template in two ways: either as a string (the current way of doing things) or as an array of string pointers. This is transparent to the end user. I have not yet created documentation on how to use it, but I have used omstdout during my testing and it shows very well how to work with the new method. It also shows all the necessary plumbing to be compatible both with current and previous rsyslogd version (if that is of interest for someone). You can find it via gitweb: http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omstdout/omstdout.c;h= e491005cca064af2c40c339af18cead9ddaf363d;hb=HEAD The full patch is here: http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=ec0e2c3e7df6addc02431628 daddfeae49b92af7 I will release it as part of the upcoming 4.1.6 devel, due next week. I hope this is a useful addition. Feedback is appreciated. Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Thursday, April 02, 2009 7:00 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog > outputmodulesandsupportforbatchoperations > > It is probably clean that we simply define a new public entry point > inside > the rsyslog core that old versions do not have. The output plugin > simply uses > it. If it is tried to be loaded on old rsyslogd, the entry point is not > found > and the loader refuses to load the module. It is somewhat ugly in that > the > error message may be misleading (doc can solve that), but otherwise I > think > it works perfectly well - after all, the only thing we could do is > disable > the module on versions that do not support the functionality. That just > disables the ability to use an alternate implementation in case of the > one > not available... well.. we can do it that way via an internal API, too, > if I > think correctly. OK, a solution begins to form ;) > > Rainer > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > Sent: Thursday, April 02, 2009 6:45 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] RFC: On rsyslog output > > modulesandsupportforbatchoperations > > > > > > For instance, if I specify: > > > > > > > > %fromhost%,%timestamp:::date-rfc3339%,%msg% > > > > > > > > Can I have an array of pointers to each already processed > property? > > > > > > I will investigate that. It just occured to me that I have the > > template > > > compiled as a linked list. It should not be too much of an effort > to > > > turn > > > values into a linked list rather than a string. Will check, but no > > > promise. > > > > That was a good discussion result. I think it is trivial to generate > > such a > > linked list. However, it is not as trivial (but I think simple > enough) > > to > > extend the plugin interface so that a plugin may request the linked > > list > > instead of the string. In order to preserve interfaces, I'll probably > > need to > > abuse the ppStrings[] array, so that in the linkedList case a cast is > > necessary. But I think this is clean enough. We just need to ensure > > that the > > plugin does not abort an older rsyslogd and an older rsyslogd does > not > > abort > > the plugin ;) A lot of chore is on the plugin, I think, in checking > > that > > everything is actually available... > > > > >From the user's perspective, the same template syntax is used, no > > matter how > > it is passed to the plugin. This can be considered nice (but some may > > not > > like it ;)). > > > > Will try to dig deeper into this. > > > > Rainer > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From friedl at hq.adiscon.com Fri Apr 3 15:06:16 2009 From: friedl at hq.adiscon.com (Florian Riedl) Date: Fri, 3 Apr 2009 15:06:16 +0200 Subject: [rsyslog] rsyslog 3.20.5 released (v3-stable) Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE25@GRFEXC.intern.adiscon.com> Hi all, rsyslog 3.20.5, a member of the v3-stable branch, has been released. This is a bug-fixing released that also comes with slightly enhanced documentation. Most importantly, a bug in RainerScript number conversion and two potential segfaults have been fixed. Full details can be seen in the Changelog: http://www.rsyslog.com/Article356.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-151.phtml As always, feedback is appreciated. Florian Riedl -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html . From rgerhards at hq.adiscon.com Fri Apr 3 15:51:18 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 3 Apr 2009 15:51:18 +0200 Subject: [rsyslog] RFC: On rsyslogoutputmodulesandsupportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904012120.56091.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702ADFF@GRFEXC.intern.adiscon.com><200904021734.19882.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AE0C@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE0D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE0E@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE1F@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE26@GRFEXC.intern.adiscon.com> I have now also created at least a bit of developer documentation: http://www.rsyslog.com/doc-dev_oplugins.html Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Friday, April 03, 2009 2:19 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On > rsyslogoutputmodulesandsupportforbatchoperations > > I have worked on the new interface this morning. As always, there were > a > couple of subtleties, but I now have applied the patch to the current > master. > Now, an output plugin can receive the template in two ways: either as a > string (the current way of doing things) or as an array of string > pointers. > This is transparent to the end user. > > I have not yet created documentation on how to use it, but I have used > omstdout during my testing and it shows very well how to work with the > new > method. It also shows all the necessary plumbing to be compatible both > with > current and previous rsyslogd version (if that is of interest for > someone). > You can find it via gitweb: > > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omstdout/omstdou > t.c;h= > e491005cca064af2c40c339af18cead9ddaf363d;hb=HEAD > > > The full patch is here: > > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=ec0e2c3e7df6addc02 > 431628 > daddfeae49b92af7 > > I will release it as part of the upcoming 4.1.6 devel, due next week. > > I hope this is a useful addition. Feedback is appreciated. > > Rainer > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > Sent: Thursday, April 02, 2009 7:00 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] RFC: On rsyslog > > outputmodulesandsupportforbatchoperations > > > > It is probably clean that we simply define a new public entry point > > inside > > the rsyslog core that old versions do not have. The output plugin > > simply uses > > it. If it is tried to be loaded on old rsyslogd, the entry point is > not > > found > > and the loader refuses to load the module. It is somewhat ugly in > that > > the > > error message may be misleading (doc can solve that), but otherwise I > > think > > it works perfectly well - after all, the only thing we could do is > > disable > > the module on versions that do not support the functionality. That > just > > disables the ability to use an alternate implementation in case of > the > > one > > not available... well.. we can do it that way via an internal API, > too, > > if I > > think correctly. OK, a solution begins to form ;) > > > > Rainer > > > > > -----Original Message----- > > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > > Sent: Thursday, April 02, 2009 6:45 PM > > > To: rsyslog-users > > > Subject: Re: [rsyslog] RFC: On rsyslog output > > > modulesandsupportforbatchoperations > > > > > > > > For instance, if I specify: > > > > > > > > > > %fromhost%,%timestamp:::date-rfc3339%,%msg% > > > > > > > > > > Can I have an array of pointers to each already processed > > property? > > > > > > > > I will investigate that. It just occured to me that I have the > > > template > > > > compiled as a linked list. It should not be too much of an effort > > to > > > > turn > > > > values into a linked list rather than a string. Will check, but > no > > > > promise. > > > > > > That was a good discussion result. I think it is trivial to > generate > > > such a > > > linked list. However, it is not as trivial (but I think simple > > enough) > > > to > > > extend the plugin interface so that a plugin may request the linked > > > list > > > instead of the string. In order to preserve interfaces, I'll > probably > > > need to > > > abuse the ppStrings[] array, so that in the linkedList case a cast > is > > > necessary. But I think this is clean enough. We just need to ensure > > > that the > > > plugin does not abort an older rsyslogd and an older rsyslogd does > > not > > > abort > > > the plugin ;) A lot of chore is on the plugin, I think, in checking > > > that > > > everything is actually available... > > > > > > >From the user's perspective, the same template syntax is used, no > > > matter how > > > it is passed to the plugin. This can be considered nice (but some > may > > > not > > > like it ;)). > > > > > > Will try to dig deeper into this. > > > > > > Rainer > > > _______________________________________________ > > > rsyslog mailing list > > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tbergfeld at hq.adiscon.com Fri Apr 3 16:09:40 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Fri, 3 Apr 2009 16:09:40 +0200 Subject: [rsyslog] rsyslog 3.21.11 (beta) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE2A@GRFEXC.intern.adiscon.com> Hi all, rsyslog 3.21.11, a member of the beta branch, has been released today. It is a bug-fixing release. Most importantly, it has the build system improvements contributed by Michael Biebl - thx! Furthermore, all patches from 3.20.5 are incorporated (see its ChangeLog entry). This is a recommended update for all users of the beta branch. Change Log: http://www.rsyslog.com/Article358.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-152.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html . From rgerhards at hq.adiscon.com Fri Apr 3 18:12:09 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 3 Apr 2009 18:12:09 +0200 Subject: [rsyslog] Weird problems when combining rsyslog 3 and 4 References: <000d01c9af24$6fb6e7ff$100013ac@intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE2D@GRFEXC.intern.adiscon.com> Sorry, this slipped my attention. However, I have just added this case to the parser test suite and I do not see any parsing error. Maybe a problem with the template (but I don't think so)? Could you re-try and provide me a debug log (need parsing and sending) from when this problem occurred. Thanks, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Friday, March 27, 2009 10:38 PM > To: rsyslog-users > Subject: Re: [rsyslog] Weird problems when combining rsyslog 3 and 4 > > These samples are enough, no need to disclose more. Single lines are > sufficient, as long as they can repro the problem :) > > rainer > > ----- Urspr?ngliche Nachricht ----- > Von: "Luis Fernando Mu?oz Mej?as" > An: "rsyslog-users" > Gesendet: 27.03.09 19:23 > Betreff: Re: [rsyslog] Weird problems when combining rsyslog 3 and 4 > > Rainer, > > > Can you send me an on-the-wire sample of those messages (I mean that > are > > invalidly interpreted). I have now created the parser test suite and > they > > would make a good addition, especially as I need to troubleshoot them > ;) > > > > Rainer > > Before disclosing enough data I have to ask for permission. I can tell > you that the last hop in this relay chain is using rsyslog v3, and that > the format I got (tcpdump dixit) for these messages is always like > this: > > <38>Mar 27 19:06:53 source_server sshd(pam_unix)[12750]: session opened > for user foo by (uid=0) > > And what gets actually logged for that is: > > 2009-03-27T19:06:53+01:00 last_hop_server source_server > sshd(pam_unix)[12750]: session opened for user foo by (uid=0) > > Then, last_hop_server becomes %hostname% and source_server becomes > %syslogtag%. > > This last hop server is using rsyslog v3, so it seems to me I have to > instruct v4 that the input is coming in a non-default format. > > Cheers. > -- > Luis Fernando Mu?oz Mej?as > Luis.Fernando.Munoz.Mejias at cern.ch > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tbergfeld at hq.adiscon.com Tue Apr 7 15:36:04 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Tue, 7 Apr 2009 15:36:04 +0200 Subject: [rsyslog] rsyslog 4.1.6 (devel) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE4F@GRFEXC.intern.adiscon.com> Hi all, we have just released rsyslog 4.1.6, a member of the development branch. The new version offers numerous enhancements and also many bug fixes. Most importantly, RainerScript has been improved (functions are now supported), native CSV support has been added, omfile now detects errors writing to files and can retry the operation and much more. The testbench has greatly been enhanced, portability been improved and performance in some cases increased. The output plugin interface now supports an enhanced API and a number of bugs have been fixed. Change Log: http://www.rsyslog.com/Article360.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-153.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html. From rgerhards at hq.adiscon.com Wed Apr 8 14:06:31 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 8 Apr 2009 14:06:31 +0200 Subject: [rsyslog] nextmaster branch Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE60@GRFEXC.intern.adiscon.com> Hi all, and especially those that follow my development. Please read this blogpost on the "nextmaster" branch: http://blog.gerhards.net/2009/04/what-is-nextmaster-good-for.html Thanks, Rainer From rgerhards at hq.adiscon.com Thu Apr 9 10:58:25 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 9 Apr 2009 10:58:25 +0200 Subject: [rsyslog] wrong permissons on directories References: <49B0EA3C.1060104@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com><49B12FA3.2030202@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> I am back at this issue and thought about changing the default down to v2-stable. However, it "feels" bad from a security perspective. I know that the current default does not work well, but it is extremely restrictive. So if I now change it to a "useful" default, I may expose some information on old systems that is not yet exposed. One could argue this is a security hole. I am very hesitant to doing this, so I thought I ask for feedback once again. The alternative way would be that only v4 (if running in v4-mode!) will have the new (correct) default, while all others have the old, wrong and thus extremely restrictive default. Quite honestly, it "feels" like this is the right route to take, even though "the other way around" sounds more natural. Has anyone an opinion on that? And I'll probably go for the v4-only change if nobody convinces me that there is no security risk... Thanks, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Friday, March 06, 2009 4:40 PM > To: rsyslog-users > Subject: Re: [rsyslog] wrong permissons on directories > > The more I think about it, the more it smells like a real bug. Has > anyone objections changing the default? > > Rainer > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of Michael Biebl > > Sent: Friday, March 06, 2009 3:54 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] wrong permissons on directories > > > > FWIW, the Debian default rsyslog.conf ships with > > > > $DirCreateMode 0755 > > > > > > 2009/3/6 Rainer Gerhards : > > > Thomas, > > > > > > do I correctly understand that you propose the default be changed? > > > > > > If so, I am hesitant to do that - wouldn't that potentially break > > existing deployments? On the other hand... how could that work... > > Umm... > > > > > > Rainer > > > > > >> -----Original Message----- > > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > >> bounces at lists.adiscon.com] On Behalf Of Thomas Mieslinger > > >> Sent: Friday, March 06, 2009 3:14 PM > > >> To: rsyslog-users > > >> Subject: Re: [rsyslog] wrong permissons on directories > > >> > > >> Thanks for the pointer to the documentation.. it is $DirCreateMode > > what > > >> I asked for... > > >> > > >> and now I ask for a change of the default > > >> documentation says: > > >> Default: 0644 > > >> > > >> Reality demands 0755. I changed it in my configuration. I'd be > happy > > to > > >> see that changed in rsyslog. > > >> > > >> Thomas > > >> > > >> > > >> > > >> Rainer Gerhards wrote: > > >> > Hi Thomas, > > >> > > > >> > can it be that your default umask gets into your way? In any > case, > > >> you > > >> > can set the permissions explicitely with > > >> > > > >> > $FileCreateMode > > >> > $FileGroup > > >> > $FileOwner > > >> > > > >> > And set the umask with > > >> > > > >> > $umask > > >> > > > >> > (see http://www.rsyslog.com/doc-rsyslog_conf_global.html) > > >> > > > >> > Does this help? > > >> > > > >> > Rainer > > >> > > > >> >> -----Original Message----- > > >> >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > >> >> bounces at lists.adiscon.com] On Behalf Of Thomas Mieslinger > > >> >> Sent: Friday, March 06, 2009 10:18 AM > > >> >> To: rsyslog-users > > >> >> Subject: [rsyslog] wrong permissons on directories > > >> >> > > >> >> Hi *, > > >> >> > > >> >> when creating directories through dynamic templates, the > > directory > > >> >> permissons are incomplete: > > >> >> > > >> >> rsyslog.conf: > > >> >> $template > > >> >> > > >> > ZeusMwAllLogFileService,"/data/log/zeusmw/%$YEAR%-%$MONTH%/all- > > >> %$YEAR%- > > >> >> %$MONTH%-%$DAY%.log" > > >> >> > > >> >> resulting directories: > > >> >> ls -al /data/log > > >> >> drw-r--r-- 3 root root 4096 Mar ?5 15:53 zeusmw/ > > >> >> > > >> >> ls -al /data/log/zeusmw > > >> >> drw-r--r-- 2 root root 4096 Mar ?6 10:11 2009-03/ > > >> >> > > >> >> # rsyslogd -version > > >> >> rsyslogd 3.21.3, compiled with: > > >> >> ? ?FEATURE_REGEXP: ? ? ? ? ? ? ? ? ? ? ? ? Yes > > >> >> ? ?FEATURE_LARGEFILE: ? ? ? ? ? ? ? ? ? ? ?Yes > > >> >> ? ?FEATURE_NETZIP (message compression): ? Yes > > >> >> ? ?GSSAPI Kerberos 5 support: ? ? ? ? ? ? ?Yes > > >> >> ? ?FEATURE_DEBUG (debug build, slow code): No > > >> >> ? ?Runtime Instrumentation (slow code): ? ?No > > >> >> > > >> >> (its the rsyslog-3.21.3-4 fedora 10 package compiled on rhel5) > > >> >> > > >> >> I'd be happy to know if thats a bug. > > >> >> > > >> >> Thanks > > >> >> Thomas > > >> >> > > >> >> _______________________________________________ > > >> >> rsyslog mailing list > > >> >> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >> >> http://www.rsyslog.com > > >> > _______________________________________________ > > >> > rsyslog mailing list > > >> > http://lists.adiscon.net/mailman/listinfo/rsyslog > > >> > http://www.rsyslog.com > > >> > > >> -- > > >> Thomas Mieslinger > > >> IT Infrastructure Systems > > >> Telefon: +49-721-91374-4404 > > >> E-Mail: thomas.mieslinger at 1und1.de > > >> > > >> 1&1 Internet AG > > >> Brauerstra?e 48 > > >> 76135 Karlsruhe > > >> > > >> Amtsgericht Montabaur HRB 6484 > > >> Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, > Thomas > > >> Gottschlich, Robert Hoffmann, Markus Huhn, Henning Kettler, Oliver > > >> Mauss, Jan Oetjen > > >> Aufsichtsratsvorsitzender: Michael Scheeren > > >> > > >> _______________________________________________ > > >> rsyslog mailing list > > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >> http://www.rsyslog.com > > > _______________________________________________ > > > rsyslog mailing list > > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com > > > > > > > > > > > -- > > Why is it that all of the instruments seeking intelligent life in the > > universe are pointed away from Earth? > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu Apr 9 13:03:38 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 9 Apr 2009 13:03:38 +0200 Subject: [rsyslog] info request: using unicode inside rsyslog Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE83@GRFEXC.intern.adiscon.com> Hi all, once again the issue was brought up of using Unicode inside rsyslog. As a reminder, rsyslog was forked from sysklogd and has inherited its 8 bit char representation. For several reasons, I would like to move the internal representation to Unicode (so at least 16 bit chars). Does anyone have any advice (or links) on what needs to be taken care of for such an endeavor. Opinions are also greatly appreciated. The current reason for my interest is questions regarding asian character sets. But of course, we should also support Unicode to implement all fine details of RFC5424. So this is becoming a more and more pressing issue. Thanks, Rainer From aoz.syn at gmail.com Thu Apr 9 14:14:52 2009 From: aoz.syn at gmail.com (RB) Date: Thu, 9 Apr 2009 06:14:52 -0600 Subject: [rsyslog] wrong permissons on directories In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> References: <49B0EA3C.1060104@1und1.de> <9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com> <49B12FA3.2030202@1und1.de> <9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> Message-ID: <4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com> On Thu, Apr 9, 2009 at 02:58, Rainer Gerhards wrote: > the current default does not work well, but it is extremely restrictive. So It's not that it doesn't work well, it honestly doesn't work at all. A directory in UNIX without execute permissions is effectively inaccessible to any non-root user, encouraging less-knowledgeable admins to just run everything as root. > Has anyone an opinion on that? And I'll probably go for the v4-only change if > nobody convinces me that there is no security risk... The only risk is that users originally granted permission to use a directory may actually be allowed to do so. If a user's data is sufficiently sensitive that such a change would unacceptably expose it, my bet is that they have already changed the permissions to something even more restrictive. I wouldn't suggest making the change if it's the only one you need to make to v2, but if there are others pending it would be a wise addition IMHO. From rgerhards at hq.adiscon.com Thu Apr 9 14:19:56 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 9 Apr 2009 14:19:56 +0200 Subject: [rsyslog] wrong permissons on directories References: <49B0EA3C.1060104@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com><49B12FA3.2030202@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> <4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of RB > Sent: Thursday, April 09, 2009 2:15 PM > To: rsyslog-users > Subject: Re: [rsyslog] wrong permissons on directories > > On Thu, Apr 9, 2009 at 02:58, Rainer Gerhards > wrote: > > the current default does not work well, but it is extremely > restrictive. So > > It's not that it doesn't work well, it honestly doesn't work at all. Well... that's the issue that I see. It works, as rsyslog usually runs as root. Granted, nobody but root can read the directories, but this is exactly what I meant with being restrictive. If we fix this issue, we permit access to these directories and as such are more open than before. I wouldn't be arguing so hard if it were not a potential security issue... In other words: I am not yet fully convinced (even not after reading the rest of your post ;)). But I am getting closer to being convinced ;) Rainer > A directory in UNIX without execute permissions is effectively > inaccessible to any non-root user, encouraging less-knowledgeable > admins to just run everything as root. > > > Has anyone an opinion on that? And I'll probably go for the v4-only > change if > > nobody convinces me that there is no security risk... > > The only risk is that users originally granted permission to use a > directory may actually be allowed to do so. If a user's data is > sufficiently sensitive that such a change would unacceptably expose > it, my bet is that they have already changed the permissions to > something even more restrictive. I wouldn't suggest making the change > if it's the only one you need to make to v2, but if there are others > pending it would be a wise addition IMHO. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From aoz.syn at gmail.com Thu Apr 9 14:34:12 2009 From: aoz.syn at gmail.com (RB) Date: Thu, 9 Apr 2009 06:34:12 -0600 Subject: [rsyslog] wrong permissons on directories In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> References: <49B0EA3C.1060104@1und1.de> <9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com> <49B12FA3.2030202@1und1.de> <9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> <4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> Message-ID: <4255c2570904090534n68feb052j5bd2937009160208@mail.gmail.com> On Thu, Apr 9, 2009 at 06:19, Rainer Gerhards wrote: > In other words: I am not yet fully convinced (even not after reading the rest > of your post ;)). But I am getting closer to being convinced ;) :) I haven't any further arguments, so we may have to stop halfway. As a security "professional" (whatever that ends up meaning) I tend to prefer developers allow me to make that choice, but understand the balance you have to make between that and helping your users make wise (if erring on the side of cautious) decisions, particularly with "legacy" software. From rgerhards at hq.adiscon.com Thu Apr 9 18:50:50 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 9 Apr 2009 18:50:50 +0200 Subject: [rsyslog] wrong permissons on directories References: <49B0EA3C.1060104@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com><49B12FA3.2030202@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com><4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> <4255c2570904090534n68feb052j5bd2937009160208@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of RB > Sent: Thursday, April 09, 2009 2:34 PM > To: rsyslog-users > Subject: Re: [rsyslog] wrong permissons on directories > > On Thu, Apr 9, 2009 at 06:19, Rainer Gerhards > wrote: > > In other words: I am not yet fully convinced (even not after reading > the rest > > of your post ;)). But I am getting closer to being convinced ;) > > :) I haven't any further arguments, so we may have to stop halfway. Maybe some other folks cast their ballot - but it was probably not smart to send this mail directly before easter ;) > As a security "professional" (whatever that ends up meaning) I tend to > prefer developers allow me to make that choice, Actually, it is your choice. Let me explain, in case there is a misunderstanding. You have full control over the directory permissions, via the $DirCreateMode [1] directive. For example, Michael Biebl was so smart to include a "$DirCreateMode 0755" in the standard Debian configuration, so it almost is a no-issue there. What I am talking about is the default for this setting, the case when nothing was specified by the user. > but understand the > balance you have to make between that and helping your users make wise I am not talking about wise vs. unwise decisions. My concern is that in current releases, the default is off, but it also means it is somewhat strict. If I now change the default (which would be wise), it may result in relaxed access control permissions. And as this affects users who so far did not care at all about the permissions, those users may never know - that is what triggers some "bad feelings" inside me. As a side-note, I wonder if a default of 0700 might be even wiser than "755". Who doesn't like that can override it. As the default is probably "pain in the a..." for people, they would possibly begin thinking about that aspect (but on the other hand I already envison all those smart web sites that tell you just to use "$DirCreateMode 0777" to "fix the issue" - so this may even be less useful than starting with 755 in the first place. The more I think about it, this whole issue is much less about technical defaults but more about human nature ;) > (if erring on the side of cautious) decisions, particularly with > "legacy" software. I hope this clarifies, Rainer [1] http://www.rsyslog.com/doc-rsconf1_dircreatemode.html From mbiebl at gmail.com Thu Apr 9 20:28:50 2009 From: mbiebl at gmail.com (Michael Biebl) Date: Thu, 9 Apr 2009 20:28:50 +0200 Subject: [rsyslog] wrong permissons on directories In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> References: <49B0EA3C.1060104@1und1.de> <49B12FA3.2030202@1und1.de> <9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com> <4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> <4255c2570904090534n68feb052j5bd2937009160208@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> Message-ID: 2009/4/9 Rainer Gerhards : >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of RB >> Sent: Thursday, April 09, 2009 2:34 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] wrong permissons on directories >> >> On Thu, Apr 9, 2009 at 06:19, Rainer Gerhards >> wrote: >> > In other words: I am not yet fully convinced (even not after reading >> the rest >> > of your post ;)). But I am getting closer to being convinced ;) >> >> :) ?I haven't any further arguments, so we may have to stop halfway. > > Maybe some other folks cast their ballot - but it was probably not smart to > send this mail directly before easter ;) I'd vote for changing the default. The current one is simply buggy, and as such I'd treat the fix as a bug. I wouldn't wait for 4.x, but fix it in the upcoming 3.22.x series, I wouldn't change 3.20.x My 2?, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? From david at lang.hm Sun Apr 12 04:53:47 2009 From: david at lang.hm (david at lang.hm) Date: Sat, 11 Apr 2009 19:53:47 -0700 (PDT) Subject: [rsyslog] wrong permissons on directories In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> References: <49B0EA3C.1060104@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com><49B12FA3.2030202@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com><4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com> <4255c2570904090534n68feb052j5bd2937009160208@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> Message-ID: On Thu, 9 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of RB >> >> On Thu, Apr 9, 2009 at 06:19, Rainer Gerhards >> wrote: >>> In other words: I am not yet fully convinced (even not after reading >> the rest >>> of your post ;)). But I am getting closer to being convinced ;) >> >> :) I haven't any further arguments, so we may have to stop halfway. > > Maybe some other folks cast their ballot - but it was probably not smart to > send this mail directly before easter ;) > >> As a security "professional" (whatever that ends up meaning) I tend to >> prefer developers allow me to make that choice, > > Actually, it is your choice. Let me explain, in case there is a > misunderstanding. You have full control over the directory permissions, via > the $DirCreateMode [1] directive. For example, Michael Biebl was so smart to > include a "$DirCreateMode 0755" in the standard Debian configuration, so it > almost is a no-issue there. What I am talking about is the default for this > setting, the case when nothing was specified by the user. > >> but understand the >> balance you have to make between that and helping your users make wise > > I am not talking about wise vs. unwise decisions. My concern is that in > current releases, the default is off, but it also means it is somewhat > strict. If I now change the default (which would be wise), it may result in > relaxed access control permissions. And as this affects users who so far did > not care at all about the permissions, those users may never know - that is > what triggers some "bad feelings" inside me. > > As a side-note, I wonder if a default of 0700 might be even wiser than "755". > Who doesn't like that can override it. As the default is probably "pain in > the a..." for people, they would possibly begin thinking about that aspect > (but on the other hand I already envison all those smart web sites that tell > you just to use "$DirCreateMode 0777" to "fix the issue" - so this may even > be less useful than starting with 755 in the first place. the current default doesn't work at all, so it's definantly wrong. either 700 or 755 would be a better default. I can see arguments about system logs not being intended to be read by everyone, so if you want to run rsyslog as root having the default be 700 is reasonable. David Lang > The more I think about it, this whole issue is much less about technical > defaults but more about human nature ;) > >> (if erring on the side of cautious) decisions, particularly with >> "legacy" software. > > I hope this clarifies, > Rainer > [1] http://www.rsyslog.com/doc-rsconf1_dircreatemode.html > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Tue Apr 14 15:57:33 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 14 Apr 2009 15:57:33 +0200 Subject: [rsyslog] wrong permissons on directories References: <49B0EA3C.1060104@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F5F@GRFEXC.intern.adiscon.com><49B12FA3.2030202@1und1.de><9B6E2A8877C38245BFB15CC491A11DA71F63@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA71F68@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AE72@GRFEXC.intern.adiscon.com><4255c2570904090514v2592c7efo85d3667f937074eb@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AE84@GRFEXC.intern.adiscon.com><4255c2570904090534n68feb052j5bd2937009160208@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AE8A@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AE97@GRFEXC.intern.adiscon.com> Thanks to everyone who commented. I will now change the default to 700, which should not expose anything more than we already had (and also is a better default as I outlined). As we all have concluded that the previous default is buggy, I'll change it wherever the problem is, that means I start with v2-stable and will end up with a patch to all currently supported versions. You'll see announcements soon... Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Sunday, April 12, 2009 4:54 AM > To: rsyslog-users > Subject: Re: [rsyslog] wrong permissons on directories > > On Thu, 9 Apr 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of RB > >> > >> On Thu, Apr 9, 2009 at 06:19, Rainer Gerhards > > >> wrote: > >>> In other words: I am not yet fully convinced (even not after reading > >> the rest > >>> of your post ;)). But I am getting closer to being convinced ;) > >> > >> :) I haven't any further arguments, so we may have to stop halfway. > > > > Maybe some other folks cast their ballot - but it was probably not > smart to > > send this mail directly before easter ;) > > > >> As a security "professional" (whatever that ends up meaning) I tend > to > >> prefer developers allow me to make that choice, > > > > Actually, it is your choice. Let me explain, in case there is a > > misunderstanding. You have full control over the directory > permissions, via > > the $DirCreateMode [1] directive. For example, Michael Biebl was so > smart to > > include a "$DirCreateMode 0755" in the standard Debian configuration, > so it > > almost is a no-issue there. What I am talking about is the default for > this > > setting, the case when nothing was specified by the user. > > > >> but understand the > >> balance you have to make between that and helping your users make > wise > > > > I am not talking about wise vs. unwise decisions. My concern is that > in > > current releases, the default is off, but it also means it is somewhat > > strict. If I now change the default (which would be wise), it may > result in > > relaxed access control permissions. And as this affects users who so > far did > > not care at all about the permissions, those users may never know - > that is > > what triggers some "bad feelings" inside me. > > > > As a side-note, I wonder if a default of 0700 might be even wiser than > "755". > > Who doesn't like that can override it. As the default is probably > "pain in > > the a..." for people, they would possibly begin thinking about that > aspect > > (but on the other hand I already envison all those smart web sites > that tell > > you just to use "$DirCreateMode 0777" to "fix the issue" - so this may > even > > be less useful than starting with 755 in the first place. > > the current default doesn't work at all, so it's definantly wrong. > > either 700 or 755 would be a better default. I can see arguments about > system logs not being intended to be read by everyone, so if you want to > run rsyslog as root having the default be 700 is reasonable. > > David Lang > > > The more I think about it, this whole issue is much less about > technical > > defaults but more about human nature ;) > > > >> (if erring on the side of cautious) decisions, particularly with > >> "legacy" software. > > > > I hope this clarifies, > > Rainer > > [1] http://www.rsyslog.com/doc-rsconf1_dircreatemode.html > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tbergfeld at hq.adiscon.com Wed Apr 15 08:33:08 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Wed, 15 Apr 2009 08:33:08 +0200 Subject: [rsyslog] rsyslog 2.0.7 (stable) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AEA0@GRFEXC.intern.adiscon.com> Hi all, We have just released rsyslog 2.0.7, a member of the v2-stable branch. This is a bug-fixing release solving some minor bugs that were discovered during the past months. Most importantly, some issues with dynamically created files were fixed, as well as two memory leaks. While one of them was not expected to be seen in practice, there were one memory leak in the Postgres output module which could cause harm. This is a recommended update for all v2-stable users. Other than on RHEL, 2.0.7 seems to be no longer used, but we still support it. So if you experience issues with that version, please let us know. Changelog: http://www.rsyslog.com/Article362.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-154.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From DGillies at fairfaxdigital.com.au Thu Apr 16 02:15:21 2009 From: DGillies at fairfaxdigital.com.au (David Gillies) Date: Thu, 16 Apr 2009 10:15:21 +1000 Subject: [rsyslog] rsyslog 3.21.11 and gnutls Message-ID: <4310250BC419AC46BB47F728902B0DD6046664E5@EXCHDP3.ffx.jfh.com.au> Hi All, I've been rolling my own rsyslog 3.21.x rpms for centos5/rhel5 for a while now. I noticed in the latest rsyslog version 3.21.11 that running ./configure says that the build requires atleast gnutls 2.0.0. Is this a hard requirement, as rhel5/centos5 only has gnutls 1.4.1. David Gillies Systems Engineer Digital Infrastructure Services Fairfax Digital The information contained in this e-mail message and any accompanying files is or may be confidential. If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error please advise the sender immediately by return e-mail or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files. From rgerhards at hq.adiscon.com Thu Apr 16 08:05:24 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 16 Apr 2009 08:05:24 +0200 Subject: [rsyslog] rsyslog 3.21.11 and gnutls References: <4310250BC419AC46BB47F728902B0DD6046664E5@EXCHDP3.ffx.jfh.com.au> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AEAA@GRFEXC.intern.adiscon.com> This is not a hard requirement as long as the older version links. The 2.0.0 was a conservative choice we made when adding the check logic. I don't have a version with an older release to test, but I think I initially developed it with 1.something. So if you can confirm that the compile/link successes with 1.4.1, I'll happily change the check condition. As a side-note, do you have the RPMs available publically? Or would like to contribute them? A lot of folks have recently asked for RHEL RPMs... Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of David Gillies > Sent: Thursday, April 16, 2009 2:15 AM > To: rsyslog at lists.adiscon.com > Subject: [rsyslog] rsyslog 3.21.11 and gnutls > > Hi All, > > I've been rolling my own rsyslog 3.21.x rpms for centos5/rhel5 for a > while now. I noticed in the latest rsyslog version 3.21.11 that running > ./configure says that the build requires atleast gnutls 2.0.0. Is this a > hard requirement, as rhel5/centos5 only has gnutls 1.4.1. > > David Gillies > Systems Engineer > Digital Infrastructure Services > Fairfax Digital > > The information contained in this e-mail message and any accompanying > files is or may be confidential. If you are not the intended recipient, > any use, dissemination, reliance, forwarding, printing or copying of > this e-mail or any attached files is unauthorised. This e-mail is > subject to copyright. No part of it should be reproduced, adapted or > communicated without the written consent of the copyright owner. If you > have received this e-mail in error please advise the sender immediately > by return e-mail or telephone and delete all copies. Fairfax does not > guarantee the accuracy or completeness of any information contained in > this e-mail or attached files. Internet communications are not secure, > therefore Fairfax does not accept legal responsibility for the contents > of this message or attached files. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From DGillies at fairfaxdigital.com.au Thu Apr 16 08:40:54 2009 From: DGillies at fairfaxdigital.com.au (David Gillies) Date: Thu, 16 Apr 2009 16:40:54 +1000 Subject: [rsyslog] rsyslog 3.21.11 and gnutls In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AEAA@GRFEXC.intern.adiscon.com> References: <4310250BC419AC46BB47F728902B0DD6046664E5@EXCHDP3.ffx.jfh.com.au> <9B6E2A8877C38245BFB15CC491A11DA702AEAA@GRFEXC.intern.adiscon.com> Message-ID: <4310250BC419AC46BB47F728902B0DD604666B14@EXCHDP3.ffx.jfh.com.au> Thanks Rainer, I haven't had a chance to build rsyslog against RHEL's gnutls but I'll give it a try tomorrow and report back. In regards to your side note, I actually just used the rsyslog spec file from the fedora project: http://cvs.fedoraproject.org/viewvc/devel/rsyslog/ In that CVS repo there's the spec file and a couple of other pieces which makes rsyslog fit into a RHEL/Fedora/CentOS box in the Red Hat way of doing things. I believe the fedora project are using that spec file for FC10 and FC11 but I've had no issues with it on RHEL5/CentOS5. David Gillies Systems Engineer Digital Infrastructure Services Fairfax Digital -----Original Message----- From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards Sent: Thursday, 16 April 2009 4:05 PM To: rsyslog-users Subject: Re: [rsyslog] rsyslog 3.21.11 and gnutls This is not a hard requirement as long as the older version links. The 2.0.0 was a conservative choice we made when adding the check logic. I don't have a version with an older release to test, but I think I initially developed it with 1.something. So if you can confirm that the compile/link successes with 1.4.1, I'll happily change the check condition. As a side-note, do you have the RPMs available publically? Or would like to contribute them? A lot of folks have recently asked for RHEL RPMs... Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of David Gillies > Sent: Thursday, April 16, 2009 2:15 AM > To: rsyslog at lists.adiscon.com > Subject: [rsyslog] rsyslog 3.21.11 and gnutls > > Hi All, > > I've been rolling my own rsyslog 3.21.x rpms for centos5/rhel5 for a > while now. I noticed in the latest rsyslog version 3.21.11 that > running ./configure says that the build requires atleast gnutls 2.0.0. > Is this a hard requirement, as rhel5/centos5 only has gnutls 1.4.1. > > David Gillies > Systems Engineer > Digital Infrastructure Services > Fairfax Digital > > The information contained in this e-mail message and any accompanying > files is or may be confidential. If you are not the intended > recipient, any use, dissemination, reliance, forwarding, printing or > copying of this e-mail or any attached files is unauthorised. This > e-mail is subject to copyright. No part of it should be reproduced, > adapted or communicated without the written consent of the copyright > owner. If you have received this e-mail in error please advise the > sender immediately by return e-mail or telephone and delete all > copies. Fairfax does not guarantee the accuracy or completeness of any > information contained in this e-mail or attached files. Internet > communications are not secure, therefore Fairfax does not accept legal > responsibility for the contents of this message or attached files. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com The information contained in this e-mail message and any accompanying files is or may be confidential. If you are not the intended recipient, any use, dissemination, reliance, forwarding, printing or copying of this e-mail or any attached files is unauthorised. This e-mail is subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If you have received this e-mail in error please advise the sender immediately by return e-mail or telephone and delete all copies. Fairfax does not guarantee the accuracy or completeness of any information contained in this e-mail or attached files. Internet communications are not secure, therefore Fairfax does not accept legal responsibility for the contents of this message or attached files. From rgerhards at hq.adiscon.com Fri Apr 17 11:11:22 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 17 Apr 2009 11:11:22 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><4255c2570904020722x6ca6a0few1ac1abfe941ee59b@mail.gmail.com> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> Jumping into that threat again... I am thinking on how to enhance the engine so that fastest-possible database writes (actually, any output) are possible. However, I come across a couple of points. I would like to do so in the most generic way. Let me quote those message parts that I have specific questions on (out of sequence, thus I preserve the full message below - if you need more context). > I made a small Python prototype to do something similar to what you > propose, with no batches, but committing each 1000 entries. The speedup > I got by introducing batches was about a factor 50. And the statement > was already prepared. Could you check what actually brings most of the speedup - the batches or the prepared statement. I am thinking along the lines of using batches but not prepared statements, as in this sample begin insert ... insert ... insert ... insert ... end Does this offer dramatic improvement? How much more improvement does the prepared statements offer (My hope is that you can quickly modify the Python prototype to provide a rough idea). This question stems back to me wondering if it is worth to rewrite all existing DB plugins to use prepared statements. Batching, as described above, can be done with far less modification to the plugin. And second question. Let's envision that the rsyslog core could provide you with multiple data records at once. For the case given above, I could still simply pass in a single - now longer - string (that makes it that attractive for the other db plugins). However, that does not work for the omoracle interface. Let's say the new interface we created is a "vector interface" as it provide each data item as part of a one-dimensional vector (or tuple). Then, it would look most natural to me if we extend this to "matrix interface", where you receive a tuple of tuples (or a two-dimensional structure that "feels" much like a SQL result set). What that be useful for you? Or, the other way around, what would you consider an optimal interface to your plugin if the rsyslog core would provide batching support? Feedback, from everyone interested, is highly appreciated and useful. Thanks, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > Sent: Thursday, April 02, 2009 5:21 PM > To: rsyslog at lists.adiscon.com > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > > RB, > > > Oi. :) Sorry I'm late to the game. > > Your contribution is appreciated. :) > > > Forgive me - my database-performance-fu and oracle-fu are not > terribly > > strong, I may make a fool of myself here. What is the performance > > gain of making a prepared statement over just executing raw > > statements? > > The statement is parsed only once, so you save the overhead of parsing > and doing an execution plan for each execution, which will be > identical. And I expect to insert hundreds of entries per second. :) > > All you have to do is pass the arguments. > > > CREATE PROCEDURE zazz AS > > insert into foo(field1, field2, field3) values(:val1, :val2, > > :val3); SET TRANSACTION; zazz("foo", "bar", "baz"); zazz("foo1", > > "bar1", "baz1"); zazz("foo2", "bar2", "baz2"); COMMIT; > > > > -- over > > > > SET TRANSACTION; > > INSERT INTO foo(field1, field2, field3) values("foo", > > "bar", "baz"); > > INSERT INTO foo(field1, field2, field3) values("foo1", > > "bar1", "baz1"); > > INSERT INTO foo(field1, field2, field3) > > values("foo2", "bar2", "baz2"); COMMIT; > > > With this code, Oracle (any DB, actually) needs to parse each insert, > and then choose the execution plan that looks best once. > > What you get by preparing the statement and using batches is that the > client (rsyslog core) will store these triplets: > > (foo, bar, baz) (foo1, bar1, baz1) (foo2, bar2, baz2) > > and when you've hit a limit (say, you're on (foo1000, bar1000, > baz1000)) > send them all to the server at once (thus calling only once to > doAction, > calling only once to the Oracle interface), who will blindly execute > the > statement without wasting a single cycle on parsing or evaluating > execution plans: it's already done. > > > Perhaps that's not even what you're doing. > > For the moment I'm doing > > BEGIN > INSERT INTO foo(field1, field2, field3) values("foo", "bar", "baz"); > COMMIT; > BEGIN > INSERT INTO foo(field1, field2, field3) values("foo1", "bar1", "baz1"); > COMMIT; > > You can already imagine the overhead involved. Actually, all DB-based > modules on rsyslog do the same. > > > I know there are other considerations and niceties with procedures, > > It's not even a stored procedure, it's on the client doing > communicating > many times versus only one with the DB. > > > but the latter syntax would still allow for batched transactions > while > > enabling rsyslog to do the dirty work of formatting the query and not > > necessitating exposure of internal structures. > > > Indeed, I want rsyslog doing most of the work for me. But the overhead > involved in parsing and evaluating execution plans is unacceptable on > my > context. So I'm looking here for the balance between rsyslog doing work > for me and rsyslog performing as good as I need it. Perhaps exposing > the > structures is not a good idea, either. > > > IMHO, database output modules should still pretty much blindly > execute > > whatever SQL rsyslog hands them, be that wrapped in a transaction or > > not. > > > Yes and no. Yes, rsyslog should be the one who tells the statement to > be > executed. But there is no need for rsyslog to repeat that statement for > each entry (millions per day). Doing it at initialization time is > enough. > > I made a small Python prototype to do something similar to what you > propose, with no batches, but committing each 1000 entries. The speedup > I got by introducing batches was about a factor 50. And the statement > was already prepared. > > Cheers. > -- > Luis Fernando Mu?oz Mej?as > Luis.Fernando.Munoz.Mejias at cern.ch > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From tbergfeld at hq.adiscon.com Fri Apr 17 15:10:32 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Fri, 17 Apr 2009 15:10:32 +0200 Subject: [rsyslog] rsyslog 4.3.0 (devel) and rsyslog 3.20.6 (v3-stable) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF01@GRFEXC.intern.adiscon.com> Hi all, today, we have two new releases to announce. Most importantly, we have released a brand-new 4.3.0 devel version, which starts the 4.3 series. It contains enhancements to RainerScript, an ever-improving testbench, and a new output plugin omprog. The later enables rsyslog to spawn an process that will receive log messages via stdin. This module is experimental, as the original requester did no longer respond when the module was implemented -so any testing would be appreciated ;). There are also some bug fixes. Changelog: http://www.rsyslog.com/Article366.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-156.phtml We also released an update to the 3.20.6, a member of the v3-stable branch. This is the last v3-stable for the 3.20.x series. The release mainly consists of bugfixes like a fix for an bug in $InputTCPMaxSessions which was accepted, but not executed, now resulting in a fixed upper limit of 200 connections. Further the default value of $DirCreateMode has been changed. For more information review the discussion on the mailing list: http://lists.adiscon.net/pipermail/rsyslog/2009-April/001986.html Changelog: http://www.rsyslog.com/Article364.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-155.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From Luis.Fernando.Munoz.Mejias at cern.ch Fri Apr 17 17:13:13 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?utf-8?q?Mu=C3=B1oz_Mej=C3=ADas?=) Date: Fri, 17 Apr 2009 17:13:13 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> Message-ID: <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Hi, > I am thinking on how to enhance the engine so that fastest-possible > database writes (actually, any output) are possible. However, I come > across a couple of points. I would like to do so in the most generic > way. Let me quote those message parts that I have specific questions > on (out of sequence, thus I preserve the full message below - if you > need more context). > > > I made a small Python prototype to do something similar to what you > > propose, with no batches, but committing each 1000 entries. The > > speedup I got by introducing batches was about a factor 50. And the > > statement was already prepared. > > Could you check what actually brings most of the speedup - the batches > or the prepared statement. I am thinking along the lines of using > batches but not prepared statements, as in this sample > > begin insert ... insert ... insert ... insert ... end I'll do, but please note that begin execute(unprepared_insert_statement) execute(unprepared_insert_statement) execute(unprepared_insert_statement) execute(unprepared_insert_statement) commit Needs 4 message exchanges with the server. OTOH: push (@batch, $item); push (@batch, $item); push (@batch, $item); push (@batch, $item); begin execute_many (insert_statement, @batch) commit Requires only one, so the network overhead is *way* smaller. This is true not only of Oracle, but also of PostgreSQL, and I suppose MySQL provides similar API. I'll try to verify where the hottest spot is, anyways. > And second question. Let's envision that the rsyslog core could > provide you with multiple data records at once. That would be *great*. > For the case given above, I could still simply pass in a single - now > longer - string (that makes it that attractive for the other db > plugins). However, that does not work for the omoracle interface. For omoracle it's not good, indeed. Also, I don't think you want to maintain yet another way of passing messages to modules. IMHO, we have two orthogonal use cases: a) the module wants all messages one by one and is happy with it (all modules but omoracle). b) the module wants to handle the properties in big batches (omoracle). IMHO, this is flexible enough for new developers to choose between easy and fast. > Let's say the new interface we created is a "vector interface" as it > provide each data item as part of a one-dimensional vector (or > tuple). Then, it would look most natural to me if we extend this to > "matrix interface", where you receive a tuple of tuples (or a > two-dimensional structure that "feels" much like a SQL result set). Indeed, that's what I have to maintain in omoracle. If I could offload it to rsyslog's core it would be even better. > What that be useful for you? Or, the other way around, what > would you consider an optimal interface to your plugin if the rsyslog > core would provide batching support? > The matrix-like structure is the one I need, indeed. :) Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From rgerhards at hq.adiscon.com Fri Apr 17 17:36:54 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 17 Apr 2009 17:36:54 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF0C@GRFEXC.intern.adiscon.com> Just one quick response... > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > Sent: Friday, April 17, 2009 5:13 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > forbatchoperations > > Hi, > > > I am thinking on how to enhance the engine so that fastest-possible > > database writes (actually, any output) are possible. However, I come > > across a couple of points. I would like to do so in the most generic > > way. Let me quote those message parts that I have specific questions > > on (out of sequence, thus I preserve the full message below - if you > > need more context). > > > > > I made a small Python prototype to do something similar to what you > > > propose, with no batches, but committing each 1000 entries. The > > > speedup I got by introducing batches was about a factor 50. And the > > > statement was already prepared. > > > > Could you check what actually brings most of the speedup - the > batches > > or the prepared statement. I am thinking along the lines of using > > batches but not prepared statements, as in this sample > > > > begin insert ... insert ... insert ... insert ... end > > I'll do, but please note that > > begin > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > commit > > Needs 4 message exchanges with the server. Mhhh... I don't agree here. My sequence was different ;) execute("begin; insert ... ; insert ... ; insert ... ; commit;"); The key is that all statements are passed in via a single execute call. I don't know about Oracle, but this is possible with MS SQL (out of past experience) and Postgresql (tested to some extent today). Note sure about MySQL either, but I think it supports it. I also mean that I have heard (really long ago) that Oracle should support it, too - but you know better than me. To pinpoint my question: I am specifically asking about multiple statements WITHIN a single SQL statement execution call. Rainer From david at lang.hm Sat Apr 18 00:28:40 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 17 Apr 2009 15:28:40 -0700 (PDT) Subject: [rsyslog] multi-message handling and databases Message-ID: the company that I work for has decided to sponser multi-message queue output capability, they have chosen to remain anonomous (I am posting from my personal account) there are two parts to this. 1. the interaction between the output module and the queue 2. the configuration of the output module for it's interaction with the database for the first part (how the output module interacts with the queue), the criteria are that 1. it needs to be able to maintain guarenteed delivery (even in the face of crashes, assuming rsyslog is configured appropriately) 2. at low-volume times it must not wait for 'enough' messages to accumulate, messages should be processed with as little latency as possible to meet these criteria, what is being proposed is the following a configuration option to define the max number of messages to be processed at once. the output module goes through the following loop X=max_messages if (messages in queue) mark that it is going to process the next X messages grab the messages format them for output attempt to deliver the messages if (message delived sucessfully) mark messages in the queue as delivered X=max_messages (reset X in case it was reduced due to delivery errors) else (delivering this batch failed, reset and try to deliver the first half) unmark the messages that it tried to deliver (putting them back into the status where no delivery has been attempted) X=int(# messages attempted / 2) if (X=0) unable to deliver a single message, do existing message error process this approach is more complex than a simple 'wait for X messages, then insert them all', but it has some significant advantages 1. no waiting for 'enough' things to happen before something gets written 2. if you have one bad message, it will transmit all the good messages before the bad one, then error out only on the bad one before picking up with the ones after the bad one. 3. nothing is marked as delivered before delivery is confirmed. an example of how this would work max_messages=15 messages arrive 1/sec it takes 2+(# messages/2) seconds to process each message (in reality the time to insert things into a database is more like 10 + (# messages / 100) or even more drastic) with the traditional rsyslog output, this would require multiple output threads to keep up (processing a single message takes 1.5 seconds with messages arriving 1/sec) with the new approach and a cold start you would see message arrives (Q=1) at T=0 om starts processing message a T=0 (expected to take 2.5) message arrives (Q=2) at T=1 message arrives (Q=3) at T=2 om finishes processing message (Q=2) at T=2.5 om starts processing 2 messages at T=2.5 (expected to take 3) message arrives (Q=4) at T=3 message arrives (Q=5) at T=4 message arrives (Q=6) at T=5 om finishes processing 2 messages (Q=4) at T=5.5 om starts processing 4 messages at T=5.5 (expected to take 4) message arrives (Q=5) at T=6 message arrives (Q=6) at T=7 message arrives (Q=7) at T=8 message arrives (Q=8) at T=9 om finishes processing 4 messages (Q=4) at T=9.5 om starts processing 4 messages at T=9.5 (expected to take 4) the system is now in a steady state message arrives (Q=5) at T=10 message arrives (Q=6) at T=11 message arrives (Q=7) at T=12 message arrives (Q=8) at T=13 om finishes processing 4 messages (Q=4) at T=13.5 om starts processing 4 messages at T=13.5 (expected to take 4) if a burst of 10 extra messages arrived at time 13.5 this last item would become 11 messages arrive at (Q=14) at T=13.5 om starts processing 14 messages at T=13.5 (expected to take 9) message arrives (Q=15) at T=14 message arrives (Q=16) at T=15 message arrives (Q=17) at T=16 message arrives (Q=18) at T=17 message arrives (Q=19) at T=18 message arrives (Q=20) at T=19 message arrives (Q=21) at T=20 message arrives (Q=22) at T=21 message arrives (Q=23) at T=22 om finishes processing 14 messages (Q=9) at T=22.5 om starts processing 9 messages at T=22.5 (expected to take 6.5) thoughts? David Lang From david at lang.hm Sat Apr 18 00:34:01 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 17 Apr 2009 15:34:01 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: On Fri, 17 Apr 2009, Luis Fernando Mu?oz Mej?as wrote: > Hi, > >> I am thinking on how to enhance the engine so that fastest-possible >> database writes (actually, any output) are possible. However, I come >> across a couple of points. I would like to do so in the most generic >> way. Let me quote those message parts that I have specific questions >> on (out of sequence, thus I preserve the full message below - if you >> need more context). >> >> > I made a small Python prototype to do something similar to what you >> > propose, with no batches, but committing each 1000 entries. The >> > speedup I got by introducing batches was about a factor 50. And the >> > statement was already prepared. >> >> Could you check what actually brings most of the speedup - the batches >> or the prepared statement. I am thinking along the lines of using >> batches but not prepared statements, as in this sample >> >> begin insert ... insert ... insert ... insert ... end > > I'll do, but please note that > > begin > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > commit > > Needs 4 message exchanges with the server. OTOH: > > > push (@batch, $item); > push (@batch, $item); > push (@batch, $item); > push (@batch, $item); > > begin > execute_many (insert_statement, @batch) > commit > > Requires only one, so the network overhead is *way* smaller. This is > true not only of Oracle, but also of PostgreSQL, and I suppose MySQL > provides similar API. no disagreement that it's less network overhead, but in my experiance simple inserts aren't bottlenecked on the network overhead, they are bottlenecked on transaction overhead (including fsync overhead) > I'll try to verify where the hottest spot is, anyways. thanks. David Lang From david at lang.hm Sat Apr 18 01:28:40 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 17 Apr 2009 16:28:40 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: On Fri, 17 Apr 2009, Luis Fernando Mu?oz Mej?as wrote: > Hi, > >> I am thinking on how to enhance the engine so that fastest-possible >> database writes (actually, any output) are possible. However, I come >> across a couple of points. I would like to do so in the most generic >> way. Let me quote those message parts that I have specific questions >> on (out of sequence, thus I preserve the full message below - if you >> need more context). >> >> > I made a small Python prototype to do something similar to what you >> > propose, with no batches, but committing each 1000 entries. The >> > speedup I got by introducing batches was about a factor 50. And the >> > statement was already prepared. >> >> Could you check what actually brings most of the speedup - the batches >> or the prepared statement. I am thinking along the lines of using >> batches but not prepared statements, as in this sample >> >> begin insert ... insert ... insert ... insert ... end > > I'll do, but please note that > > begin > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > execute(unprepared_insert_statement) > commit > > Needs 4 message exchanges with the server. OTOH: > > > push (@batch, $item); > push (@batch, $item); > push (@batch, $item); > push (@batch, $item); > > begin > execute_many (insert_statement, @batch) > commit > > Requires only one, so the network overhead is *way* smaller. This is > true not only of Oracle, but also of PostgreSQL, and I suppose MySQL > provides similar API. as a strawman, and thinking of databases in general (not any particular database), I see the needs for the database interface as being able to be generisized down to a set of config variables something like DBtype value: one of "oracle|postgres|mysql|libdbi" purpose: determine which low-level communication library is used to talk to the DB DBinit value: string purpose: any initialization that needs to be done when first connecting to the database (sanity checks to make sure the DB has the correct schema, initializing sql functions, authentication, etc) DBstart value: string purpose: fixed text ahead of any message content DBjoin value: string purpose: fixed text used to join two messages togeather DBend value: string purpose: fixed text used to end a message to the server DBmessage value: rsyslog template purpose: format an individual message for the database examples example 1 existing single-message handling DBinit="" DBstart="" DBjoin="" DBend="" DBmessage="insert into table logs values ('$server','$timestamp','$msg');" resulting statement insert into table logs values ('server1','$timestamp',$'msg'); example 2 prepared statement DBinit="" DBstart="" DBjoin="\n" DBend="begin; execute_many (insert_statement, @batch); commit" DBmessage="push (@batch, '$item');" resulting statement push (@batch, 'item1'); push (@batch, 'item2'); push (@batch, 'item3'); push (@batch, 'item4'); begin; execute_many (insert_statement, @batch); commit example 3 multiple inserts in one statment DBinit="" DBstart="insert into table logs values " DBjoin=", " DBend=";" DBmessage="('$server','$timestamp','$msg')" resulting statement insert into table logs values ('server1','time1','message1'), ('server2','time2','message2'), ('server3','time3','message3'), ('server4','time4','message4'); example 4 multiple inserts in one transaction DBinit="" DBstart="begin;\n" DBjoin="\n" DBend="\ncommit;" DBmessage="insert into table logs values ('$server','$time','$message'); " resulting statement begin; insert into table logs values ('server1','time1','message1'); insert into table logs values ('server2','time2','message2'); insert into table logs values ('server3','time3','message3'); insert into table logs values ('server4','time4','message4'); commit; I don't happen to know the syntax to define a stored procedure off the top of my head or I would give you an example of that (which would use the DBinit to define the stored procedure) postgres has a 'copy' command, where you tell it that you are going to follow with many lines of content to insert (which is significantly faster than insert statements, even batched up) I believe that this 5-variable set can handle just about every variation in putting things in the database, and as such would allow the database drivers themselves to be greatly simplified. thoughts? David Lang From aoz.syn at gmail.com Sat Apr 18 03:02:12 2009 From: aoz.syn at gmail.com (RB) Date: Fri, 17 Apr 2009 19:02:12 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <4255c2570904171802y4d9126dex22e456006801fb25@mail.gmail.com> On Fri, Apr 17, 2009 at 17:28, wrote: > I believe that this 5-variable set can handle just about every variation > in putting things in the database, and as such would allow the database > drivers themselves to be greatly simplified. > > > thoughts? One, and probably none too bright: the less engine-specific, the better. I'll be the last to claim programming strictly against a designed-but-not-implemented API is wise, but some basic ground rules would be good. From david at lang.hm Sat Apr 18 05:06:59 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 17 Apr 2009 20:06:59 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <4255c2570904171802y4d9126dex22e456006801fb25@mail.gmail.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <4255c2570904171802y4d9126dex22e456006801fb25@mail.gmail.com> Message-ID: On Fri, 17 Apr 2009, RB wrote: > On Fri, Apr 17, 2009 at 17:28, wrote: >> I believe that this 5-variable set can handle just about every variation >> in putting things in the database, and as such would allow the database >> drivers themselves to be greatly simplified. >> >> >> thoughts? > > One, and probably none too bright: the less engine-specific, the > better. I'll be the last to claim programming strictly against a > designed-but-not-implemented API is wise, but some basic ground rules > would be good. one thing I really dislike about many products that use databases (including LDAP) is that they assume that they are the only thing in existance that needs to use the data, so they can insist on you doing it their way. One thing I really like about how rsyslog does the database access is that it makes it really easy to put the data in whatever schema is easiest for your _other_ tools to access. David Lang From feikong0119 at 163.com Mon Apr 20 10:05:53 2009 From: feikong0119 at 163.com (feikong0119) Date: Mon, 20 Apr 2009 16:05:53 +0800 (CST) Subject: [rsyslog] problem-help Message-ID: <11538818.534261240214753720.JavaMail.coremail@bj163app29.163.com> Hello ? I am using rsyslog 2.0.6 which you Developped, there is a problem, I need you to help me!!. I want to save free format log, for example, : [event time] [event id] [event type] [event source] <30> Feb 12 17:05:15 dhclient: Feb 12 17:05:15.xxx(ms) 1152 0 radar this data is needed to insert to db. about free form At [rsyslog-2.0.6/doc/property_replacer.html], there is a string ?STRUCTURED-DATA?, I don?t know how to use it in rsyslog.conf? May you give me an example? Thanks? Kyou-siryu. From lists at luigirosa.com Mon Apr 20 12:13:54 2009 From: lists at luigirosa.com (Luigi Rosa) Date: Mon, 20 Apr 2009 12:13:54 +0200 Subject: [rsyslog] problem-help In-Reply-To: <11538818.534261240214753720.JavaMail.coremail@bj163app29.163.com> References: <11538818.534261240214753720.JavaMail.coremail@bj163app29.163.com> Message-ID: <49EC4AE2.2010507@luigirosa.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 feikong0119 said the following on 20/04/09 10:05: > May you give me an example?? Don't know if it is what you are asking, but this is an example from a working server: $ModLoad ommysql # load the output driver (use ompgsql for PostgreSQL) $template dbFormat,"insert into SystemEvents (Message, Facility,FromHost, Priority, DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag, EventSource, Fac ilityText, SeverityText, PriorityText) values ('%msg%', %syslogfacility%, '%source%',%syslogpriority%, '%timereported:::date-mysql%', '%timegenerated:::dat e-mysql%', %iut%, '%syslogtag%', '%programname%', '%syslogfacility-text%', '%syslogseverity-text%', '%syslogpriority-text%')",sql *.* :ommysql:localhost,xxxxxxxxxx,yyyyyyyyyyy,pppppppppp;dbFormat Put db, username and password instead of xxxxx yyyyyy and ppppp Ciao, luigi - -- / +--[Luigi Rosa]-- \ Only one human captain has ever survived battle with a Minbari fleet. He is behind me. You are in front of me. If you value your lives, be somewhere else. --Delenn "Severed Dreams" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknsSt0ACgkQ3kWu7Tfl6ZQ+vwCgxQZ2v0bE6MGECdCxnYWpl/43 v1UAoJ0j5IA7f/VoPgXWI7GlVEk+rsk+ =bqxB -----END PGP SIGNATURE----- From rgerhards at hq.adiscon.com Mon Apr 20 17:44:43 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 17:44:43 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> Sorry for the silence, been thinking quite a bit ;) > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Saturday, April 18, 2009 1:29 AM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > ... > as a strawman, and thinking of databases in general (not any particular > database), I see the needs for the database interface as being able to > be > generisized down to a set of config variables something like > > DBtype > value: one of "oracle|postgres|mysql|libdbi" > purpose: determine which low-level communication library is used to > talk > to the DB > > > DBinit > value: string > purpose: any initialization that needs to be done when first > connecting > to the database (sanity checks to make sure the DB has the correct > schema, > initializing sql functions, authentication, etc) > > DBstart > value: string > purpose: fixed text ahead of any message content > > DBjoin > value: string > purpose: fixed text used to join two messages togeather > > DBend > value: string > purpose: fixed text used to end a message to the server > > DBmessage > value: rsyslog template > purpose: format an individual message for the database > > > examples > > example 1 existing single-message handling > > DBinit="" > DBstart="" > DBjoin="" > DBend="" > DBmessage="insert into table logs values > ('$server','$timestamp','$msg');" > > resulting statement > > insert into table logs values ('server1','$timestamp',$'msg'); > > example 2 prepared statement > > DBinit="" > DBstart="" > DBjoin="\n" > DBend="begin; execute_many (insert_statement, @batch); commit" > DBmessage="push (@batch, '$item');" > > resulting statement > > push (@batch, 'item1'); > push (@batch, 'item2'); > push (@batch, 'item3'); > push (@batch, 'item4'); > begin; execute_many (insert_statement, @batch); commit There is a problem with this example - and that is that each database provides its own API for prepared statements. Brief look tells the are quite similar (good!) but it also tells that you can not work with "just strings" (bad!). So the approach involves more than just crafting the right strings. Prepared statements, however, are quite useful (not only from a performance perspective), so it would definitely be a plus to have them. > > > example 3 multiple inserts in one statment > > DBinit="" > DBstart="insert into table logs values " > DBjoin=", " > DBend=";" > DBmessage="('$server','$timestamp','$msg')" > > resulting statement > > insert into table logs values ('server1','time1','message1'), > ('server2','time2','message2'), ('server3','time3','message3'), > ('server4','time4','message4'); > > > example 4 multiple inserts in one transaction > > DBinit="" > DBstart="begin;\n" > DBjoin="\n" > DBend="\ncommit;" > DBmessage="insert into table logs values > ('$server','$time','$message'); " > > resulting statement > > begin; > insert into table logs values ('server1','time1','message1'); > insert into table logs values ('server2','time2','message2'); > insert into table logs values ('server3','time3','message3'); > insert into table logs values ('server4','time4','message4'); > commit; > > > I don't happen to know the syntax to define a stored procedure off the > top > of my head or I would give you an example of that (which would use the > DBinit to define the stored procedure) I think this is quite similar to ordinary SQL, at least in those engines that I used in the past (not sure about MySQL, which IMHO is not really a full-blown SQL engine). > > postgres has a 'copy' command, where you tell it that you are going to > follow with many lines of content to insert (which is significantly > faster > than insert statements, even batched up) There are other optimizations possible. MySQL, for example, has a mode where you permit the database to do inserts via a lazy writer, obviously at the risk of either consistency or even reliability (don't get the details together yet). Other engines, I guess, have other/additional optimizations. > I believe that this 5-variable set can handle just about every > variation > in putting things in the database, and as such would allow the database > drivers themselves to be greatly simplified. I think what this really boils down is the design of a DB "superdriver" which provides core functionality we expect in many/most/at least the most relevant{Oracle, Postgres, MSSQL?, MySql) engines and that provides a minidriver layer, where the engine-specific functionality is actually linked in. This can greatly reduce the effort required to write a DB driver and it can also reduce the effort required to maintain currently existing drivers. But, still, there is some effort to do. Maybe we could achieve the same goal with a set of macros, that would boild down to an elegance vs. effort decision. Besides that, it is interesting to note that we can solve almost everything EXCEPT the prepared statements with the string-only method, and this is why I am so interested in the actual gain by prepared statements. I agree there is gain, I just wonder if it is sufficiently large to add the extra complexity (I'd expect that the far majority can be achieved by batching inside a single transaction, but even then, with 1000 insert statements you need to re-create the execution plan 1000 times...). It also boils down to what is wiser: start with a new DB abstraction or start with trying to make the queue support batches. Rainer > thoughts? > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 20 17:54:47 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 08:54:47 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: > Sorry for the silence, been thinking quite a bit ;) no problem >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Saturday, April 18, 2009 1:29 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for >> batchoperations >> > > > ... > >> as a strawman, and thinking of databases in general (not any particular >> database), I see the needs for the database interface as being able to >> be >> generisized down to a set of config variables something like >> >> DBtype >> value: one of "oracle|postgres|mysql|libdbi" >> purpose: determine which low-level communication library is used to >> talk >> to the DB >> >> >> DBinit >> value: string >> purpose: any initialization that needs to be done when first >> connecting >> to the database (sanity checks to make sure the DB has the correct >> schema, >> initializing sql functions, authentication, etc) >> >> DBstart >> value: string >> purpose: fixed text ahead of any message content >> >> DBjoin >> value: string >> purpose: fixed text used to join two messages togeather >> >> DBend >> value: string >> purpose: fixed text used to end a message to the server >> >> DBmessage >> value: rsyslog template >> purpose: format an individual message for the database >> >> >> examples >> >> example 1 existing single-message handling >> >> DBinit="" >> DBstart="" >> DBjoin="" >> DBend="" >> DBmessage="insert into table logs values >> ('$server','$timestamp','$msg');" >> >> resulting statement >> >> insert into table logs values ('server1','$timestamp',$'msg'); >> >> example 2 prepared statement >> >> DBinit="" >> DBstart="" >> DBjoin="\n" >> DBend="begin; execute_many (insert_statement, @batch); commit" >> DBmessage="push (@batch, '$item');" >> >> resulting statement >> >> push (@batch, 'item1'); >> push (@batch, 'item2'); >> push (@batch, 'item3'); >> push (@batch, 'item4'); >> begin; execute_many (insert_statement, @batch); commit > > > There is a problem with this example - and that is that each database > provides its own API for prepared statements. Brief look tells the are quite > similar (good!) but it also tells that you can not work with "just strings" > (bad!). So the approach involves more than just crafting the right strings. > > Prepared statements, however, are quite useful (not only from a performance > perspective), so it would definitely be a plus to have them. every database that I have seen (including Oracle) has had the ability to create prepared statements and stored procedures from the text-based database tool, so I'm not understanding why working with 'just strings' isn't enough. could you explain more? also, where prepared statements are good, stored procedures are probably better. >> >> >> example 3 multiple inserts in one statment >> >> DBinit="" >> DBstart="insert into table logs values " >> DBjoin=", " >> DBend=";" >> DBmessage="('$server','$timestamp','$msg')" >> >> resulting statement >> >> insert into table logs values ('server1','time1','message1'), >> ('server2','time2','message2'), ('server3','time3','message3'), >> ('server4','time4','message4'); >> >> >> example 4 multiple inserts in one transaction >> >> DBinit="" >> DBstart="begin;\n" >> DBjoin="\n" >> DBend="\ncommit;" >> DBmessage="insert into table logs values >> ('$server','$time','$message'); " >> >> resulting statement >> >> begin; >> insert into table logs values ('server1','time1','message1'); >> insert into table logs values ('server2','time2','message2'); >> insert into table logs values ('server3','time3','message3'); >> insert into table logs values ('server4','time4','message4'); >> commit; >> >> >> I don't happen to know the syntax to define a stored procedure off the >> top >> of my head or I would give you an example of that (which would use the >> DBinit to define the stored procedure) > > I think this is quite similar to ordinary SQL, at least in those engines that > I used in the past (not sure about MySQL, which IMHO is not really a > full-blown SQL engine). agreed. >> >> postgres has a 'copy' command, where you tell it that you are going to >> follow with many lines of content to insert (which is significantly >> faster >> than insert statements, even batched up) > > There are other optimizations possible. MySQL, for example, has a mode where > you permit the database to do inserts via a lazy writer, obviously at the > risk of either consistency or even reliability (don't get the details > together yet). Other engines, I guess, have other/additional optimizations. correct, but you could set options like that in the DBinit string. >> I believe that this 5-variable set can handle just about every >> variation >> in putting things in the database, and as such would allow the database >> drivers themselves to be greatly simplified. > > I think what this really boils down is the design of a DB "superdriver" which > provides core functionality we expect in many/most/at least the most > relevant{Oracle, Postgres, MSSQL?, MySql) engines and that provides a > minidriver layer, where the engine-specific functionality is actually linked > in. This can greatly reduce the effort required to write a DB driver and it > can also reduce the effort required to maintain currently existing drivers. > > But, still, there is some effort to do. Maybe we could achieve the same goal > with a set of macros, that would boild down to an elegance vs. effort > decision. > > Besides that, it is interesting to note that we can solve almost everything > EXCEPT the prepared statements with the string-only method, and this is why I > am so interested in the actual gain by prepared statements. I agree there is > gain, I just wonder if it is sufficiently large to add the extra complexity > (I'd expect that the far majority can be achieved by batching inside a single > transaction, but even then, with 1000 insert statements you need to re-create > the execution plan 1000 times...). > > It also boils down to what is wiser: start with a new DB abstraction or start > with trying to make the queue support batches. definantly making the queue support batches ;-) as I see it, that will benifit all output modules, not just the DB ones. and you are the only person who can do the queue support while there are others who can (and do) work on the DB modules themselves. I would expect it to take a bit of 'discussion' between the different DB folks for them to all agree on any new abstraction, no it's not something that can be started immediatly in any case. David Lang From rgerhards at hq.adiscon.com Mon Apr 20 18:10:52 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 18:10:52 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> David, > every database that I have seen (including Oracle) has had the ability > to > create prepared statements and stored procedures from the text-based > database tool, so I'm not understanding why working with 'just strings' > isn't enough. could you explain more? > While this is not really nice, let me ask a counter-question: how is this done for example in Oracle? All I have seen while reviewing manuals was that you need to call a series of APIs. Most importantly usually one where you specify buffer sizes - what is real pain, given the fact that we do not really want to be able to use this jumbo buffers just because there is an ultra-slim chance we may have one message per year that is that large (but that's another issue, let's not get to distracted at this point). [snip] > definantly making the queue support batches ;-) > > as I see it, that will benifit all output modules, not just the DB > ones. > and you are the only person who can do the queue support while there > are > others who can (and do) work on the DB modules themselves. > > I would expect it to take a bit of 'discussion' between the different > DB > folks for them to all agree on any new abstraction, no it's not > something > that can be started immediatly in any case. Unfortunately there are not that many *active* db folks. I guess n=2, me included ;) Anyhow, that doesn't mean it has priority. But the two issues, as I now see it, are entangled. For the queue optimization, I need a test environment and it better be a good one. File output is too fast to be a good one. Database output is perfect. So I would actually need to modify at least one db output to support the queue enhancements. Plus, that will actually tell me the fine print of enhancing the queue in the best possible way. I've started to look at the postgres module for that reason. Thinking over the situation, I then found out that what I am doing now is exactly the same thing, with exactly the same issues, that Luis Fernando does with Oracle. This cries for a generic approach, especially if it is not too much effort to generalize. The macro-approach goes into that direction: keep it simple, but don't go the full length of a minidriver model. Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 20 18:27:34 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 09:27:34 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: > David, >> every database that I have seen (including Oracle) has had the ability >> to >> create prepared statements and stored procedures from the text-based >> database tool, so I'm not understanding why working with 'just strings' >> isn't enough. could you explain more? >> > > While this is not really nice, let me ask a counter-question: how is this > done for example in Oracle? All I have seen while reviewing manuals was that > you need to call a series of APIs. Most importantly usually one where you > specify buffer sizes - what is real pain, given the fact that we do not > really want to be able to use this jumbo buffers just because there is an > ultra-slim chance we may have one message per year that is that large (but > that's another issue, let's not get to distracted at this point). I'll ask the Oracle experts here at work. David Lang > [snip] > >> definantly making the queue support batches ;-) >> >> as I see it, that will benifit all output modules, not just the DB >> ones. >> and you are the only person who can do the queue support while there >> are >> others who can (and do) work on the DB modules themselves. >> >> I would expect it to take a bit of 'discussion' between the different >> DB >> folks for them to all agree on any new abstraction, no it's not >> something >> that can be started immediatly in any case. > > Unfortunately there are not that many *active* db folks. I guess n=2, me > included ;) > > Anyhow, that doesn't mean it has priority. But the two issues, as I now see > it, are entangled. For the queue optimization, I need a test environment and > it better be a good one. File output is too fast to be a good one. Database > output is perfect. So I would actually need to modify at least one db output > to support the queue enhancements. Plus, that will actually tell me the fine > print of enhancing the queue in the best possible way. I've started to look > at the postgres module for that reason. Thinking over the situation, I then > found out that what I am doing now is exactly the same thing, with exactly > the same issues, that Luis Fernando does with Oracle. This cries for a > generic approach, especially if it is not too much effort to generalize. > > The macro-approach goes into that direction: keep it simple, but don't go the > full length of a minidriver model. > > Rainer >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Mon Apr 20 18:31:56 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 18:31:56 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com> I just realize I never sent this thought... > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > Sent: Friday, April 17, 2009 5:13 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > forbatchoperations > > For the case given above, I could still simply pass in a single - now > > longer - string (that makes it that attractive for the other db > > plugins). However, that does not work for the omoracle interface. > > For omoracle it's not good, indeed. Also, I don't think you want to > maintain yet another way of passing messages to modules. IMHO, we have > two orthogonal use cases: > > a) the module wants all messages one by one and is happy with it (all > modules but omoracle). > > b) the module wants to handle the properties in big batches (omoracle). > > IMHO, this is flexible enough for new developers to choose between easy > and fast. Plus there is the question of compatibility. I don't like to change an interface once it is introduced. Granted, we have a small time frame now where we can model the new "vector interface" - because so far it is in devel only (and thus should not be considered immutable) and you are probably the only user. But on the other hand, having two different modes may also make sense: a) string IF, single entry b) string IF, multiple entry c) vector interface, single vector d) vector interface, multiple vectors If I'd start from scratch, a+c would obviously not be needed, as multiple includes n=1 (if well-crafte). But case a) is already in wide-spread use, no chance to undo that. b) would definitely be useful (just think about the file writer or TCP forwarding). So it probably is nice to have two options, well and consistent defined, rather than a set of three values that map {a,b,d}. At least this is my current school of thought... Rainer From rgerhards at hq.adiscon.com Mon Apr 20 18:33:15 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 18:33:15 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Monday, April 20, 2009 6:28 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > > David, > >> every database that I have seen (including Oracle) has had the > ability > >> to > >> create prepared statements and stored procedures from the text-based > >> database tool, so I'm not understanding why working with 'just > strings' > >> isn't enough. could you explain more? > >> > > > > While this is not really nice, let me ask a counter-question: how is > this > > done for example in Oracle? All I have seen while reviewing manuals > was that > > you need to call a series of APIs. Most importantly usually one where > you > > specify buffer sizes - what is real pain, given the fact that we do > not > > really want to be able to use this jumbo buffers just because there > is an > > ultra-slim chance we may have one message per year that is that large > (but > > that's another issue, let's not get to distracted at this point). > > I'll ask the Oracle experts here at work. Excellent, but let me re-phrase: if you have a PostgreSQL expert at hand, that would be even more useful (I can do testing with PostgreSQL myself, but do not have access to Oracle - I overlooked that tiny little restriction when posting ;)). Rainer > > David Lang > > > [snip] > > > >> definantly making the queue support batches ;-) > >> > >> as I see it, that will benifit all output modules, not just the DB > >> ones. > >> and you are the only person who can do the queue support while there > >> are > >> others who can (and do) work on the DB modules themselves. > >> > >> I would expect it to take a bit of 'discussion' between the > different > >> DB > >> folks for them to all agree on any new abstraction, no it's not > >> something > >> that can be started immediatly in any case. > > > > Unfortunately there are not that many *active* db folks. I guess n=2, > me > > included ;) > > > > Anyhow, that doesn't mean it has priority. But the two issues, as I > now see > > it, are entangled. For the queue optimization, I need a test > environment and > > it better be a good one. File output is too fast to be a good one. > Database > > output is perfect. So I would actually need to modify at least one db > output > > to support the queue enhancements. Plus, that will actually tell me > the fine > > print of enhancing the queue in the best possible way. I've started > to look > > at the postgres module for that reason. Thinking over the situation, > I then > > found out that what I am doing now is exactly the same thing, with > exactly > > the same issues, that Luis Fernando does with Oracle. This cries for > a > > generic approach, especially if it is not too much effort to > generalize. > > > > The macro-approach goes into that direction: keep it simple, but > don't go the > > full length of a minidriver model. > > > > Rainer > >> > >> David Lang > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Mon Apr 20 18:57:02 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 18:57:02 +0200 Subject: [rsyslog] multi-message handling and databases References: Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> David, I start with some quick pointers. I think it makes sense to move the results of this discussion into a document - or alternatively move it to the wiki, if you (or others) find this useful. I have to admit that I am a bit skeptic about the wiki, I guess mail is better for discussion here. But I wanted to mention this option. Now on to the meat: > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Saturday, April 18, 2009 12:29 AM > To: rsyslog-users > Subject: [rsyslog] multi-message handling and databases > > the company that I work for has decided to sponser multi-message queue > output capability, they have chosen to remain anonomous (I am posting > from > my personal account) > > there are two parts to this. > > 1. the interaction between the output module and the queue > > 2. the configuration of the output module for it's interaction with the > database > > for the first part (how the output module interacts with the queue), > the > criteria are that > > 1. it needs to be able to maintain guarenteed delivery (even in the > face > of crashes, assuming rsyslog is configured appropriately) > > 2. at low-volume times it must not wait for 'enough' messages to > accumulate, messages should be processed with as little latency as > possible > > > > to meet these criteria, what is being proposed is the following > > a configuration option to define the max number of messages to be > processed at once. > > the output module goes through the following loop This sentence covers much of the complexity of this change ;) The "problem" is that is it the other way around. It is not the output module that asks the queue engine for data, it is the queue engine that pushes data to the output module. While this sounds like a simple change of positions, it has greater implications. ... especially if you think about the data flow. At this point, it may make sense to review the data flow. I have described it here: http://www.rsyslog.com/Article350.phtml Even if you don't listen to the presentation, the diagram is useful. In it, you see there are n queues, with n being 1 + number of actions. The "1"-queue is the main message queue. So each message moves first into the main queue, is dequeued there (in the push-way described above), run through the filter engine and then placed into the relevant action queues. So the new interface does not necessarily need to modify the main queue (but there is much benefit in doing so). But it must change the way action queues deliver messages. That, in turn, means that the new batch mode can only work if the action is configured to use any actual queueing mode (not the default "DIRECT" mode, where incoming messages are directly handed over to the action processing without any actual in-memory buffering). So the approach is probably to enhance the queue object (which drives both the main and action queues) to support dequeueing of multiple messages at once (what, as a side-effect, will also greatly reduce looking conflicts). Under normal operations, this is relatively straightforward. It gets messy when there is failure in the actions and it gets very complex if we think about the various shutdown scenarios (not to mention disk assisted queues actually running in DA mode). I have begin to look at these issues (part of today's and over-the-weekend thinking ;)), but this will probably need some more time to finally solve - plus some discussion, I guess... > > X=max_messages > > if (messages in queue) > mark that it is going to process the next X messages > grab the messages > format them for output > attempt to deliver the messages > if (message delived sucessfully) > mark messages in the queue as delivered > X=max_messages (reset X in case it was reduced due to delivery > errors) > else (delivering this batch failed, reset and try to deliver the > first half) I think, in our previous discussion (mailing list archive), we concluded that there is no value in re-trying with half of the batch. > unmark the messages that it tried to deliver (putting them back > into the status where no delivery has been attempted) > X=int(# messages attempted / 2) > if (X=0) > unable to deliver a single message, do existing message error > process > > > > this approach is more complex than a simple 'wait for X messages, then > insert them all', but it has some significant advantages > > 1. no waiting for 'enough' things to happen before something gets > written > > 2. if you have one bad message, it will transmit all the good messages > before the bad one, then error out only on the bad one before picking > up > with the ones after the bad one. This needs to be specified. Again, I think our prior conclusion was that this would not make much sense. After all, if e.g. a SQL statement is invalid in the template, how should it recover? If the sql statement is correct, why should it eternally fail? Or should we drop a message if it fails after n attempts (OK, we can do that already ;)). Hard to do for non-transactional outputs. > > 3. nothing is marked as delivered before delivery is confirmed. > > > > an example of how this would work > > max_messages=15 > > messages arrive 1/sec > > it takes 2+(# messages/2) seconds to process each message (in reality > the > time to insert things into a database is more like 10 + (# messages / > 100) > or even more drastic) > > with the traditional rsyslog output, this would require multiple output > threads to keep up (processing a single message takes 1.5 seconds with > messages arriving 1/sec) > > with the new approach and a cold start you would see > > message arrives (Q=1) at T=0 > om starts processing message a T=0 (expected to take 2.5) > message arrives (Q=2) at T=1 > message arrives (Q=3) at T=2 > om finishes processing message (Q=2) at T=2.5 > om starts processing 2 messages at T=2.5 (expected to take 3) > message arrives (Q=4) at T=3 > message arrives (Q=5) at T=4 > message arrives (Q=6) at T=5 > om finishes processing 2 messages (Q=4) at T=5.5 > om starts processing 4 messages at T=5.5 (expected to take 4) > message arrives (Q=5) at T=6 > message arrives (Q=6) at T=7 > message arrives (Q=7) at T=8 > message arrives (Q=8) at T=9 > om finishes processing 4 messages (Q=4) at T=9.5 > om starts processing 4 messages at T=9.5 (expected to take 4) > > the system is now in a steady state > > message arrives (Q=5) at T=10 > message arrives (Q=6) at T=11 > message arrives (Q=7) at T=12 > message arrives (Q=8) at T=13 > om finishes processing 4 messages (Q=4) at T=13.5 > om starts processing 4 messages at T=13.5 (expected to take 4) > > if a burst of 10 extra messages arrived at time 13.5 this last item > would > become > > 11 messages arrive at (Q=14) at T=13.5 > om starts processing 14 messages at T=13.5 (expected to take 9) > message arrives (Q=15) at T=14 > message arrives (Q=16) at T=15 > message arrives (Q=17) at T=16 > message arrives (Q=18) at T=17 > message arrives (Q=19) at T=18 > message arrives (Q=20) at T=19 > message arrives (Q=21) at T=20 > message arrives (Q=22) at T=21 > message arrives (Q=23) at T=22 > om finishes processing 14 messages (Q=9) at T=22.5 > om starts processing 9 messages at T=22.5 (expected to take 6.5) > > > > thoughts? > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 20 18:59:30 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 09:59:30 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: > I just realize I never sent this thought... > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as >> Sent: Friday, April 17, 2009 5:13 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support >> forbatchoperations > > >>> For the case given above, I could still simply pass in a single - now >>> longer - string (that makes it that attractive for the other db >>> plugins). However, that does not work for the omoracle interface. >> >> For omoracle it's not good, indeed. Also, I don't think you want to >> maintain yet another way of passing messages to modules. IMHO, we have >> two orthogonal use cases: >> >> a) the module wants all messages one by one and is happy with it (all >> modules but omoracle). >> >> b) the module wants to handle the properties in big batches (omoracle). >> >> IMHO, this is flexible enough for new developers to choose between easy >> and fast. > > Plus there is the question of compatibility. I don't like to change an > interface once it is introduced. Granted, we have a small time frame now > where we can model the new "vector interface" - because so far it is in devel > only (and thus should not be considered immutable) and you are probably the > only user. But on the other hand, having two different modes may also make > sense: > > a) string IF, single entry > b) string IF, multiple entry > c) vector interface, single vector > d) vector interface, multiple vectors > > If I'd start from scratch, a+c would obviously not be needed, as multiple > includes n=1 (if well-crafte). But case a) is already in wide-spread use, no > chance to undo that. b) would definitely be useful (just think about the file > writer or TCP forwarding). So it probably is nice to have two options, well > and consistent defined, rather than a set of three values that map {a,b,d}. > At least this is my current school of thought... are there any known external output modules for rsyslog? if not it may make sense to do a+c (or a+d) with a being depriciated (not used for anything new, replaced where in use as time allows) while b could be useful, I think it's probably simpler to define either c or d and have everything use that than to define an additional interface. David Lang From rgerhards at hq.adiscon.com Mon Apr 20 19:02:37 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 19:02:37 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF29@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Monday, April 20, 2009 7:00 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > forbatchoperations > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > > I just realize I never sent this thought... > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > >> Sent: Friday, April 17, 2009 5:13 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > >> forbatchoperations > > > > > >>> For the case given above, I could still simply pass in a single - > now > >>> longer - string (that makes it that attractive for the other db > >>> plugins). However, that does not work for the omoracle interface. > >> > >> For omoracle it's not good, indeed. Also, I don't think you want to > >> maintain yet another way of passing messages to modules. IMHO, we > have > >> two orthogonal use cases: > >> > >> a) the module wants all messages one by one and is happy with it > (all > >> modules but omoracle). > >> > >> b) the module wants to handle the properties in big batches > (omoracle). > >> > >> IMHO, this is flexible enough for new developers to choose between > easy > >> and fast. > > > > Plus there is the question of compatibility. I don't like to change > an > > interface once it is introduced. Granted, we have a small time frame > now > > where we can model the new "vector interface" - because so far it is > in devel > > only (and thus should not be considered immutable) and you are > probably the > > only user. But on the other hand, having two different modes may also > make > > sense: > > > > a) string IF, single entry > > b) string IF, multiple entry > > c) vector interface, single vector > > d) vector interface, multiple vectors > > > > If I'd start from scratch, a+c would obviously not be needed, as > multiple > > includes n=1 (if well-crafte). But case a) is already in wide-spread > use, no > > chance to undo that. b) would definitely be useful (just think about > the file > > writer or TCP forwarding). So it probably is nice to have two > options, well > > and consistent defined, rather than a set of three values that map > {a,b,d}. > > At least this is my current school of thought... > > are there any known external output modules for rsyslog? At least one, at Frankfurt stock exchange (as of forum posts), probably at least another one. I'd say there is sufficient probability we need to think abot compatibility. > > if not it may make sense to do a+c (or a+d) with a being depriciated > (not > used for anything new, replaced where in use as time allows) We could do that, but it doesn?t change that much in the next 10 years... > > while b could be useful, I think it's probably simpler to define either > c > or d and have everything use that than to define an additional > interface. But that complicates outputs, the file writer will definitely want b - why force it to work on arrays and combine all the string itself? > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 20 19:21:04 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 10:21:04 -0700 (PDT) Subject: [rsyslog] multi-message handling and databases In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: > David, > > I start with some quick pointers. I think it makes sense to move the results > of this discussion into a document - or alternatively move it to the wiki, if > you (or others) find this useful. I have to admit that I am a bit skeptic > about the wiki, I guess mail is better for discussion here. But I wanted to > mention this option. > > Now on to the meat: > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Saturday, April 18, 2009 12:29 AM >> To: rsyslog-users >> Subject: [rsyslog] multi-message handling and databases >> >> the company that I work for has decided to sponser multi-message queue >> output capability, they have chosen to remain anonomous (I am posting >> from >> my personal account) >> >> there are two parts to this. >> >> 1. the interaction between the output module and the queue >> >> 2. the configuration of the output module for it's interaction with the >> database >> >> for the first part (how the output module interacts with the queue), >> the >> criteria are that >> >> 1. it needs to be able to maintain guarenteed delivery (even in the >> face >> of crashes, assuming rsyslog is configured appropriately) >> >> 2. at low-volume times it must not wait for 'enough' messages to >> accumulate, messages should be processed with as little latency as >> possible >> >> >> >> to meet these criteria, what is being proposed is the following >> >> a configuration option to define the max number of messages to be >> processed at once. >> >> the output module goes through the following loop > > This sentence covers much of the complexity of this change ;) > > The "problem" is that is it the other way around. It is not the output module > that asks the queue engine for data, it is the queue engine that pushes data > to the output module. While this sounds like a simple change of positions, it > has greater implications. > > ... especially if you think about the data flow. At this point, it may make > sense to review the data flow. I have described it here: > > http://www.rsyslog.com/Article350.phtml I will do this later today. > Even if you don't listen to the presentation, the diagram is useful. In it, > you see there are n queues, with n being 1 + number of actions. The "1"-queue > is the main message queue. So each message moves first into the main queue, > is dequeued there (in the push-way described above), run through the filter > engine and then placed into the relevant action queues. > > So the new interface does not necessarily need to modify the main queue (but > there is much benefit in doing so). But it must change the way action queues > deliver messages. That, in turn, means that the new batch mode can only work > if the action is configured to use any actual queueing mode (not the default > "DIRECT" mode, where incoming messages are directly handed over to the action > processing without any actual in-memory buffering). hmm, I suspect that having the 'direct' mode able to do this IFF (if and only if) all output modules are able to do the multi-message handling would be a win. specificly I expect to find that the locking process to deliver a single message is expensive enough that it's a big win even for the simple default case of writing to a file. I also expect to see wins for moving events from the main queue to the action queues. > So the approach is probably to enhance the queue object (which drives both > the main and action queues) to support dequeueing of multiple messages at > once (what, as a side-effect, will also greatly reduce looking conflicts). > Under normal operations, this is relatively straightforward. so far so good. > It gets messy when there is failure in the actions and it gets very complex > if we think about the various shutdown scenarios (not to mention disk > assisted queues actually running in DA mode). I have begin to look at these > issues (part of today's and over-the-weekend thinking ;)), but this will > probably need some more time to finally solve - plus some discussion, I > guess... would it simplify things significantly to say that the multi-message output and having multiple worker threads are exclusive? >> >> X=max_messages >> >> if (messages in queue) >> mark that it is going to process the next X messages >> grab the messages >> format them for output >> attempt to deliver the messages >> if (message delived sucessfully) >> mark messages in the queue as delivered >> X=max_messages (reset X in case it was reduced due to delivery >> errors) >> else (delivering this batch failed, reset and try to deliver the >> first half) > > I think, in our previous discussion (mailing list archive), we concluded that > there is no value in re-trying with half of the batch. very possibly, I'm not remembering it. not doing so will simplify the code considerably, but the advantages of retrying with half the batch are: 1. you deliver as much as you can 2. when you finally get stuck, you can pinpoint directly what message you were stuck on (in case you have a failure based on the data, say quotes in something that then gets formatted into a database, or slashes in something that becomes a filename component) your call >> unmark the messages that it tried to deliver (putting them back >> into the status where no delivery has been attempted) >> X=int(# messages attempted / 2) >> if (X=0) >> unable to deliver a single message, do existing message error >> process >> >> >> >> this approach is more complex than a simple 'wait for X messages, then >> insert them all', but it has some significant advantages >> >> 1. no waiting for 'enough' things to happen before something gets >> written >> >> 2. if you have one bad message, it will transmit all the good messages >> before the bad one, then error out only on the bad one before picking >> up >> with the ones after the bad one. > > This needs to be specified. Again, I think our prior conclusion was that this > would not make much sense. After all, if e.g. a SQL statement is invalid in > the template, how should it recover? If the sql statement is correct, why > should it eternally fail? Or should we drop a message if it fails after n > attempts (OK, we can do that already ;)). Hard to do for non-transactional > outputs. as noted above, I'm thinking in terms of the data in the particular log message being something that it shouldn't be, that causes problems for the output module for databases this could be quotes for file output with dynamic files you could get a hostname or program that has a slash (or ../../../../../../etc/shadow) in it. in theory these should all be detected by the module and scrubbed before being submitted, in practice bugs happen (especially if/when rsyslog starts dealing with unicode messages), being able to pinpoint 'this is the message that I was unable to deal with' is very helpful. with a vector interface, another option would be to allow the output module to report back how many of the submitted messages it sucessfully delivered. that way any 'retry half' type logic could be in the module, and only if it makes sense. for a file output module, if you ran out of disk space partway through the write, it could report on the number that it sucessfully wrote. as I said before, your call. David Lang From david at lang.hm Mon Apr 20 19:24:22 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 10:24:22 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF29@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF29@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> >>> I just realize I never sent this thought... >>> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as >>>> Sent: Friday, April 17, 2009 5:13 PM >>>> To: rsyslog-users >>>> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support >>>> forbatchoperations >>> >>> >>>>> For the case given above, I could still simply pass in a single - >> now >>>>> longer - string (that makes it that attractive for the other db >>>>> plugins). However, that does not work for the omoracle interface. >>>> >>>> For omoracle it's not good, indeed. Also, I don't think you want to >>>> maintain yet another way of passing messages to modules. IMHO, we >> have >>>> two orthogonal use cases: >>>> >>>> a) the module wants all messages one by one and is happy with it >> (all >>>> modules but omoracle). >>>> >>>> b) the module wants to handle the properties in big batches >> (omoracle). >>>> >>>> IMHO, this is flexible enough for new developers to choose between >> easy >>>> and fast. >>> >>> Plus there is the question of compatibility. I don't like to change >> an >>> interface once it is introduced. Granted, we have a small time frame >> now >>> where we can model the new "vector interface" - because so far it is >> in devel >>> only (and thus should not be considered immutable) and you are >> probably the >>> only user. But on the other hand, having two different modes may also >> make >>> sense: >>> >>> a) string IF, single entry >>> b) string IF, multiple entry >>> c) vector interface, single vector >>> d) vector interface, multiple vectors >>> >>> If I'd start from scratch, a+c would obviously not be needed, as >> multiple >>> includes n=1 (if well-crafte). But case a) is already in wide-spread >> use, no >>> chance to undo that. b) would definitely be useful (just think about >> the file >>> writer or TCP forwarding). So it probably is nice to have two >> options, well >>> and consistent defined, rather than a set of three values that map >> {a,b,d}. >>> At least this is my current school of thought... >> >> are there any known external output modules for rsyslog? > > At least one, at Frankfurt stock exchange (as of forum posts), probably at > least another one. I'd say there is sufficient probability we need to think > abot compatibility. > >> >> if not it may make sense to do a+c (or a+d) with a being depriciated >> (not >> used for anything new, replaced where in use as time allows) > > We could do that, but it doesn?t change that much in the next 10 years... > >> >> while b could be useful, I think it's probably simpler to define either >> c >> or d and have everything use that than to define an additional >> interface. > > But that complicates outputs, the file writer will definitely want b - why > force it to work on arrays and combine all the string itself? if it allows you to eliminate an entire class of interface, that may be worth it. something needs to combine the strings, and (not knowing the code intimatly) I don't see a huge difference between doing it in one place vs the other. David Lang From rgerhards at hq.adiscon.com Mon Apr 20 19:42:12 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 19:42:12 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF24@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF29@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF2A@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Monday, April 20, 2009 7:24 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > forbatchoperations > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> On Mon, 20 Apr 2009, Rainer Gerhards wrote: > >> > >>> I just realize I never sent this thought... > >>> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>> bounces at lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as > >>>> Sent: Friday, April 17, 2009 5:13 PM > >>>> To: rsyslog-users > >>>> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > >>>> forbatchoperations > >>> > >>> > >>>>> For the case given above, I could still simply pass in a single - > >> now > >>>>> longer - string (that makes it that attractive for the other db > >>>>> plugins). However, that does not work for the omoracle interface. > >>>> > >>>> For omoracle it's not good, indeed. Also, I don't think you want > to > >>>> maintain yet another way of passing messages to modules. IMHO, we > >> have > >>>> two orthogonal use cases: > >>>> > >>>> a) the module wants all messages one by one and is happy with it > >> (all > >>>> modules but omoracle). > >>>> > >>>> b) the module wants to handle the properties in big batches > >> (omoracle). > >>>> > >>>> IMHO, this is flexible enough for new developers to choose between > >> easy > >>>> and fast. > >>> > >>> Plus there is the question of compatibility. I don't like to change > >> an > >>> interface once it is introduced. Granted, we have a small time > frame > >> now > >>> where we can model the new "vector interface" - because so far it > is > >> in devel > >>> only (and thus should not be considered immutable) and you are > >> probably the > >>> only user. But on the other hand, having two different modes may > also > >> make > >>> sense: > >>> > >>> a) string IF, single entry > >>> b) string IF, multiple entry > >>> c) vector interface, single vector > >>> d) vector interface, multiple vectors > >>> > >>> If I'd start from scratch, a+c would obviously not be needed, as > >> multiple > >>> includes n=1 (if well-crafte). But case a) is already in wide- > spread > >> use, no > >>> chance to undo that. b) would definitely be useful (just think > about > >> the file > >>> writer or TCP forwarding). So it probably is nice to have two > >> options, well > >>> and consistent defined, rather than a set of three values that map > >> {a,b,d}. > >>> At least this is my current school of thought... > >> > >> are there any known external output modules for rsyslog? > > > > At least one, at Frankfurt stock exchange (as of forum posts), > probably at > > least another one. I'd say there is sufficient probability we need to > think > > abot compatibility. > > > >> > >> if not it may make sense to do a+c (or a+d) with a being depriciated > >> (not > >> used for anything new, replaced where in use as time allows) > > > > We could do that, but it doesn?t change that much in the next 10 > years... > > > >> > >> while b could be useful, I think it's probably simpler to define > either > >> c > >> or d and have everything use that than to define an additional > >> interface. > > > > But that complicates outputs, the file writer will definitely want b > - why > > force it to work on arrays and combine all the string itself? > > if it allows you to eliminate an entire class of interface, that may be > worth it. something needs to combine the strings, and (not knowing the > code intimatly) I don't see a huge difference between doing it in one > place vs the other. It's code reuse - why make each output plugin that potentially could use this implement it on its own. Also note that it is "*one* place vs. potentially *many* others." Also, it is a performance issue - if we have this interface, the core can generate one big string and everything is done. If we do not have it, the core needs to generate an argument array first and then the plugin must convert that array into a string. I'd say that both operations are roughly equally costly, so you save half of the cost by supporting that "big string type" of interface. Finally, you need to keep in mind that it is very easy to do that kind of interface handling inside the core - I already do it for the vector interface. In general, the interface provides a pointer. What this pointer intends to mean is modified by a function call. See omstdout for an example, it is my test driver that implements a hybrid interface support: http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/omstdout/omstdout.c;h= 181895a418dbbe01ac1d65dbac60fd159535e8fa;hb=refs/heads/nextmaster#l94 Line 94ff is the actual consumer and in line 143 it tells the rsyslog core what format it wants. Extending this by one more bit is really not a big deal (but, granted, this is an architecture question). [line 181 to 191 check which modes are supported]. From aoz.syn at gmail.com Mon Apr 20 19:42:55 2009 From: aoz.syn at gmail.com (RB) Date: Mon, 20 Apr 2009 11:42:55 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> Message-ID: <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards wrote: > Excellent, but let me re-phrase: if you have a PostgreSQL expert at hand, > that would be even more useful (I can do testing with PostgreSQL myself, but > do not have access to Oracle - I overlooked that tiny little restriction when > posting ;)). Perhaps more importantly for implementation-specific bits, perhaps we could clarify which we're discussing: procedures, functions, or prepared statements? The thread seems to jump back & forth between stored procedures and prepared statements, and although they are similar and often have a pure-SQL interface, they are not implemented everywhere. MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION PostgreSQL: PREPARE, CREATE FUNCTION Oracle: CREATE PROCEDURE, CREATE FUNCTION So, for PosgreSQL, you'd do something like Luis' earlier post: PREPARE rsyslog_insert(date, text) AS INSERT INTO foo VALUES($1, $2); EXECUTE rsyslog_insert('20090420-06:00', "log1"); EXECUTE rsyslog_insert('20090420-06:00', "log2"); EXECUTE rsyslog_insert('20090420-06:00', "log3"); From rgerhards at hq.adiscon.com Mon Apr 20 19:47:07 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 19:47:07 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of RB > Sent: Monday, April 20, 2009 7:43 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support > forbatchoperations > > On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards > wrote: > > Excellent, but let me re-phrase: if you have a PostgreSQL expert at > hand, > > that would be even more useful (I can do testing with PostgreSQL > myself, but > > do not have access to Oracle - I overlooked that tiny little > restriction when > > posting ;)). > > Perhaps more importantly for implementation-specific bits, perhaps we > could clarify which we're discussing: procedures, functions, or > prepared statements? The thread seems to jump back & forth between > stored procedures and prepared statements, and although they are > similar and often have a pure-SQL interface, they are not implemented > everywhere. > > MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION > PostgreSQL: PREPARE, CREATE FUNCTION > Oracle: CREATE PROCEDURE, CREATE FUNCTION > > So, for PosgreSQL, you'd do something like Luis' earlier post: > > PREPARE rsyslog_insert(date, text) AS > INSERT INTO foo VALUES($1, $2); > EXECUTE rsyslog_insert('20090420-06:00', "log1"); > EXECUTE rsyslog_insert('20090420-06:00', "log2"); > EXECUTE rsyslog_insert('20090420-06:00', "log3"); The real issue, as I see it, is "string vs. API call". Your sample above is API-call, and this requires different ways of doing things. David is suggesting doing EVERYTHING via a single exec() API call and that API receives a single string. The string then describes the different modes. I doubt that the single string approach actually works across databases (but may be wrong ;)). Thus it is not so important if it is prepared statement or stored procedure or whatever - and I pick whatever seems to be useful to prove that it can't be done via a single API call with different strings. The most problematic part seems to be prepared statements, thus we slowly converge into that direction. Hope that clarifies (else *please* give me a wakup-call). Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Mon Apr 20 19:57:55 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 19:57:55 +0200 Subject: [rsyslog] multi-message handling and databases References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Monday, April 20, 2009 7:21 PM > To: rsyslog-users > Subject: Re: [rsyslog] multi-message handling and databases > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > > David, > > > > I start with some quick pointers. I think it makes sense to move the > results > > of this discussion into a document - or alternatively move it to the > wiki, if > > you (or others) find this useful. I have to admit that I am a bit > skeptic > > about the wiki, I guess mail is better for discussion here. But I > wanted to > > mention this option. > > > > Now on to the meat: > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> Sent: Saturday, April 18, 2009 12:29 AM > >> To: rsyslog-users > >> Subject: [rsyslog] multi-message handling and databases > >> > >> the company that I work for has decided to sponser multi-message > queue > >> output capability, they have chosen to remain anonomous (I am > posting > >> from > >> my personal account) > >> > >> there are two parts to this. > >> > >> 1. the interaction between the output module and the queue > >> > >> 2. the configuration of the output module for it's interaction with > the > >> database > >> > >> for the first part (how the output module interacts with the queue), > >> the > >> criteria are that > >> > >> 1. it needs to be able to maintain guarenteed delivery (even in the > >> face > >> of crashes, assuming rsyslog is configured appropriately) > >> > >> 2. at low-volume times it must not wait for 'enough' messages to > >> accumulate, messages should be processed with as little latency as > >> possible > >> > >> > >> > >> to meet these criteria, what is being proposed is the following > >> > >> a configuration option to define the max number of messages to be > >> processed at once. > >> > >> the output module goes through the following loop > > > > This sentence covers much of the complexity of this change ;) > > > > The "problem" is that is it the other way around. It is not the > output module > > that asks the queue engine for data, it is the queue engine that > pushes data > > to the output module. While this sounds like a simple change of > positions, it > > has greater implications. > > > > ... especially if you think about the data flow. At this point, it > may make > > sense to review the data flow. I have described it here: > > > > http://www.rsyslog.com/Article350.phtml > > I will do this later today. > > > Even if you don't listen to the presentation, the diagram is useful. > In it, > > you see there are n queues, with n being 1 + number of actions. The > "1"-queue > > is the main message queue. So each message moves first into the main > queue, > > is dequeued there (in the push-way described above), run through the > filter > > engine and then placed into the relevant action queues. > > > > So the new interface does not necessarily need to modify the main > queue (but > > there is much benefit in doing so). But it must change the way action > queues > > deliver messages. That, in turn, means that the new batch mode can > only work > > if the action is configured to use any actual queueing mode (not the > default > > "DIRECT" mode, where incoming messages are directly handed over to > the action > > processing without any actual in-memory buffering). > > hmm, I suspect that having the 'direct' mode able to do this IFF (if > and only if) all output modules are able to do the multi-message > handling > would be a win. You can't do that, because if it is in direct mode, there always is at most one message inside the queue. You can not operate on the main message queue "batch", as this is not yet filtered, so you do not know which message is for which action. So, from the action perspective, nothing is queued at this point. Thus, you need a queue running in a real queue mode. I hope it will become more clear if you have looked at the data flow (otherwise I need to write some big overview about it...). > > specificly I expect to find that the locking process to deliver a > single > message is expensive enough This is handled by the main queue batch. So even in direct mode, we have the benefit from the locking code improvement (I agree, potentially a *very big* gain). I guess you currently think of a single big queue inside rsyslog, which is the wrong picture. We have chained queues and you always need to look which part of the message processing works on which queues. Very important implications! > that it's a big win even for the simple > default case of writing to a file. I also expect to see wins for moving > events from the main queue to the action queues. Yup, thus the direct mode oft he action queue does not affect the main queue at all (and in direct mode we have no locing in the action queues, why should we ... nothing needs to by synchronized if you just stick the message into the output...) > > > So the approach is probably to enhance the queue object (which drives > both > > the main and action queues) to support dequeueing of multiple > messages at > > once (what, as a side-effect, will also greatly reduce looking > conflicts). > > Under normal operations, this is relatively straightforward. > > so far so good. > > > It gets messy when there is failure in the actions and it gets very > complex > > if we think about the various shutdown scenarios (not to mention disk > > assisted queues actually running in DA mode). I have begin to look at > these > > issues (part of today's and over-the-weekend thinking ;)), but this > will > > probably need some more time to finally solve - plus some discussion, > I > > guess... > > would it simplify things significantly to say that the multi-message > output and having multiple worker threads are exclusive? Unlikely (but I don't like to totally outrule it, probability less than 5%) > > >> > >> X=max_messages > >> > >> if (messages in queue) > >> mark that it is going to process the next X messages > >> grab the messages > >> format them for output > >> attempt to deliver the messages > >> if (message delived sucessfully) > >> mark messages in the queue as delivered > >> X=max_messages (reset X in case it was reduced due to delivery > >> errors) > >> else (delivering this batch failed, reset and try to deliver the > >> first half) > > > > I think, in our previous discussion (mailing list archive), we > concluded that > > there is no value in re-trying with half of the batch. > > very possibly, I'm not remembering it. > > not doing so will simplify the code considerably, but the advantages of > retrying with half the batch are: > > 1. you deliver as much as you can > > 2. when you finally get stuck, you can pinpoint directly what message > you > were stuck on (in case you have a failure based on the data, say quotes > in > something that then gets formatted into a database, or slashes in > something that becomes a filename component) > > your call I need to refer you back to our previous discussion. Unfortunately, it was private. I dug the link out and sent it via private mail. Sorry all others, please stand by a little moment. If I have not read it wrong, it boiled down to we have no non-transactional sources that were problematic and we had not identified cases where it would be useful to retry with fewer elements. I'd provide a more complete description, but that would probably take me another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long discussion). David, if you like to quote anything from me, feel free to do so. > > >> unmark the messages that it tried to deliver (putting them back > >> into the status where no delivery has been attempted) > >> X=int(# messages attempted / 2) > >> if (X=0) > >> unable to deliver a single message, do existing message error > >> process > >> > >> > >> > >> this approach is more complex than a simple 'wait for X messages, > then > >> insert them all', but it has some significant advantages > >> > >> 1. no waiting for 'enough' things to happen before something gets > >> written > >> > >> 2. if you have one bad message, it will transmit all the good > messages > >> before the bad one, then error out only on the bad one before > picking > >> up > >> with the ones after the bad one. > > > > This needs to be specified. Again, I think our prior conclusion was > that this > > would not make much sense. After all, if e.g. a SQL statement is > invalid in > > the template, how should it recover? If the sql statement is correct, > why > > should it eternally fail? Or should we drop a message if it fails > after n > > attempts (OK, we can do that already ;)). Hard to do for non- > transactional > > outputs. > > as noted above, I'm thinking in terms of the data in the particular log > message being something that it shouldn't be, that causes problems for > the > output module > > for databases this could be quotes > > for file output with dynamic files you could get a hostname or program > that has a slash (or ../../../../../../etc/shadow) in it. > > in theory these should all be detected by the module and scrubbed > before > being submitted, in practice bugs happen (especially if/when rsyslog > starts dealing with unicode messages), being able to pinpoint 'this is > the > message that I was unable to deal with' is very helpful. > > with a vector interface, another option would be to allow the output > module to report back how many of the submitted messages it sucessfully > delivered. that way any 'retry half' type logic could be in the module, > and only if it makes sense. for a file output module, if you ran out of > disk space partway through the write, it could report on the number > that > it sucessfully wrote. > > as I said before, your call. Let's go through previous argument, first. We are re-iterating ;) Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From aoz.syn at gmail.com Mon Apr 20 20:31:57 2009 From: aoz.syn at gmail.com (RB) Date: Mon, 20 Apr 2009 12:31:57 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> Message-ID: <4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> On Mon, Apr 20, 2009 at 11:47, Rainer Gerhards wrote: > The real issue, as I see it, is "string vs. API call". Your sample above is > API-call, and this requires different ways of doing things. David is > suggesting doing EVERYTHING via a single exec() API call and that API > receives a single string. The string then describes the different modes. I > doubt that the single string approach actually works across databases (but > may be wrong ;)). Perhaps I'm confused - by API do you refer to database-specific libraries (like libpq) or the rsyslog-internal API between the core & database output modules? If the latter, I've completely misunderstood your questions. I was most specifically responding to this statement by you, which seemed supported in your later Oracle/PosgreSQL-specific questions: > There is a problem with this example - and that is that each database > provides its own API for prepared statements. Although this is true, most databases also have a pure-SQL interface for defining and calling [stored procedures|prepared statements] that can be used with a simple exec(SQL) call instead of the language-specific [C|C++|Java|Lisp] API calls. The tradeoff is (as usual) efficiency versus flexibility. From rgerhards at hq.adiscon.com Mon Apr 20 20:34:43 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 20:34:43 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and supportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com><4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> <4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF2E@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of RB > Sent: Monday, April 20, 2009 8:32 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and > supportforbatchoperations > > On Mon, Apr 20, 2009 at 11:47, Rainer Gerhards > wrote: > > The real issue, as I see it, is "string vs. API call". Your sample > above is > > API-call, and this requires different ways of doing things. David is > > suggesting doing EVERYTHING via a single exec() API call and that API > > receives a single string. The string then describes the different > modes. I > > doubt that the single string approach actually works across databases > (but > > may be wrong ;)). > > Perhaps I'm confused - by API do you refer to database-specific > libraries (like libpq) Yes! > or the rsyslog-internal API between the core & > database output modules? If the latter, I've completely misunderstood > your questions. > > I was most specifically responding to this statement by you, which > seemed supported in your later Oracle/PosgreSQL-specific questions: > > There is a problem with this example - and that is that each database > > provides its own API for prepared statements. > > Although this is true, most databases also have a pure-SQL interface > for defining and calling [stored procedures|prepared statements] that > can be used with a simple exec(SQL) call instead of the > language-specific [C|C++|Java|Lisp] API calls. The tradeoff is (as > usual) efficiency versus flexibility. That's what David, too, said. So if it is the case, the question remains how much overhead it costs. Plus, it would be useful to have a sample for PostgreSQL or MySQL, so that I can do some testing myself ;) Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 20 21:01:10 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 12:01:10 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of RB >> >> On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards >> wrote: >>> Excellent, but let me re-phrase: if you have a PostgreSQL expert at >> hand, >>> that would be even more useful (I can do testing with PostgreSQL >> myself, but >>> do not have access to Oracle - I overlooked that tiny little >> restriction when >>> posting ;)). >> >> Perhaps more importantly for implementation-specific bits, perhaps we >> could clarify which we're discussing: procedures, functions, or >> prepared statements? The thread seems to jump back & forth between >> stored procedures and prepared statements, and although they are >> similar and often have a pure-SQL interface, they are not implemented >> everywhere. >> >> MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION >> PostgreSQL: PREPARE, CREATE FUNCTION >> Oracle: CREATE PROCEDURE, CREATE FUNCTION >> >> So, for PosgreSQL, you'd do something like Luis' earlier post: >> >> PREPARE rsyslog_insert(date, text) AS >> INSERT INTO foo VALUES($1, $2); >> EXECUTE rsyslog_insert('20090420-06:00', "log1"); >> EXECUTE rsyslog_insert('20090420-06:00', "log2"); >> EXECUTE rsyslog_insert('20090420-06:00', "log3"); so this would be DBInit="PREPARE rsyslog_insert(date, text) AS\nINSERT INTO foo VALUES(\$1, \$2);" DBStart = "begin\n" DBMid = "" DBEnd = "end" DBItem = "EXECUTE rsyslog_insert('$timestamp','$msg');\n" note that in DBInit you have to be careful about the $, escape them, use single quotes, or otherwise make sure they are in the resulting string > The real issue, as I see it, is "string vs. API call". Your sample above is > API-call, and this requires different ways of doing things. David is > suggesting doing EVERYTHING via a single exec() API call and that API > receives a single string. The string then describes the different modes. I > doubt that the single string approach actually works across databases (but > may be wrong ;)). > > Thus it is not so important if it is prepared statement or stored procedure > or whatever - and I pick whatever seems to be useful to prove that it can't > be done via a single API call with different strings. The most problematic > part seems to be prepared statements, thus we slowly converge into that > direction. > > Hope that clarifies (else *please* give me a wakup-call). what the different databases send out over the wire can look exactly the same (and it _is_ a string), sometimes your programming API provides the strings to a exec() API call, sometimes you make other API calls to create the string. so you could have exec('PREPARE rsyslog_insert(date, text) AS\n INSERT INTO foo VALUES($1, $2);') or you could have create_prepared_statement(rsyslog_insert(date,text), 'INSERT INTO foo VALUES($1, $2);') but what goes out over the wire is the same thing. David Lang From david at lang.hm Mon Apr 20 21:09:06 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 12:09:06 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: <4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> <4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> Message-ID: On Mon, 20 Apr 2009, RB wrote: > On Mon, Apr 20, 2009 at 11:47, Rainer Gerhards wrote: >> The real issue, as I see it, is "string vs. API call". Your sample above is >> API-call, and this requires different ways of doing things. David is >> suggesting doing EVERYTHING via a single exec() API call and that API >> receives a single string. The string then describes the different modes. I >> doubt that the single string approach actually works across databases (but >> may be wrong ;)). > > Perhaps I'm confused - by API do you refer to database-specific > libraries (like libpq) or the rsyslog-internal API between the core & > database output modules? If the latter, I've completely misunderstood > your questions. I believe that we are talking about the rsyslog to database interface here. I think that there is confusion between the actual database interface and the software API for libpq, there are different ways of doing the same thing, you can make a library call specific to the function you want to perform, or you can craft a SQL statement and send it via an exec call. someone sniffing the network between the two machines would not be able to tell the difference between the two (unless they know the specific library and recognise that it's being done slightly differently) it's possible for a database to define a binary API in addition to the text SQL API, but I'm not aware of it being used for normal software. Postgres does not have a binary API like this (it gets talked about every once in a while as a way to speed up huge tasks, but the normal prepared statements/stored procedures are able to eliminate so much of the overhead for normal use that there's never been enough of a need to implement one) > I was most specifically responding to this statement by you, which > seemed supported in your later Oracle/PosgreSQL-specific questions: >> There is a problem with this example - and that is that each database >> provides its own API for prepared statements. > > Although this is true, most databases also have a pure-SQL interface > for defining and calling [stored procedures|prepared statements] that > can be used with a simple exec(SQL) call instead of the > language-specific [C|C++|Java|Lisp] API calls. The tradeoff is (as > usual) efficiency versus flexibility. is it really any more efficiant to define a stored procedure or prepared statement through the API than through the exec() call? and even if it is, is this something that is done once per startup or every command? if it's once per startup the complexity cost may not be worth the small time savings. David Lang From rgerhards at hq.adiscon.com Mon Apr 20 21:11:46 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 20 Apr 2009 21:11:46 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations Message-ID: <002201c9c1ec$1764dc30$100013ac@intern.adiscon.com> Ok, i will see that i craft a statement for postgres tomorrow that will work with the actual schema. That woukd provide me with a sufficient testbed. rainer ----- Urspr?ngliche Nachricht ----- Von: "david at lang.hm" An: "rsyslog-users" Gesendet: 20.04.09 21:01 Betreff: Re: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of RB >> >> On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards >> wrote: >>> Excellent, but let me re-phrase: if you have a PostgreSQL expert at >> hand, >>> that would be even more useful (I can do testing with PostgreSQL >> myself, but >>> do not have access to Oracle - I overlooked that tiny little >> restriction when >>> posting ;)). >> >> Perhaps more importantly for implementation-specific bits, perhaps we >> could clarify which we're discussing: procedures, functions, or >> prepared statements? The thread seems to jump back & forth between >> stored procedures and prepared statements, and although they are >> similar and often have a pure-SQL interface, they are not implemented >> everywhere. >> >> MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION >> PostgreSQL: PREPARE, CREATE FUNCTION >> Oracle: CREATE PROCEDURE, CREATE FUNCTION >> >> So, for PosgreSQL, you'd do something like Luis' earlier post: >> >> PREPARE rsyslog_insert(date, text) AS >> INSERT INTO foo VALUES($1, $2); >> EXECUTE rsyslog_insert('20090420-06:00', "log1"); >> EXECUTE rsyslog_insert('20090420-06:00', "log2"); >> EXECUTE rsyslog_insert('20090420-06:00', "log3"); so this would be DBInit="PREPARE rsyslog_insert(date, text) AS\nINSERT INTO foo VALUES(\$1, \$2);" DBStart = "begin\n" DBMid = "" DBEnd = "end" DBItem = "EXECUTE rsyslog_insert('$timestamp','$msg');\n" note that in DBInit you have to be careful about the $, escape them, use single quotes, or otherwise make sure they are in the resulting string > The real issue, as I see it, is "string vs. API call". Your sample above is > API-call, and this requires different ways of doing things. David is > suggesting doing EVERYTHING via a single exec() API call and that API > receives a single string. The string then describes the different modes. I > doubt that the single string approach actually works across databases (but > may be wrong ;)). > > Thus it is not so important if it is prepared statement or stored procedure > or whatever - and I pick whatever seems to be useful to prove that it can't > be done via a single API call with different strings. The most problematic > part seems to be prepared statements, thus we slowly converge into that > direction. > > Hope that clarifies (else *please* give me a wakup-call). what the different databases send out over the wire can look exactly the same (and it _is_ a string), sometimes your programming API provides the strings to a exec() API call, sometimes you make other API calls to create the string. so you could have exec('PREPARE rsyslog_insert(date, text) AS\n INSERT INTO foo VALUES($1, $2);') or you could have create_prepared_statement(rsyslog_insert(date,text), 'INSERT INTO foo VALUES($1, $2);') but what goes out over the wire is the same thing. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From david at lang.hm Mon Apr 20 21:22:12 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 12:22:12 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com> <200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> Message-ID: On Mon, 20 Apr 2009, RB wrote: > On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards wrote: >> Excellent, but let me re-phrase: if you have a PostgreSQL expert at hand, >> that would be even more useful (I can do testing with PostgreSQL myself, but >> do not have access to Oracle - I overlooked that tiny little restriction when >> posting ;)). > > Perhaps more importantly for implementation-specific bits, perhaps we > could clarify which we're discussing: procedures, functions, or > prepared statements? The thread seems to jump back & forth between > stored procedures and prepared statements, and although they are > similar and often have a pure-SQL interface, they are not implemented > everywhere. > > MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION > PostgreSQL: PREPARE, CREATE FUNCTION > Oracle: CREATE PROCEDURE, CREATE FUNCTION > > So, for PosgreSQL, you'd do something like Luis' earlier post: > > PREPARE rsyslog_insert(date, text) AS > INSERT INTO foo VALUES($1, $2); > EXECUTE rsyslog_insert('20090420-06:00', "log1"); > EXECUTE rsyslog_insert('20090420-06:00', "log2"); > EXECUTE rsyslog_insert('20090420-06:00', "log3"); by the way, this is not nessasrily the most efficiant way to get data into postgres (although it's far more efficiant than independant insert statements like we do today. you can do begin;insert;insert;end you can do insert values (),(),(),(),() you can do prepare;execute;execute you can do a procedure or function that will insert the data into a different table based on the time (think of it as dynafiles for databases) you can do copy (and can probably combine this with prepared statements, procedures, and functions) which one is the most efficiant one depends on a lot of things (what permissions you give the rsyslog user, how the database is setup, etc) it would be best if we can avoid coding one specific option into rsyslog David Lang From aoz.syn at gmail.com Mon Apr 20 21:16:19 2009 From: aoz.syn at gmail.com (RB) Date: Mon, 20 Apr 2009 13:16:19 -0600 Subject: [rsyslog] RFC: On rsyslog output modules and support forbatchoperations In-Reply-To: References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch> <9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> <4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com> <4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> Message-ID: <4255c2570904201216n29045054he250c59793516e3@mail.gmail.com> On Mon, Apr 20, 2009 at 13:09, wrote: > is it really any more efficiant to define a stored procedure or prepared > statement through the API than through the exec() call? > > and even if it is, is this something that is done once per startup or > every command? if it's once per startup the complexity cost may not be > worth the small time savings. I don't have numbers on the overhead bit, there are application notes for MySQL (http://dev.mysql.com/doc/refman/5.1/en/sql-syntax-prepared-statements.html, paragraph 2) that notes that the SQL interface is not as efficient as their binary protocol, but gives no justification. I won't argue whether binary protocols are faster, but agree with the assertion that the gain may not be sufficiently significant in this use case. From david at lang.hm Mon Apr 20 21:43:01 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 12:43:01 -0700 (PDT) Subject: [rsyslog] multi-message handling and databases In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> >>> David, >>> >>> I start with some quick pointers. I think it makes sense to move the >> results >>> of this discussion into a document - or alternatively move it to the >> wiki, if >>> you (or others) find this useful. I have to admit that I am a bit >> skeptic >>> about the wiki, I guess mail is better for discussion here. But I >> wanted to >>> mention this option. >>> >>> Now on to the meat: >>> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> >> hmm, I suspect that having the 'direct' mode able to do this IFF (if >> and only if) all output modules are able to do the multi-message >> handling >> would be a win. > > You can't do that, because if it is in direct mode, there always is at most > one message inside the queue. You can not operate on the main message queue > "batch", as this is not yet filtered, so you do not know which message is for > which action. So, from the action perspective, nothing is queued at this > point. Thus, you need a queue running in a real queue mode. I hope it will > become more clear if you have looked at the data flow (otherwise I need to > write some big overview about it...). I had not thought about the filtering issue >> >> specificly I expect to find that the locking process to deliver a >> single >> message is expensive enough > > This is handled by the main queue batch. So even in direct mode, we have the > benefit from the locking code improvement (I agree, potentially a *very big* > gain). I guess you currently think of a single big queue inside rsyslog, > which is the wrong picture. We have chained queues and you always need to > look which part of the message processing works on which queues. Very > important implications! this is a big difference. yes, I was thinking that there was one big queue (unless you defined action queues explicitly), I'll pay very careful attention to the tutorial and let you know if it explains this. >> that it's a big win even for the simple >> default case of writing to a file. I also expect to see wins for moving >> events from the main queue to the action queues. > > Yup, thus the direct mode oft he action queue does not affect the main queue > at all (and in direct mode we have no locing in the action queues, why should > we ... nothing needs to by synchronized if you just stick the message into > the output...) and if you have multiple output threads? >>> It gets messy when there is failure in the actions and it gets very >> complex >>> if we think about the various shutdown scenarios (not to mention disk >>> assisted queues actually running in DA mode). I have begin to look at >> these >>> issues (part of today's and over-the-weekend thinking ;)), but this >> will >>> probably need some more time to finally solve - plus some discussion, >> I >>> guess... >> >> would it simplify things significantly to say that the multi-message >> output and having multiple worker threads are exclusive? > > Unlikely (but I don't like to totally outrule it, probability less than 5%) Ok, not an issue then >> >>>> >>>> X=max_messages >>>> >>>> if (messages in queue) >>>> mark that it is going to process the next X messages >>>> grab the messages >>>> format them for output >>>> attempt to deliver the messages >>>> if (message delived sucessfully) >>>> mark messages in the queue as delivered >>>> X=max_messages (reset X in case it was reduced due to delivery >>>> errors) >>>> else (delivering this batch failed, reset and try to deliver the >>>> first half) >>> >>> I think, in our previous discussion (mailing list archive), we >> concluded that >>> there is no value in re-trying with half of the batch. >> >> very possibly, I'm not remembering it. >> >> not doing so will simplify the code considerably, but the advantages of >> retrying with half the batch are: >> >> 1. you deliver as much as you can >> >> 2. when you finally get stuck, you can pinpoint directly what message >> you >> were stuck on (in case you have a failure based on the data, say quotes >> in >> something that then gets formatted into a database, or slashes in >> something that becomes a filename component) >> >> your call > > I need to refer you back to our previous discussion. Unfortunately, it was > private. I dug the link out and sent it via private mail. Sorry all others, > please stand by a little moment. If I have not read it wrong, it boiled down > to we have no non-transactional sources that were problematic and we had not > identified cases where it would be useful to retry with fewer elements. > > I'd provide a more complete description, but that would probably take me > another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long > discussion). David, if you like to quote anything from me, feel free to do > so. I'll dig through this today and tonight and review this to be clear, I'm mostly concerned about the debugging/troubleshooting issues (which one of these 1000 messages made the database complain..). but I guess this can be addressed by stopping rsyslog and restarting it with a smaller batch size until you track it down. it should be rare enough to make that tolerable. David Lang From feikong0119 at 163.com Tue Apr 21 02:37:51 2009 From: feikong0119 at 163.com (feikong0119) Date: Tue, 21 Apr 2009 08:37:51 +0800 (CST) Subject: [rsyslog] problem-help Message-ID: <22681448.34131240274271109.JavaMail.coremail@bj163app119.163.com> Hello ? ?? I am using rsyslog 2.0.6 which you Developped, there is a problem, I need you to help me!!. I want to save free format log, for example, ? :? [event time] [event id] [event type] [event source] <30> Feb 12 17:05:15 dhclient: Feb 12 17:05:15.xxx(ms) 1152 0 radar this data is needed to insert to db. about free form At [rsyslog-2.0.6/doc/property_replacer.html], there is a string ?STRUCTURED-DATA?, I don?t know how to use it? in rsyslog.conf? May you give me an example? ? Thanks? ??????? ??????? ??????? ??????? ??????? ??????? Kyou-siryu. From david at lang.hm Tue Apr 21 03:59:21 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 18:59:21 -0700 (PDT) Subject: [rsyslog] multi-message handling and databases In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: >>> ... especially if you think about the data flow. At this point, it >> may make >>> sense to review the data flow. I have described it here: >>> >>> http://www.rsyslog.com/Article350.phtml one problem with this diagram (and therefor the explination that goes with it) it that it confuses logical steps with objects with threads. I would suggest redrawing the diagram with boxes to delinate thread boundries (if I understand this correctly, this would mean splitting the pre-processing box up into three, each of which would be in the same 'thread' box as the input module) It is also not as clear as it could be what happens when you don't define action queues. I would suggest a sepeate diagram just showing that situation (if I understand it correctly, you could show the single-queue operation, and then have one of the 'actions' be to queue the item into an action queue for async processing. also show where the message formatting take place (again, including which thread is doing the work). it is esecially unclear what happens where when you have the action queues (is the message formatted before it's put into the action queue, or is the format of a message in the action queue exactly the same as in the main queue?) some of this was covered (to at least some extent) by your comments, but this sort of thing is actually covered better with text. I'm not a powerpoint person, but that sort of slideshow (with the text explination) is ideal for this sort of explination. David Lang From david at lang.hm Tue Apr 21 04:38:36 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 19:38:36 -0700 (PDT) Subject: [rsyslog] multi-message handling and databases In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, david at lang.hm wrote: > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > >>> -----Original Message----- >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>> >>> On Mon, 20 Apr 2009, Rainer Gerhards wrote: >>> >>>>> >>>>> X=max_messages >>>>> >>>>> if (messages in queue) >>>>> mark that it is going to process the next X messages >>>>> grab the messages >>>>> format them for output >>>>> attempt to deliver the messages >>>>> if (message delived sucessfully) >>>>> mark messages in the queue as delivered >>>>> X=max_messages (reset X in case it was reduced due to delivery >>>>> errors) >>>>> else (delivering this batch failed, reset and try to deliver the >>>>> first half) >>>> >>>> I think, in our previous discussion (mailing list archive), we >>> concluded that >>>> there is no value in re-trying with half of the batch. >>> >>> very possibly, I'm not remembering it. >>> >>> not doing so will simplify the code considerably, but the advantages of >>> retrying with half the batch are: >>> >>> 1. you deliver as much as you can >>> >>> 2. when you finally get stuck, you can pinpoint directly what message >>> you >>> were stuck on (in case you have a failure based on the data, say quotes >>> in >>> something that then gets formatted into a database, or slashes in >>> something that becomes a filename component) >>> >>> your call >> >> I need to refer you back to our previous discussion. Unfortunately, it was >> private. I dug the link out and sent it via private mail. Sorry all others, >> please stand by a little moment. If I have not read it wrong, it boiled down >> to we have no non-transactional sources that were problematic and we had not >> identified cases where it would be useful to retry with fewer elements. >> >> I'd provide a more complete description, but that would probably take me >> another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long >> discussion). David, if you like to quote anything from me, feel free to do >> so. > > I'll dig through this today and tonight and review this > > to be clear, I'm mostly concerned about the debugging/troubleshooting > issues (which one of these 1000 messages made the database complain..). > but I guess this can be addressed by stopping rsyslog and restarting it > with a smaller batch size until you track it down. it should be rare > enough to make that tolerable. looking back over the thread, I think a quick summary is that you believe that most output modules would not be able to do anything useful if they get an error in the middle of a set of messages. I'm not sure that I agree, but I think it's a fairly easy thing to add later if you are wrong. so if you think it's a significant win to drop that for now, go ahead my initial proposal was for the output module to return the number of records sucessfully written, so that those would not be retried. that definantly isn't possible with a string-based interface, so we then moved to 'retry with half' before the thread wound down. David Lang From rgerhards at hq.adiscon.com Tue Apr 21 07:28:09 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 07:28:09 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com><4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF30@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Monday, April 20, 2009 9:22 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > > On Mon, 20 Apr 2009, RB wrote: > > > On Mon, Apr 20, 2009 at 10:33, Rainer Gerhards > wrote: > >> Excellent, but let me re-phrase: if you have a PostgreSQL expert at > hand, > >> that would be even more useful (I can do testing with PostgreSQL > myself, but > >> do not have access to Oracle - I overlooked that tiny little > restriction when > >> posting ;)). > > > > Perhaps more importantly for implementation-specific bits, perhaps we > > could clarify which we're discussing: procedures, functions, or > > prepared statements? The thread seems to jump back & forth between > > stored procedures and prepared statements, and although they are > > similar and often have a pure-SQL interface, they are not implemented > > everywhere. > > > > MySQL: PREPARE, CREATE PROCEDURE, CREATE FUNCTION > > PostgreSQL: PREPARE, CREATE FUNCTION > > Oracle: CREATE PROCEDURE, CREATE FUNCTION > > > > So, for PosgreSQL, you'd do something like Luis' earlier post: > > > > PREPARE rsyslog_insert(date, text) AS > > INSERT INTO foo VALUES($1, $2); > > EXECUTE rsyslog_insert('20090420-06:00', "log1"); > > EXECUTE rsyslog_insert('20090420-06:00', "log2"); > > EXECUTE rsyslog_insert('20090420-06:00', "log3"); > > by the way, this is not nessasrily the most efficiant way to get data > into > postgres (although it's far more efficiant than independant insert > statements like we do today. > > you can do begin;insert;insert;end > > you can do insert values (),(),(),(),() > > you can do prepare;execute;execute > > you can do a procedure or function that will insert the data into a > different table based on the time (think of it as dynafiles for > databases) > > you can do copy (and can probably combine this with prepared > statements, > procedures, and functions) > > which one is the most efficiant one depends on a lot of things (what > permissions you give the rsyslog user, how the database is setup, etc) > > it would be best if we can avoid coding one specific option into > rsyslog > Definitely - I came down the database library API path because this is how omoracle is done AND I remember from past work that the API provides a lot of speed. But I am not a real database guy ;) If we can stay at a largely DB independent level with a simple API, I am all for that! Actually, it proves that the original design was that bad at all ;) Rainer From rgerhards at hq.adiscon.com Tue Apr 21 07:31:24 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 07:31:24 +0200 Subject: [rsyslog] RFC: On rsyslog output modules and supportforbatchoperations References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com><4255c2570904201042u3490c6a6pddb5573840656fd6@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AF2B@GRFEXC.intern.adiscon.com><4255c2570904201131g336479f2yef9ebcdbd8f5c8e4@mail.gmail.com> <4255c2570904201216n29045054he250c59793516e3@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF31@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of RB > Sent: Monday, April 20, 2009 9:16 PM > To: rsyslog-users > Subject: Re: [rsyslog] RFC: On rsyslog output modules and > supportforbatchoperations > > On Mon, Apr 20, 2009 at 13:09, wrote: > > is it really any more efficiant to define a stored procedure or > prepared > > statement through the API than through the exec() call? > > > > and even if it is, is this something that is done once per startup or > > every command? if it's once per startup the complexity cost may not > be > > worth the small time savings. > > I don't have numbers on the overhead bit, there are application notes > for MySQL (http://dev.mysql.com/doc/refman/5.1/en/sql-syntax-prepared- > statements.html, > paragraph 2) that notes that the SQL interface is not as efficient as > their binary protocol, but gives no justification. I won't argue > whether binary protocols are faster, but agree with the assertion that > the gain may not be sufficiently significant in this use case. My main concern was that we could not do those things with a "string-only" calling interface. As I now know we can, I don't see any performance problems for most cases (it may be different in those rare cases where every cycle counts, but they should be very, very seldom). I think I can even formally proof that the overhead is not significant if the batch size is sufficiently large (> 500). Let me check the priorities, probably I'll do the proof. Rainer From rgerhards at hq.adiscon.com Tue Apr 21 07:41:48 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 07:41:48 +0200 Subject: [rsyslog] multi-message handling and databases References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF32@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Tuesday, April 21, 2009 3:59 AM > To: rsyslog-users > Subject: Re: [rsyslog] multi-message handling and databases > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > >>> ... especially if you think about the data flow. At this point, it > >> may make > >>> sense to review the data flow. I have described it here: > >>> > >>> http://www.rsyslog.com/Article350.phtml > > one problem with this diagram (and therefor the explination that goes > with > it) it that it confuses logical steps with objects with threads. It was not designed as an in-depth threading description. It is about the data flow, not which thread does what. Looks like I need to create such a thing. But you can easily do this. The queue is the threading boundary. Everything on the left side is one thread, everything on the right site is another thread (except if the queue runs in direct mode, in which case it is the same thread to the left and the right). > > I would suggest redrawing the diagram with boxes to delinate thread > boundries (if I understand this correctly, this would mean splitting > the > pre-processing box up into three, each of which would be in the same > 'thread' box as the input module) Ok... I see where you are coming from. But is this split really useful? Each input runs on its own thread, so if we have n input, we need to split the Preprocessor box into n parts... > > It is also not as clear as it could be what happens when you don't > define > action queues. I would suggest a sepeate diagram just showing that > situation (if I understand it correctly, you could show the single- > queue > operation, and then have one of the 'actions' be to queue the item into > an > action queue for async processing. This is the core misunderstanding. You ALWAYS have action queues! You can not "not define action queues". What you can do is not define the operation mode of the action queue, in which case the default operation mode is used. That happens to be the non-queueing direct mode. But that doesn't at all change the picture. Eexcept, as noted above, that the queue no longer is a threading boundary. That, btw, is one reason I think threads do not belong into that diagram: they are not related to the data flow, they are just helper entities to implement it. > > also show where the message formatting take place (again, including > which > thread is doing the work). In the Action Postprocessor > it is esecially unclear what happens where > when > you have the action queues (is the message formatted before it's put > into > the action queue, or is the format of a message in the action queue > exactly the same as in the main queue?) > > some of this was covered (to at least some extent) by your comments, > but > this sort of thing is actually covered better with text. I'm not a > powerpoint person, but that sort of slideshow (with the text > explination) > is ideal for this sort of explination. Full ack, the tutorial is a "better than nothing" approach. Consolidating all doc available (did you read the queue doc? If not, you really should) is a major writing task, I'd think a couple of days (and reviews) to fully get it right. Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Tue Apr 21 07:45:55 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 07:45:55 +0200 Subject: [rsyslog] multi-message handling and databases References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF2D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF33@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Tuesday, April 21, 2009 4:39 AM > To: rsyslog-users > Subject: Re: [rsyslog] multi-message handling and databases > > On Mon, 20 Apr 2009, david at lang.hm wrote: > > > On Mon, 20 Apr 2009, Rainer Gerhards wrote: > > > >>> -----Original Message----- > >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>> > >>> On Mon, 20 Apr 2009, Rainer Gerhards wrote: > >>> > >>>>> > >>>>> X=max_messages > >>>>> > >>>>> if (messages in queue) > >>>>> mark that it is going to process the next X messages > >>>>> grab the messages > >>>>> format them for output > >>>>> attempt to deliver the messages > >>>>> if (message delived sucessfully) > >>>>> mark messages in the queue as delivered > >>>>> X=max_messages (reset X in case it was reduced due to > delivery > >>>>> errors) > >>>>> else (delivering this batch failed, reset and try to deliver > the > >>>>> first half) > >>>> > >>>> I think, in our previous discussion (mailing list archive), we > >>> concluded that > >>>> there is no value in re-trying with half of the batch. > >>> > >>> very possibly, I'm not remembering it. > >>> > >>> not doing so will simplify the code considerably, but the > advantages of > >>> retrying with half the batch are: > >>> > >>> 1. you deliver as much as you can > >>> > >>> 2. when you finally get stuck, you can pinpoint directly what > message > >>> you > >>> were stuck on (in case you have a failure based on the data, say > quotes > >>> in > >>> something that then gets formatted into a database, or slashes in > >>> something that becomes a filename component) > >>> > >>> your call > >> > >> I need to refer you back to our previous discussion. Unfortunately, > it was > >> private. I dug the link out and sent it via private mail. Sorry all > others, > >> please stand by a little moment. If I have not read it wrong, it > boiled down > >> to we have no non-transactional sources that were problematic and we > had not > >> identified cases where it would be useful to retry with fewer > elements. > >> > >> I'd provide a more complete description, but that would probably > take me > >> another 2...4 hours, and I hope to get around (yes, it was a > reeeaaaly long > >> discussion). David, if you like to quote anything from me, feel free > to do > >> so. > > > > I'll dig through this today and tonight and review this > > > > to be clear, I'm mostly concerned about the debugging/troubleshooting > > issues (which one of these 1000 messages made the database > complain..). > > but I guess this can be addressed by stopping rsyslog and restarting > it > > with a smaller batch size until you track it down. it should be rare > > enough to make that tolerable. > > looking back over the thread, I think a quick summary is that you > believe > that most output modules would not be able to do anything useful if > they > get an error in the middle of a set of messages. Yes > I'm not sure that I > agree, but I think it's a fairly easy thing to add later if you are > wrong. > so if you think it's a significant win to drop that for now, go ahead Not yet sure - I think we need to do quite some more doc before actually coding anything ;) > > > my initial proposal was for the output module to return the number of > records sucessfully written, so that those would not be retried. that > definantly isn't possible with a string-based interface, so we then > moved > to 'retry with half' before the thread wound down. I see the picture, but I am not yet sure if this turns out to be a problem. It depends much on the not-yet-specified ways of error recovery. Having said this, I think it is time to do a couple of documents. Most importantly, I think we need to list the failure/error cases and see how we can/need to handle them. This, I think, can lead us to the road of actual implementation work. But you've probably noticed there is much to be written, so let me relax a bit and think about where to start ;) Rainer > > David Lang > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Tue Apr 21 08:18:35 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 20 Apr 2009 23:18:35 -0700 (PDT) Subject: [rsyslog] RFC: On rsyslog output modules and support for batchoperations In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> References: <200904011802.14727.Luis.Fernando.Munoz.Mejias@cern.ch><200904021721.29880.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AEEB@GRFEXC.intern.adiscon.com><200904171713.14284.Luis.Fernando.Munoz.Mejias@cern.ch><9B6E2A8877C38245BFB15CC491A11DA702AF22@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF23@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF25@GRFEXC.intern.adiscon.com> Message-ID: On Mon, 20 Apr 2009, Rainer Gerhards wrote: > Excellent, but let me re-phrase: if you have a PostgreSQL expert at hand, > that would be even more useful (I can do testing with PostgreSQL myself, but > do not have access to Oracle - I overlooked that tiny little restriction when > posting ;)). I asked the Postgres experts on the postgres-performance mailing list see the thread titled 'performance for high-volume log insertion' archives are at http://archives.postgresql.org/pgsql-performance/ so far I managed to go down a blind alley, but in the process I learned that postgres has implemented a binary API, but it's really only usefule if you are dealing with datatypes that are far more efficiant to deal with in binary form (dates and numbers primaril). so for rsyslog it looks like there is little, if any benifit from using binary mode. the non-string API does have one significant benifit, it eliminates the need to escape strings. but countering that is the need to define what will be in each string (for an aarbatrary number of strings), plus define what the SQL for the prepared statement is. I don't think this is a net win for complexity, and I agree with you that for large batches, I don't expect that the API mode will be noticably better. the thread left off (for tonight) with a question posted by the person who aswered most of my questions > Have you done any testing to compare COPY vs. INSERT using prepared > statements? I'd be curious to know how those compare and against > multi-value INSERTS, prepared and unprepared. I'll let you know if anyone responds to this. and if RB ot anyone else has comments on ths I would like to know (independantly of whatever rsyslog ends up doing ;-) David Lang From Luis.Fernando.Munoz.Mejias at cern.ch Tue Apr 21 10:51:59 2009 From: Luis.Fernando.Munoz.Mejias at cern.ch (Luis Fernando =?iso-8859-1?q?Mu=F1oz_Mej=EDas?=) Date: Tue, 21 Apr 2009 10:51:59 +0200 Subject: [rsyslog] multi-message handling and databases In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702AF28@GRFEXC.intern.adiscon.com> Message-ID: <200904211051.59523.Luis.Fernando.Munoz.Mejias@cern.ch> Hi, I'm sorry for joining late to this party. Busy times around here. > > I think, in our previous discussion (mailing list archive), we concluded > > that there is no value in re-trying with half of the batch. > > very possibly, I'm not remembering it. > > not doing so will simplify the code considerably, but the advantages of > retrying with half the batch are: > > 1. you deliver as much as you can > > 2. when you finally get stuck, you can pinpoint directly what message you > were stuck on (in case you have a failure based on the data, say quotes in > something that then gets formatted into a database, or slashes in > something that becomes a filename component) FYI, Oracle's OCI is able to tell you how many entries were inserted and thus you can skip the exact offending entry and retry with the next part of the batch. I suppose other DB's interfaces have the same feature, with different namings. If the output module *knows* which the bad record is, I think it's better to offload the work to the OM than to force the core to binary search for it. I'm planning to add this to omoracle, but not this week... Cheers. -- Luis Fernando Mu?oz Mej?as Luis.Fernando.Munoz.Mejias at cern.ch From tbergfeld at hq.adiscon.com Tue Apr 21 16:53:44 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Tue, 21 Apr 2009 16:53:44 +0200 Subject: [rsyslog] rsyslog 3.22.0 (v3-stable) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF3D@GRFEXC.intern.adiscon.com> Hi all, rsyslog 3.22.0, a member of the v3-stable branch, has been released today. This is the next and final iteration of v3-stable, bringing all the features of the previous beta branch. Among others, it provides enhanced performance, greater configurability and enhanced stability. This is the new v3-stable branch, retiring any previous versions. As such, it is a recommended update for all v3-stable users. Changelog: http://www.rsyslog.com/Article368.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-157.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html. From rgerhards at hq.adiscon.com Tue Apr 21 16:59:10 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 16:59:10 +0200 Subject: [rsyslog] git branches - user "master" again! Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF3E@GRFEXC.intern.adiscon.com> Quick note for those that follow git updates: I've done the v3-stable/beta/master/nextmaster shift, so now nextmaster is gone (will be purged from repository in a few days). The devel branch is now back to "master", and will remain so until the next shuffle... (at least three month). Rainer From david at lang.hm Tue Apr 21 17:51:37 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 21 Apr 2009 08:51:37 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume log insertion In-Reply-To: <20090421154458.GD18845@it.is.rice.edu> References: <20090421015515.GR8123@tamriel.snowman.net> <20090421064554.GW8123@tamriel.snowman.net> <49ED8A37.4030509@archonet.com> <20090421133330.GZ18845@it.is.rice.edu> <20090421154458.GD18845@it.is.rice.edu> Message-ID: On Tue, 21 Apr 2009, Kenneth Marshall wrote: > On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: >> Kenneth, >> could you join the discussion on the rsyslog mailing list? >> rsyslog-users >> >> I'm surprised to hear you say that rsyslog can already do batch inserts and >> am interested in how you did that. >> >> what sort of insert rate did you mange to get? >> >> David Lang >> > David, > > I would be happy to join the discussion. I did not mean to say > that rsyslog currently supported batch inserts, just that the > pieces that provide "stand-by queuing" could be used to manage > batching inserts. I've changed the to list to the rsyslog users list. currently the stand-by queuing still handles messages one at a time. however a sponser has been found to pay to changing the rsyslog internals to allow for multiple messages to be handled at once, which is what triggered some of this discussion. which version of rsyslog are you working with? when you modified rsyslog to do prepared statement (to avoid the escaping and parsing) did you hard-code the prepared statement? what other changes did you make? David Lang > Cheers, > Ken > >> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >> >>> Date: Tue, 21 Apr 2009 08:33:30 -0500 >>> From: Kenneth Marshall >>> To: Richard Huxton >>> Cc: david at lang.hm, Stephen Frost , >>> Greg Smith , pgsql-performance at postgresql.org >>> Subject: Re: [PERFORM] performance for high-volume log insertion >>> Hi, >>> >>> I just finished reading this thread. We are currently working on >>> setting up a central log system using rsyslog and PostgreSQL. It >>> works well once we patched the memory leak. We also looked at what >>> could be done to improve the efficiency of the DB interface. On the >>> rsyslog side, moving to prepared queries allows you to remove the >>> escaping that needs to be done currently before attempting to >>> insert the data into the SQL backend as well as removing the parsing >>> and planning time from the insert. This is a big win for high insert >>> rates, which is what we are talking about. The escaping process is >>> also a big CPU user in rsyslog which then hands the escaped string >>> to the backend which then has to undo everything that had been done >>> and parse/plan the resulting query. This can use a surprising amount >>> of additional CPU. Even if you cannot support a general prepared >>> query interface, by specifying what the query should look like you >>> can handle much of the low-hanging fruit query-wise. >>> >>> We are currently using a date based trigger to use a new partition >>> each day and keep 2 months of logs currently. This can be usefully >>> managed on the backend database, but if rsyslog supported changing >>> the insert to the new table on a time basis, the CPU used by the >>> trigger to support this on the backend could be reclaimed. This >>> would be a win for any DB backend. As you move to the new partition, >>> issuing a truncate to clear the table would simplify the DB interfaces. >>> >>> Another performance enhancement already mentioned, would be to >>> allow certain extra fields in the DB to be automatically populated >>> as a function of the log messages. For example, logging the mail queue >>> id for messages from mail systems would make it much easier to locate >>> particular mail transactions in large amounts of data. >>> >>> To sum up, eliminating the escaping in rsyslog through the use of >>> prepared queries would reduce the CPU load on the DB backend. Batching >>> the inserts will also net you a big performance increase. Some DB-based >>> applications allow for the specification of several types of queries, >>> one for single inserts and then a second to support multiple inserts >>> (copy). Rsyslog already supports the queuing pieces to allow you to >>> batch inserts. Just some ideas. >>> >>> Regards, >>> Ken >>> >>> >>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: >>>> david at lang.hm wrote: >>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: >>>>>> * david at lang.hm (david at lang.hm) wrote: >>>>>>> while I fully understand the 'benchmark your situation' need, this >>>>>>> isn't >>>>>>> that simple. >>>>>> >>>>>> It really is. You know your application, you know it's primary use >>>>>> cases, and probably have some data to play with. You're certainly in a >>>>>> much better situation to at least *try* and benchmark it than we are. >>>>> rsyslog is a syslog server. it replaces (or for debian and fedora, has >>>>> replaced) your standard syslog daemon. it recieves log messages from >>>>> every >>>>> app on your system (and possibly others), filters, maniulates them, and >>>>> then stores them somewhere. among the places that it can store the logs >>>>> are database servers (native support for MySQL, PostgreSQL, and Oracle. >>>>> plus libdbi for others) >>>> >>>> Well, from a performance standpoint the obvious things to do are: >>>> 1. Keep a connection open, do NOT reconnect for each log-statement >>>> 2. Batch log statements together where possible >>>> 3. Use prepared statements >>>> 4. Partition the tables by day/week/month/year (configurable I suppose) >>>> >>>> The first two are vital, the third takes you a step further. The fourth >>>> is >>>> a long-term admin thing. >>>> >>>> And possibly >>>> 5. Have two connections, one for fatal/error etc and one for info/debug >>>> level log statements (configurable split?). Then you can use the >>>> synchronous_commit setting on the less important ones. Might buy you some >>>> performance on a busy system. >>>> >>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS >>>> >>>>> other apps then search and report on the data after it is stored. what >>>>> apps?, I don't know either. pick your favorite reporting tool and you'll >>>>> be a step ahead of me (I don't know a really good reporting tool) >>>>> as for sample data, you have syslog messages, just like I do. so you >>>>> have >>>>> the same access to data that I have. >>>>> how would you want to query them? how would people far less experianced >>>>> that you want to query them? >>>>> I can speculate that some people would do two columns (time, everything >>>>> else), others will do three (time, server, everything else), and others >>>>> will go further (I know some who would like to extract IP addresses >>>>> embedded in a message into their own column). some people will index on >>>>> the time and host, others will want to do full-text searches of >>>>> everything. >>>> >>>> Well, assuming it looks much like traditional syslog, I would do >>>> something >>>> like: (timestamp, host, facility, priority, message). It's easy enough to >>>> stitch back together if people want that. >>>> >>>> PostgreSQL's full-text indexing is quite well suited to logfiles I'd have >>>> thought, since it knows about filenames, urls etc already. >>>> >>>> If you want to get fancy, add a msg_type column and one subsidiary table >>>> for each msg_type. So - you might have smtp_connect_from (hostname, >>>> ip_addr). A set of perl regexps can match and extract the fields for >>>> these >>>> extra tables, or you could do it with triggers inside the database. I >>>> think >>>> it makes sense to do it in the application. Easier for users to >>>> contribute >>>> new patterns/extractions. Meanwhile, the core table is untouched so you >>>> don't *need* to know about these extra tables. >>>> >>>> If you have subsidiary tables, you'll want to partition those too and >>>> perhaps stick them in their own schema (logs200901, logs200902 etc). >>>> >>>> -- >>>> Richard Huxton >>>> Archonet Ltd >>>> >>>> -- >>>> Sent via pgsql-performance mailing list >>>> (pgsql-performance at postgresql.org) >>>> To make changes to your subscription: >>>> http://www.postgresql.org/mailpref/pgsql-performance >>>> >>> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance at postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > From rgerhards at hq.adiscon.com Tue Apr 21 18:31:29 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 18:31:29 +0200 Subject: [rsyslog] rsyslog queue operation - an overview Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com> Hi all, as I feared, clarifying queue operations took time. A full day of doc work, I hope it is worth it - but I am positive it is, because good understanding of this topic is vital to seriously discuss all the other issues that base on these concepts. When I look at the resulting document, it doesn't look like it took a whole day. Finding a good analogy was not easy, but with the help of Tom Bergfeld and a long discussion we came up with one that I like more and more. It is based on road junctions in everyday life and surprisingly precisely describes how things work in rsyslog and why they work in that way. Still, a lot of subtleties are missing (another day or three of work), but the overall picture should be clear(er). So please give it a try: http://www.rsyslog.com/doc-queues_analogy.html Even though I will probably disappointed in that case, please let me know if the document does not work for you (aka "you have no idea what I intend to say"). Otherwise, I cannot improve it. So honest feedback is appreciated. Thanks, Rainer From david at lang.hm Tue Apr 21 19:01:43 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 21 Apr 2009 10:01:43 -0700 (PDT) Subject: [rsyslog] rsyslog queue operation - an overview In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com> Message-ID: On Tue, 21 Apr 2009, Rainer Gerhards wrote: > Hi all, > > as I feared, clarifying queue operations took time. A full day of doc work, I > hope it is worth it - but I am positive it is, because good understanding of > this topic is vital to seriously discuss all the other issues that base on > these concepts. > > When I look at the resulting document, it doesn't look like it took a whole > day. Finding a good analogy was not easy, but with the help of Tom Bergfeld > and a long discussion we came up with one that I like more and more. It is > based on road junctions in everyday life and surprisingly precisely describes > how things work in rsyslog and why they work in that way. Still, a lot of > subtleties are missing (another day or three of work), but the overall > picture should be clear(er). So please give it a try: > > http://www.rsyslog.com/doc-queues_analogy.html > > Even though I will probably disappointed in that case, please let me know if > the document does not work for you (aka "you have no idea what I intend to > say"). Otherwise, I cannot improve it. So honest feedback is appreciated. this helps a lot. I'm still not sure where the template formatting takes place. David Lang From rgerhards at hq.adiscon.com Tue Apr 21 19:32:27 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Tue, 21 Apr 2009 19:32:27 +0200 Subject: [rsyslog] rsyslog queue operation - an overview Message-ID: <002301c9c2a7$63603c4f$100013ac@intern.adiscon.com> Template formatting: action processing, before the output module (I think I added a sentence especially to clarify that part...) ----- Urspr?ngliche Nachricht ----- Von: "david at lang.hm" An: "rsyslog-users" Gesendet: 21.04.09 19:02 Betreff: Re: [rsyslog] rsyslog queue operation - an overview On Tue, 21 Apr 2009, Rainer Gerhards wrote: > Hi all, > > as I feared, clarifying queue operations took time. A full day of doc work, I > hope it is worth it - but I am positive it is, because good understanding of > this topic is vital to seriously discuss all the other issues that base on > these concepts. > > When I look at the resulting document, it doesn't look like it took a whole > day. Finding a good analogy was not easy, but with the help of Tom Bergfeld > and a long discussion we came up with one that I like more and more. It is > based on road junctions in everyday life and surprisingly precisely describes > how things work in rsyslog and why they work in that way. Still, a lot of > subtleties are missing (another day or three of work), but the overall > picture should be clear(er). So please give it a try: > > http://www.rsyslog.com/doc-queues_analogy.html > > Even though I will probably disappointed in that case, please let me know if > the document does not work for you (aka "you have no idea what I intend to > say"). Otherwise, I cannot improve it. So honest feedback is appreciated. this helps a lot. I'm still not sure where the template formatting takes place. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com From ktm at rice.edu Tue Apr 21 19:52:17 2009 From: ktm at rice.edu (Kenneth Marshall) Date: Tue, 21 Apr 2009 12:52:17 -0500 Subject: [rsyslog] [PERFORM] performance for high-volume log insertion In-Reply-To: References: <20090421064554.GW8123@tamriel.snowman.net> <49ED8A37.4030509@archonet.com> <20090421133330.GZ18845@it.is.rice.edu> <20090421154458.GD18845@it.is.rice.edu> Message-ID: <20090421175217.GH18845@it.is.rice.edu> David, Okay, I am now subscribed to the mailing list. We are currently using rsyslog-3.20.x. As far as implementing a prototype of the prepared statement, I was sidetracked by other duties and have not had a chance to do anything but an initial evaluation. As far as the rsyslog internal escaping, it looked simplest to create another template type like the current SQL and STDSQL that indicated that escaping was not needed and/or that prepared statements should be used. Regards, Ken On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: > On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: >>> Kenneth, >>> could you join the discussion on the rsyslog mailing list? >>> rsyslog-users >>> >>> I'm surprised to hear you say that rsyslog can already do batch inserts >>> and >>> am interested in how you did that. >>> >>> what sort of insert rate did you mange to get? >>> >>> David Lang >>> >> David, >> >> I would be happy to join the discussion. I did not mean to say >> that rsyslog currently supported batch inserts, just that the >> pieces that provide "stand-by queuing" could be used to manage >> batching inserts. > > I've changed the to list to the rsyslog users list. > > currently the stand-by queuing still handles messages one at a time. > however a sponser has been found to pay to changing the rsyslog internals > to allow for multiple messages to be handled at once, which is what > triggered some of this discussion. > > which version of rsyslog are you working with? > > when you modified rsyslog to do prepared statement (to avoid the escaping > and parsing) did you hard-code the prepared statement? what other changes > did you make? > > David Lang > >> Cheers, >> Ken >> >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>> >>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 >>>> From: Kenneth Marshall >>>> To: Richard Huxton >>>> Cc: david at lang.hm, Stephen Frost , >>>> Greg Smith , pgsql-performance at postgresql.org >>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>> Hi, >>>> >>>> I just finished reading this thread. We are currently working on >>>> setting up a central log system using rsyslog and PostgreSQL. It >>>> works well once we patched the memory leak. We also looked at what >>>> could be done to improve the efficiency of the DB interface. On the >>>> rsyslog side, moving to prepared queries allows you to remove the >>>> escaping that needs to be done currently before attempting to >>>> insert the data into the SQL backend as well as removing the parsing >>>> and planning time from the insert. This is a big win for high insert >>>> rates, which is what we are talking about. The escaping process is >>>> also a big CPU user in rsyslog which then hands the escaped string >>>> to the backend which then has to undo everything that had been done >>>> and parse/plan the resulting query. This can use a surprising amount >>>> of additional CPU. Even if you cannot support a general prepared >>>> query interface, by specifying what the query should look like you >>>> can handle much of the low-hanging fruit query-wise. >>>> >>>> We are currently using a date based trigger to use a new partition >>>> each day and keep 2 months of logs currently. This can be usefully >>>> managed on the backend database, but if rsyslog supported changing >>>> the insert to the new table on a time basis, the CPU used by the >>>> trigger to support this on the backend could be reclaimed. This >>>> would be a win for any DB backend. As you move to the new partition, >>>> issuing a truncate to clear the table would simplify the DB interfaces. >>>> >>>> Another performance enhancement already mentioned, would be to >>>> allow certain extra fields in the DB to be automatically populated >>>> as a function of the log messages. For example, logging the mail queue >>>> id for messages from mail systems would make it much easier to locate >>>> particular mail transactions in large amounts of data. >>>> >>>> To sum up, eliminating the escaping in rsyslog through the use of >>>> prepared queries would reduce the CPU load on the DB backend. Batching >>>> the inserts will also net you a big performance increase. Some DB-based >>>> applications allow for the specification of several types of queries, >>>> one for single inserts and then a second to support multiple inserts >>>> (copy). Rsyslog already supports the queuing pieces to allow you to >>>> batch inserts. Just some ideas. >>>> >>>> Regards, >>>> Ken >>>> >>>> >>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: >>>>> david at lang.hm wrote: >>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: >>>>>>> * david at lang.hm (david at lang.hm) wrote: >>>>>>>> while I fully understand the 'benchmark your situation' need, this >>>>>>>> isn't >>>>>>>> that simple. >>>>>>> >>>>>>> It really is. You know your application, you know it's primary use >>>>>>> cases, and probably have some data to play with. You're certainly in >>>>>>> a >>>>>>> much better situation to at least *try* and benchmark it than we are. >>>>>> rsyslog is a syslog server. it replaces (or for debian and fedora, has >>>>>> replaced) your standard syslog daemon. it recieves log messages from >>>>>> every >>>>>> app on your system (and possibly others), filters, maniulates them, >>>>>> and >>>>>> then stores them somewhere. among the places that it can store the >>>>>> logs >>>>>> are database servers (native support for MySQL, PostgreSQL, and >>>>>> Oracle. >>>>>> plus libdbi for others) >>>>> >>>>> Well, from a performance standpoint the obvious things to do are: >>>>> 1. Keep a connection open, do NOT reconnect for each log-statement >>>>> 2. Batch log statements together where possible >>>>> 3. Use prepared statements >>>>> 4. Partition the tables by day/week/month/year (configurable I suppose) >>>>> >>>>> The first two are vital, the third takes you a step further. The fourth >>>>> is >>>>> a long-term admin thing. >>>>> >>>>> And possibly >>>>> 5. Have two connections, one for fatal/error etc and one for info/debug >>>>> level log statements (configurable split?). Then you can use the >>>>> synchronous_commit setting on the less important ones. Might buy you >>>>> some >>>>> performance on a busy system. >>>>> >>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS >>>>> >>>>>> other apps then search and report on the data after it is stored. what >>>>>> apps?, I don't know either. pick your favorite reporting tool and >>>>>> you'll >>>>>> be a step ahead of me (I don't know a really good reporting tool) >>>>>> as for sample data, you have syslog messages, just like I do. so you >>>>>> have >>>>>> the same access to data that I have. >>>>>> how would you want to query them? how would people far less >>>>>> experianced >>>>>> that you want to query them? >>>>>> I can speculate that some people would do two columns (time, >>>>>> everything >>>>>> else), others will do three (time, server, everything else), and >>>>>> others >>>>>> will go further (I know some who would like to extract IP addresses >>>>>> embedded in a message into their own column). some people will index >>>>>> on >>>>>> the time and host, others will want to do full-text searches of >>>>>> everything. >>>>> >>>>> Well, assuming it looks much like traditional syslog, I would do >>>>> something >>>>> like: (timestamp, host, facility, priority, message). It's easy enough >>>>> to >>>>> stitch back together if people want that. >>>>> >>>>> PostgreSQL's full-text indexing is quite well suited to logfiles I'd >>>>> have >>>>> thought, since it knows about filenames, urls etc already. >>>>> >>>>> If you want to get fancy, add a msg_type column and one subsidiary >>>>> table >>>>> for each msg_type. So - you might have smtp_connect_from (hostname, >>>>> ip_addr). A set of perl regexps can match and extract the fields for >>>>> these >>>>> extra tables, or you could do it with triggers inside the database. I >>>>> think >>>>> it makes sense to do it in the application. Easier for users to >>>>> contribute >>>>> new patterns/extractions. Meanwhile, the core table is untouched so you >>>>> don't *need* to know about these extra tables. >>>>> >>>>> If you have subsidiary tables, you'll want to partition those too and >>>>> perhaps stick them in their own schema (logs200901, logs200902 etc). >>>>> >>>>> -- >>>>> Richard Huxton >>>>> Archonet Ltd >>>>> >>>>> -- >>>>> Sent via pgsql-performance mailing list >>>>> (pgsql-performance at postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>> >>>> >>> >>> -- >>> Sent via pgsql-performance mailing list >>> (pgsql-performance at postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-performance >>> >> > From rgerhards at hq.adiscon.com Wed Apr 22 07:17:19 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 07:17:19 +0200 Subject: [rsyslog] rsyslog queue operation - an overview References: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF41@GRFEXC.intern.adiscon.com> David, quick question, in order to help improve the document. Could you please look at the second paragraph below the "radio tower picture". It says: "Now let's look at the action queues: here, the active part, the producer, is the Parser and Filter Engine. The passive part is the Action Processor. The latter does any processing that is necessary to call the output plugin, in particular it processes the template to create the plugin calling parameters (either a string or vector of arguments)." I thought that addresses the question on who generates the template strings. I am not sure if you missed it, or if the wording is bad. If it is a problem with the wording, I'd appreciate if you could suggest some text (it is always a bit harder for the non-native English speakers, plus, I have to admit, I do not really invest another day to do the full editorial work that would be necessary to make this a great document - anyone up for this task? ;)). Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Tuesday, April 21, 2009 7:02 PM > To: rsyslog-users > Subject: Re: [rsyslog] rsyslog queue operation - an overview > > On Tue, 21 Apr 2009, Rainer Gerhards wrote: > > > Hi all, > > > > as I feared, clarifying queue operations took time. A full day of doc > work, I > > hope it is worth it - but I am positive it is, because good > understanding of > > this topic is vital to seriously discuss all the other issues that > base on > > these concepts. > > > > When I look at the resulting document, it doesn't look like it took a > whole > > day. Finding a good analogy was not easy, but with the help of Tom > Bergfeld > > and a long discussion we came up with one that I like more and more. > It is > > based on road junctions in everyday life and surprisingly precisely > describes > > how things work in rsyslog and why they work in that way. Still, a > lot of > > subtleties are missing (another day or three of work), but the > overall > > picture should be clear(er). So please give it a try: > > > > http://www.rsyslog.com/doc-queues_analogy.html > > > > Even though I will probably disappointed in that case, please let me > know if > > the document does not work for you (aka "you have no idea what I > intend to > > say"). Otherwise, I cannot improve it. So honest feedback is > appreciated. > > this helps a lot. I'm still not sure where the template formatting > takes > place. > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed Apr 22 07:34:08 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 21 Apr 2009 22:34:08 -0700 (PDT) Subject: [rsyslog] rsyslog queue operation - an overview In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF41@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF41@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 22 Apr 2009, Rainer Gerhards wrote: > David, > > quick question, in order to help improve the document. Could you please look > at the second paragraph below the "radio tower picture". It says: > > "Now let's look at the action queues: here, the active part, the producer, is > the Parser and Filter Engine. The passive part is the Action Processor. The > latter does any processing that is necessary to call the output plugin, in > particular it processes the template to create the plugin calling parameters > (either a string or vector of arguments)." > > I thought that addresses the question on who generates the template strings. > I am not sure if you missed it, or if the wording is bad. If it is a problem > with the wording, I'd appreciate if you could suggest some text (it is always > a bit harder for the non-native English speakers, plus, I have to admit, I > do not really invest another day to do the full editorial work that would be > necessary to make this a great document - anyone up for this task? ;)). I managed to miss this. in the next few days I'm going to try to write up my understanding of this (on the basis that if I can explain it, it means that I understand it) David Lang > Rainer > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Tuesday, April 21, 2009 7:02 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] rsyslog queue operation - an overview >> >> On Tue, 21 Apr 2009, Rainer Gerhards wrote: >> >>> Hi all, >>> >>> as I feared, clarifying queue operations took time. A full day of doc >> work, I >>> hope it is worth it - but I am positive it is, because good >> understanding of >>> this topic is vital to seriously discuss all the other issues that >> base on >>> these concepts. >>> >>> When I look at the resulting document, it doesn't look like it took a >> whole >>> day. Finding a good analogy was not easy, but with the help of Tom >> Bergfeld >>> and a long discussion we came up with one that I like more and more. >> It is >>> based on road junctions in everyday life and surprisingly precisely >> describes >>> how things work in rsyslog and why they work in that way. Still, a >> lot of >>> subtleties are missing (another day or three of work), but the >> overall >>> picture should be clear(er). So please give it a try: >>> >>> http://www.rsyslog.com/doc-queues_analogy.html >>> >>> Even though I will probably disappointed in that case, please let me >> know if >>> the document does not work for you (aka "you have no idea what I >> intend to >>> say"). Otherwise, I cannot improve it. So honest feedback is >> appreciated. >> >> this helps a lot. I'm still not sure where the template formatting >> takes >> place. >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed Apr 22 07:37:02 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 07:37:02 +0200 Subject: [rsyslog] rsyslog queue operation - an overview References: <9B6E2A8877C38245BFB15CC491A11DA702AF3F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF41@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF43@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, April 22, 2009 7:34 AM > To: rsyslog-users > Subject: Re: [rsyslog] rsyslog queue operation - an overview > > On Wed, 22 Apr 2009, Rainer Gerhards wrote: > > > David, > > > > quick question, in order to help improve the document. Could you > please look > > at the second paragraph below the "radio tower picture". It says: > > > > "Now let's look at the action queues: here, the active part, the > producer, is > > the Parser and Filter Engine. The passive part is the Action > Processor. The > > latter does any processing that is necessary to call the output > plugin, in > > particular it processes the template to create the plugin calling > parameters > > (either a string or vector of arguments)." > > > > I thought that addresses the question on who generates the template > strings. > > I am not sure if you missed it, or if the wording is bad. If it is a > problem > > with the wording, I'd appreciate if you could suggest some text (it > is always > > a bit harder for the non-native English speakers, plus, I have to > admit, I > > do not really invest another day to do the full editorial work that > would be > > necessary to make this a great document - anyone up for this task? > ;)). > > I managed to miss this. > > in the next few days I'm going to try to write up my understanding of > this > (on the basis that if I can explain it, it means that I understand it) Excellent, I, too, like this method. Plus, it gives us a second description, finally by a different author, of the mechanism. Rainer From rgerhards at hq.adiscon.com Wed Apr 22 08:09:46 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 08:09:46 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume log insertion References: <20090421064554.GW8123@tamriel.snowman.net><49ED8A37.4030509@archonet.com><20090421133330.GZ18845@it.is.rice.edu><20090421154458.GD18845@it.is.rice.edu> <20090421175217.GH18845@it.is.rice.edu> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF48@GRFEXC.intern.adiscon.com> Hi Ken, glad to have you here. I am a bit silent at the moment, because I am not a real database guy and so I am primarily listening to any information that is incoming. If you have a couple of minutes, it would be useful to review this thread here: http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html ...one comment inline below... > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Kenneth Marshall > Sent: Tuesday, April 21, 2009 7:52 PM > To: david at lang.hm > Cc: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion > > David, > > Okay, I am now subscribed to the mailing list. We are currently using > rsyslog-3.20.x. As far as implementing a prototype of the prepared > statement, I was sidetracked by other duties and have not had a chance > to do anything but an initial evaluation. As far as the rsyslog > internal > escaping, it looked simplest to create another template type like the > current SQL and STDSQL that indicated that escaping was not needed > and/or that prepared statements should be used. To disable escaping, simply do not use SQL or STDSQL. However, the db outputs currently require this option (easy to disable), because I cannot see how it will work (with the existing code) without escaping. Any idea is most welcome. Rainer > > Regards, > Ken > > On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: > > On Tue, 21 Apr 2009, Kenneth Marshall wrote: > > > >> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: > >>> Kenneth, > >>> could you join the discussion on the rsyslog mailing list? > >>> rsyslog-users > >>> > >>> I'm surprised to hear you say that rsyslog can already do batch > inserts > >>> and > >>> am interested in how you did that. > >>> > >>> what sort of insert rate did you mange to get? > >>> > >>> David Lang > >>> > >> David, > >> > >> I would be happy to join the discussion. I did not mean to say > >> that rsyslog currently supported batch inserts, just that the > >> pieces that provide "stand-by queuing" could be used to manage > >> batching inserts. > > > > I've changed the to list to the rsyslog users list. > > > > currently the stand-by queuing still handles messages one at a time. > > however a sponser has been found to pay to changing the rsyslog > internals > > to allow for multiple messages to be handled at once, which is what > > triggered some of this discussion. > > > > which version of rsyslog are you working with? > > > > when you modified rsyslog to do prepared statement (to avoid the > escaping > > and parsing) did you hard-code the prepared statement? what other > changes > > did you make? > > > > David Lang > > > >> Cheers, > >> Ken > >> > >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>> > >>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 > >>>> From: Kenneth Marshall > >>>> To: Richard Huxton > >>>> Cc: david at lang.hm, Stephen Frost , > >>>> Greg Smith , pgsql- > performance at postgresql.org > >>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>> Hi, > >>>> > >>>> I just finished reading this thread. We are currently working on > >>>> setting up a central log system using rsyslog and PostgreSQL. It > >>>> works well once we patched the memory leak. We also looked at what > >>>> could be done to improve the efficiency of the DB interface. On > the > >>>> rsyslog side, moving to prepared queries allows you to remove the > >>>> escaping that needs to be done currently before attempting to > >>>> insert the data into the SQL backend as well as removing the > parsing > >>>> and planning time from the insert. This is a big win for high > insert > >>>> rates, which is what we are talking about. The escaping process is > >>>> also a big CPU user in rsyslog which then hands the escaped string > >>>> to the backend which then has to undo everything that had been > done > >>>> and parse/plan the resulting query. This can use a surprising > amount > >>>> of additional CPU. Even if you cannot support a general prepared > >>>> query interface, by specifying what the query should look like you > >>>> can handle much of the low-hanging fruit query-wise. > >>>> > >>>> We are currently using a date based trigger to use a new partition > >>>> each day and keep 2 months of logs currently. This can be usefully > >>>> managed on the backend database, but if rsyslog supported changing > >>>> the insert to the new table on a time basis, the CPU used by the > >>>> trigger to support this on the backend could be reclaimed. This > >>>> would be a win for any DB backend. As you move to the new > partition, > >>>> issuing a truncate to clear the table would simplify the DB > interfaces. > >>>> > >>>> Another performance enhancement already mentioned, would be to > >>>> allow certain extra fields in the DB to be automatically populated > >>>> as a function of the log messages. For example, logging the mail > queue > >>>> id for messages from mail systems would make it much easier to > locate > >>>> particular mail transactions in large amounts of data. > >>>> > >>>> To sum up, eliminating the escaping in rsyslog through the use of > >>>> prepared queries would reduce the CPU load on the DB backend. > Batching > >>>> the inserts will also net you a big performance increase. Some DB- > based > >>>> applications allow for the specification of several types of > queries, > >>>> one for single inserts and then a second to support multiple > inserts > >>>> (copy). Rsyslog already supports the queuing pieces to allow you > to > >>>> batch inserts. Just some ideas. > >>>> > >>>> Regards, > >>>> Ken > >>>> > >>>> > >>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: > >>>>> david at lang.hm wrote: > >>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: > >>>>>>> * david at lang.hm (david at lang.hm) wrote: > >>>>>>>> while I fully understand the 'benchmark your situation' need, > this > >>>>>>>> isn't > >>>>>>>> that simple. > >>>>>>> > >>>>>>> It really is. You know your application, you know it's primary > use > >>>>>>> cases, and probably have some data to play with. You're > certainly in > >>>>>>> a > >>>>>>> much better situation to at least *try* and benchmark it than > we are. > >>>>>> rsyslog is a syslog server. it replaces (or for debian and > fedora, has > >>>>>> replaced) your standard syslog daemon. it recieves log messages > from > >>>>>> every > >>>>>> app on your system (and possibly others), filters, maniulates > them, > >>>>>> and > >>>>>> then stores them somewhere. among the places that it can store > the > >>>>>> logs > >>>>>> are database servers (native support for MySQL, PostgreSQL, and > >>>>>> Oracle. > >>>>>> plus libdbi for others) > >>>>> > >>>>> Well, from a performance standpoint the obvious things to do are: > >>>>> 1. Keep a connection open, do NOT reconnect for each log- > statement > >>>>> 2. Batch log statements together where possible > >>>>> 3. Use prepared statements > >>>>> 4. Partition the tables by day/week/month/year (configurable I > suppose) > >>>>> > >>>>> The first two are vital, the third takes you a step further. The > fourth > >>>>> is > >>>>> a long-term admin thing. > >>>>> > >>>>> And possibly > >>>>> 5. Have two connections, one for fatal/error etc and one for > info/debug > >>>>> level log statements (configurable split?). Then you can use the > >>>>> synchronous_commit setting on the less important ones. Might buy > you > >>>>> some > >>>>> performance on a busy system. > >>>>> > >>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- > wal.html#RUNTIME-CONFIG-WAL-SETTINGS > >>>>> > >>>>>> other apps then search and report on the data after it is > stored. what > >>>>>> apps?, I don't know either. pick your favorite reporting tool > and > >>>>>> you'll > >>>>>> be a step ahead of me (I don't know a really good reporting > tool) > >>>>>> as for sample data, you have syslog messages, just like I do. so > you > >>>>>> have > >>>>>> the same access to data that I have. > >>>>>> how would you want to query them? how would people far less > >>>>>> experianced > >>>>>> that you want to query them? > >>>>>> I can speculate that some people would do two columns (time, > >>>>>> everything > >>>>>> else), others will do three (time, server, everything else), and > >>>>>> others > >>>>>> will go further (I know some who would like to extract IP > addresses > >>>>>> embedded in a message into their own column). some people will > index > >>>>>> on > >>>>>> the time and host, others will want to do full-text searches of > >>>>>> everything. > >>>>> > >>>>> Well, assuming it looks much like traditional syslog, I would do > >>>>> something > >>>>> like: (timestamp, host, facility, priority, message). It's easy > enough > >>>>> to > >>>>> stitch back together if people want that. > >>>>> > >>>>> PostgreSQL's full-text indexing is quite well suited to logfiles > I'd > >>>>> have > >>>>> thought, since it knows about filenames, urls etc already. > >>>>> > >>>>> If you want to get fancy, add a msg_type column and one > subsidiary > >>>>> table > >>>>> for each msg_type. So - you might have smtp_connect_from > (hostname, > >>>>> ip_addr). A set of perl regexps can match and extract the fields > for > >>>>> these > >>>>> extra tables, or you could do it with triggers inside the > database. I > >>>>> think > >>>>> it makes sense to do it in the application. Easier for users to > >>>>> contribute > >>>>> new patterns/extractions. Meanwhile, the core table is untouched > so you > >>>>> don't *need* to know about these extra tables. > >>>>> > >>>>> If you have subsidiary tables, you'll want to partition those too > and > >>>>> perhaps stick them in their own schema (logs200901, logs200902 > etc). > >>>>> > >>>>> -- > >>>>> Richard Huxton > >>>>> Archonet Ltd > >>>>> > >>>>> -- > >>>>> Sent via pgsql-performance mailing list > >>>>> (pgsql-performance at postgresql.org) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>> > >>>> > >>> > >>> -- > >>> Sent via pgsql-performance mailing list > >>> (pgsql-performance at postgresql.org) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > >>> > >> > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed Apr 22 08:26:28 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 21 Apr 2009 23:26:28 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume log insertion In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF48@GRFEXC.intern.adiscon.com> References: <20090421064554.GW8123@tamriel.snowman.net><49ED8A37.4030509@archonet.com><20090421133330.GZ18845@it.is.rice.edu><20090421154458.GD18845@it.is.rice.edu> <20090421175217.GH18845@it.is.rice.edu> <9B6E2A8877C38245BFB15CC491A11DA702AF48@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 22 Apr 2009, Rainer Gerhards wrote: > Hi Ken, > > glad to have you here. I am a bit silent at the moment, because I am not a > real database guy and so I am primarily listening to any information that is > incoming. If you have a couple of minutes, it would be useful to review this > thread here: > > http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html > > ...one comment inline below... > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Kenneth Marshall >> >> David, >> >> Okay, I am now subscribed to the mailing list. We are currently using >> rsyslog-3.20.x. As far as implementing a prototype of the prepared >> statement, I was sidetracked by other duties and have not had a chance >> to do anything but an initial evaluation. As far as the rsyslog >> internal >> escaping, it looked simplest to create another template type like the >> current SQL and STDSQL that indicated that escaping was not needed >> and/or that prepared statements should be used. > > To disable escaping, simply do not use SQL or STDSQL. However, the db outputs > currently require this option (easy to disable), because I cannot see how it > will work (with the existing code) without escaping. > > Any idea is most welcome. when using prepared statement escaping is not needed. according to Ken's message below he found that the overhead of doing the escaping was significant. I don't see why this should be the case, but if it requires making an extra copy of the string I guess it's possible I plan to get a setup togeather in the next couple of days that will let me do some testing of the options on the database side. David Lang > Rainer >> >> Regards, >> Ken >> >> On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>> >>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: >>>>> Kenneth, >>>>> could you join the discussion on the rsyslog mailing list? >>>>> rsyslog-users >>>>> >>>>> I'm surprised to hear you say that rsyslog can already do batch >> inserts >>>>> and >>>>> am interested in how you did that. >>>>> >>>>> what sort of insert rate did you mange to get? >>>>> >>>>> David Lang >>>>> >>>> David, >>>> >>>> I would be happy to join the discussion. I did not mean to say >>>> that rsyslog currently supported batch inserts, just that the >>>> pieces that provide "stand-by queuing" could be used to manage >>>> batching inserts. >>> >>> I've changed the to list to the rsyslog users list. >>> >>> currently the stand-by queuing still handles messages one at a time. >>> however a sponser has been found to pay to changing the rsyslog >> internals >>> to allow for multiple messages to be handled at once, which is what >>> triggered some of this discussion. >>> >>> which version of rsyslog are you working with? >>> >>> when you modified rsyslog to do prepared statement (to avoid the >> escaping >>> and parsing) did you hard-code the prepared statement? what other >> changes >>> did you make? >>> >>> David Lang >>> >>>> Cheers, >>>> Ken >>>> >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>>>> >>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 >>>>>> From: Kenneth Marshall >>>>>> To: Richard Huxton >>>>>> Cc: david at lang.hm, Stephen Frost , >>>>>> Greg Smith , pgsql- >> performance at postgresql.org >>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>>>> Hi, >>>>>> >>>>>> I just finished reading this thread. We are currently working on >>>>>> setting up a central log system using rsyslog and PostgreSQL. It >>>>>> works well once we patched the memory leak. We also looked at what >>>>>> could be done to improve the efficiency of the DB interface. On >> the >>>>>> rsyslog side, moving to prepared queries allows you to remove the >>>>>> escaping that needs to be done currently before attempting to >>>>>> insert the data into the SQL backend as well as removing the >> parsing >>>>>> and planning time from the insert. This is a big win for high >> insert >>>>>> rates, which is what we are talking about. The escaping process is >>>>>> also a big CPU user in rsyslog which then hands the escaped string >>>>>> to the backend which then has to undo everything that had been >> done >>>>>> and parse/plan the resulting query. This can use a surprising >> amount >>>>>> of additional CPU. Even if you cannot support a general prepared >>>>>> query interface, by specifying what the query should look like you >>>>>> can handle much of the low-hanging fruit query-wise. >>>>>> >>>>>> We are currently using a date based trigger to use a new partition >>>>>> each day and keep 2 months of logs currently. This can be usefully >>>>>> managed on the backend database, but if rsyslog supported changing >>>>>> the insert to the new table on a time basis, the CPU used by the >>>>>> trigger to support this on the backend could be reclaimed. This >>>>>> would be a win for any DB backend. As you move to the new >> partition, >>>>>> issuing a truncate to clear the table would simplify the DB >> interfaces. >>>>>> >>>>>> Another performance enhancement already mentioned, would be to >>>>>> allow certain extra fields in the DB to be automatically populated >>>>>> as a function of the log messages. For example, logging the mail >> queue >>>>>> id for messages from mail systems would make it much easier to >> locate >>>>>> particular mail transactions in large amounts of data. >>>>>> >>>>>> To sum up, eliminating the escaping in rsyslog through the use of >>>>>> prepared queries would reduce the CPU load on the DB backend. >> Batching >>>>>> the inserts will also net you a big performance increase. Some DB- >> based >>>>>> applications allow for the specification of several types of >> queries, >>>>>> one for single inserts and then a second to support multiple >> inserts >>>>>> (copy). Rsyslog already supports the queuing pieces to allow you >> to >>>>>> batch inserts. Just some ideas. >>>>>> >>>>>> Regards, >>>>>> Ken >>>>>> >>>>>> >>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: >>>>>>> david at lang.hm wrote: >>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: >>>>>>>>> * david at lang.hm (david at lang.hm) wrote: >>>>>>>>>> while I fully understand the 'benchmark your situation' need, >> this >>>>>>>>>> isn't >>>>>>>>>> that simple. >>>>>>>>> >>>>>>>>> It really is. You know your application, you know it's primary >> use >>>>>>>>> cases, and probably have some data to play with. You're >> certainly in >>>>>>>>> a >>>>>>>>> much better situation to at least *try* and benchmark it than >> we are. >>>>>>>> rsyslog is a syslog server. it replaces (or for debian and >> fedora, has >>>>>>>> replaced) your standard syslog daemon. it recieves log messages >> from >>>>>>>> every >>>>>>>> app on your system (and possibly others), filters, maniulates >> them, >>>>>>>> and >>>>>>>> then stores them somewhere. among the places that it can store >> the >>>>>>>> logs >>>>>>>> are database servers (native support for MySQL, PostgreSQL, and >>>>>>>> Oracle. >>>>>>>> plus libdbi for others) >>>>>>> >>>>>>> Well, from a performance standpoint the obvious things to do are: >>>>>>> 1. Keep a connection open, do NOT reconnect for each log- >> statement >>>>>>> 2. Batch log statements together where possible >>>>>>> 3. Use prepared statements >>>>>>> 4. Partition the tables by day/week/month/year (configurable I >> suppose) >>>>>>> >>>>>>> The first two are vital, the third takes you a step further. The >> fourth >>>>>>> is >>>>>>> a long-term admin thing. >>>>>>> >>>>>>> And possibly >>>>>>> 5. Have two connections, one for fatal/error etc and one for >> info/debug >>>>>>> level log statements (configurable split?). Then you can use the >>>>>>> synchronous_commit setting on the less important ones. Might buy >> you >>>>>>> some >>>>>>> performance on a busy system. >>>>>>> >>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- >> wal.html#RUNTIME-CONFIG-WAL-SETTINGS >>>>>>> >>>>>>>> other apps then search and report on the data after it is >> stored. what >>>>>>>> apps?, I don't know either. pick your favorite reporting tool >> and >>>>>>>> you'll >>>>>>>> be a step ahead of me (I don't know a really good reporting >> tool) >>>>>>>> as for sample data, you have syslog messages, just like I do. so >> you >>>>>>>> have >>>>>>>> the same access to data that I have. >>>>>>>> how would you want to query them? how would people far less >>>>>>>> experianced >>>>>>>> that you want to query them? >>>>>>>> I can speculate that some people would do two columns (time, >>>>>>>> everything >>>>>>>> else), others will do three (time, server, everything else), and >>>>>>>> others >>>>>>>> will go further (I know some who would like to extract IP >> addresses >>>>>>>> embedded in a message into their own column). some people will >> index >>>>>>>> on >>>>>>>> the time and host, others will want to do full-text searches of >>>>>>>> everything. >>>>>>> >>>>>>> Well, assuming it looks much like traditional syslog, I would do >>>>>>> something >>>>>>> like: (timestamp, host, facility, priority, message). It's easy >> enough >>>>>>> to >>>>>>> stitch back together if people want that. >>>>>>> >>>>>>> PostgreSQL's full-text indexing is quite well suited to logfiles >> I'd >>>>>>> have >>>>>>> thought, since it knows about filenames, urls etc already. >>>>>>> >>>>>>> If you want to get fancy, add a msg_type column and one >> subsidiary >>>>>>> table >>>>>>> for each msg_type. So - you might have smtp_connect_from >> (hostname, >>>>>>> ip_addr). A set of perl regexps can match and extract the fields >> for >>>>>>> these >>>>>>> extra tables, or you could do it with triggers inside the >> database. I >>>>>>> think >>>>>>> it makes sense to do it in the application. Easier for users to >>>>>>> contribute >>>>>>> new patterns/extractions. Meanwhile, the core table is untouched >> so you >>>>>>> don't *need* to know about these extra tables. >>>>>>> >>>>>>> If you have subsidiary tables, you'll want to partition those too >> and >>>>>>> perhaps stick them in their own schema (logs200901, logs200902 >> etc). >>>>>>> >>>>>>> -- >>>>>>> Richard Huxton >>>>>>> Archonet Ltd >>>>>>> >>>>>>> -- >>>>>>> Sent via pgsql-performance mailing list >>>>>>> (pgsql-performance at postgresql.org) >>>>>>> To make changes to your subscription: >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Sent via pgsql-performance mailing list >>>>> (pgsql-performance at postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>> >>>> >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed Apr 22 08:33:05 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 08:33:05 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume log insertion References: <20090421064554.GW8123@tamriel.snowman.net><49ED8A37.4030509@archonet.com><20090421133330.GZ18845@it.is.rice.edu><20090421154458.GD18845@it.is.rice.edu><20090421175217.GH18845@it.is.rice.edu><9B6E2A8877C38245BFB15CC491A11DA702AF48@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF49@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, April 22, 2009 8:26 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion > > On Wed, 22 Apr 2009, Rainer Gerhards wrote: > > > Hi Ken, > > > > glad to have you here. I am a bit silent at the moment, because I am > not a > > real database guy and so I am primarily listening to any information > that is > > incoming. If you have a couple of minutes, it would be useful to > review this > > thread here: > > > > http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html > > > > ...one comment inline below... > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of Kenneth Marshall > >> > >> David, > >> > >> Okay, I am now subscribed to the mailing list. We are currently > using > >> rsyslog-3.20.x. As far as implementing a prototype of the prepared > >> statement, I was sidetracked by other duties and have not had a > chance > >> to do anything but an initial evaluation. As far as the rsyslog > >> internal > >> escaping, it looked simplest to create another template type like > the > >> current SQL and STDSQL that indicated that escaping was not needed > >> and/or that prepared statements should be used. > > > > To disable escaping, simply do not use SQL or STDSQL. However, the db > outputs > > currently require this option (easy to disable), because I cannot see > how it > > will work (with the existing code) without escaping. > > > > Any idea is most welcome. > > when using prepared statement escaping is not needed. ... but that can only be on the non-text API level (e.g. by using libpq). On the SQL text level, I have no idea how to tell the sql engine to insert ' - if I don't say '''' but ''' how does the engine know what I say? With the C-level API, I bind a parameter and specify a buffer and then put my character into that buffer - no escaping needed for sure. But, again, I do not see how this would work on the text level... > according to > Ken's > message below he found that the overhead of doing the escaping was > significant. I don't see why this should be the case, but if it > requires > making an extra copy of the string I guess it's possible > I am surprised, too. Even if a copy is made (I think it is), this is a quick in-memory operation. Given the rest of the picture, I would expect that to have very low impact on the overall cost (just think about the need to copy buffers between different contexts, down to different layers, etc - so I'd expect to see ample copy operations before the data finally hits the disk). Anyhow, I may be totally wrong... > I plan to get a setup togeather in the next couple of days that will > let > me do some testing of the options on the database side. > That would be great. I, for now, intend to look at the queue first. I have begun to thought about steps on how to tackle the beast. So I will probably not do much more on the database level than throw in some thoughts (but not do any testing or coding). Rainer > David Lang > > > Rainer > >> > >> Regards, > >> Ken > >> > >> On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: > >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>> > >>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: > >>>>> Kenneth, > >>>>> could you join the discussion on the rsyslog mailing list? > >>>>> rsyslog-users > >>>>> > >>>>> I'm surprised to hear you say that rsyslog can already do batch > >> inserts > >>>>> and > >>>>> am interested in how you did that. > >>>>> > >>>>> what sort of insert rate did you mange to get? > >>>>> > >>>>> David Lang > >>>>> > >>>> David, > >>>> > >>>> I would be happy to join the discussion. I did not mean to say > >>>> that rsyslog currently supported batch inserts, just that the > >>>> pieces that provide "stand-by queuing" could be used to manage > >>>> batching inserts. > >>> > >>> I've changed the to list to the rsyslog users list. > >>> > >>> currently the stand-by queuing still handles messages one at a > time. > >>> however a sponser has been found to pay to changing the rsyslog > >> internals > >>> to allow for multiple messages to be handled at once, which is what > >>> triggered some of this discussion. > >>> > >>> which version of rsyslog are you working with? > >>> > >>> when you modified rsyslog to do prepared statement (to avoid the > >> escaping > >>> and parsing) did you hard-code the prepared statement? what other > >> changes > >>> did you make? > >>> > >>> David Lang > >>> > >>>> Cheers, > >>>> Ken > >>>> > >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>>>> > >>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 > >>>>>> From: Kenneth Marshall > >>>>>> To: Richard Huxton > >>>>>> Cc: david at lang.hm, Stephen Frost , > >>>>>> Greg Smith , pgsql- > >> performance at postgresql.org > >>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>>>> Hi, > >>>>>> > >>>>>> I just finished reading this thread. We are currently working on > >>>>>> setting up a central log system using rsyslog and PostgreSQL. It > >>>>>> works well once we patched the memory leak. We also looked at > what > >>>>>> could be done to improve the efficiency of the DB interface. On > >> the > >>>>>> rsyslog side, moving to prepared queries allows you to remove > the > >>>>>> escaping that needs to be done currently before attempting to > >>>>>> insert the data into the SQL backend as well as removing the > >> parsing > >>>>>> and planning time from the insert. This is a big win for high > >> insert > >>>>>> rates, which is what we are talking about. The escaping process > is > >>>>>> also a big CPU user in rsyslog which then hands the escaped > string > >>>>>> to the backend which then has to undo everything that had been > >> done > >>>>>> and parse/plan the resulting query. This can use a surprising > >> amount > >>>>>> of additional CPU. Even if you cannot support a general prepared > >>>>>> query interface, by specifying what the query should look like > you > >>>>>> can handle much of the low-hanging fruit query-wise. > >>>>>> > >>>>>> We are currently using a date based trigger to use a new > partition > >>>>>> each day and keep 2 months of logs currently. This can be > usefully > >>>>>> managed on the backend database, but if rsyslog supported > changing > >>>>>> the insert to the new table on a time basis, the CPU used by the > >>>>>> trigger to support this on the backend could be reclaimed. This > >>>>>> would be a win for any DB backend. As you move to the new > >> partition, > >>>>>> issuing a truncate to clear the table would simplify the DB > >> interfaces. > >>>>>> > >>>>>> Another performance enhancement already mentioned, would be to > >>>>>> allow certain extra fields in the DB to be automatically > populated > >>>>>> as a function of the log messages. For example, logging the mail > >> queue > >>>>>> id for messages from mail systems would make it much easier to > >> locate > >>>>>> particular mail transactions in large amounts of data. > >>>>>> > >>>>>> To sum up, eliminating the escaping in rsyslog through the use > of > >>>>>> prepared queries would reduce the CPU load on the DB backend. > >> Batching > >>>>>> the inserts will also net you a big performance increase. Some > DB- > >> based > >>>>>> applications allow for the specification of several types of > >> queries, > >>>>>> one for single inserts and then a second to support multiple > >> inserts > >>>>>> (copy). Rsyslog already supports the queuing pieces to allow you > >> to > >>>>>> batch inserts. Just some ideas. > >>>>>> > >>>>>> Regards, > >>>>>> Ken > >>>>>> > >>>>>> > >>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: > >>>>>>> david at lang.hm wrote: > >>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: > >>>>>>>>> * david at lang.hm (david at lang.hm) wrote: > >>>>>>>>>> while I fully understand the 'benchmark your situation' > need, > >> this > >>>>>>>>>> isn't > >>>>>>>>>> that simple. > >>>>>>>>> > >>>>>>>>> It really is. You know your application, you know it's > primary > >> use > >>>>>>>>> cases, and probably have some data to play with. You're > >> certainly in > >>>>>>>>> a > >>>>>>>>> much better situation to at least *try* and benchmark it than > >> we are. > >>>>>>>> rsyslog is a syslog server. it replaces (or for debian and > >> fedora, has > >>>>>>>> replaced) your standard syslog daemon. it recieves log > messages > >> from > >>>>>>>> every > >>>>>>>> app on your system (and possibly others), filters, maniulates > >> them, > >>>>>>>> and > >>>>>>>> then stores them somewhere. among the places that it can store > >> the > >>>>>>>> logs > >>>>>>>> are database servers (native support for MySQL, PostgreSQL, > and > >>>>>>>> Oracle. > >>>>>>>> plus libdbi for others) > >>>>>>> > >>>>>>> Well, from a performance standpoint the obvious things to do > are: > >>>>>>> 1. Keep a connection open, do NOT reconnect for each log- > >> statement > >>>>>>> 2. Batch log statements together where possible > >>>>>>> 3. Use prepared statements > >>>>>>> 4. Partition the tables by day/week/month/year (configurable I > >> suppose) > >>>>>>> > >>>>>>> The first two are vital, the third takes you a step further. > The > >> fourth > >>>>>>> is > >>>>>>> a long-term admin thing. > >>>>>>> > >>>>>>> And possibly > >>>>>>> 5. Have two connections, one for fatal/error etc and one for > >> info/debug > >>>>>>> level log statements (configurable split?). Then you can use > the > >>>>>>> synchronous_commit setting on the less important ones. Might > buy > >> you > >>>>>>> some > >>>>>>> performance on a busy system. > >>>>>>> > >>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- > >> wal.html#RUNTIME-CONFIG-WAL-SETTINGS > >>>>>>> > >>>>>>>> other apps then search and report on the data after it is > >> stored. what > >>>>>>>> apps?, I don't know either. pick your favorite reporting tool > >> and > >>>>>>>> you'll > >>>>>>>> be a step ahead of me (I don't know a really good reporting > >> tool) > >>>>>>>> as for sample data, you have syslog messages, just like I do. > so > >> you > >>>>>>>> have > >>>>>>>> the same access to data that I have. > >>>>>>>> how would you want to query them? how would people far less > >>>>>>>> experianced > >>>>>>>> that you want to query them? > >>>>>>>> I can speculate that some people would do two columns (time, > >>>>>>>> everything > >>>>>>>> else), others will do three (time, server, everything else), > and > >>>>>>>> others > >>>>>>>> will go further (I know some who would like to extract IP > >> addresses > >>>>>>>> embedded in a message into their own column). some people will > >> index > >>>>>>>> on > >>>>>>>> the time and host, others will want to do full-text searches > of > >>>>>>>> everything. > >>>>>>> > >>>>>>> Well, assuming it looks much like traditional syslog, I would > do > >>>>>>> something > >>>>>>> like: (timestamp, host, facility, priority, message). It's easy > >> enough > >>>>>>> to > >>>>>>> stitch back together if people want that. > >>>>>>> > >>>>>>> PostgreSQL's full-text indexing is quite well suited to > logfiles > >> I'd > >>>>>>> have > >>>>>>> thought, since it knows about filenames, urls etc already. > >>>>>>> > >>>>>>> If you want to get fancy, add a msg_type column and one > >> subsidiary > >>>>>>> table > >>>>>>> for each msg_type. So - you might have smtp_connect_from > >> (hostname, > >>>>>>> ip_addr). A set of perl regexps can match and extract the > fields > >> for > >>>>>>> these > >>>>>>> extra tables, or you could do it with triggers inside the > >> database. I > >>>>>>> think > >>>>>>> it makes sense to do it in the application. Easier for users to > >>>>>>> contribute > >>>>>>> new patterns/extractions. Meanwhile, the core table is > untouched > >> so you > >>>>>>> don't *need* to know about these extra tables. > >>>>>>> > >>>>>>> If you have subsidiary tables, you'll want to partition those > too > >> and > >>>>>>> perhaps stick them in their own schema (logs200901, logs200902 > >> etc). > >>>>>>> > >>>>>>> -- > >>>>>>> Richard Huxton > >>>>>>> Archonet Ltd > >>>>>>> > >>>>>>> -- > >>>>>>> Sent via pgsql-performance mailing list > >>>>>>> (pgsql-performance at postgresql.org) > >>>>>>> To make changes to your subscription: > >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> Sent via pgsql-performance mailing list > >>>>> (pgsql-performance at postgresql.org) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>> > >>>> > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed Apr 22 08:52:02 2009 From: david at lang.hm (david at lang.hm) Date: Tue, 21 Apr 2009 23:52:02 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume log insertion In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF49@GRFEXC.intern.adiscon.com> References: <20090421064554.GW8123@tamriel.snowman.net><49ED8A37.4030509@archonet.com><20090421133330.GZ18845@it.is.rice.edu><20090421154458.GD18845@it.is.rice.edu><20090421175217.GH18845@it.is.rice.edu><9B6E2A8877C38245BFB15CC491A11DA702AF48@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF49@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 22 Apr 2009, Rainer Gerhards wrote: >> -----Original Message---- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Wednesday, April 22, 2009 8:26 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log >> insertion >> >> On Wed, 22 Apr 2009, Rainer Gerhards wrote: >> >>> Hi Ken, >>> >>> glad to have you here. I am a bit silent at the moment, because I am >> not a >>> real database guy and so I am primarily listening to any information >> that is >>> incoming. If you have a couple of minutes, it would be useful to >> review this >>> thread here: >>> >>> http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html >>> >>> ...one comment inline below... >>> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of Kenneth Marshall >>>> >>>> David, >>>> >>>> Okay, I am now subscribed to the mailing list. We are currently >> using >>>> rsyslog-3.20.x. As far as implementing a prototype of the prepared >>>> statement, I was sidetracked by other duties and have not had a >> chance >>>> to do anything but an initial evaluation. As far as the rsyslog >>>> internal >>>> escaping, it looked simplest to create another template type like >> the >>>> current SQL and STDSQL that indicated that escaping was not needed >>>> and/or that prepared statements should be used. >>> >>> To disable escaping, simply do not use SQL or STDSQL. However, the db >> outputs >>> currently require this option (easy to disable), because I cannot see >> how it >>> will work (with the existing code) without escaping. >>> >>> Any idea is most welcome. >> >> when using prepared statement escaping is not needed. > > ... but that can only be on the non-text API level (e.g. by using libpq). On > the SQL text level, I have no idea how to tell the sql engine to insert ' - > if I don't say '''' but ''' how does the engine know what I say? With the > C-level API, I bind a parameter and specify a buffer and then put my > character into that buffer - no escaping needed for sure. But, again, I do > not see how this would work on the text level... correct. >> according to >> Ken's >> message below he found that the overhead of doing the escaping was >> significant. I don't see why this should be the case, but if it >> requires >> making an extra copy of the string I guess it's possible >> > > I am surprised, too. Even if a copy is made (I think it is), this is a quick > in-memory operation. Given the rest of the picture, I would expect that to > have very low impact on the overall cost (just think about the need to copy > buffers between different contexts, down to different layers, etc - so I'd > expect to see ample copy operations before the data finally hits the disk). > Anyhow, I may be totally wrong... > >> I plan to get a setup togeather in the next couple of days that will >> let >> me do some testing of the options on the database side. >> > > That would be great. I, for now, intend to look at the queue first. I have > begun to thought about steps on how to tackle the beast. So I will probably > not do much more on the database level than throw in some thoughts (but not > do any testing or coding). sounds good. David Lang > Rainer > >> David Lang >> >>> Rainer >>>> >>>> Regards, >>>> Ken >>>> >>>> On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>>>> >>>>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: >>>>>>> Kenneth, >>>>>>> could you join the discussion on the rsyslog mailing list? >>>>>>> rsyslog-users >>>>>>> >>>>>>> I'm surprised to hear you say that rsyslog can already do batch >>>> inserts >>>>>>> and >>>>>>> am interested in how you did that. >>>>>>> >>>>>>> what sort of insert rate did you mange to get? >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>> David, >>>>>> >>>>>> I would be happy to join the discussion. I did not mean to say >>>>>> that rsyslog currently supported batch inserts, just that the >>>>>> pieces that provide "stand-by queuing" could be used to manage >>>>>> batching inserts. >>>>> >>>>> I've changed the to list to the rsyslog users list. >>>>> >>>>> currently the stand-by queuing still handles messages one at a >> time. >>>>> however a sponser has been found to pay to changing the rsyslog >>>> internals >>>>> to allow for multiple messages to be handled at once, which is what >>>>> triggered some of this discussion. >>>>> >>>>> which version of rsyslog are you working with? >>>>> >>>>> when you modified rsyslog to do prepared statement (to avoid the >>>> escaping >>>>> and parsing) did you hard-code the prepared statement? what other >>>> changes >>>>> did you make? >>>>> >>>>> David Lang >>>>> >>>>>> Cheers, >>>>>> Ken >>>>>> >>>>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>>>>>> >>>>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 >>>>>>>> From: Kenneth Marshall >>>>>>>> To: Richard Huxton >>>>>>>> Cc: david at lang.hm, Stephen Frost , >>>>>>>> Greg Smith , pgsql- >>>> performance at postgresql.org >>>>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>>>>>> Hi, >>>>>>>> >>>>>>>> I just finished reading this thread. We are currently working on >>>>>>>> setting up a central log system using rsyslog and PostgreSQL. It >>>>>>>> works well once we patched the memory leak. We also looked at >> what >>>>>>>> could be done to improve the efficiency of the DB interface. On >>>> the >>>>>>>> rsyslog side, moving to prepared queries allows you to remove >> the >>>>>>>> escaping that needs to be done currently before attempting to >>>>>>>> insert the data into the SQL backend as well as removing the >>>> parsing >>>>>>>> and planning time from the insert. This is a big win for high >>>> insert >>>>>>>> rates, which is what we are talking about. The escaping process >> is >>>>>>>> also a big CPU user in rsyslog which then hands the escaped >> string >>>>>>>> to the backend which then has to undo everything that had been >>>> done >>>>>>>> and parse/plan the resulting query. This can use a surprising >>>> amount >>>>>>>> of additional CPU. Even if you cannot support a general prepared >>>>>>>> query interface, by specifying what the query should look like >> you >>>>>>>> can handle much of the low-hanging fruit query-wise. >>>>>>>> >>>>>>>> We are currently using a date based trigger to use a new >> partition >>>>>>>> each day and keep 2 months of logs currently. This can be >> usefully >>>>>>>> managed on the backend database, but if rsyslog supported >> changing >>>>>>>> the insert to the new table on a time basis, the CPU used by the >>>>>>>> trigger to support this on the backend could be reclaimed. This >>>>>>>> would be a win for any DB backend. As you move to the new >>>> partition, >>>>>>>> issuing a truncate to clear the table would simplify the DB >>>> interfaces. >>>>>>>> >>>>>>>> Another performance enhancement already mentioned, would be to >>>>>>>> allow certain extra fields in the DB to be automatically >> populated >>>>>>>> as a function of the log messages. For example, logging the mail >>>> queue >>>>>>>> id for messages from mail systems would make it much easier to >>>> locate >>>>>>>> particular mail transactions in large amounts of data. >>>>>>>> >>>>>>>> To sum up, eliminating the escaping in rsyslog through the use >> of >>>>>>>> prepared queries would reduce the CPU load on the DB backend. >>>> Batching >>>>>>>> the inserts will also net you a big performance increase. Some >> DB- >>>> based >>>>>>>> applications allow for the specification of several types of >>>> queries, >>>>>>>> one for single inserts and then a second to support multiple >>>> inserts >>>>>>>> (copy). Rsyslog already supports the queuing pieces to allow you >>>> to >>>>>>>> batch inserts. Just some ideas. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Ken >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: >>>>>>>>> david at lang.hm wrote: >>>>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: >>>>>>>>>>> * david at lang.hm (david at lang.hm) wrote: >>>>>>>>>>>> while I fully understand the 'benchmark your situation' >> need, >>>> this >>>>>>>>>>>> isn't >>>>>>>>>>>> that simple. >>>>>>>>>>> >>>>>>>>>>> It really is. You know your application, you know it's >> primary >>>> use >>>>>>>>>>> cases, and probably have some data to play with. You're >>>> certainly in >>>>>>>>>>> a >>>>>>>>>>> much better situation to at least *try* and benchmark it than >>>> we are. >>>>>>>>>> rsyslog is a syslog server. it replaces (or for debian and >>>> fedora, has >>>>>>>>>> replaced) your standard syslog daemon. it recieves log >> messages >>>> from >>>>>>>>>> every >>>>>>>>>> app on your system (and possibly others), filters, maniulates >>>> them, >>>>>>>>>> and >>>>>>>>>> then stores them somewhere. among the places that it can store >>>> the >>>>>>>>>> logs >>>>>>>>>> are database servers (native support for MySQL, PostgreSQL, >> and >>>>>>>>>> Oracle. >>>>>>>>>> plus libdbi for others) >>>>>>>>> >>>>>>>>> Well, from a performance standpoint the obvious things to do >> are: >>>>>>>>> 1. Keep a connection open, do NOT reconnect for each log- >>>> statement >>>>>>>>> 2. Batch log statements together where possible >>>>>>>>> 3. Use prepared statements >>>>>>>>> 4. Partition the tables by day/week/month/year (configurable I >>>> suppose) >>>>>>>>> >>>>>>>>> The first two are vital, the third takes you a step further. >> The >>>> fourth >>>>>>>>> is >>>>>>>>> a long-term admin thing. >>>>>>>>> >>>>>>>>> And possibly >>>>>>>>> 5. Have two connections, one for fatal/error etc and one for >>>> info/debug >>>>>>>>> level log statements (configurable split?). Then you can use >> the >>>>>>>>> synchronous_commit setting on the less important ones. Might >> buy >>>> you >>>>>>>>> some >>>>>>>>> performance on a busy system. >>>>>>>>> >>>>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- >>>> wal.html#RUNTIME-CONFIG-WAL-SETTINGS >>>>>>>>> >>>>>>>>>> other apps then search and report on the data after it is >>>> stored. what >>>>>>>>>> apps?, I don't know either. pick your favorite reporting tool >>>> and >>>>>>>>>> you'll >>>>>>>>>> be a step ahead of me (I don't know a really good reporting >>>> tool) >>>>>>>>>> as for sample data, you have syslog messages, just like I do. >> so >>>> you >>>>>>>>>> have >>>>>>>>>> the same access to data that I have. >>>>>>>>>> how would you want to query them? how would people far less >>>>>>>>>> experianced >>>>>>>>>> that you want to query them? >>>>>>>>>> I can speculate that some people would do two columns (time, >>>>>>>>>> everything >>>>>>>>>> else), others will do three (time, server, everything else), >> and >>>>>>>>>> others >>>>>>>>>> will go further (I know some who would like to extract IP >>>> addresses >>>>>>>>>> embedded in a message into their own column). some people will >>>> index >>>>>>>>>> on >>>>>>>>>> the time and host, others will want to do full-text searches >> of >>>>>>>>>> everything. >>>>>>>>> >>>>>>>>> Well, assuming it looks much like traditional syslog, I would >> do >>>>>>>>> something >>>>>>>>> like: (timestamp, host, facility, priority, message). It's easy >>>> enough >>>>>>>>> to >>>>>>>>> stitch back together if people want that. >>>>>>>>> >>>>>>>>> PostgreSQL's full-text indexing is quite well suited to >> logfiles >>>> I'd >>>>>>>>> have >>>>>>>>> thought, since it knows about filenames, urls etc already. >>>>>>>>> >>>>>>>>> If you want to get fancy, add a msg_type column and one >>>> subsidiary >>>>>>>>> table >>>>>>>>> for each msg_type. So - you might have smtp_connect_from >>>> (hostname, >>>>>>>>> ip_addr). A set of perl regexps can match and extract the >> fields >>>> for >>>>>>>>> these >>>>>>>>> extra tables, or you could do it with triggers inside the >>>> database. I >>>>>>>>> think >>>>>>>>> it makes sense to do it in the application. Easier for users to >>>>>>>>> contribute >>>>>>>>> new patterns/extractions. Meanwhile, the core table is >> untouched >>>> so you >>>>>>>>> don't *need* to know about these extra tables. >>>>>>>>> >>>>>>>>> If you have subsidiary tables, you'll want to partition those >> too >>>> and >>>>>>>>> perhaps stick them in their own schema (logs200901, logs200902 >>>> etc). >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Richard Huxton >>>>>>>>> Archonet Ltd >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sent via pgsql-performance mailing list >>>>>>>>> (pgsql-performance at postgresql.org) >>>>>>>>> To make changes to your subscription: >>>>>>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sent via pgsql-performance mailing list >>>>>>> (pgsql-performance at postgresql.org) >>>>>>> To make changes to your subscription: >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From tbergfeld at hq.adiscon.com Wed Apr 22 09:39:35 2009 From: tbergfeld at hq.adiscon.com (Tom Bergfeld) Date: Wed, 22 Apr 2009 09:39:35 +0200 Subject: [rsyslog] rsyslog 4.1.7 (beta) released Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF52@GRFEXC.intern.adiscon.com> Hi all, rsyslog 4.1.7, now a member of the beta branch, has been released. This is the first incarnation o v4 flagged as beta. It offers all enhancements of v4. There have also been a number of bug fixes over 4.1.6, for example the fix for an invalid error check inside the PostgreSQL output module. Changelog: http://www.rsyslog.com/Article370.phtml Download: http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-158.phtml As always, feedback is appreciated. Tom Bergfeld -- Support ======= Improving rsyslog is costly, but you can help! We are looking for organizations that find rsyslog useful and wish to contribute back. You can contribute by reporting bugs, improve the software, or donate money or equipment. Commercial support contracts for rsyslog are available, and they help finance continued maintenance. Adiscon GmbH, a privately held German company, is currently funding rsyslog development. We are always looking for interesting development projects. For details on how to help, please see http://www.rsyslog.com/doc-how2help.html. From rgerhards at hq.adiscon.com Wed Apr 22 13:52:47 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 13:52:47 +0200 Subject: [rsyslog] advise request: free(NULL) Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF56@GRFEXC.intern.adiscon.com> Hi all, as Luis Fernando pointed out in a mail, free(NULL) is defined to be valid in C. So far, I still guarded this sequence as follows: if(p != NULL) free(p); Because I had "a bad feeling" about portability. I begin to think that this may be overly conservative. If it is actually not needed, code cleanup may be useful (it is always a bad idea to do things more complex than necessary...). Comments on the issue would be appreciated. Thanks, Rainer From david at lang.hm Wed Apr 22 14:35:56 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 22 Apr 2009 05:35:56 -0700 (PDT) Subject: [rsyslog] advise request: free(NULL) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF56@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF56@GRFEXC.intern.adiscon.com> Message-ID: On Wed, 22 Apr 2009, Rainer Gerhards wrote: > Hi all, > > as Luis Fernando pointed out in a mail, free(NULL) is defined to be valid in > C. So far, I still guarded this sequence as follows: > > if(p != NULL) > free(p); > > Because I had "a bad feeling" about portability. I begin to think that this > may be overly conservative. If it is actually not needed, code cleanup may be > useful (it is always a bad idea to do things more complex than necessary...). > > Comments on the issue would be appreciated. if the standard says it's safe, I would count on it being safe. the only question is which standard introduced this safety. if that standard has just been released I would carefully read the notes on it, but in this case I believe that it's been in the standards for a couple of decades. David Lang From rgerhards at hq.adiscon.com Wed Apr 22 14:39:25 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 14:39:25 +0200 Subject: [rsyslog] advise request: free(NULL) References: <9B6E2A8877C38245BFB15CC491A11DA702AF56@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF58@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, April 22, 2009 2:36 PM > To: rsyslog-users > Subject: Re: [rsyslog] advise request: free(NULL) > > On Wed, 22 Apr 2009, Rainer Gerhards wrote: > > > Hi all, > > > > as Luis Fernando pointed out in a mail, free(NULL) is defined to be > valid in > > C. So far, I still guarded this sequence as follows: > > > > if(p != NULL) > > free(p); > > > > Because I had "a bad feeling" about portability. I begin to think > that this > > may be overly conservative. If it is actually not needed, code > cleanup may be > > useful (it is always a bad idea to do things more complex than > necessary...). > > > > Comments on the issue would be appreciated. > > if the standard says it's safe, I would count on it being safe. > > the only question is which standard introduced this safety. if that > standard has just been released I would carefully read the notes on it, > but in this case I believe that it's been in the standards for a couple > of > decades. >From what I have seen, it was part of the original ANSI C standard, but it looks like it was one decade ago that it somehow reliably went into mainstream compilers. So it looks like we can drop it (get rid of an old habit ;)). But please anybody who sees a potential issue please speak up! Rainer > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From ktm at rice.edu Wed Apr 22 15:43:11 2009 From: ktm at rice.edu (Kenneth Marshall) Date: Wed, 22 Apr 2009 08:43:11 -0500 Subject: [rsyslog] [PERFORM] performance for high-volume log insertion In-Reply-To: References: <9B6E2A8877C38245BFB15CC491A11DA702AF49@GRFEXC.intern.adiscon.com> Message-ID: <20090422134310.GM18845@it.is.rice.edu> Hi David and Rainer, I have read the thread below and caught up on all of the list traffic. Regarding the performance implications of the escaping, it is certainly not a concern on the low end of messages per second. I mentioned it because if you use PQexecParams or PQexecPrepared you can avoid the tedious and error-prone quoting and escaping process on both the rsyslog and the DB backend. Obviously, as this thread shows, using multiple inserts per transaction is much, much faster than using a single insert per transaction: http://archives.postgresql.org/pgsql-performance/2006-06/msg00381.php It is also useful to remember that the number of round-trips from the application to the DB will also slow down multiple inserts per transaction, i.e. sending begin;insert xxx values yyy;insert xxx values zzz;...;commit; (send) will be yet again faster than: begin; (send) insert xxx values yyy; (send) insert xxx values zzz; (send) ... comit; (send) I also agree with your assessment that having you do not need a lot of granularity in the grouping. If you have light traffic the current 1 insert per transaction is fine and if you have heavy logging, grouping to a single larger size is sufficient and would also reduce the complexity of the implimentation. Regards, Ken On Tue, Apr 21, 2009 at 11:52:02PM -0700, david at lang.hm wrote: > On Wed, 22 Apr 2009, Rainer Gerhards wrote: > > >> -----Original Message---- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> Sent: Wednesday, April 22, 2009 8:26 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > >> insertion > >> > >> On Wed, 22 Apr 2009, Rainer Gerhards wrote: > >> > >>> Hi Ken, > >>> > >>> glad to have you here. I am a bit silent at the moment, because I am > >> not a > >>> real database guy and so I am primarily listening to any information > >> that is > >>> incoming. If you have a couple of minutes, it would be useful to > >> review this > >>> thread here: > >>> > >>> http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html > >>> > >>> ...one comment inline below... > >>> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>> bounces at lists.adiscon.com] On Behalf Of Kenneth Marshall > >>>> > >>>> David, > >>>> > >>>> Okay, I am now subscribed to the mailing list. We are currently > >> using > >>>> rsyslog-3.20.x. As far as implementing a prototype of the prepared > >>>> statement, I was sidetracked by other duties and have not had a > >> chance > >>>> to do anything but an initial evaluation. As far as the rsyslog > >>>> internal > >>>> escaping, it looked simplest to create another template type like > >> the > >>>> current SQL and STDSQL that indicated that escaping was not needed > >>>> and/or that prepared statements should be used. > >>> > >>> To disable escaping, simply do not use SQL or STDSQL. However, the db > >> outputs > >>> currently require this option (easy to disable), because I cannot see > >> how it > >>> will work (with the existing code) without escaping. > >>> > >>> Any idea is most welcome. > >> > >> when using prepared statement escaping is not needed. > > > > ... but that can only be on the non-text API level (e.g. by using libpq). On > > the SQL text level, I have no idea how to tell the sql engine to insert ' - > > if I don't say '''' but ''' how does the engine know what I say? With the > > C-level API, I bind a parameter and specify a buffer and then put my > > character into that buffer - no escaping needed for sure. But, again, I do > > not see how this would work on the text level... > > correct. > > >> according to > >> Ken's > >> message below he found that the overhead of doing the escaping was > >> significant. I don't see why this should be the case, but if it > >> requires > >> making an extra copy of the string I guess it's possible > >> > > > > I am surprised, too. Even if a copy is made (I think it is), this is a quick > > in-memory operation. Given the rest of the picture, I would expect that to > > have very low impact on the overall cost (just think about the need to copy > > buffers between different contexts, down to different layers, etc - so I'd > > expect to see ample copy operations before the data finally hits the disk). > > Anyhow, I may be totally wrong... > > > >> I plan to get a setup togeather in the next couple of days that will > >> let > >> me do some testing of the options on the database side. > >> > > > > That would be great. I, for now, intend to look at the queue first. I have > > begun to thought about steps on how to tackle the beast. So I will probably > > not do much more on the database level than throw in some thoughts (but not > > do any testing or coding). > > sounds good. > > David Lang > > > Rainer > > > >> David Lang > >> > >>> Rainer > >>>> > >>>> Regards, > >>>> Ken > >>>> > >>>> On Tue, Apr 21, 2009 at 08:51:37AM -0700, david at lang.hm wrote: > >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>>>> > >>>>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, david at lang.hm wrote: > >>>>>>> Kenneth, > >>>>>>> could you join the discussion on the rsyslog mailing list? > >>>>>>> rsyslog-users > >>>>>>> > >>>>>>> I'm surprised to hear you say that rsyslog can already do batch > >>>> inserts > >>>>>>> and > >>>>>>> am interested in how you did that. > >>>>>>> > >>>>>>> what sort of insert rate did you mange to get? > >>>>>>> > >>>>>>> David Lang > >>>>>>> > >>>>>> David, > >>>>>> > >>>>>> I would be happy to join the discussion. I did not mean to say > >>>>>> that rsyslog currently supported batch inserts, just that the > >>>>>> pieces that provide "stand-by queuing" could be used to manage > >>>>>> batching inserts. > >>>>> > >>>>> I've changed the to list to the rsyslog users list. > >>>>> > >>>>> currently the stand-by queuing still handles messages one at a > >> time. > >>>>> however a sponser has been found to pay to changing the rsyslog > >>>> internals > >>>>> to allow for multiple messages to be handled at once, which is what > >>>>> triggered some of this discussion. > >>>>> > >>>>> which version of rsyslog are you working with? > >>>>> > >>>>> when you modified rsyslog to do prepared statement (to avoid the > >>>> escaping > >>>>> and parsing) did you hard-code the prepared statement? what other > >>>> changes > >>>>> did you make? > >>>>> > >>>>> David Lang > >>>>> > >>>>>> Cheers, > >>>>>> Ken > >>>>>> > >>>>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: > >>>>>>> > >>>>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 > >>>>>>>> From: Kenneth Marshall > >>>>>>>> To: Richard Huxton > >>>>>>>> Cc: david at lang.hm, Stephen Frost , > >>>>>>>> Greg Smith , pgsql- > >>>> performance at postgresql.org > >>>>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I just finished reading this thread. We are currently working on > >>>>>>>> setting up a central log system using rsyslog and PostgreSQL. It > >>>>>>>> works well once we patched the memory leak. We also looked at > >> what > >>>>>>>> could be done to improve the efficiency of the DB interface. On > >>>> the > >>>>>>>> rsyslog side, moving to prepared queries allows you to remove > >> the > >>>>>>>> escaping that needs to be done currently before attempting to > >>>>>>>> insert the data into the SQL backend as well as removing the > >>>> parsing > >>>>>>>> and planning time from the insert. This is a big win for high > >>>> insert > >>>>>>>> rates, which is what we are talking about. The escaping process > >> is > >>>>>>>> also a big CPU user in rsyslog which then hands the escaped > >> string > >>>>>>>> to the backend which then has to undo everything that had been > >>>> done > >>>>>>>> and parse/plan the resulting query. This can use a surprising > >>>> amount > >>>>>>>> of additional CPU. Even if you cannot support a general prepared > >>>>>>>> query interface, by specifying what the query should look like > >> you > >>>>>>>> can handle much of the low-hanging fruit query-wise. > >>>>>>>> > >>>>>>>> We are currently using a date based trigger to use a new > >> partition > >>>>>>>> each day and keep 2 months of logs currently. This can be > >> usefully > >>>>>>>> managed on the backend database, but if rsyslog supported > >> changing > >>>>>>>> the insert to the new table on a time basis, the CPU used by the > >>>>>>>> trigger to support this on the backend could be reclaimed. This > >>>>>>>> would be a win for any DB backend. As you move to the new > >>>> partition, > >>>>>>>> issuing a truncate to clear the table would simplify the DB > >>>> interfaces. > >>>>>>>> > >>>>>>>> Another performance enhancement already mentioned, would be to > >>>>>>>> allow certain extra fields in the DB to be automatically > >> populated > >>>>>>>> as a function of the log messages. For example, logging the mail > >>>> queue > >>>>>>>> id for messages from mail systems would make it much easier to > >>>> locate > >>>>>>>> particular mail transactions in large amounts of data. > >>>>>>>> > >>>>>>>> To sum up, eliminating the escaping in rsyslog through the use > >> of > >>>>>>>> prepared queries would reduce the CPU load on the DB backend. > >>>> Batching > >>>>>>>> the inserts will also net you a big performance increase. Some > >> DB- > >>>> based > >>>>>>>> applications allow for the specification of several types of > >>>> queries, > >>>>>>>> one for single inserts and then a second to support multiple > >>>> inserts > >>>>>>>> (copy). Rsyslog already supports the queuing pieces to allow you > >>>> to > >>>>>>>> batch inserts. Just some ideas. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Ken > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: > >>>>>>>>> david at lang.hm wrote: > >>>>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: > >>>>>>>>>>> * david at lang.hm (david at lang.hm) wrote: > >>>>>>>>>>>> while I fully understand the 'benchmark your situation' > >> need, > >>>> this > >>>>>>>>>>>> isn't > >>>>>>>>>>>> that simple. > >>>>>>>>>>> > >>>>>>>>>>> It really is. You know your application, you know it's > >> primary > >>>> use > >>>>>>>>>>> cases, and probably have some data to play with. You're > >>>> certainly in > >>>>>>>>>>> a > >>>>>>>>>>> much better situation to at least *try* and benchmark it than > >>>> we are. > >>>>>>>>>> rsyslog is a syslog server. it replaces (or for debian and > >>>> fedora, has > >>>>>>>>>> replaced) your standard syslog daemon. it recieves log > >> messages > >>>> from > >>>>>>>>>> every > >>>>>>>>>> app on your system (and possibly others), filters, maniulates > >>>> them, > >>>>>>>>>> and > >>>>>>>>>> then stores them somewhere. among the places that it can store > >>>> the > >>>>>>>>>> logs > >>>>>>>>>> are database servers (native support for MySQL, PostgreSQL, > >> and > >>>>>>>>>> Oracle. > >>>>>>>>>> plus libdbi for others) > >>>>>>>>> > >>>>>>>>> Well, from a performance standpoint the obvious things to do > >> are: > >>>>>>>>> 1. Keep a connection open, do NOT reconnect for each log- > >>>> statement > >>>>>>>>> 2. Batch log statements together where possible > >>>>>>>>> 3. Use prepared statements > >>>>>>>>> 4. Partition the tables by day/week/month/year (configurable I > >>>> suppose) > >>>>>>>>> > >>>>>>>>> The first two are vital, the third takes you a step further. > >> The > >>>> fourth > >>>>>>>>> is > >>>>>>>>> a long-term admin thing. > >>>>>>>>> > >>>>>>>>> And possibly > >>>>>>>>> 5. Have two connections, one for fatal/error etc and one for > >>>> info/debug > >>>>>>>>> level log statements (configurable split?). Then you can use > >> the > >>>>>>>>> synchronous_commit setting on the less important ones. Might > >> buy > >>>> you > >>>>>>>>> some > >>>>>>>>> performance on a busy system. > >>>>>>>>> > >>>>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- > >>>> wal.html#RUNTIME-CONFIG-WAL-SETTINGS > >>>>>>>>> > >>>>>>>>>> other apps then search and report on the data after it is > >>>> stored. what > >>>>>>>>>> apps?, I don't know either. pick your favorite reporting tool > >>>> and > >>>>>>>>>> you'll > >>>>>>>>>> be a step ahead of me (I don't know a really good reporting > >>>> tool) > >>>>>>>>>> as for sample data, you have syslog messages, just like I do. > >> so > >>>> you > >>>>>>>>>> have > >>>>>>>>>> the same access to data that I have. > >>>>>>>>>> how would you want to query them? how would people far less > >>>>>>>>>> experianced > >>>>>>>>>> that you want to query them? > >>>>>>>>>> I can speculate that some people would do two columns (time, > >>>>>>>>>> everything > >>>>>>>>>> else), others will do three (time, server, everything else), > >> and > >>>>>>>>>> others > >>>>>>>>>> will go further (I know some who would like to extract IP > >>>> addresses > >>>>>>>>>> embedded in a message into their own column). some people will > >>>> index > >>>>>>>>>> on > >>>>>>>>>> the time and host, others will want to do full-text searches > >> of > >>>>>>>>>> everything. > >>>>>>>>> > >>>>>>>>> Well, assuming it looks much like traditional syslog, I would > >> do > >>>>>>>>> something > >>>>>>>>> like: (timestamp, host, facility, priority, message). It's easy > >>>> enough > >>>>>>>>> to > >>>>>>>>> stitch back together if people want that. > >>>>>>>>> > >>>>>>>>> PostgreSQL's full-text indexing is quite well suited to > >> logfiles > >>>> I'd > >>>>>>>>> have > >>>>>>>>> thought, since it knows about filenames, urls etc already. > >>>>>>>>> > >>>>>>>>> If you want to get fancy, add a msg_type column and one > >>>> subsidiary > >>>>>>>>> table > >>>>>>>>> for each msg_type. So - you might have smtp_connect_from > >>>> (hostname, > >>>>>>>>> ip_addr). A set of perl regexps can match and extract the > >> fields > >>>> for > >>>>>>>>> these > >>>>>>>>> extra tables, or you could do it with triggers inside the > >>>> database. I > >>>>>>>>> think > >>>>>>>>> it makes sense to do it in the application. Easier for users to > >>>>>>>>> contribute > >>>>>>>>> new patterns/extractions. Meanwhile, the core table is > >> untouched > >>>> so you > >>>>>>>>> don't *need* to know about these extra tables. > >>>>>>>>> > >>>>>>>>> If you have subsidiary tables, you'll want to partition those > >> too > >>>> and > >>>>>>>>> perhaps stick them in their own schema (logs200901, logs200902 > >>>> etc). > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Richard Huxton > >>>>>>>>> Archonet Ltd > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Sent via pgsql-performance mailing list > >>>>>>>>> (pgsql-performance at postgresql.org) > >>>>>>>>> To make changes to your subscription: > >>>>>>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Sent via pgsql-performance mailing list > >>>>>>> (pgsql-performance at postgresql.org) > >>>>>>> To make changes to your subscription: > >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>>>>> > >>>>>> > >>>>> > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Wed Apr 22 17:39:40 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 17:39:40 +0200 Subject: [rsyslog] multi-dequeue git branch Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF59@GRFEXC.intern.adiscon.com> Hi all, I spent today on reviewing code and implementing a design approach that I worked on the past days. Good thinking seems to pay well, and so I ended up with a version that should have performance benefits over the previous versions. Actually, I hope that the performance is much better, even though the version currently dequeues in batches of 8 (soon to be configurable). If someone would like to give it a try, please use the "multi-dequeue" git branch. I will be actively working on it. The current version is a proof of concept. It does NOT work fully correct. Most importantly, queue termination is not properly handled, nor are error conditions. If you use this version, you will probably lose message on HUP, shutdown and when something goes wrong. I have also not yet done any performance testing myself. My planned next steps are a) extend the test bench to ensure that DA-mode works properly b) add config statements c) work on either the termination conditions or a new output plugin interface Especially c) will require lots of additional doc work, so there will hopefully some additional reading for you. I would also like to mention that feedback in the current phase is very valuable. Most importantly, once I have finalized the new algorithms, it will probably very hard to convince me to change anything for the next couple of month. So if you are interested in performance and the output plugin interface, it would be useful to keep an eye on the list and provide comments where appropriate ;) Thanks, Rainer From rgerhards at hq.adiscon.com Wed Apr 22 17:41:39 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Wed, 22 Apr 2009 17:41:39 +0200 Subject: [rsyslog] multi-dequeue git branch References: <9B6E2A8877C38245BFB15CC491A11DA702AF59@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF5A@GRFEXC.intern.adiscon.com> Oh, and I forgot to mention: it dequeues in batches, but the output plugin interface is unchanged. So we do not yet have the ability to add batching to the outputs. That's a totally different story, and I'll look at implementation when we have done basic verification that the queue works (or earlier, as needs require ;)). Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Wednesday, April 22, 2009 5:40 PM > To: rsyslog-users > Subject: [rsyslog] multi-dequeue git branch > > Hi all, > > I spent today on reviewing code and implementing a design approach that > I > worked on the past days. Good thinking seems to pay well, and so I > ended up > with a version that should have performance benefits over the previous > versions. Actually, I hope that the performance is much better, even > though > the version currently dequeues in batches of 8 (soon to be > configurable). If > someone would like to give it a try, please use the "multi-dequeue" git > branch. I will be actively working on it. > > The current version is a proof of concept. It does NOT work fully > correct. > Most importantly, queue termination is not properly handled, nor are > error > conditions. If you use this version, you will probably lose message on > HUP, > shutdown and when something goes wrong. I have also not yet done any > performance testing myself. > > My planned next steps are > > a) extend the test bench to ensure that DA-mode works properly > b) add config statements > c) work on either the termination conditions or a new output plugin > interface > > Especially c) will require lots of additional doc work, so there will > hopefully some additional reading for you. I would also like to mention > that > feedback in the current phase is very valuable. Most importantly, once > I have > finalized the new algorithms, it will probably very hard to convince me > to > change anything for the next couple of month. So if you are interested > in > performance and the output plugin interface, it would be useful to keep > an > eye on the list and provide comments where appropriate ;) > > Thanks, > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Wed Apr 22 22:11:27 2009 From: david at lang.hm (david at lang.hm) Date: Wed, 22 Apr 2009 13:11:27 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume log insertion (fwd) Message-ID: from the postgres performance mailing list, relative speeds of different ways of inserting data. I've asked if the 'seperate inserts' mode is seperate round trips or many inserts in one round trip. based on this it looks like prepared statements make a difference, but not so much that other techniques (either a single statement or a copy) aren't comparable (or better) options. David Lang ---------- Forwarded message ---------- Date: Wed, 22 Apr 2009 15:33:21 -0400 From: Glenn Maynard To: pgsql-performance at postgresql.org Subject: Re: [PERFORM] performance for high-volume log insertion On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost wrote: > Yes, as I beleive was mentioned already, planning time for inserts is > really small. ?Parsing time for inserts when there's little parsing that > has to happen also isn't all *that* expensive and the same goes for > conversions from textual representations of data to binary. > > We're starting to re-hash things, in my view. ?The low-hanging fruit is > doing multiple things in a single transaction, either by using COPY, > multi-value INSERTs, or just multiple INSERTs in a single transaction. > That's absolutely step one. This is all well-known, covered information, but perhaps some numbers will help drive this home. 40000 inserts into a single-column, unindexed table; with predictable results: separate inserts, no transaction: 21.21s separate inserts, same transaction: 1.89s 40 inserts, 100 rows/insert: 0.18s one 40000-value insert: 0.16s 40 prepared inserts, 100 rows/insert: 0.15s COPY (text): 0.10s COPY (binary): 0.10s Of course, real workloads will change the weights, but this is more or less the magnitude of difference I always see--batch your inserts into single statements, and if that's not enough, skip to COPY. -- Glenn Maynard -- Sent via pgsql-performance mailing list (pgsql-performance at postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance From pete.philips at secerno.com Thu Apr 23 15:24:53 2009 From: pete.philips at secerno.com (Pete Philips) Date: Thu, 23 Apr 2009 14:24:53 +0100 Subject: [rsyslog] Replace one character with another Message-ID: <49F06C25.5080808@secerno.com> Hi. I would like to create a template such that all occurrences of string "xxx" are output as string "yyy". For example, I would like this message: Apr 23 10:24:02 pluto kernel: error in xxx module to appear as: Apr 23 10:24:02 pluto kernel: error in yyy module Is this possible? Many thanks, Pete. -- Pete Philips Secerno Ltd Email: pete.philips at secerno.com PGP key: http://www.secerno.com/pgp/pete.gpg From rgerhards at hq.adiscon.com Thu Apr 23 15:34:39 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 23 Apr 2009 15:34:39 +0200 Subject: [rsyslog] Replace one character with another References: <49F06C25.5080808@secerno.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF6C@GRFEXC.intern.adiscon.com> No, that is not possible at the moment. Once we have the full scripting engine, this can be done, but there is no date scheduled yet for when this will be... Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Pete Philips > Sent: Thursday, April 23, 2009 3:25 PM > To: rsyslog at lists.adiscon.com > Subject: [rsyslog] Replace one character with another > > Hi. > > I would like to create a template such that all occurrences of string > "xxx" > are output as string "yyy". > > For example, I would like this message: > > Apr 23 10:24:02 pluto kernel: error in xxx module > > to appear as: > > Apr 23 10:24:02 pluto kernel: error in yyy module > > Is this possible? > > Many thanks, > > > Pete. > -- > Pete Philips > Secerno Ltd > Email: pete.philips at secerno.com > PGP key: http://www.secerno.com/pgp/pete.gpg > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Thu Apr 23 16:38:12 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Thu, 23 Apr 2009 16:38:12 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume log insertion(fwd) References: Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com> That's interesting. As a side-activity, I am thinking about a new output module interface. Especially given the discussion on the postgres list, but also some other thoughts about other modules (e.g. omtcp or the file output), I tend to use an approach that permits both string-based as well as API-based (API as in libpq) ways of doing things. I have not really designed anything, but the rough idea is that each plugin needs three entry points: - start batch - process single message - end batch Then, the plugin can decide itself what it wants to do and when. Most importantly, this calling interface works well for string-based transactions as well as API-based ones. For the output file writer, for example, I envision that over time it will have its own write buffer (for various reasons, for example I am also discussing zipped writing with some folks). With this interface, I can put everything into the buffer, write out if needed but not if there is no immediate need but I can make sure that I write out when the "end batch" entry point is called. As I said, it is not really thought out yet, but maybe a starting point. So feedback is appreciated. Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Wednesday, April 22, 2009 10:11 PM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion(fwd) > > from the postgres performance mailing list, relative speeds of > different > ways of inserting data. > > I've asked if the 'seperate inserts' mode is seperate round trips or > many > inserts in one round trip. > > based on this it looks like prepared statements make a difference, but > not > so much that other techniques (either a single statement or a copy) > aren't > comparable (or better) options. > > David Lang > > ---------- Forwarded message ---------- > Date: Wed, 22 Apr 2009 15:33:21 -0400 > From: Glenn Maynard > To: pgsql-performance at postgresql.org > Subject: Re: [PERFORM] performance for high-volume log insertion > > On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > wrote: > > Yes, as I beleive was mentioned already, planning time for inserts is > > really small. ?Parsing time for inserts when there's little parsing > that > > has to happen also isn't all *that* expensive and the same goes for > > conversions from textual representations of data to binary. > > > > We're starting to re-hash things, in my view. ?The low-hanging fruit > is > > doing multiple things in a single transaction, either by using COPY, > > multi-value INSERTs, or just multiple INSERTs in a single > transaction. > > That's absolutely step one. > > This is all well-known, covered information, but perhaps some numbers > will help drive this home. 40000 inserts into a single-column, > unindexed table; with predictable results: > > separate inserts, no transaction: 21.21s > separate inserts, same transaction: 1.89s > 40 inserts, 100 rows/insert: 0.18s > one 40000-value insert: 0.16s > 40 prepared inserts, 100 rows/insert: 0.15s > COPY (text): 0.10s > COPY (binary): 0.10s > > Of course, real workloads will change the weights, but this is more or > less the magnitude of difference I always see--batch your inserts into > single statements, and if that's not enough, skip to COPY. > > -- > Glenn Maynard > > -- > Sent via pgsql-performance mailing list (pgsql- > performance at postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance From pete.philips at secerno.com Thu Apr 23 17:01:11 2009 From: pete.philips at secerno.com (Pete Philips) Date: Thu, 23 Apr 2009 16:01:11 +0100 Subject: [rsyslog] Replace one character with another In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF6C@GRFEXC.intern.adiscon.com> References: <49F06C25.5080808@secerno.com> <9B6E2A8877C38245BFB15CC491A11DA702AF6C@GRFEXC.intern.adiscon.com> Message-ID: <49F082B7.3020807@secerno.com> Rainer Gerhards wrote: > No, that is not possible at the moment. Once we have the full scripting > engine, this can be done, but there is no date scheduled yet for when this > will be... OK understood. (and thanks for the quick response BTW!) On a slightly different tack, I see there are lot's of good options for escaping control characters but I'd like to escape arbitrary characters. For example the "=" character is usually perfectly normal but the application I am sending to requires that it be represented as "\=". Is there any way to do this? Cheers, Pete. -- Pete Philips Secerno Ltd Email: pete.philips at secerno.com PGP key: http://www.secerno.com/pgp/pete.gpg From rgerhards at hq.adiscon.com Fri Apr 24 07:34:09 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 07:34:09 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com> Another innocent question: Let's say I used an exec() API exclusively. Now let me assume that I do, on the *same* database connection, this calling sequence: exec("begin transaction") exec("insert ...") exec("insert ...") exec("insert ...") exec("insert ...") exec("insert ...") exec("insert ...") [Point A] exec("commit") Is it safe to assume that this will result in a performance benefit (I know that it causes more network traffic than necessary, but that's not my point - I just talk of speedup). Will this performance speedup be considerable (along the magnitude of 20 vs. 3 seconds for a given sequence?). Also, even more importantly, does this really many they are all in one transaction? In particular, what happens if the connection breaks at [Point A], e.g. by the network connection going down for an extended period of time. Is it safe to assume that then everything will be rolled back? Feedback is appreciated. Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Thursday, April 23, 2009 4:38 PM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > That's interesting. As a side-activity, I am thinking about a new > output > module interface. Especially given the discussion on the postgres list, > but > also some other thoughts about other modules (e.g. omtcp or the file > output), > I tend to use an approach that permits both string-based as well as > API-based > (API as in libpq) ways of doing things. I have not really designed > anything, > but the rough idea is that each plugin needs three entry points: > > - start batch > - process single message > - end batch > > Then, the plugin can decide itself what it wants to do and when. Most > importantly, this calling interface works well for string-based > transactions > as well as API-based ones. > > For the output file writer, for example, I envision that over time it > will > have its own write buffer (for various reasons, for example I am also > discussing zipped writing with some folks). With this interface, I can > put > everything into the buffer, write out if needed but not if there is no > immediate need but I can make sure that I write out when the "end > batch" > entry point is called. > > As I said, it is not really thought out yet, but maybe a starting > point. So > feedback is appreciated. > > Rainer > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > Sent: Wednesday, April 22, 2009 10:11 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > > insertion(fwd) > > > > from the postgres performance mailing list, relative speeds of > > different > > ways of inserting data. > > > > I've asked if the 'seperate inserts' mode is seperate round trips or > > many > > inserts in one round trip. > > > > based on this it looks like prepared statements make a difference, > but > > not > > so much that other techniques (either a single statement or a copy) > > aren't > > comparable (or better) options. > > > > David Lang > > > > ---------- Forwarded message ---------- > > Date: Wed, 22 Apr 2009 15:33:21 -0400 > > From: Glenn Maynard > > To: pgsql-performance at postgresql.org > > Subject: Re: [PERFORM] performance for high-volume log insertion > > > > On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > > wrote: > > > Yes, as I beleive was mentioned already, planning time for inserts > is > > > really small. ?Parsing time for inserts when there's little parsing > > that > > > has to happen also isn't all *that* expensive and the same goes for > > > conversions from textual representations of data to binary. > > > > > > We're starting to re-hash things, in my view. ?The low-hanging > fruit > > is > > > doing multiple things in a single transaction, either by using > COPY, > > > multi-value INSERTs, or just multiple INSERTs in a single > > transaction. > > > That's absolutely step one. > > > > This is all well-known, covered information, but perhaps some numbers > > will help drive this home. 40000 inserts into a single-column, > > unindexed table; with predictable results: > > > > separate inserts, no transaction: 21.21s > > separate inserts, same transaction: 1.89s > > 40 inserts, 100 rows/insert: 0.18s > > one 40000-value insert: 0.16s > > 40 prepared inserts, 100 rows/insert: 0.15s > > COPY (text): 0.10s > > COPY (binary): 0.10s > > > > Of course, real workloads will change the weights, but this is more > or > > less the magnitude of difference I always see--batch your inserts > into > > single statements, and if that's not enough, skip to COPY. > > > > -- > > Glenn Maynard > > > > -- > > Sent via pgsql-performance mailing list (pgsql- > > performance at postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri Apr 24 07:47:04 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 07:47:04 +0200 Subject: [rsyslog] multi-dequeue git branch References: <9B6E2A8877C38245BFB15CC491A11DA702AF59@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF5A@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF79@GRFEXC.intern.adiscon.com> An update a) and b) are more or less completed. I now shifted my approach a bit. I think I need to concentrate on the output plugin interface *before* I look at queue termination. The reason is that there is a very intimate relationship between the two. The rsyslog interface API dictates part of the termination process, but proper termination also dictates some of the interface details. So to design and implement that, I'll be going down the full stack and then up again. That means a) add basic batching support to queue object b) create new batching output module interface c) add transactional behavior to queue shutdown and error conditions I have now arrived at step b). Today, I will need to think hard about the overall picture, but also about the plugin interface (plus do a couple of not-so-fun things ;)). So if you have any suggestions/opinions on the interface, please let me know. I will also see that I can extend the testbench, so that we can check the more complex conditions (this task is far more complex than it sounds, a lot of timing conditions are involved). Please also note that the multi-dequeue git branch now should work rather well (except, of course, when terminating the queue). The tcp input's performance has also enhanced. It would be very useful if someone could look at the new engine's speedup. It should be considerable. I expect that most of the potential speedup for non-database outputs is already gained in this version. Further speedup for e.g. the file writer or tcp forwarding output should be really marginal compared to what we gained with this step. So it is vital to confirm if it is well enough and within our expectations. I have not yet done performance tests, because they take a lot of time and I prefer not to be "disturbed" by such longer activity while I still work on the overall picture (even though I implement in parallel, think of this as proof of concept tasks - from my POV, I am looking at a very high level at the code and dig down only occasionally - longer testing task interrupt that process). Thanks, Rainer > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Wednesday, April 22, 2009 5:42 PM > To: rsyslog-users > Subject: Re: [rsyslog] multi-dequeue git branch > > Oh, and I forgot to mention: it dequeues in batches, but the output > plugin > interface is unchanged. So we do not yet have the ability to add > batching to > the outputs. That's a totally different story, and I'll look at > implementation when we have done basic verification that the queue > works (or > earlier, as needs require ;)). > > Rainer > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > Sent: Wednesday, April 22, 2009 5:40 PM > > To: rsyslog-users > > Subject: [rsyslog] multi-dequeue git branch > > > > Hi all, > > > > I spent today on reviewing code and implementing a design approach > that > > I > > worked on the past days. Good thinking seems to pay well, and so I > > ended up > > with a version that should have performance benefits over the > previous > > versions. Actually, I hope that the performance is much better, even > > though > > the version currently dequeues in batches of 8 (soon to be > > configurable). If > > someone would like to give it a try, please use the "multi-dequeue" > git > > branch. I will be actively working on it. > > > > The current version is a proof of concept. It does NOT work fully > > correct. > > Most importantly, queue termination is not properly handled, nor are > > error > > conditions. If you use this version, you will probably lose message > on > > HUP, > > shutdown and when something goes wrong. I have also not yet done any > > performance testing myself. > > > > My planned next steps are > > > > a) extend the test bench to ensure that DA-mode works properly > > b) add config statements > > c) work on either the termination conditions or a new output plugin > > interface > > > > Especially c) will require lots of additional doc work, so there will > > hopefully some additional reading for you. I would also like to > mention > > that > > feedback in the current phase is very valuable. Most importantly, > once > > I have > > finalized the new algorithms, it will probably very hard to convince > me > > to > > change anything for the next couple of month. So if you are > interested > > in > > performance and the output plugin interface, it would be useful to > keep > > an > > eye on the list and provide comments where appropriate ;) > > > > Thanks, > > Rainer > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Fri Apr 24 07:57:13 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 23 Apr 2009 22:57:13 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: > Another innocent question: > > Let's say I used an exec() API exclusively. Now let me assume that I do, on > the *same* database connection, this calling sequence: > > exec("begin transaction") > exec("insert ...") > exec("insert ...") > exec("insert ...") > exec("insert ...") > exec("insert ...") > exec("insert ...") [Point A] > exec("commit") > > Is it safe to assume that this will result in a performance benefit (I know > that it causes more network traffic than necessary, but that's not my point - > I just talk of speedup). Will this performance speedup be considerable (along > the magnitude of 20 vs. 3 seconds for a given sequence?). Yes, this speedup would be considerable from the message at the bottom it would be on the order of >>> separate inserts, no transaction: 21.21s >>> separate inserts, same transaction: 1.89s there is still another order of magnatude gain to be had by going to the copy (and eliminating the extra round trips) >>> COPY (text): 0.10s a copy looks something like copy to table X from STDIN data data data > Also, even more importantly, does this really many they are all in one > transaction? yes. > In particular, what happens if the connection breaks at [Point > A], e.g. by the network connection going down for an extended period of time. > Is it safe to assume that then everything will be rolled back? yes, every one of them would dissappear. David Lang > Feedback is appreciated. > > Rainer > >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >> Sent: Thursday, April 23, 2009 4:38 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume >> loginsertion(fwd) >> >> That's interesting. As a side-activity, I am thinking about a new >> output >> module interface. Especially given the discussion on the postgres list, >> but >> also some other thoughts about other modules (e.g. omtcp or the file >> output), >> I tend to use an approach that permits both string-based as well as >> API-based >> (API as in libpq) ways of doing things. I have not really designed >> anything, >> but the rough idea is that each plugin needs three entry points: >> >> - start batch >> - process single message >> - end batch >> >> Then, the plugin can decide itself what it wants to do and when. Most >> importantly, this calling interface works well for string-based >> transactions >> as well as API-based ones. >> >> For the output file writer, for example, I envision that over time it >> will >> have its own write buffer (for various reasons, for example I am also >> discussing zipped writing with some folks). With this interface, I can >> put >> everything into the buffer, write out if needed but not if there is no >> immediate need but I can make sure that I write out when the "end >> batch" >> entry point is called. >> >> As I said, it is not really thought out yet, but maybe a starting >> point. So >> feedback is appreciated. >> >> Rainer >> >>> -----Original Message----- >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>> Sent: Wednesday, April 22, 2009 10:11 PM >>> To: rsyslog-users >>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log >>> insertion(fwd) >>> >>> from the postgres performance mailing list, relative speeds of >>> different >>> ways of inserting data. >>> >>> I've asked if the 'seperate inserts' mode is seperate round trips or >>> many >>> inserts in one round trip. >>> >>> based on this it looks like prepared statements make a difference, >> but >>> not >>> so much that other techniques (either a single statement or a copy) >>> aren't >>> comparable (or better) options. >>> >>> David Lang >>> >>> ---------- Forwarded message ---------- >>> Date: Wed, 22 Apr 2009 15:33:21 -0400 >>> From: Glenn Maynard >>> To: pgsql-performance at postgresql.org >>> Subject: Re: [PERFORM] performance for high-volume log insertion >>> >>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost >>> wrote: >>>> Yes, as I beleive was mentioned already, planning time for inserts >> is >>>> really small. ?Parsing time for inserts when there's little parsing >>> that >>>> has to happen also isn't all *that* expensive and the same goes for >>>> conversions from textual representations of data to binary. >>>> >>>> We're starting to re-hash things, in my view. ?The low-hanging >> fruit >>> is >>>> doing multiple things in a single transaction, either by using >> COPY, >>>> multi-value INSERTs, or just multiple INSERTs in a single >>> transaction. >>>> That's absolutely step one. >>> >>> This is all well-known, covered information, but perhaps some numbers >>> will help drive this home. 40000 inserts into a single-column, >>> unindexed table; with predictable results: >>> >>> separate inserts, no transaction: 21.21s >>> separate inserts, same transaction: 1.89s >>> 40 inserts, 100 rows/insert: 0.18s >>> one 40000-value insert: 0.16s >>> 40 prepared inserts, 100 rows/insert: 0.15s >>> COPY (text): 0.10s >>> COPY (binary): 0.10s >>> >>> Of course, real workloads will change the weights, but this is more >> or >>> less the magnitude of difference I always see--batch your inserts >> into >>> single statements, and if that's not enough, skip to COPY. >>> >>> -- >>> Glenn Maynard >>> >>> -- >>> Sent via pgsql-performance mailing list (pgsql- >>> performance at postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-performance >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri Apr 24 08:23:46 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 08:23:46 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, April 24, 2009 7:57 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > On Fri, 24 Apr 2009, Rainer Gerhards wrote: > > > Another innocent question: > > > > Let's say I used an exec() API exclusively. Now let me assume that I > do, on > > the *same* database connection, this calling sequence: > > > > exec("begin transaction") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") [Point A] > > exec("commit") > > > > Is it safe to assume that this will result in a performance benefit > (I know > > that it causes more network traffic than necessary, but that's not my > point - > > I just talk of speedup). Will this performance speedup be > considerable (along > > the magnitude of 20 vs. 3 seconds for a given sequence?). > > Yes, this speedup would be considerable > > from the message at the bottom it would be on the order of > > >>> separate inserts, no transaction: 21.21s > >>> separate inserts, same transaction: 1.89s I read this, just wanted some reconfirmation. > > there is still another order of magnatude gain to be had by going to > the > copy (and eliminating the extra round trips) > > >>> COPY (text): 0.10s Definitely, but let's tackle the 90% issue first. > > a copy looks something like > > copy to table X from STDIN > data > data > data > > > > Also, even more importantly, does this really many they are all in > one > > transaction? > > yes. > > > In particular, what happens if the connection breaks at [Point > > A], e.g. by the network connection going down for an extended period > of time. > > Is it safe to assume that then everything will be rolled back? > > yes, every one of them would dissappear. > So it looks my three-call (beginBatch, pushData, EndBatch) calling interface can probably work. I need to work on how non-transactional outputs can convey what they have commited, but the basic interface looks rather good. Rainer > David Lang > > > Feedback is appreciated. > > > > Rainer > > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > >> Sent: Thursday, April 23, 2009 4:38 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > >> loginsertion(fwd) > >> > >> That's interesting. As a side-activity, I am thinking about a new > >> output > >> module interface. Especially given the discussion on the postgres > list, > >> but > >> also some other thoughts about other modules (e.g. omtcp or the file > >> output), > >> I tend to use an approach that permits both string-based as well as > >> API-based > >> (API as in libpq) ways of doing things. I have not really designed > >> anything, > >> but the rough idea is that each plugin needs three entry points: > >> > >> - start batch > >> - process single message > >> - end batch > >> > >> Then, the plugin can decide itself what it wants to do and when. > Most > >> importantly, this calling interface works well for string-based > >> transactions > >> as well as API-based ones. > >> > >> For the output file writer, for example, I envision that over time > it > >> will > >> have its own write buffer (for various reasons, for example I am > also > >> discussing zipped writing with some folks). With this interface, I > can > >> put > >> everything into the buffer, write out if needed but not if there is > no > >> immediate need but I can make sure that I write out when the "end > >> batch" > >> entry point is called. > >> > >> As I said, it is not really thought out yet, but maybe a starting > >> point. So > >> feedback is appreciated. > >> > >> Rainer > >> > >>> -----Original Message----- > >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>> Sent: Wednesday, April 22, 2009 10:11 PM > >>> To: rsyslog-users > >>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > >>> insertion(fwd) > >>> > >>> from the postgres performance mailing list, relative speeds of > >>> different > >>> ways of inserting data. > >>> > >>> I've asked if the 'seperate inserts' mode is seperate round trips > or > >>> many > >>> inserts in one round trip. > >>> > >>> based on this it looks like prepared statements make a difference, > >> but > >>> not > >>> so much that other techniques (either a single statement or a copy) > >>> aren't > >>> comparable (or better) options. > >>> > >>> David Lang > >>> > >>> ---------- Forwarded message ---------- > >>> Date: Wed, 22 Apr 2009 15:33:21 -0400 > >>> From: Glenn Maynard > >>> To: pgsql-performance at postgresql.org > >>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>> > >>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > >>> wrote: > >>>> Yes, as I beleive was mentioned already, planning time for inserts > >> is > >>>> really small. ?Parsing time for inserts when there's little > parsing > >>> that > >>>> has to happen also isn't all *that* expensive and the same goes > for > >>>> conversions from textual representations of data to binary. > >>>> > >>>> We're starting to re-hash things, in my view. ?The low-hanging > >> fruit > >>> is > >>>> doing multiple things in a single transaction, either by using > >> COPY, > >>>> multi-value INSERTs, or just multiple INSERTs in a single > >>> transaction. > >>>> That's absolutely step one. > >>> > >>> This is all well-known, covered information, but perhaps some > numbers > >>> will help drive this home. 40000 inserts into a single-column, > >>> unindexed table; with predictable results: > >>> > >>> separate inserts, no transaction: 21.21s > >>> separate inserts, same transaction: 1.89s > >>> 40 inserts, 100 rows/insert: 0.18s > >>> one 40000-value insert: 0.16s > >>> 40 prepared inserts, 100 rows/insert: 0.15s > >>> COPY (text): 0.10s > >>> COPY (binary): 0.10s > >>> > >>> Of course, real workloads will change the weights, but this is more > >> or > >>> less the magnitude of difference I always see--batch your inserts > >> into > >>> single statements, and if that's not enough, skip to COPY. > >>> > >>> -- > >>> Glenn Maynard > >>> > >>> -- > >>> Sent via pgsql-performance mailing list (pgsql- > >>> performance at postgresql.org) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com From david at lang.hm Fri Apr 24 08:45:00 2009 From: david at lang.hm (david at lang.hm) Date: Thu, 23 Apr 2009 23:45:00 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> Sent: Friday, April 24, 2009 7:57 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume >> loginsertion(fwd) >> >> On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> >>> Another innocent question: >>> >>> Let's say I used an exec() API exclusively. Now let me assume that I >> do, on >>> the *same* database connection, this calling sequence: >>> >>> exec("begin transaction") >>> exec("insert ...") >>> exec("insert ...") >>> exec("insert ...") >>> exec("insert ...") >>> exec("insert ...") >>> exec("insert ...") [Point A] >>> exec("commit") >>> >>> Is it safe to assume that this will result in a performance benefit >> (I know >>> that it causes more network traffic than necessary, but that's not my >> point - >>> I just talk of speedup). Will this performance speedup be >> considerable (along >>> the magnitude of 20 vs. 3 seconds for a given sequence?). >> >> Yes, this speedup would be considerable >> >> from the message at the bottom it would be on the order of >> >>>>> separate inserts, no transaction: 21.21s >>>>> separate inserts, same transaction: 1.89s > > I read this, just wanted some reconfirmation. consider it confirmed. I've run my own tests in the past and you get a HUGE benifit from this one step. depending on the particular database you may be able to continue to see benifits well beyond 100 inserts in a batch (I would start my testing at 100 and plan on going up to 1000) >> >> there is still another order of magnatude gain to be had by going to >> the >> copy (and eliminating the extra round trips) >> >>>>> COPY (text): 0.10s > > Definitely, but let's tackle the 90% issue first. > >> >> a copy looks something like >> >> copy to table X from STDIN >> data >> data >> data >> >> >>> Also, even more importantly, does this really many they are all in >> one >>> transaction? >> >> yes. >> >>> In particular, what happens if the connection breaks at [Point >>> A], e.g. by the network connection going down for an extended period >> of time. >>> Is it safe to assume that then everything will be rolled back? >> >> yes, every one of them would dissappear. >> > > So it looks my three-call (beginBatch, pushData, EndBatch) calling interface > can probably work. I need to work on how non-transactional outputs can convey > what they have commited, but the basic interface looks rather good. yes, although there is benifit in making these not be seperate exec statements but instead sending them to the database as you go along (I don't know the library well enough to know how to do a non-blocking call like this) or crafting one long string and sending it all at once. even if the pieces are generated by seperate write calls on the network filehandle, with a TCP datastream (and a fast sender), the number of round-trips may be far fewer than you think (what you create as seperate exec statements my earlier 4-part proposal (start, mid, stop, data) is _slightly_ more flexible in that it has the mid/joiv variable, allowing for something to appear between points of data, but not at the end. i.e. insert into table X values (),(); your 3-part version would end up with an extra , at the end. while this isn't critical it is an easy way to gain about another factor of 10 David Lang > Rainer > >> David Lang >> >>> Feedback is appreciated. >>> >>> Rainer >>> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >>>> Sent: Thursday, April 23, 2009 4:38 PM >>>> To: rsyslog-users >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume >>>> loginsertion(fwd) >>>> >>>> That's interesting. As a side-activity, I am thinking about a new >>>> output >>>> module interface. Especially given the discussion on the postgres >> list, >>>> but >>>> also some other thoughts about other modules (e.g. omtcp or the file >>>> output), >>>> I tend to use an approach that permits both string-based as well as >>>> API-based >>>> (API as in libpq) ways of doing things. I have not really designed >>>> anything, >>>> but the rough idea is that each plugin needs three entry points: >>>> >>>> - start batch >>>> - process single message >>>> - end batch >>>> >>>> Then, the plugin can decide itself what it wants to do and when. >> Most >>>> importantly, this calling interface works well for string-based >>>> transactions >>>> as well as API-based ones. >>>> >>>> For the output file writer, for example, I envision that over time >> it >>>> will >>>> have its own write buffer (for various reasons, for example I am >> also >>>> discussing zipped writing with some folks). With this interface, I >> can >>>> put >>>> everything into the buffer, write out if needed but not if there is >> no >>>> immediate need but I can make sure that I write out when the "end >>>> batch" >>>> entry point is called. >>>> >>>> As I said, it is not really thought out yet, but maybe a starting >>>> point. So >>>> feedback is appreciated. >>>> >>>> Rainer >>>> >>>>> -----Original Message----- >>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>>> Sent: Wednesday, April 22, 2009 10:11 PM >>>>> To: rsyslog-users >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log >>>>> insertion(fwd) >>>>> >>>>> from the postgres performance mailing list, relative speeds of >>>>> different >>>>> ways of inserting data. >>>>> >>>>> I've asked if the 'seperate inserts' mode is seperate round trips >> or >>>>> many >>>>> inserts in one round trip. >>>>> >>>>> based on this it looks like prepared statements make a difference, >>>> but >>>>> not >>>>> so much that other techniques (either a single statement or a copy) >>>>> aren't >>>>> comparable (or better) options. >>>>> >>>>> David Lang >>>>> >>>>> ---------- Forwarded message ---------- >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400 >>>>> From: Glenn Maynard >>>>> To: pgsql-performance at postgresql.org >>>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>>> >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost >>>>> wrote: >>>>>> Yes, as I beleive was mentioned already, planning time for inserts >>>> is >>>>>> really small. ?Parsing time for inserts when there's little >> parsing >>>>> that >>>>>> has to happen also isn't all *that* expensive and the same goes >> for >>>>>> conversions from textual representations of data to binary. >>>>>> >>>>>> We're starting to re-hash things, in my view. ?The low-hanging >>>> fruit >>>>> is >>>>>> doing multiple things in a single transaction, either by using >>>> COPY, >>>>>> multi-value INSERTs, or just multiple INSERTs in a single >>>>> transaction. >>>>>> That's absolutely step one. >>>>> >>>>> This is all well-known, covered information, but perhaps some >> numbers >>>>> will help drive this home. 40000 inserts into a single-column, >>>>> unindexed table; with predictable results: >>>>> >>>>> separate inserts, no transaction: 21.21s >>>>> separate inserts, same transaction: 1.89s >>>>> 40 inserts, 100 rows/insert: 0.18s >>>>> one 40000-value insert: 0.16s >>>>> 40 prepared inserts, 100 rows/insert: 0.15s >>>>> COPY (text): 0.10s >>>>> COPY (binary): 0.10s >>>>> >>>>> Of course, real workloads will change the weights, but this is more >>>> or >>>>> less the magnitude of difference I always see--batch your inserts >>>> into >>>>> single statements, and if that's not enough, skip to COPY. >>>>> >>>>> -- >>>>> Glenn Maynard >>>>> >>>>> -- >>>>> Sent via pgsql-performance mailing list (pgsql- >>>>> performance at postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri Apr 24 08:54:30 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 08:54:30 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, April 24, 2009 8:45 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > So it looks my three-call (beginBatch, pushData, EndBatch) calling > interface > > can probably work. I need to work on how non-transactional outputs > can convey > > what they have commited, but the basic interface looks rather good. > > yes, although there is benifit in making these not be seperate exec > statements but instead sending them to the database as you go along (I Definitely, but I'd consider this an implementation detail. If it is worth it, every plugin in question may implement this mode. I'd also say it is not too much work, depending on what "too much" means to you ;) > don't know the library well enough to know how to do a non-blocking > call > like this) or crafting one long string and sending it all at once. even > if > the pieces are generated by seperate write calls on the network > filehandle, with a TCP datastream (and a fast sender), the number of > round-trips may be far fewer than you think (what you create as > seperate > exec statements > > my earlier 4-part proposal (start, mid, stop, data) is _slightly_ more > flexible in that it has the mid/joiv variable, allowing for something > to > appear between points of data, but not at the end. > > i.e. > > insert into table X values (),(); > > your 3-part version would end up with an extra , at the end. > > while this isn't critical it is an easy way to gain about another > factor > of 10 I'd draw a subtle line here. I think what you propose is valid and right, but it is not something that belongs into the output plugin interface. Let's use my triplet (beginBatch, pushData, endBatch) for a while. On top of that calling interface, the plugin can add strings in its configuration (NOT an interface issue!). So it could use the calling interface as follows: beginBatch: emit start pushData: if not first element in batch emit mid emit data endBatch emit stop The question now is if there should be support in the core engine for the If not first element in batch Add mid functionality. I am not sure if there are other plugins but databases that could use it. So far, I doubt this (the file writer not, forwarding not, snmp not, email? Not sure, but don't think so). If it is just a db thing, it does not belong into the core. Rainer > > David Lang > > > Rainer > > > >> David Lang > >> > >>> Feedback is appreciated. > >>> > >>> Rainer > >>> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > >>>> Sent: Thursday, April 23, 2009 4:38 PM > >>>> To: rsyslog-users > >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > >>>> loginsertion(fwd) > >>>> > >>>> That's interesting. As a side-activity, I am thinking about a new > >>>> output > >>>> module interface. Especially given the discussion on the postgres > >> list, > >>>> but > >>>> also some other thoughts about other modules (e.g. omtcp or the > file > >>>> output), > >>>> I tend to use an approach that permits both string-based as well > as > >>>> API-based > >>>> (API as in libpq) ways of doing things. I have not really designed > >>>> anything, > >>>> but the rough idea is that each plugin needs three entry points: > >>>> > >>>> - start batch > >>>> - process single message > >>>> - end batch > >>>> > >>>> Then, the plugin can decide itself what it wants to do and when. > >> Most > >>>> importantly, this calling interface works well for string-based > >>>> transactions > >>>> as well as API-based ones. > >>>> > >>>> For the output file writer, for example, I envision that over time > >> it > >>>> will > >>>> have its own write buffer (for various reasons, for example I am > >> also > >>>> discussing zipped writing with some folks). With this interface, I > >> can > >>>> put > >>>> everything into the buffer, write out if needed but not if there > is > >> no > >>>> immediate need but I can make sure that I write out when the "end > >>>> batch" > >>>> entry point is called. > >>>> > >>>> As I said, it is not really thought out yet, but maybe a starting > >>>> point. So > >>>> feedback is appreciated. > >>>> > >>>> Rainer > >>>> > >>>>> -----Original Message----- > >>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>>>> Sent: Wednesday, April 22, 2009 10:11 PM > >>>>> To: rsyslog-users > >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > >>>>> insertion(fwd) > >>>>> > >>>>> from the postgres performance mailing list, relative speeds of > >>>>> different > >>>>> ways of inserting data. > >>>>> > >>>>> I've asked if the 'seperate inserts' mode is seperate round trips > >> or > >>>>> many > >>>>> inserts in one round trip. > >>>>> > >>>>> based on this it looks like prepared statements make a > difference, > >>>> but > >>>>> not > >>>>> so much that other techniques (either a single statement or a > copy) > >>>>> aren't > >>>>> comparable (or better) options. > >>>>> > >>>>> David Lang > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400 > >>>>> From: Glenn Maynard > >>>>> To: pgsql-performance at postgresql.org > >>>>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>>>> > >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > > >>>>> wrote: > >>>>>> Yes, as I beleive was mentioned already, planning time for > inserts > >>>> is > >>>>>> really small. ?Parsing time for inserts when there's little > >> parsing > >>>>> that > >>>>>> has to happen also isn't all *that* expensive and the same goes > >> for > >>>>>> conversions from textual representations of data to binary. > >>>>>> > >>>>>> We're starting to re-hash things, in my view. ?The low-hanging > >>>> fruit > >>>>> is > >>>>>> doing multiple things in a single transaction, either by using > >>>> COPY, > >>>>>> multi-value INSERTs, or just multiple INSERTs in a single > >>>>> transaction. > >>>>>> That's absolutely step one. > >>>>> > >>>>> This is all well-known, covered information, but perhaps some > >> numbers > >>>>> will help drive this home. 40000 inserts into a single-column, > >>>>> unindexed table; with predictable results: > >>>>> > >>>>> separate inserts, no transaction: 21.21s > >>>>> separate inserts, same transaction: 1.89s > >>>>> 40 inserts, 100 rows/insert: 0.18s > >>>>> one 40000-value insert: 0.16s > >>>>> 40 prepared inserts, 100 rows/insert: 0.15s > >>>>> COPY (text): 0.10s > >>>>> COPY (binary): 0.10s > >>>>> > >>>>> Of course, real workloads will change the weights, but this is > more > >>>> or > >>>>> less the magnitude of difference I always see--batch your inserts > >>>> into > >>>>> single statements, and if that's not enough, skip to COPY. > >>>>> > >>>>> -- > >>>>> Glenn Maynard > >>>>> > >>>>> -- > >>>>> Sent via pgsql-performance mailing list (pgsql- > >>>>> performance at postgresql.org) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com From david at lang.hm Fri Apr 24 09:00:36 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 24 Apr 2009 00:00:36 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> loginsertion(fwd) >>> So it looks my three-call (beginBatch, pushData, EndBatch) calling >> interface >>> can probably work. I need to work on how non-transactional outputs >> can convey >>> what they have commited, but the basic interface looks rather good. >> >> yes, although there is benifit in making these not be seperate exec >> statements but instead sending them to the database as you go along (I > > Definitely, but I'd consider this an implementation detail. If it is worth > it, every plugin in question may implement this mode. I'd also say it is not > too much work, depending on what "too much" means to you ;) > >> don't know the library well enough to know how to do a non-blocking >> call >> like this) or crafting one long string and sending it all at once. even >> if >> the pieces are generated by seperate write calls on the network >> filehandle, with a TCP datastream (and a fast sender), the number of >> round-trips may be far fewer than you think (what you create as >> seperate >> exec statements >> >> my earlier 4-part proposal (start, mid, stop, data) is _slightly_ more >> flexible in that it has the mid/joiv variable, allowing for something >> to >> appear between points of data, but not at the end. >> >> i.e. >> >> insert into table X values (),(); >> >> your 3-part version would end up with an extra , at the end. >> >> while this isn't critical it is an easy way to gain about another >> factor >> of 10 > > I'd draw a subtle line here. I think what you propose is valid and right, but > it is not something that belongs into the output plugin interface. > > Let's use my triplet (beginBatch, pushData, endBatch) for a while. On top of > that calling interface, the plugin can add strings in its configuration (NOT > an interface issue!). So it could use the calling interface as follows: > > beginBatch: > emit start > > pushData: > if not first element in batch > emit mid > emit data > > endBatch > emit stop > > The question now is if there should be support in the core engine for the > > If not first element in batch > Add mid > > functionality. I am not sure if there are other plugins but databases that > could use it. So far, I doubt this (the file writer not, forwarding not, snmp > not, email? Not sure, but don't think so). If it is just a db thing, it does > not belong into the core. that logic can work for every module that needs it, so this shouldn't be an issue.. I was mixing up the API with the need for a config variable. David Lang > Rainer > >> >> David Lang >> >>> Rainer >>> >>>> David Lang >>>> >>>>> Feedback is appreciated. >>>>> >>>>> Rainer >>>>> >>>>>> -----Original Message----- >>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards >>>>>> Sent: Thursday, April 23, 2009 4:38 PM >>>>>> To: rsyslog-users >>>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume >>>>>> loginsertion(fwd) >>>>>> >>>>>> That's interesting. As a side-activity, I am thinking about a new >>>>>> output >>>>>> module interface. Especially given the discussion on the postgres >>>> list, >>>>>> but >>>>>> also some other thoughts about other modules (e.g. omtcp or the >> file >>>>>> output), >>>>>> I tend to use an approach that permits both string-based as well >> as >>>>>> API-based >>>>>> (API as in libpq) ways of doing things. I have not really designed >>>>>> anything, >>>>>> but the rough idea is that each plugin needs three entry points: >>>>>> >>>>>> - start batch >>>>>> - process single message >>>>>> - end batch >>>>>> >>>>>> Then, the plugin can decide itself what it wants to do and when. >>>> Most >>>>>> importantly, this calling interface works well for string-based >>>>>> transactions >>>>>> as well as API-based ones. >>>>>> >>>>>> For the output file writer, for example, I envision that over time >>>> it >>>>>> will >>>>>> have its own write buffer (for various reasons, for example I am >>>> also >>>>>> discussing zipped writing with some folks). With this interface, I >>>> can >>>>>> put >>>>>> everything into the buffer, write out if needed but not if there >> is >>>> no >>>>>> immediate need but I can make sure that I write out when the "end >>>>>> batch" >>>>>> entry point is called. >>>>>> >>>>>> As I said, it is not really thought out yet, but maybe a starting >>>>>> point. So >>>>>> feedback is appreciated. >>>>>> >>>>>> Rainer >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>>>>> Sent: Wednesday, April 22, 2009 10:11 PM >>>>>>> To: rsyslog-users >>>>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log >>>>>>> insertion(fwd) >>>>>>> >>>>>>> from the postgres performance mailing list, relative speeds of >>>>>>> different >>>>>>> ways of inserting data. >>>>>>> >>>>>>> I've asked if the 'seperate inserts' mode is seperate round trips >>>> or >>>>>>> many >>>>>>> inserts in one round trip. >>>>>>> >>>>>>> based on this it looks like prepared statements make a >> difference, >>>>>> but >>>>>>> not >>>>>>> so much that other techniques (either a single statement or a >> copy) >>>>>>> aren't >>>>>>> comparable (or better) options. >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400 >>>>>>> From: Glenn Maynard >>>>>>> To: pgsql-performance at postgresql.org >>>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>>>>> >>>>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost >> >>>>>>> wrote: >>>>>>>> Yes, as I beleive was mentioned already, planning time for >> inserts >>>>>> is >>>>>>>> really small. ?Parsing time for inserts when there's little >>>> parsing >>>>>>> that >>>>>>>> has to happen also isn't all *that* expensive and the same goes >>>> for >>>>>>>> conversions from textual representations of data to binary. >>>>>>>> >>>>>>>> We're starting to re-hash things, in my view. ?The low-hanging >>>>>> fruit >>>>>>> is >>>>>>>> doing multiple things in a single transaction, either by using >>>>>> COPY, >>>>>>>> multi-value INSERTs, or just multiple INSERTs in a single >>>>>>> transaction. >>>>>>>> That's absolutely step one. >>>>>>> >>>>>>> This is all well-known, covered information, but perhaps some >>>> numbers >>>>>>> will help drive this home. 40000 inserts into a single-column, >>>>>>> unindexed table; with predictable results: >>>>>>> >>>>>>> separate inserts, no transaction: 21.21s >>>>>>> separate inserts, same transaction: 1.89s >>>>>>> 40 inserts, 100 rows/insert: 0.18s >>>>>>> one 40000-value insert: 0.16s >>>>>>> 40 prepared inserts, 100 rows/insert: 0.15s >>>>>>> COPY (text): 0.10s >>>>>>> COPY (binary): 0.10s >>>>>>> >>>>>>> Of course, real workloads will change the weights, but this is >> more >>>>>> or >>>>>>> less the magnitude of difference I always see--batch your inserts >>>>>> into >>>>>>> single statements, and if that's not enough, skip to COPY. >>>>>>> >>>>>>> -- >>>>>>> Glenn Maynard >>>>>>> >>>>>>> -- >>>>>>> Sent via pgsql-performance mailing list (pgsql- >>>>>>> performance at postgresql.org) >>>>>>> To make changes to your subscription: >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri Apr 24 09:01:52 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 09:01:52 +0200 Subject: [rsyslog] [PERFORM] performance for high-volumeloginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF7C@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > Sent: Friday, April 24, 2009 8:55 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high- > volumeloginsertion(fwd) > > > > -----Original Message----- > > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > Sent: Friday, April 24, 2009 8:45 AM > > To: rsyslog-users > > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > > > > loginsertion(fwd) > > > So it looks my three-call (beginBatch, pushData, EndBatch) calling > > interface > > > can probably work. I need to work on how non-transactional outputs > > can convey > > > what they have commited, but the basic interface looks rather good. > > > > yes, although there is benifit in making these not be seperate exec > > statements but instead sending them to the database as you go along > (I > > Definitely, but I'd consider this an implementation detail. If it is > worth > it, every plugin in question may implement this mode. I'd also say it > is not > too much work, depending on what "too much" means to you ;) > > > don't know the library well enough to know how to do a non-blocking > > call > > like this) or crafting one long string and sending it all at once. > even > > if > > the pieces are generated by seperate write calls on the network > > filehandle, with a TCP datastream (and a fast sender), the number of > > round-trips may be far fewer than you think (what you create as > > seperate > > exec statements > > > > my earlier 4-part proposal (start, mid, stop, data) is _slightly_ > more > > flexible in that it has the mid/joiv variable, allowing for something > > to > > appear between points of data, but not at the end. > > > > i.e. > > > > insert into table X values (),(); > > > > your 3-part version would end up with an extra , at the end. > > > > while this isn't critical it is an easy way to gain about another > > factor > > of 10 > > I'd draw a subtle line here. I think what you propose is valid and > right, but > it is not something that belongs into the output plugin interface. An additional clarification: you talk about string building, I talk about callbacks. Both things need to go together, but I think we are talking about separate entities. I currently think we need a triplet for the callbacks, but a quadruple for the string builder. I am not yet convinced that we need to put the string builder (using the quadruple) into the core. > > Let's use my triplet (beginBatch, pushData, endBatch) for a while. On > top of > that calling interface, the plugin can add strings in its configuration > (NOT > an interface issue!). So it could use the calling interface as follows: > > beginBatch: > emit start > > pushData: > if not first element in batch > emit mid > emit data > > endBatch > emit stop > > The question now is if there should be support in the core engine for > the > > If not first element in batch > Add mid > > functionality. I am not sure if there are other plugins but databases > that > could use it. So far, I doubt this (the file writer not, forwarding > not, snmp > not, email? Not sure, but don't think so). If it is just a db thing, it > does > not belong into the core. > > Rainer > > > > > David Lang > > > > > Rainer > > > > > >> David Lang > > >> > > >>> Feedback is appreciated. > > >>> > > >>> Rainer > > >>> > > >>>> -----Original Message----- > > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > >>>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards > > >>>> Sent: Thursday, April 23, 2009 4:38 PM > > >>>> To: rsyslog-users > > >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > > >>>> loginsertion(fwd) > > >>>> > > >>>> That's interesting. As a side-activity, I am thinking about a > new > > >>>> output > > >>>> module interface. Especially given the discussion on the > postgres > > >> list, > > >>>> but > > >>>> also some other thoughts about other modules (e.g. omtcp or the > > file > > >>>> output), > > >>>> I tend to use an approach that permits both string-based as well > > as > > >>>> API-based > > >>>> (API as in libpq) ways of doing things. I have not really > designed > > >>>> anything, > > >>>> but the rough idea is that each plugin needs three entry points: > > >>>> > > >>>> - start batch > > >>>> - process single message > > >>>> - end batch > > >>>> > > >>>> Then, the plugin can decide itself what it wants to do and when. > > >> Most > > >>>> importantly, this calling interface works well for string-based > > >>>> transactions > > >>>> as well as API-based ones. > > >>>> > > >>>> For the output file writer, for example, I envision that over > time > > >> it > > >>>> will > > >>>> have its own write buffer (for various reasons, for example I am > > >> also > > >>>> discussing zipped writing with some folks). With this interface, > I > > >> can > > >>>> put > > >>>> everything into the buffer, write out if needed but not if there > > is > > >> no > > >>>> immediate need but I can make sure that I write out when the > "end > > >>>> batch" > > >>>> entry point is called. > > >>>> > > >>>> As I said, it is not really thought out yet, but maybe a > starting > > >>>> point. So > > >>>> feedback is appreciated. > > >>>> > > >>>> Rainer > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > > >>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > >>>>> Sent: Wednesday, April 22, 2009 10:11 PM > > >>>>> To: rsyslog-users > > >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > log > > >>>>> insertion(fwd) > > >>>>> > > >>>>> from the postgres performance mailing list, relative speeds of > > >>>>> different > > >>>>> ways of inserting data. > > >>>>> > > >>>>> I've asked if the 'seperate inserts' mode is seperate round > trips > > >> or > > >>>>> many > > >>>>> inserts in one round trip. > > >>>>> > > >>>>> based on this it looks like prepared statements make a > > difference, > > >>>> but > > >>>>> not > > >>>>> so much that other techniques (either a single statement or a > > copy) > > >>>>> aren't > > >>>>> comparable (or better) options. > > >>>>> > > >>>>> David Lang > > >>>>> > > >>>>> ---------- Forwarded message ---------- > > >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400 > > >>>>> From: Glenn Maynard > > >>>>> To: pgsql-performance at postgresql.org > > >>>>> Subject: Re: [PERFORM] performance for high-volume log > insertion > > >>>>> > > >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > > > > >>>>> wrote: > > >>>>>> Yes, as I beleive was mentioned already, planning time for > > inserts > > >>>> is > > >>>>>> really small. ?Parsing time for inserts when there's little > > >> parsing > > >>>>> that > > >>>>>> has to happen also isn't all *that* expensive and the same > goes > > >> for > > >>>>>> conversions from textual representations of data to binary. > > >>>>>> > > >>>>>> We're starting to re-hash things, in my view. ?The low-hanging > > >>>> fruit > > >>>>> is > > >>>>>> doing multiple things in a single transaction, either by using > > >>>> COPY, > > >>>>>> multi-value INSERTs, or just multiple INSERTs in a single > > >>>>> transaction. > > >>>>>> That's absolutely step one. > > >>>>> > > >>>>> This is all well-known, covered information, but perhaps some > > >> numbers > > >>>>> will help drive this home. 40000 inserts into a single-column, > > >>>>> unindexed table; with predictable results: > > >>>>> > > >>>>> separate inserts, no transaction: 21.21s > > >>>>> separate inserts, same transaction: 1.89s > > >>>>> 40 inserts, 100 rows/insert: 0.18s > > >>>>> one 40000-value insert: 0.16s > > >>>>> 40 prepared inserts, 100 rows/insert: 0.15s > > >>>>> COPY (text): 0.10s > > >>>>> COPY (binary): 0.10s > > >>>>> > > >>>>> Of course, real workloads will change the weights, but this is > > more > > >>>> or > > >>>>> less the magnitude of difference I always see--batch your > inserts > > >>>> into > > >>>>> single statements, and if that's not enough, skip to COPY. > > >>>>> > > >>>>> -- > > >>>>> Glenn Maynard > > >>>>> > > >>>>> -- > > >>>>> Sent via pgsql-performance mailing list (pgsql- > > >>>>> performance at postgresql.org) > > >>>>> To make changes to your subscription: > > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > > >>>> _______________________________________________ > > >>>> rsyslog mailing list > > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>>> http://www.rsyslog.com > > >>> _______________________________________________ > > >>> rsyslog mailing list > > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>> http://www.rsyslog.com > > > _______________________________________________ > > > rsyslog mailing list > > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Fri Apr 24 09:03:43 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 09:03:43 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, April 24, 2009 9:01 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > that logic can work for every module that needs it, so this shouldn't > be > an issue.. I was mixing up the API with the need for a config variable. > > David Lang Lol, our messages crossed. Still the question is if the stringbuilder should support this mode. If so, we need to make deep changes to the way the property replacer works, thus I am hesitant to do this without real needs (not just because of the work to be done, but also because of the extra complexity [read: bugs, performance] it introduces). Rainer From david at lang.hm Fri Apr 24 09:19:34 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 24 Apr 2009 00:19:34 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> that logic can work for every module that needs it, so this shouldn't >> be >> an issue.. I was mixing up the API with the need for a config variable. >> >> David Lang > > Lol, our messages crossed. Still the question is if the stringbuilder should > support this mode. If so, we need to make deep changes to the way the > property replacer works, thus I am hesitant to do this without real needs > (not just because of the work to be done, but also because of the extra > complexity [read: bugs, performance] it introduces). if the string builder is not in the core, the output module can do the work for 'mid' as/if needed. table it for now. food for thought (which may affect the decision when it happens), there is benifit in doing the database equivalent of dynafiles (inserting into different tables depending on the contents of the message) for pretty much the same reasons it's useful to do for flat files. the part that does the string building needs to know about dynafiles/'dynatables' and craft the strings to be written accordingly. since syslog does not guarentee the order of events, selecting 'like' items from the set that have been provided to it and grouping accordingly is well within the right of the output side of things. David Lang From rgerhards at hq.adiscon.com Fri Apr 24 09:23:11 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 09:23:11 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF7E@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, April 24, 2009 9:20 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > On Fri, 24 Apr 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > > > >> that logic can work for every module that needs it, so this > shouldn't > >> be > >> an issue.. I was mixing up the API with the need for a config > variable. > >> > >> David Lang > > > > Lol, our messages crossed. Still the question is if the stringbuilder > should > > support this mode. If so, we need to make deep changes to the way the > > property replacer works, thus I am hesitant to do this without real > needs > > (not just because of the work to be done, but also because of the > extra > > complexity [read: bugs, performance] it introduces). > > if the string builder is not in the core, the output module can do the > work for 'mid' as/if needed. > > table it for now. > > food for thought (which may affect the decision when it happens), there > is > benifit in doing the database equivalent of dynafiles (inserting into Just to avoid misunderstanding: today, this is easy to acomplish - you just need to use a property replacer expression inside the template string like this "insert into syslog%hostname% ...". That, of course, will not work any longer if we go for prepared statements (indeed another subtlety). I'd still expect that it works with the begin... insert* ... end exec calls. Rainer > different tables depending on the contents of the message) for pretty > much > the same reasons it's useful to do for flat files. > > the part that does the string building needs to know about > dynafiles/'dynatables' and craft the strings to be written accordingly. > > since syslog does not guarentee the order of events, selecting 'like' > items from the set that have been provided to it and grouping > accordingly > is well within the right of the output side of things. > > David Lang > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Fri Apr 24 09:30:15 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 24 Apr 2009 00:30:15 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF7E@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF7E@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>> >>>> that logic can work for every module that needs it, so this >> shouldn't >>>> be >>>> an issue.. I was mixing up the API with the need for a config >> variable. >>>> >>>> David Lang >>> >>> Lol, our messages crossed. Still the question is if the stringbuilder >> should >>> support this mode. If so, we need to make deep changes to the way the >>> property replacer works, thus I am hesitant to do this without real >> needs >>> (not just because of the work to be done, but also because of the >> extra >>> complexity [read: bugs, performance] it introduces). >> >> if the string builder is not in the core, the output module can do the >> work for 'mid' as/if needed. >> >> table it for now. >> >> food for thought (which may affect the decision when it happens), there >> is >> benifit in doing the database equivalent of dynafiles (inserting into > > Just to avoid misunderstanding: today, this is easy to acomplish - you just > need to use a property replacer expression inside the template string like > this "insert into syslog%hostname% ...". correct (as long as you didn't need to create the tables) > That, of course, will not work any longer if we go for prepared statements > (indeed another subtlety). I'd still expect that it works with the begin... > insert* ... end exec calls. it will work with the begin;insert;end approach for prepared statements you would need to prepare one for each destination (similar to creating/opening files), and if you ended up doing copy or multi-value inserts you would need to do seperate ones for different destinations. but in any case, issues for another day. David Lang > Rainer > >> different tables depending on the contents of the message) for pretty >> much >> the same reasons it's useful to do for flat files. >> >> the part that does the string building needs to know about >> dynafiles/'dynatables' and craft the strings to be written accordingly. >> >> since syslog does not guarentee the order of events, selecting 'like' >> items from the set that have been provided to it and grouping >> accordingly >> is well within the right of the output side of things. >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > From rgerhards at hq.adiscon.com Fri Apr 24 09:58:15 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Fri, 24 Apr 2009 09:58:15 +0200 Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7E@GRFEXC.intern.adiscon.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF80@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of david at lang.hm > Sent: Friday, April 24, 2009 9:30 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > On Fri, 24 Apr 2009, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >> > >> On Fri, 24 Apr 2009, Rainer Gerhards wrote: > >> > >>>> -----Original Message----- > >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm > >>> > >>>> that logic can work for every module that needs it, so this > >> shouldn't > >>>> be > >>>> an issue.. I was mixing up the API with the need for a config > >> variable. > >>>> > >>>> David Lang > >>> > >>> Lol, our messages crossed. Still the question is if the > stringbuilder > >> should > >>> support this mode. If so, we need to make deep changes to the way > the > >>> property replacer works, thus I am hesitant to do this without real > >> needs > >>> (not just because of the work to be done, but also because of the > >> extra > >>> complexity [read: bugs, performance] it introduces). > >> > >> if the string builder is not in the core, the output module can do > the > >> work for 'mid' as/if needed. > >> > >> table it for now. > >> > >> food for thought (which may affect the decision when it happens), > there > >> is > >> benifit in doing the database equivalent of dynafiles (inserting > into > > > > Just to avoid misunderstanding: today, this is easy to acomplish - > you just > > need to use a property replacer expression inside the template string > like > > this "insert into syslog%hostname% ...". > > correct (as long as you didn't need to create the tables) > > > That, of course, will not work any longer if we go for prepared > statements > > (indeed another subtlety). I'd still expect that it works with the > begin... > > insert* ... end exec calls. > > it will work with the begin;insert;end approach > > for prepared statements you would need to prepare one for each > destination > (similar to creating/opening files), and if you ended up doing copy or > multi-value inserts you would need to do seperate ones for different > destinations. > > but in any case, issues for another day. Yep - and not only because of time zones ;) I think it is vital to get the calling interface ready (we had quite some progress already :)). The layers below it are important (to help crafting he interface), but some issues I think do not need to be explored in full detail at this level. The bottom line, I think, is that this is a database-specific problem and other plugins probably do not have any such issue. So, we could handle that in a database abstraction, but not the general calling interface (assuming that the calling interface provides the necessary plumbing). Please all let me know if I am overlooking something. Thanks, Rainer From david at lang.hm Fri Apr 24 10:06:14 2009 From: david at lang.hm (david at lang.hm) Date: Fri, 24 Apr 2009 01:06:14 -0700 (PDT) Subject: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd) In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF80@GRFEXC.intern.adiscon.com> References: <9B6E2A8877C38245BFB15CC491A11DA702AF6F@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF78@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7A@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7B@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7D@GRFEXC.intern.adiscon.com><9B6E2A8877C38245BFB15CC491A11DA702AF7E@GRFEXC.intern.adiscon.com> <9B6E2A8877C38245BFB15CC491A11DA702AF80@GRFEXC.intern.adiscon.com> Message-ID: On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >> >> On Fri, 24 Apr 2009, Rainer Gerhards wrote: >> >>>> -----Original Message----- >>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>> >>>> On Fri, 24 Apr 2009, Rainer Gerhards wrote: >>>> >>>>>> -----Original Message----- >>>>>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- >>>>>> bounces at lists.adiscon.com] On Behalf Of david at lang.hm >>>>> >>>>>> that logic can work for every module that needs it, so this >>>> shouldn't >>>>>> be >>>>>> an issue.. I was mixing up the API with the need for a config >>>> variable. >>>>>> >>>>>> David Lang >>>>> >>>>> Lol, our messages crossed. Still the question is if the >> stringbuilder >>>> should >>>>> support this mode. If so, we need to make deep changes to the way >> the >>>>> property replacer works, thus I am hesitant to do this without real >>>> needs >>>>> (not just because of the work to be done, but also because of the >>>> extra >>>>> complexity [read: bugs, performance] it introduces). >>>> >>>> if the string builder is not in the core, the output module can do >> the >>>> work for 'mid' as/if needed. >>>> >>>> table it for now. >>>> >>>> food for thought (which may affect the decision when it happens), >> there >>>> is >>>> benifit in doing the database equivalent of dynafiles (inserting >> into >>> >>> Just to avoid misunderstanding: today, this is easy to acomplish - >> you just >>> need to use a property replacer expression inside the template string >> like >>> this "insert into syslog%hostname% ...". >> >> correct (as long as you didn't need to create the tables) >> >>> That, of course, will not work any longer if we go for prepared >> statements >>> (indeed another subtlety). I'd still expect that it works with the >> begin... >>> insert* ... end exec calls. >> >> it will work with the begin;insert;end approach >> >> for prepared statements you would need to prepare one for each >> destination >> (similar to creating/opening files), and if you ended up doing copy or >> multi-value inserts you would need to do seperate ones for different >> destinations. >> >> but in any case, issues for another day. > > Yep - and not only because of time zones ;) > > I think it is vital to get the calling interface ready (we had quite some > progress already :)). The layers below it are important (to help crafting he > interface), but some issues I think do not need to be explored in full detail > at this level. > > The bottom line, I think, is that this is a database-specific problem and > other plugins probably do not have any such issue. So, we could handle that > in a database abstraction, but not the general calling interface (assuming > that the calling interface provides the necessary plumbing). > > Please all let me know if I am overlooking something. sounds good to me. as you can tell I will speak up if I think you missed something :-) David Lang From jfs.world at gmail.com Mon Apr 27 06:47:58 2009 From: jfs.world at gmail.com (Jeffrey 'jf' Lim) Date: Mon, 27 Apr 2009 12:47:58 +0800 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files Message-ID: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> I'm using rsyslog now to log to dynafiles that are marked with a date. What i'm discovering is that rsyslog is holding on to the log files longer than I would want them to. Ideally, of course, rsyslog should release the file after the date is over - but given that rsyslog has no way of knowing (unless this intelligence is built in to the dynafile/template parsing!) that this is a file that will never be written to again, is there a way to get rsyslog to "release" the files? "$HUPIsRestart" looks like the solution - but unfortunately, it only works for v4... I would prefer not to do a "hard restart", cos I am under the impression that I will lose some log entries this way... -jf -- In the meantime, here is your PSA: "It's so hard to write a graphics driver that open-sourcing it would not help." -- Andrew Fear, Software Product Manager, NVIDIA Corporation http://kerneltrap.org/node/7228 From david at lang.hm Mon Apr 27 07:00:48 2009 From: david at lang.hm (david at lang.hm) Date: Sun, 26 Apr 2009 22:00:48 -0700 (PDT) Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> Message-ID: On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > I'm using rsyslog now to log to dynafiles that are marked with a date. > What i'm discovering is that rsyslog is holding on to the log files > longer than I would want them to. Ideally, of course, rsyslog should > release the file after the date is over - but given that rsyslog has > no way of knowing (unless this intelligence is built in to the > dynafile/template parsing!) that this is a file that will never be > written to again, is there a way to get rsyslog to "release" the > files? remember that logs can arrive out of order (especially with relays that could queue messages), so there is no way to know for sure that there won't be 'old' logs. > "$HUPIsRestart" looks like the solution - but unfortunately, it only > works for v4... I would prefer not to do a "hard restart", cos I am > under the impression that I will lose some log entries this way... prior to V4 HUPIsRestart is always on, but it will cause you to loose any logs that are in the queue at the time of the HUP. on these versions you do not have any good option. if you really believe that you will no longer get any old messages to go into the files, you can go ahead and issue a mv, gzip, etc to deal with the old logs, but the disk space will not be freed until rsyslog restarts. if HUPIsRestart is off (on V4), does that release the files appropriately? Davd Lang From jfs.world at gmail.com Mon Apr 27 07:06:08 2009 From: jfs.world at gmail.com (Jeffrey 'jf' Lim) Date: Mon, 27 Apr 2009 13:06:08 +0800 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> Message-ID: <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> On Mon, Apr 27, 2009 at 1:00 PM, wrote: > On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > >> I'm using rsyslog now to log to dynafiles that are marked with a date. >> What i'm discovering is that rsyslog is holding on to the log files >> longer than I would want them to. Ideally, of course, rsyslog should >> release the file after the date is over - but given that rsyslog has >> no way of knowing (unless this intelligence is built in to the >> dynafile/template parsing!) that this is a file that will never be >> written to again, is there a way to get rsyslog to "release" the >> files? > > remember that logs can arrive out of order (especially with relays that > could queue messages), so there is no way to know for sure that there > won't be 'old' logs. > :) well i can be pretty sure a few hours (or at least a few days) past the date! >> "$HUPIsRestart" looks like the solution - but unfortunately, it only >> works for v4... I would prefer not to do a "hard restart", cos I am >> under the impression that I will lose some log entries this way... > > prior to V4 HUPIsRestart is always on, but it will cause you to loose any > logs that are in the queue at the time of the HUP. on these versions you > do not have any good option. right. Thanks for the confirmation. > if you really believe that you will no longer > get any old messages to go into the files, you can go ahead and issue a > mv, gzip, etc to deal with the old logs, but the disk space will not be > freed until rsyslog restarts. > yeah, that's the thing. It's the disk space that's still hung on to that bothers me. Else I would just let rsyslog hang on to those files. > if HUPIsRestart is off (on V4), does that release the files appropriately? > I dont have v4, so cant say. Is v4 ready for production? -jf -- In the meantime, here is your PSA: "It's so hard to write a graphics driver that open-sourcing it would not help." -- Andrew Fear, Software Product Manager, NVIDIA Corporation http://kerneltrap.org/node/7228 From david at lang.hm Mon Apr 27 07:53:08 2009 From: david at lang.hm (david at lang.hm) Date: Sun, 26 Apr 2009 22:53:08 -0700 (PDT) Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> Message-ID: On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > On Mon, Apr 27, 2009 at 1:00 PM, wrote: >> On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: >> >>> I'm using rsyslog now to log to dynafiles that are marked with a date. >>> What i'm discovering is that rsyslog is holding on to the log files >>> longer than I would want them to. Ideally, of course, rsyslog should >>> release the file after the date is over - but given that rsyslog has >>> no way of knowing (unless this intelligence is built in to the >>> dynafile/template parsing!) that this is a file that will never be >>> written to again, is there a way to get rsyslog to "release" the >>> files? >> >> remember that logs can arrive out of order (especially with relays that >> could queue messages), so there is no way to know for sure that there >> won't be 'old' logs. >> > > :) well i can be pretty sure a few hours (or at least a few days) past the date! that depends on your environment. if you use a (relativly) reliable delivery mechanism and allow systems to queue their messages, a machine/interface being down for several days could break your expectations. >>> "$HUPIsRestart" looks like the solution - but unfortunately, it only >>> works for v4... I would prefer not to do a "hard restart", cos I am >>> under the impression that I will lose some log entries this way... >> >> prior to V4 HUPIsRestart is always on, but it will cause you to loose any >> logs that are in the queue at the time of the HUP. on these versions you >> do not have any good option. > > right. Thanks for the confirmation. > > >> if you really believe that you will no longer >> get any old messages to go into the files, you can go ahead and issue a >> mv, gzip, etc to deal with the old logs, but the disk space will not be >> freed until rsyslog restarts. >> > > yeah, that's the thing. It's the disk space that's still hung on to > that bothers me. Else I would just let rsyslog hang on to those files. > > >> if HUPIsRestart is off (on V4), does that release the files appropriately? >> > > I dont have v4, so cant say. Is v4 ready for production? that depends on your criteria. it's the 'beta' branch of rsyslog now, but it includes a lot of features that can significantly reduce known sources of data loss (like the HUP restart issue) David Lang From jfs.world at gmail.com Mon Apr 27 08:17:44 2009 From: jfs.world at gmail.com (Jeffrey 'jf' Lim) Date: Mon, 27 Apr 2009 14:17:44 +0800 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> Message-ID: <4b3125cc0904262317u4b93469ahbcee44c2b9aa92f6@mail.gmail.com> On Mon, Apr 27, 2009 at 1:53 PM, wrote: > On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > >> On Mon, Apr 27, 2009 at 1:00 PM, ? wrote: >>> >>> remember that logs can arrive out of order (especially with relays that >>> could queue messages), so there is no way to know for sure that there >>> won't be 'old' logs. >>> >> >> :) well i can be pretty sure a few hours (or at least a few days) past the date! > > that depends on your environment. if you use a (relativly) reliable > delivery mechanism and allow systems to queue their messages, a > machine/interface being down for several days could break your > expectations. > Dunno. I've got relp set up for transfer - but apparently I discovered that relp doesnt take care of a "disk full" situation on the receiver end? I would have expected my old entries to come in once I had cleared the disk space, but no... I'm not complaining btw - just remarking that this was an unexpected behaviour for me. >> that bothers me. Else I would just let rsyslog hang on to those files. >> >> >>> if HUPIsRestart is off (on V4), does that release the files appropriately? >>> >> >> I dont have v4, so cant say. Is v4 ready for production? > > that depends on your criteria. > > it's the 'beta' branch of rsyslog now, but it includes a lot of features > that can significantly reduce known sources of data loss (like the HUP > restart issue) > right. Do you know whether it's possible to have relp transfer between a v3 and a v4? Would that work? any problems? -jf -- In the meantime, here is your PSA: "It's so hard to write a graphics driver that open-sourcing it would not help." -- Andrew Fear, Software Product Manager, NVIDIA Corporation http://kerneltrap.org/node/7228 From rgerhards at hq.adiscon.com Mon Apr 27 08:32:57 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 27 Apr 2009 08:32:57 +0200 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com><4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> <4b3125cc0904262317u4b93469ahbcee44c2b9aa92f6@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF98@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Jeffrey 'jf' Lim > Sent: Monday, April 27, 2009 8:18 AM > To: rsyslog-users > Subject: Re: [rsyslog] rsyslog and holding on to dyna (daily) files > > On Mon, Apr 27, 2009 at 1:53 PM, wrote: > > On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > > > >> On Mon, Apr 27, 2009 at 1:00 PM, ? wrote: > >>> > >>> remember that logs can arrive out of order (especially with relays > that > >>> could queue messages), so there is no way to know for sure that > there > >>> won't be 'old' logs. > >>> > >> > >> :) well i can be pretty sure a few hours (or at least a few days) > past the date! > > > > that depends on your environment. if you use a (relativly) reliable > > delivery mechanism and allow systems to queue their messages, a > > machine/interface being down for several days could break your > > expectations. > > > > Dunno. I've got relp set up for transfer - but apparently I discovered > that relp doesnt take care of a "disk full" situation on the receiver > end? > I would have expected my old entries to come in once I had > cleared the disk space, but no... I'm not complaining btw - just > remarking that this was an unexpected behaviour for me. That has nothing to do with RELP. The issue here is that the file output writer (in v3) uses the sysklogd concept of "if I can't write it, I'll throw it away". This is another issue that was "fixed" in v4 (not really a fix, but a conceptual change). If RELP gets an ack from the receiver, the message is delivered from the RELP POV. The receiving end acks, so everything is done for RELP. Some thing if you queue at the receiver and for some reason lose the queue. RELP is reliable transport, but not more than that. However, if you need reliable end-to-end, you can do that by running the receiver totally synchronous, that is all queues (including the main message queue!) in direct mode. You'll have awful performance and will lose messages if you use anything other than RELP for message reception (well, plain tcp works mostly correct, too), but you'll have synchronous end-to-end. Usually, reliable queuing is sufficient, but then the sender does NOT know when the message was actually processed (just that the received enqueued it, think about the difference!). > > >> that bothers me. Else I would just let rsyslog hang on to those > files. > >> > >> > >>> if HUPIsRestart is off (on V4), does that release the files > appropriately? > >>> > >> > >> I dont have v4, so cant say. Is v4 ready for production? > > > > that depends on your criteria. > > > > it's the 'beta' branch of rsyslog now, but it includes a lot of > features > > that can significantly reduce known sources of data loss (like the > HUP > > restart issue) > > > > right. Do you know whether it's possible to have relp transfer between > a v3 and a v4? Would that work? any problems? Not tested, but works. It should not matter at all what version sent a messages. If the receiver - any version - supports a protocol, it must correctly process it (except, of course, if there is a bug). To the root question: The issue with the dynafile cache is that rsyslog does not yet have a timeout mechanism. If you look at the source, you'll see comments that recommend one. Unfortunately, I did not yet manage to go for that. One approach (which I could implement and have nearly done a few times when something else distracted me) is the ability to configure the dynafile cache size. Let's say you know you have 5 hosts and write daily files, so you then could set the cache size to 5 what means that the "older" files will be closed as soon as a message from each host is received the next days. Not a real timeout, but probably a useful work-around. Note, however, that v3 is "dead" when it comes to new features. New features always only go into the current devel, otherwise we risk to destabilize a stable version. So v3 is well alive in regard to bug fixes, but anything new requires you to either backport the patch (we can do that on a paid work basis) or upgrade to the current devel. I know this is a harsh measure, but without it, it would not make sense to declare a version as stable. I invite you to read this blog post for the philosophy behind it: http://blog.gerhards.net/2009/03/how-software-gets-stable.html HTH Rainer > > -jf > > -- > In the meantime, here is your PSA: > "It's so hard to write a graphics driver that open-sourcing it would > not help." > -- Andrew Fear, Software Product Manager, NVIDIA Corporation > http://kerneltrap.org/node/7228 > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From rgerhards at hq.adiscon.com Mon Apr 27 08:34:53 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 27 Apr 2009 08:34:53 +0200 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF99@GRFEXC.intern.adiscon.com> > if HUPIsRestart is off (on V4), does that release the files > appropriately? Definitely - that's almost the only action that HUP does ;) Rainer From jfs.world at gmail.com Mon Apr 27 08:44:39 2009 From: jfs.world at gmail.com (Jeffrey 'jf' Lim) Date: Mon, 27 Apr 2009 14:44:39 +0800 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA702AF98@GRFEXC.intern.adiscon.com> References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> <4b3125cc0904262317u4b93469ahbcee44c2b9aa92f6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF98@GRFEXC.intern.adiscon.com> Message-ID: <4b3125cc0904262344t5363cebdmb87a0d04b1c72a07@mail.gmail.com> On Mon, Apr 27, 2009 at 2:32 PM, Rainer Gerhards wrote: >> Jeffrey 'jf' Lim wrote: >> >> Dunno. I've got relp set up for transfer - but apparently I discovered >> that relp doesnt take care of a "disk full" situation on the receiver >> end? >> I would have expected my old entries to come in once I had >> cleared the disk space, but no... I'm not complaining btw - just >> remarking that this was an unexpected behaviour for me. > > That has nothing to do with RELP. The issue here is that the file output > writer (in v3) uses the sysklogd concept of "if I can't write it, I'll throw > it away". This is another issue that was "fixed" in v4 (not really a fix, but > a conceptual change). > that's good to know. So how does it deal with this situation now? > If RELP gets an ack from the receiver, the message is delivered from the RELP > POV. The receiving end acks, so everything is done for RELP. Some thing if > you queue at the receiver and for some reason lose the queue. > > RELP is reliable transport, but not more than that. However, if you need > reliable end-to-end, you can do that by running the receiver totally > synchronous, that is all queues (including the main message queue!) in direct > mode. You'll have awful performance and will lose messages if you use > anything other than RELP for message reception (well, plain tcp works mostly > correct, too), but you'll have synchronous end-to-end. Usually, reliable > queuing is sufficient, but then the sender does NOT know when the message was > actually processed (just that the received enqueued it, think about the > difference!). > right!!! you just opened my eyes ("think about the difference"). Thanks. >> >> right. Do you know whether it's possible to have relp transfer between >> a v3 and a v4? Would that work? any problems? > > Not tested, but works. It should not matter at all what version sent a > messages. If the receiver - any version - supports a protocol, it must > correctly process it (except, of course, if there is a bug). > yeah, I was thinking that. But nice to get a confirmation from you too. > To the root question: > > The issue with the dynafile cache is that rsyslog does not yet have a timeout > mechanism. If you look at the source, you'll see comments that recommend one. > Unfortunately, I did not yet manage to go for that. > > One approach (which I could implement and have nearly done a few times when > something else distracted me) is the ability to configure the dynafile cache > size. Let's say you know you have 5 hosts and write daily files, so you then > could set the cache size to 5 what means that the "older" files will be > closed as soon as a message from each host is received the next days. Not a > real timeout, but probably a useful work-around. > the timeout is preferable (otherwise you'd have to keep a proper count (?) of the number of dynafiles that you need). But sure, as a workaround, it definitely is useful. > Note, however, that v3 is "dead" when it comes to new features. New features > always only go into the current devel, otherwise we risk to destabilize a > stable version. So v3 is well alive in regard to bug fixes, but anything new > requires you to either backport the patch (we can do that on a paid work > basis) or upgrade to the current devel. I know this is a harsh measure, but > without it, it would not make sense to declare a version as stable. I invite > you to read this blog post for the philosophy behind it: > > http://blog.gerhards.net/2009/03/how-software-gets-stable.html > heh, I just happened to read it today before posting actually.. :) But it's good to hear this summary again. thanks, -jf -- In the meantime, here is your PSA: "It's so hard to write a graphics driver that open-sourcing it would not help." -- Andrew Fear, Software Product Manager, NVIDIA Corporation http://kerneltrap.org/node/7228 From rgerhards at hq.adiscon.com Mon Apr 27 08:49:14 2009 From: rgerhards at hq.adiscon.com (Rainer Gerhards) Date: Mon, 27 Apr 2009 08:49:14 +0200 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com><4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com><4b3125cc0904262317u4b93469ahbcee44c2b9aa92f6@mail.gmail.com><9B6E2A8877C38245BFB15CC491A11DA702AF98@GRFEXC.intern.adiscon.com> <4b3125cc0904262344t5363cebdmb87a0d04b1c72a07@mail.gmail.com> Message-ID: <9B6E2A8877C38245BFB15CC491A11DA702AF9A@GRFEXC.intern.adiscon.com> > -----Original Message----- > From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog- > bounces at lists.adiscon.com] On Behalf Of Jeffrey 'jf' Lim > Sent: Monday, April 27, 2009 8:45 AM > To: rsyslog-users > Subject: Re: [rsyslog] rsyslog and holding on to dyna (daily) files > > On Mon, Apr 27, 2009 at 2:32 PM, Rainer Gerhards > wrote: > >> Jeffrey 'jf' Lim wrote: > >> > >> Dunno. I've got relp set up for transfer - but apparently I > discovered > >> that relp doesnt take care of a "disk full" situation on the > receiver > >> end? > >> I would have expected my old entries to come in once I had > >> cleared the disk space, but no... I'm not complaining btw - just > >> remarking that this was an unexpected behaviour for me. > > > > That has nothing to do with RELP. The issue here is that the file > output > > writer (in v3) uses the sysklogd concept of "if I can't write it, > I'll throw > > it away". This is another issue that was "fixed" in v4 (not really a > fix, but > > a conceptual change). > > > > that's good to know. So how does it deal with this situation now? I v3 ... that's the way it is... > > > If RELP gets an ack from the receiver, the message is delivered from > the RELP > > POV. The receiving end acks, so everything is done for RELP. Some > thing if > > you queue at the receiver and for some reason lose the queue. > > > > RELP is reliable transport, but not more than that. However, if you > need > > reliable end-to-end, you can do that by running the receiver totally > > synchronous, that is all queues (including the main message queue!) > in direct > > mode. You'll have awful performance and will lose messages if you use > > anything other than RELP for message reception (well, plain tcp works > mostly > > correct, too), but you'll have synchronous end-to-end. Usually, > reliable > > queuing is sufficient, but then the sender does NOT know when the > message was > > actually processed (just that the received enqueued it, think about > the > > difference!). > > > > right!!! you just opened my eyes ("think about the difference"). > Thanks. > > > >> > >> right. Do you know whether it's possible to have relp transfer > between > >> a v3 and a v4? Would that work? any problems? > > > > Not tested, but works. It should not matter at all what version sent > a > > messages. If the receiver - any version - supports a protocol, it > must > > correctly process it (except, of course, if there is a bug). > > > > yeah, I was thinking that. But nice to get a confirmation from you too. > > > > To the root question: > > > > The issue with the dynafile cache is that rsyslog does not yet have a > timeout > > mechanism. If you look at the source, you'll see comments that > recommend one. > > Unfortunately, I did not yet manage to go for that. > > > > One approach (which I could implement and have nearly done a few > times when > > something else distracted me) is the ability to configure the > dynafile cache > > size. Let's say you know you have 5 hosts and write daily files, so > you then > > could set the cache size to 5 what means that the "older" files will > be > > closed as soon as a message from each host is received the next days. > Not a > > real timeout, but probably a useful work-around. > > > > the timeout is preferable (otherwise you'd have to keep a proper count > (?) of the number of dynafiles that you need). But sure, as a > workaround, it definitely is useful. > I agree, but someone must implement it. If I look at my current activity schedule, I'd say in 2 month at earliest ;) If we do timeouts, we should do them "right", that is provide core functionality for them. Quite some work, but with benefit for many more than just dynafiles. Rainer > > > Note, however, that v3 is "dead" when it comes to new features. New > features > > always only go into the current devel, otherwise we risk to > destabilize a > > stable version. So v3 is well alive in regard to bug fixes, but > anything new > > requires you to either backport the patch (we can do that on a paid > work > > basis) or upgrade to the current devel. I know this is a harsh > measure, but > > without it, it would not make sense to declare a version as stable. I > invite > > you to read this blog post for the philosophy behind it: > > > > http://blog.gerhards.net/2009/03/how-software-gets-stable.html > > > > heh, I just happened to read it today before posting actually.. :) But > it's good to hear this summary again. > > thanks, > -jf > > -- > In the meantime, here is your PSA: > "It's so hard to write a graphics driver that open-sourcing it would > not help." > -- Andrew Fear, Software Product Manager, NVIDIA Corporation > http://kerneltrap.org/node/7228 > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com From david at lang.hm Mon Apr 27 09:23:01 2009 From: david at lang.hm (david at lang.hm) Date: Mon, 27 Apr 2009 00:23:01 -0700 (PDT) Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: <4b3125cc0904262344t5363cebdmb87a0d04b1c72a07@mail.gmail.com> References: <4b3125cc0904262147o5d5c1c77oba49e6e471755328@mail.gmail.com> <4b3125cc0904262206y2cf3ab6g52dbcd80296e1b00@mail.gmail.com> <4b3125cc0904262317u4b93469ahbcee44c2b9aa92f6@mail.gmail.com> <9B6E2A8877C38245BFB15CC491A11DA702AF98@GRFEXC.intern.adiscon.com> <4b3125cc0904262344t5363cebdmb87a0d04b1c72a07@mail.gmail.com> Message-ID: On Mon, 27 Apr 2009, Jeffrey 'jf' Lim wrote: > On Mon, Apr 27, 2009 at 2:32 PM, Rainer Gerhards > wrote: >>> Jeffrey 'jf' Lim wrote: >>> >>> Dunno. I've got relp set up for transfer - but apparently I discovered >>> that relp doesnt take care of a "disk full" situation on the receiver >>> end? >>> I would have expected my old entries to come in once I had >>> cleared the disk space, but no... I'm not complaining btw - just >>> remarking that this was an unexpected behaviour for me. >> >> That has nothing to do with RELP. The issue here is that the file output >> writer (in v3) uses the sysklogd concept of "if I can't write it, I'll throw >> it away". This is another issue that was "fixed" in v4 (not really a fix, but >> a conceptual change). >> > > that's good to know. So how does it deal with this situation now? in v3 it doesn't > >> If RELP gets an ack from the receiver, the message is delivered from the RELP >> POV. The receiving end acks, so everything is done for RELP. Some thing if >> you queue at the receiver and for some reason lose the queue. >> >> RELP is reliable transport, but not more than that. However, if you need >> reliable end-to-end, you can do that by running the receiver totally >> synchronous, that is all queues (including the main message queue!) in direct >> mode. You'll have awful performance and will lose messages if you use >> anything other than RELP for message reception (well, plain tcp works mostly >> correct, too), but you'll have synchronous end-to-end. Usually, reliable >> queuing is sufficient, but then the sender does NOT know when the message was >> actually processed (just that the received enqueued it, think about the >> difference!). >> > > right!!! you just opened my eyes ("think about the difference"). Thanks. note that if you can setup a fast enough storage system the fully syncronous mode may end up being acceptable (and get better as the multi-message processing gets implemented) there is a ram based battery backed drive available (see http://ezcopysmart.com/ans-9010.html ) that can give you _very_ high performance for writes, in addition there are some high-end flash drives (look at fusion io for example) that claim to be able to do >100,000 IO operations per second. it all depends on how much you need the speed and what you are willing to do to get it. David Lang From aoz.syn at gmail.com Mon Apr 27 09:24:44 2009 From: aoz.syn at gmail.com (RB) Date: Mon, 27 Apr 2009 01:24:44 -0600 Subject: [rsyslog] rsyslog and holding on to dyna (daily) files In-Reply-To: <9B6E2A8877C38245BFB15