[rsyslog] untra-reliable speed test
Rainer Gerhards
rgerhards at hq.adiscon.com
Fri May 8 10:34:23 CEST 2009
On Fri, 2009-05-08 at 01:18 -0700, david at lang.hm wrote:
> On Fri, 8 May 2009, Rainer Gerhards wrote:
>
> >> -----Original Message-----
> >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> >> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> >>
> >> I have a box put togeaterh for a first cut at a speed test of rsyslog in
> >> untra-reliable mode. the outline below is intended to minimize the number
> >> of variables.
> >>
> >> the box is a dual quad-core opteron with 8G of ram, one SATA drive and a
> >> fusionIO SSD PCIE drive, currently running RHEL 5.3 kernel 2.6.18-53
> >> (redhat stock kernel) I intend to format the SSD with ext2 (as the
> >> application is providing data integrity, and to avoid the known
> >> performance problems with ext3 and fsync)
> >
> > Just a question, because I do not know enough about ext2: does ext2 guarantee
> > that when an application does fsync, all data, INCLUDING related file system
> > control structures are written to disk? Or, to phrase it the other way
> > around, can ext2 guarantee that fsync'ed data can always be read after a
> > power failure. I think along the lines of some control structures not being
> > written, thus the fsynced app data may be present on the disk, but cannot be
> > accessed any longer. In the worst case, would it be possible that a whole
> > file be lost during a file system check after reboot?
> >
> > My *uneducated* understanding is that ext3 does guard against this (thus the
> > performance problems) but ext2 does not.
>
> the performance problem with ext3 is that it forces ALL pending writes to
> disk when anything does a fsync
>
> now that you mention it, I think that with all filesystems other than ext2
I think you meant ext3 here?
> you need to do a fsync on the directory as well as on the file
>
another uneducated question: does that ensure that all fs control
structures be written? I mean things like the chain that links file
parts together. My understanding is the answer is "yes", but I prefer to
ask as I am not 100% sure.
> > If my understanding would be correct (and I don't say so), we would need to
> > use ext3.
>
> I'll try both (and later on, when I use by own kernel rather than the
> redhat one I'll also test XFS)
>
> I think that if no other disk activity is taking place ext3 maynot be too
> bad (one other advantage that ext2 would have over ext3 and XFS is that
> journaling filesystems have to write whatever they journal twice (once to
> the journal and once to the final location)
ack
>
> >> for the rsyslog test I am thinking the following
> >>
> >> useing rsyslog 4.1.7
> >> enable input file
> >
> > Not sure if I got this bullet point right. Do you mean you intend to use
> > imfile for input generation?
>
> yes, that was my intent. just to simplify things by making the test
> completely self contained to the one box.
there is a kind of interaction between imfile and the queue in that
imfile flags its messages as "delayable", which was introduced to
prevent imfile unnecessarily putting data too fast into the queue. But
on the other hand, this should tune the system to the actual max rate
(at least in theory).
>
> > In any case, I would suggest to do a test with UDP and one with TCP senders,
> > both sending at maximum rate. With UDP, we would see a message loss rate,
> > while with TCP we would see the actual number of messages that the system can
> > process. So TCP is probably the more meaningful number, but packet loss rate
> > for UDP - a common use case - would also be interesting, at least I think so.
>
> will do.
>
> I will be interested in seeing the UDP loss rate, I suspect that with
> appropriate OS tuning I can get it down to zero loss rate at the data
> rates that the rest of the system maintains (the OS has a buffer prior to
> rsyslog's input process that can cover delays on the input threads)
Let's say you find out the max rate R via e.g. TCP, and then use R as an
upper bound of the UDP traffic, that should work. But I would also find
it interesting to see how many messages are dropped if you send at a
rate >> R. I would not be surprised if the resulting commit rate would
be (even far) below R.
>
> >> set the main queue mode to disk
> >> enable fsyncs everwhere
> >
> >
> > Just as a reminder: this includes $MainMsgQueueCheckpointInterval 1 (which is
> > a *real* performance eater and puts a lot of burden on the consistency of the
> > file system's control structures, thus my question on ext2 vs. ext3 above).
>
> does this do a fsync on the directory.
No! But I think it would be easy to add (but easy only in a
non-optimized way, optimization would take more effort).
>
> >> set the output to log *.* to a file
> >>
> >> run a cron job that rolls the log file once a min and sends a HUP to
> >> rsyslog
> >>
> >> create a large file of log information
> >>
> >> run this for a while and then count the number of logs in each rolled log
> >> file. hopefully the number will be reasonably consistant.
> >>
> >> does this sound like a reasonable approach? or is this going to not be
> >> representitive for some reason?
> >
> > With the few comments above, I think this is a very reasonable approach and
> > should provide very good insight.
> >
> > Actually, I hope that it can prove my point that this setup is too slow
> > wrong...
>
> there will definantly be a performance issue at some point here, the
> question is if it's fast enough to be useable.
>
> the drive claims to be able to do >100,000 I/O ops/sec. if we can manage
> to get a few thousand logs/sec written on this, it will be extremely
> usable.
OK, a "few thousand" is not what I have on my mind for a
high-performance system (a "few ten-thousand), but I agree that it can
be considered a busy system. So a "few thousand" (maybe more than
5,000?) should be sufficient to prove the original point - especially as
harware gets faster AND you can use solid state disks or similar
mechanisms (if assuming they qualify for the reliability criteria).
One thing we need to think about is burst traffic rate, especially with
UDP. I tend to think that such a system must be able to support UDP
traffic, too (what is a questionable opinion) and, if so, we must not
only look at the sustained but even more at the burst rate.
As I side-note, you will probably see that the disk queue can be
optimized. If sufficient effort is made, I think it can perform at least
perform faster at a factor of four to six. The reason is that it was
never really meant to be used on a busy box in this way. While knowing
this, we should not start a new discussion about these optimizations,
simply because they take considerable additional time and we can not fit
that part into anything we have on our mind for the forseable future.
Rainer
More information about the rsyslog
mailing list