[rsyslog] untra-reliable speed test
Rainer Gerhards
rgerhards at hq.adiscon.com
Wed May 13 10:56:41 CEST 2009
On Tue, 2009-05-12 at 14:05 -0700, david at lang.hm wrote:
> On Tue, 12 May 2009, Rainer Gerhards wrote:
>
> > Interesting, will look at details tomorrow...
>
> the strace results show a lot of lock contention
I've now taken a closer look and everything looks pretty much like what
I expected. The lock contention in this situation is not a bad sign:
they show that the locks are actually utilized to synchronize producer
and consumer.
Think about this (simplified) scenario:
input -> queue -> output
Let's say the processing time is the cost we incur. If we look at it,
the queue's cost dominates by far the combined cost of input and output.
In most cases, it dominates input+output cost so much, that you can
express the total cost as just the cost of the queue operation, without
looking at anything else.
So the input needs to wait until the queue is ready to accept a new
message. Once it has done so, the output is notified and immediately
acquires the queue lock and begins the dequeue operation. At the same
time, the input has already finished input processing (as I said, this
happens in virtually "no time" compared to the queue operation). So it
needs to wait for the queue lock. Once the dequeue operation is
finished, the output releases the lock, and processes the message in
virtually no time, too. The input acquired the queue lock, and the whole
story begins right from the start.
A small queue may build up depending on the OS scheduler, but I think
most often, input and output will just wait for the queue to complete.
In that sense, this mode is similar to DIRECT mode, except that a queue
can build up when the action needs to be retried.
>
> > With disk queues, there is always only a single queue worker (the disk queue is purely sequential).
>
> interesting, the OS can do enough in parallel that it may be worth looking
> into this if we ever go in the direction of optimizing this mode.
>
If we optimize it, the best thing to do is a totally new queue storage
driver for such cases. Sequential files do not really work well if we
have multiple producers running.
This is a major effort and even then we need to think about the
implications I raised in regard to processing cost above.
I have thought a bit more about the situation. First of all, rsyslog was
never designed for this use case (preserve every message EVEN in case of
sudden power fail). When I introduced purely disk-based queues, this was
done to support disk-assisted mode. I needed a queue type to permit me
store things on disk, if we run out of memory. As a "side-effect", a
pure disk mode was available also (I'd never implemented it for the sake
of itself). As it was there, I decided to expose this mode and made it
user-configurable. I thought (probably correct) that it could solve some
need - a need that I'd consider "very exotic" (think about the reliance
on a audit-grade protocol for this to really make sense). And I added
the checkpoint capability because it seemed useful, even with disk-based
queues, which could be guarded from total loss of messages by using a
reasonable checkpoint interval. Again, a checkpoint interval of one is
permitted just because this capability came "for free" and could be
handy in some use cases.
The kiosk example we discussed last year (?) on the mailing list looked
like a good match for such an exotic environment. Sudden power loss was
an option, and we had low traffic volume. Bingo, perfect match.
However, I'd never thought about a reasonable high-volume system using
disk-only queues. Think about the cost functions, such a system boils
down to a DIRECT mode queue which just takes an exceptional lot of time
for processing messages.
So probably the best approach for this situation would be to run the
queue actually in direct mode. That removes the overwhelming cost of
queue operations. Direct mode also ensures that the input receives an
ack from the output [but there may be subtle issues which I need to
check to make sure this is always the case, so do not take this for
granted - but if it is not yet so, this should not be too complex to
change]. With this approach, we have two issues left:
a) the output action may be so slow, that it actually is the dominating
cost factor and not disk queue operation
b) the output action may block for an extended period of time (e.g.
during a retry)
In case a), a disk-queue makes sense, because it's cost is irrelevant in
this scenario. Indeed, it is irrelevant under all circumstances. As
such, we can configure a disk-only action queue in that case. Note that
this implies a *very* slow output.
Case b) is more complicated. We do NOT have any proper way to address it
with current code. The solution IMHO is to introduce a new queue mode
"Disk Queue on Delay" which starts an ultra-reliable disk queue
(preferably with a faster queue store driver) if and only if the action
indicates that it will need extended processing time. This requires some
changes to action processing, but the action state machine should be
capable to handle that with relatively slight modification [again, an
educated guess, not a guarantee]).
In that scenario, we run the action immediately whenever possible. Only
if that take the (considerable) extra effort of buffering messages into
a much-slower on disk queue. Note that such a mode makes only sense with
audit-grade protocols and senders (which hold processing until the ACK
has been received). As such, a busy system automatically slows down to
the rate that the queue writer can handle. In this sense, the overall
system (e.g. a financial trading system!) may be slowed down by the
unavailability of a failing output (which in turn causes the extra and
very high cost of disk queue operations). It needs to be considered if
that is an acceptable price.
The faster an ultra-reliable queue disk store driver performs, the more
cases we can handle in the spirit of a) above. In theory, this can lead
to elimination of b) cases.
Nevertheless, I hope I have shown that re-designing the queue (drivers)
to support high throughput AND ultra-reliable operations AT THE SAME
TIME is far from being a trivial task. To do it right, it involves some
other changes too.
I'll have that rough picture on my mind when I work on the queue.
I hope this clarifies.
Rainer
> this was all with rsyslog 4.1.7
>
> David Lang
>
> > rainer
> >
> > ----- Urspr?ngliche Nachricht -----
> > Von: "david at lang.hm" <david at lang.hm>
> > An: "rsyslog-users" <rsyslog at lists.adiscon.com>
> > Gesendet: 12.05.09 22:23
> > Betreff: Re: [rsyslog] untra-reliable speed test
> >
> > I've completed my first round of testing
> >
> > this is a a fusionio SSD card with a 8-core opteron system, 8G ram
> >
> > running debian Lenny (debian 5) 2.6.26 kernel
> >
> > rsyslog.conf
> >
> > $ModLoad imuxsock # provides support for local system logging
> > $ModLoad imklog # provides kernel logging support (previously done by rklogd)
> > $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
> > $WorkDirectory /logs
> > $HUPisRestart off
> > $MainMsgQueueCheckpointInterval 1
> > $MainMsgQueueFilename mainq
> > $MainMsgQueueType disk
> > $OptimizeForUniprocessor off
> >
> > #$ActionfileEnableSync on
> > #$ActionQueueCheckpointInterval 1
> > #$ActionQueueFileName queue1
> > #$ActionQueueType disk
> > *.* /logs/messages;RSYSLOG_TraditionalFileFormat
> >
> >
> > input provided by cat largefile | logger
> >
> >
> > I did tests with and without the action queue stuff enabled
> >
> > the results were not quite what I expected, but interesting
> >
> > xfs w/ actionqueue 1200/sec
> > xfs 2000/sec
> > ext3 w/actionqueue 2000/sec
> > ext3 4600/sec
> > ext4 w/actionqueue 2000/sec
> > ext4 4000/sec
> > ext2 w/actionqueue 5300/sec
> > ext2 7400/sec
> >
> > note that with ext2 I don't think the input could keep up (there were not
> > multiple queue files the way there were for all the others), when I
> > shifted to infile as input the ext2 rate increased to ~7800/sec, and the
> > cpu utilization dropped by 50-70%
> >
> > I have not yet tried anything with multiple worker threads.
> >
> > I captured some strace files and have posted them at
> > http://rsyslog.lang.hm/rsyslog
> >
> > David Lang
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
More information about the rsyslog
mailing list