[rsyslog] ultra-reliable speed test
david at lang.hm
david at lang.hm
Fri May 8 19:37:41 CEST 2009
On Fri, 8 May 2009, Rainer Gerhards wrote:
>> also note that these tests are being done on the version _without_ batch
>> processing. I need to think about it a bit more to be sure there aren't
>> any holes in my thinking, but I believe that you would only need to do one
>> set of fsyncs per batch that's processed. so setting a batch size of 100
>> should increase the messages/sec by a similar factor.
>
> I hadn't thought about this, but now that you say it, I agree. Actually,
> an fsync per queue lock release would probably be the rigth criterion. I
> think that is almost equivalent to what you said, but the advantage of
> that definition is that I can simply watch out for these *already
> existing* places as a guideline. That can indeed make a considerable
> difference.
exactly.
>> this is only on the output side for now, but if this proves to be
>> interesting, some inputs could batch as well (from your comments it sounds
>> as if relp can send a batch of messages and then get acknowledgement of
>> all of them at once, if so, that could serve as the input)
>
> That's a sliding window, but this is something that really does not
> belong into the app layer (and is not visible their). It is the same
> thing as the tcp sliding window, which you know to exist but do not know
> any specifics of.
>
> Even if we would make the relp sliding window visible to the app layer,
> it wouldn't provide much benefit. The only I can think of is lock
> contention but with the queue workers acquiring the lock now only once
> per batch, the probability is greatly reduced.
doing a fsync once per batch would also be a considerable savings
(assuming the basic rate is high enough to be meaningful)
this would mean a change to the relp definition, it would need to have
each side pass the other it's 'max batch size' when the systems connect
for the first time (defaulting to 1 if the other side doesn't say
anything)
>>> One thing we need to think about is burst traffic rate, especially with
>>> UDP. I tend to think that such a system must be able to support UDP
>>> traffic, too (what is a questionable opinion) and, if so, we must not
>>> only look at the sustained but even more at the burst rate.
>>
>> yes and no. while I see the need to support UDP, it's not going to be
>> reliable (the Os bufferes them before they get to the system, ignoring the
>> network ability to drop them), and if you really need high UDP burst rates
>> you could run two copies of rsyslog, one ultra-reliable (with reliable
>> inputs), and a second one with a memory queue, feeding into the
>> ultra-reliable one with a batched input method.
>
> ack - as I said, the opinion is questionable... But what if you have
> important devices that simply do not speak anything else but UDP (they
> still seem to exist...).
>
> However, think of it that way:
>
> You limit the max burst rate by using an ultra-reliable queue. You do
> so, because you do not want to lose messages when a sudden power failure
> occurs. To support that configuration, you need to run the second
> instance. It queues in memory until the (slower) reliable rsyslogd can
> now accept the message and put it into the reliable queue. Let's say
> that you have a burst of r messages and that from these burst only r/2
> can be enqueued (because the ultra reliable queue is so slow). So you
> lose r/2 messages.
>
> Now consider the case that you run rsyslog with just a reliable queue,
> one that is kept in memory but not able to cover the power failure
> scenario. Obviously, all messages in that queue are lost when power
> fails (or almost all to be precise). However, that system has a much
> broader bandwidth. So with it, there would never have been r messages
> inside the queue, because that system has a much higher sustained
> message rate (and thus the burst causes much less of trouble). Let's say
> the system is just twice as fast in this setup (I guess it usually would
> be *much* faster). Than, it would be able to process all r records.
>
> In that scenario, the ultra-reliable system loses r/2 messages, whereas
> the somewhat more "unreliable" system loses none - by virtue of being
> able to process messages as they arrive.
>
>
> Now extend that picture to messages residing inside the OS buffers or
> even those that are still queued in their sources because a stream
> transport blocked sending them.
>
> I know that each detail of this picture can be argued at length about.
>
> However, my opinion is that there is no "ultra-reliable" system in life,
> only various probabilities in losing messages. These probabilities
> often depend on each other, what makes calculating them very hard to
> impossible. Still, the probability of message loss in the system at
> large is just the product of the probabilities in each of its
> components. And reliability is just the inverse of that probability.
>
> This is where *I* conclude that it can make sense to permit a system to
> lose some messages under certain circumstances, if that influences the
> overall probability calculation towards the desired end result. In that
> sense, I tend to think that a fast, memory-queuing rsyslogd instance can
> be much more reliable compared to one that is configured as being
> ultra-reliable, where the rest of the system at large is badly
> influenced by this (the scenario above).
>
> However, I also know that for regulatory requirements, you often seem to
> need to prove that a system may not lose messages once it has received
> them, even at the cost of an overall increased probability of message
> loss.
it's a bit more than that.
In my case I have two completely different use-cases, and will almost
certinly end up running two different sets of rsyslog (potentially on
different sets of servers)
case #1
'normal system syslogs'
99.9% reliability (easy to achieve with UDP) is easily good enough.
the sender is normal software that knows nothing about rsyslog
high volume, mostly junk
'logs of record'
the application is modified to do application level acknowledgements
(relp or similar), and the system must be architected to not loose logs
once they are acknowledged short of a disaster that physically destroys
equipment (storage drives must be redundant so that a drive failure does
not loose logs)
low volume, every log is critical.
> My view of reliability is much the same as my view of security: there is
> no such thing as "being totally secure", you can just reduce the
> probability that something bad happens. The worst thing in security is
> someone who thinks he is "totally secure" and as such is no longer
> actively looking at potential issues.
>
> The same I see for reliability. There is no thing like "being totally
> reliable" and it is a really bad idea to think you could ever be.
> Knowing this, one may begin to think about how to decrease the overall
> probability of message loss AND think about what rate is acceptable (and
> what to do with these cases, e.g. "how can they hurt").
>
> ... but ... enough of philosophy, I am not sure if it helps this
> discussion ;) (but I thought it is useful to "see" what I have on my
> mind when talking about these things).
and like security, different solutions are appropriate for different
situations. there are some types of data and environments where you put
lots of protections in place, even if they slow work down, but in other
situations that level of protection would not benifit anyone.
>>> As I side-note, you will probably see that the disk queue can be
>>> optimized. If sufficient effort is made, I think it can perform at least
>>> perform faster at a factor of four to six. The reason is that it was
>>> never really meant to be used on a busy box in this way. While knowing
>>> this, we should not start a new discussion about these optimizations,
>>> simply because they take considerable additional time and we can not fit
>>> that part into anything we have on our mind for the forseable future.
>>
>> yeah, I've been thinking of various things that could be done here, but I
>> won't ask about any of them for now ;-)
>
> Oh yes, a broad range. Simple things like zipping the data and keeping
> all handles always open to complex things like a dedicated,
> random-accesss, database-like disk queue store (being even
> preformatted). If you look at the code, you'll possibly notice that the
> disk queue system uses stream drivers to persist the data. This would be
> the hook to extend.
yep, you can also do tricks like allocating 4k for each message, no matter
what it's size, to avoid the need to maintain a seperate 'table of
contents' that you have to look at and modify when processing a message.
> ... but: that's a story for another quarter ;)
yep
David Lang
More information about the rsyslog
mailing list