[rsyslog] untra-reliable speed test

Rainer Gerhards rgerhards at hq.adiscon.com
Thu May 14 10:46:49 CEST 2009


> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of david at lang.hm
> Sent: Wednesday, May 13, 2009 11:20 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] untra-reliable speed test
> On Wed, 13 May 2009, Rainer Gerhards wrote:
> >>> A small queue may build up depending on the OS scheduler,
> >> but I think
> >>> most often, input and output will just wait for the queue
> >> to complete.
> >>> In that sense, this mode is similar to DIRECT mode, except
> >> that a queue
> >>> can build up when the action needs to be retried.
> >>
> >> interestingly, except for ext2 the queue fills up to it's limit.
> >
> > I overlooked that fact this morning. I have to admit that *this* puzzles
me.
> > Some queue buildup makes sense but filling up to the limit... That really
> > sound strange.
> 
> I suspect that new messages arriving (especially when they are arriving at
> high rates) manage to get in and get the lock multiple times while the
> output is formatting, writing, and syncing the message to the output.
> 
> for the non-ext2 cases, the output takes long enough that the queue fills
> up, for ext2 the output is fast enough that this doesn't happen (which
> indicates that increased efficiancy in the input could possibly drive the
> data rates higher)

As far as I can think, this sounds like the only reasonable explanation. 

> 
> >>>>> With disk queues, there is always only a single queue
> >> worker (the disk queue is purely sequential).
> >>>>
> >>>> interesting, the OS can do enough in parallel that it may
> >> be worth looking
> >>>> into this if we ever go in the direction of optimizing this mode.
> >>>
> >>> If we optimize it, the best thing to do is a totally new
> >> queue storage
> >>> driver for such cases. Sequential files do not really work
> >> well if we
> >>> have multiple producers running.
> >>>
> >>> This is a major effort and even then we need to think about the
> >>> implications I raised in regard to processing cost above.
> >>>
> >>
> >> <snip large discussion on optimization and history of disk queue>
> >>
> >> the current speed is high enough to be useful, even without
> >> batching (and
> >> as noted before, batching should improve things significantly)
> >
> > That's good news.
> >
> >> so at the moment I would like to avoid discussion of how to
> >> optimize the
> >> disk queue (as much as I am itching to do so). for stability
> >> reasons I
> >> agree the best approach is probably to create a new queue
> >> type rather than
> >> doing major changes to the existing queue.
> >
> > Just one final word for this quarter: it's not just about stability, a
new
> > driver also can be designed for optimal performance from the ground up.
> >
> >>
> >> instead I would like to see how close we are to having it be
> >> audit grade,
> >> and how difficult it is to close that gap.
> >
> > That's what I am currently looking at (and thus be a bit silent). I need
to
> > review some of the queue code, thus it makes limited sense to do writeup
> now.
> > The queue is rather complex and what I do right now is make some
> > micro-changes while I do the review.
> >
> > The ultra-reliable mode adds complexity to the dequeue operation.
However,
> > the current mode has complexity during the queue shutdown, especially
when
> it
> > needs to cancel worker threads after their timeout expired. The queue
> > shutdown needs to be modified for batch operation, especially the areas
that
> > are most complex. So my idea is to do both together. That takes a bit
longer
> > (mostly thinking), but I can envision that there is a good chance the
> > resulting code will be less complex. In the long term, that could also
> result
> > in easier maintenance.
> >
> > This is also the reason why I think about the queue storage drivers: even
by
> > thinking on a non-detail level, I hope to be able to see the hooks that
we
> > need so that I can influence design decision into a direction that
> > facilitates the later addition (at least if the choices are otherwise
more
> or
> > less equal). If things continue to work out like they did today, I hope
to
> > have workable experimental code (and sufficient enough understanding) by
> > early next week. It depends a bit if I see code inside the queue that -
> after
> > 15 month of practical experience with it - shold be
> > optimized/shuffled/whatever. I think this is a opportunity to apply
> knowledge
> > gained.
> 
> sounds good.
> 
> at some point in the future I would like to have a brin-storming session
> on what could be done for a more efficiant disk queue.

I agree on that, but would like to postpone it until we have gone further
with the task that we are currently doing. While I believe in thinking a bit
ahead, I fear that my thoughts otherwise get a bit too much distracted from
the complex batching and audit-grade queue case. I have to admit I am not
sure I can successfully handle all three issues at the same time...

Rainer



More information about the rsyslog mailing list