[rsyslog] Development of failsafe disk based queue

Rainer Gerhards rgerhards at hq.adiscon.com
Wed Oct 1 14:57:19 CEST 2008


David,

going back to the higher layer: do you say that immediate power failure
is a case that you consider needed to be addressed in an enterprise
logging system?

Anybody else with an opinion?
Rainer

On Wed, 2008-10-01 at 05:39 -0700, david at lang.hm wrote:
> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
> 
> > On Wed, 2008-10-01 at 05:25 -0700, david at lang.hm wrote:
> >> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
> >>
> >>> David,
> >>>
> >>> the file syncing mentioned in the compatibility doc applies to the
> >>> output action, only.
> >>
> >> ouch.
> >>
> >>> The queue does never do synchronous writes - I always assumed that a
> >>> critical system would have a UPS and could never think (so far) about a
> >>> valid reason for not having it. So the queue would need to have an extra
> >>> option to do sync writes. Obviously, that's not a big deal.
> >>
> >> good
> >>
> >>> Performance, of course, will be extremely terrible with such a setup...
> >>
> >> only if you have to wait for a spinning disk to do the write.
> >
> > I agree to the rest of your argument below. But the question raised here
> > was in regard to a system without any battery backup. So I would need to
> > wait.
> 
> no UPS is not nessasarily the same as no battey backup.
> 
> you could use a compact flash drive and probably get better 
> performance/reliability than spinning disks with no battery at all.
> 
> > Even then, in the worst case, I think it would be possible that the disk
> > does only a partial write. I am not sure if that's really the case with
> > today's disk drives (which I think have capacitors to prevent this
> > scenario), but with past drives this could happen (I know all too well -
> > a few years ago that cost me a weekend ;)).
> 
> current disks do not have capacitors to prevent partial writes or to flush 
> their caches. but options like the linux ext3 data-journaled make it so 
> that you have your data in the journal safely, and the various solid-state 
> options solve that problem.
> 
> David Lang
> 
> > Rainer
> >
> >>
> >> this is the same problem that databases have. they need to guarentee that
> >> once the database tells the writing program that the data is written it
> >> will be there even if the system looses power immediatly.
> >>
> >> if you run a database on standard desktop hardware (and it doesn't have
> >> this safety disabled) you cannot do more then about 80 writes/second. If
> >> you upgrade to the super speedy 15K rpm drives you can do ~160
> >> writes/second.
> >>
> >> given that you need to write the data + metadata it gets even uglier, so
> >> what the databases do (and some journaling filesystems) is to write a log
> >> that says what they are going to do, sync that, and then later write the
> >> data to the actual files (updating the journal when they complete the
> >> write)
> >>
> >> it sounds like you order your write correctly for a disk-based queue, but
> >> you would need the option of issuing the syncs (probably when you do the
> >> checkpoints)
> >>
> >> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
> >> wrong flash drive), the fact that you need to do four writes per log entry
> >> (data to queue, metadata to queue, data to output, update metadata for
> >> queue) could drop you to below 15 logs/sec (60/4 but then you loose time
> >> to seeking as well)
> >>
> >> however, with the correct drive to write to (say a $2,400 80G fusion-io
> >> flash card that can do ~100k IO ops/sec) you should be able to sustain
> >> 20,000 logs/sec.
> >>
> >> realisticly very few people need the sustained write capacity that you
> >> would get from such a setup. but if you go with a $500-$700 raid card with
> >> a battery-backed cache you get very similar performance, but with some
> >> possibility that you can't sustain it forever.
> >>
> >> David Lang
> >>
> >>> Rainer
> >>>
> >>> On Wed, 2008-10-01 at 04:55 -0700, david at lang.hm wrote:
> >>>> On Wed, 1 Oct 2008, David Ecker wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am looking for a failsafe solution to store syslog messages localy
> >>>>> until they could be send later. I already looked at the disk based
> >>>>> memory queue and the disk based queue. Both queue's don't work if you
> >>>>> just power down the system immediatly actually loosing the whole queue.
> >>>>
> >>>> are you sure about the disk based queue?
> >>>>
> >>>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
> >>>> can be set to do a commit of the metadata after each message.
> >>>>
> >>>> Disk Queues
> >>>>
> >>>> Disk queues use disk drives for buffering. The important fact is that the
> >>>> always use the disk and do not buffer anything in memory. Thus, the queue
> >>>> is ultra-reliable, but by far the slowest mode. For regular use cases,
> >>>> this queue mode is not recommended. It is useful if log data is so
> >>>> important that it must not be lost, even in extreme cases.
> >>>>
> >>>> When a disk queue is written, it is done in chunks. Each chunk receives
> >>>> its individual file. Files are named with a prefix (set via the
> >>>> "$<object>QueueFilename" config directive) and followed by a 7-digit
> >>>> number (starting at one and incremented for each file). Chunks are 10mb by
> >>>> default, a different size can be set via the"$<object>QueueMaxFileSize"
> >>>> config directive. Note that the size limit is not a sharp one: rsyslog
> >>>> always writes one complete queue entry, even if it violates the size
> >>>> limit. So chunks are actually a little but (usually less than 1k) larger
> >>>> then the configured size. Each chunk also has a different size for the
> >>>> same reason. If you observe different chunk sizes, you can relax: this is
> >>>> not a problem.
> >>>>
> >>>> Writing in chunks is used so that processed data can quickly be deleted
> >>>> and is free for other uses - while at the same time keeping no artificial
> >>>> upper limit on disk space used. If a disk quota is set (instructions
> >>>> further below), be sure that the quota/chunk size allows at least two
> >>>> chunks to be written. Rsyslog currently does not check that and will fail
> >>>> miserably if a single chunk is over the quota.
> >>>>
> >>>> Creating new chunks costs performance but provides quicker ability to free
> >>>> disk space. The 10mb default is considered a good compromise between these
> >>>> two. However, it may make sense to adapt these settings to local policies.
> >>>> For example, if a disk queue is written on a dedicated 200gb disk, it may
> >>>> make sense to use a 2gb (or even larger) chunk size.
> >>>>
> >>>> Please note, however, that the disk queue by default does not update its
> >>>> housekeeping structures every time it writes to disk. This is for
> >>>> performance reasons. In the event of failure, data will still be lost
> >>>> (except when manually is mangled with the file structures). However, disk
> >>>> queues can be set to write bookkeeping information on checkpoints (every n
> >>>> records), so that this can be made ultra-reliable, too. If the checkpoint
> >>>> interval is set to one, no data can be lost, but the queue is
> >>>> exceptionally slow.
> >>>>
> >>>> Each queue can be placed on a different disk for best performance and/or
> >>>> isolation. This is currently selected by specifying different
> >>>> $WorkDirectory config directives before the queue creation statement.
> >>>>
> >>>> To create a disk queue, use the "$<object>QueueType Disk" config
> >>>> directive. Checkpoint intervals can be specified via
> >>>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> you also need to specificly enable syncing (from
> >>>> http://www.rsyslog.com/doc-v3compatibility.html )
> >>>>
> >>>> Output File Syncing
> >>>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> >>>> such, it retained stock syslogd's default of syncing every file write if
> >>>> not specified otherwise (by placing a dash in front of the output file
> >>>> name). While this was a useful feature in past days where hardware was
> >>>> much less reliable and UPS seldom, this no longer is useful in today's
> >>>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
> >>>> writes files around 50 *times* slower than without it. It also affects
> >>>> overall system performance due to the high IO activity. In rsyslog v3,
> >>>> syncing has been turned off by default. This is done via a specific
> >>>> configuration directive "$ActionFileEnableSync on/off" which is off by
> >>>> default. So even if rsyslogd finds sync selector lines, it ignores them by
> >>>> default. In order to enable file syncing, the administrator must specify
> >>>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
> >>>> syncing only happens in some installations where the administrator
> >>>> actually wanted that (performance-intense) feature. In the fast majority
> >>>> of cases (if not all), this dramatically increases rsyslogd performance
> >>>> without any negative effects.
> >>>>
> >>>>
> >>>>
> >>>>> I already looked at queue.c and it seemed to me that both queues were
> >>>>> not designed for that kind of failure, but I could be wrong there. Since
> >>>>> an immediate power down of the system is the major failure which will
> >>>>> occure pretty often I need to create a soltution there.
> >>>>
> >>>> with checkpoint interval set to 1 and syncing enabled the data should be
> >>>> in on the disk safely (assuming you have hardware that supports this) and
> >>>> a power-off won't affect it.
> >>>>
> >>>> David Lang
> >>>>
> >>>>
> >>>>
> >>>>> Did you already start to develop something addressing that problem?
> >>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> >>>>> queue myself? I would contribute the code to the rsyslog project if you
> >>>>> would like afterwards.
> >>>>>
> >>>>> bye
> >>>>> David Ecker
> >>>>>
> >>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >>>
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com




More information about the rsyslog mailing list