[rsyslog] Development of failsafe disk based queue

David Ecker david at ecker-software.de
Wed Oct 1 14:45:06 CEST 2008


Rainer Gerhards schrieb:
> On Wed, 2008-10-01 at 05:25 -0700, david at lang.hm wrote:
>   
>> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>>
>>     
>>> David,
>>>
>>> the file syncing mentioned in the compatibility doc applies to the
>>> output action, only.
>>>       
>> ouch.
>>
>>     
>>> The queue does never do synchronous writes - I always assumed that a
>>> critical system would have a UPS and could never think (so far) about a
>>> valid reason for not having it. So the queue would need to have an extra
>>> option to do sync writes. Obviously, that's not a big deal.
>>>       
>> good
>>
>>     
>>> Performance, of course, will be extremely terrible with such a setup...
>>>       
>> only if you have to wait for a spinning disk to do the write.
>>     
>
> I agree to the rest of your argument below. But the question raised here
> was in regard to a system without any battery backup. So I would need to
> wait.
>
> Even then, in the worst case, I think it would be possible that the disk
> does only a partial write. I am not sure if that's really the case with
> today's disk drives (which I think have capacitors to prevent this
> scenario), but with past drives this could happen (I know all too well -
> a few years ago that cost me a weekend ;)).
>
> Rainer
>   
Hi,

as long as you do sector based writes (512 byte per sector, usual) you
can be sure that the write wasn"t partial.. Writing more than one sector
or not starting at a correct offset (n*512,n=0,1,2,...x) might result in
a partial write. I'll already tested that with my devel client here. So
fencing each sector with a crc32 value would help detecting errors
during a write operation. This is actually only a problem if you are
writing directly to a block device like any filesystem does and yes,
reordering is definitly a problem. So validating the content written to
the disk afterwards is important.

If writing through a filesystem reserving space in the destination file
beforehand actually minimizes errors since the file system table doesn't
have to be updated (you should also use the Flag O_NOATIME for that
case). See for example VMWare ESX VMDK file handling.

David

>   
>> this is the same problem that databases have. they need to guarentee that 
>> once the database tells the writing program that the data is written it 
>> will be there even if the system looses power immediatly.
>>
>> if you run a database on standard desktop hardware (and it doesn't have 
>> this safety disabled) you cannot do more then about 80 writes/second. If 
>> you upgrade to the super speedy 15K rpm drives you can do ~160 
>> writes/second.
>>
>> given that you need to write the data + metadata it gets even uglier, so 
>> what the databases do (and some journaling filesystems) is to write a log 
>> that says what they are going to do, sync that, and then later write the 
>> data to the actual files (updating the journal when they complete the 
>> write)
>>
>> it sounds like you order your write correctly for a disk-based queue, but 
>> you would need the option of issuing the syncs (probably when you do the 
>> checkpoints)
>>
>> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the 
>> wrong flash drive), the fact that you need to do four writes per log entry 
>> (data to queue, metadata to queue, data to output, update metadata for 
>> queue) could drop you to below 15 logs/sec (60/4 but then you loose time 
>> to seeking as well)
>>
>> however, with the correct drive to write to (say a $2,400 80G fusion-io 
>> flash card that can do ~100k IO ops/sec) you should be able to sustain 
>> 20,000 logs/sec.
>>
>> realisticly very few people need the sustained write capacity that you 
>> would get from such a setup. but if you go with a $500-$700 raid card with 
>> a battery-backed cache you get very similar performance, but with some 
>> possibility that you can't sustain it forever.
>>
>> David Lang
>>
>>     
>>> Rainer
>>>
>>> On Wed, 2008-10-01 at 04:55 -0700, david at lang.hm wrote:
>>>       
>>>> On Wed, 1 Oct 2008, David Ecker wrote:
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>> I am looking for a failsafe solution to store syslog messages localy
>>>>> until they could be send later. I already looked at the disk based
>>>>> memory queue and the disk based queue. Both queue's don't work if you
>>>>> just power down the system immediatly actually loosing the whole queue.
>>>>>           
>>>> are you sure about the disk based queue?
>>>>
>>>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
>>>> can be set to do a commit of the metadata after each message.
>>>>
>>>> Disk Queues
>>>>
>>>> Disk queues use disk drives for buffering. The important fact is that the
>>>> always use the disk and do not buffer anything in memory. Thus, the queue
>>>> is ultra-reliable, but by far the slowest mode. For regular use cases,
>>>> this queue mode is not recommended. It is useful if log data is so
>>>> important that it must not be lost, even in extreme cases.
>>>>
>>>> When a disk queue is written, it is done in chunks. Each chunk receives
>>>> its individual file. Files are named with a prefix (set via the
>>>> "$<object>QueueFilename" config directive) and followed by a 7-digit
>>>> number (starting at one and incremented for each file). Chunks are 10mb by
>>>> default, a different size can be set via the"$<object>QueueMaxFileSize"
>>>> config directive. Note that the size limit is not a sharp one: rsyslog
>>>> always writes one complete queue entry, even if it violates the size
>>>> limit. So chunks are actually a little but (usually less than 1k) larger
>>>> then the configured size. Each chunk also has a different size for the
>>>> same reason. If you observe different chunk sizes, you can relax: this is
>>>> not a problem.
>>>>
>>>> Writing in chunks is used so that processed data can quickly be deleted
>>>> and is free for other uses - while at the same time keeping no artificial
>>>> upper limit on disk space used. If a disk quota is set (instructions
>>>> further below), be sure that the quota/chunk size allows at least two
>>>> chunks to be written. Rsyslog currently does not check that and will fail
>>>> miserably if a single chunk is over the quota.
>>>>
>>>> Creating new chunks costs performance but provides quicker ability to free
>>>> disk space. The 10mb default is considered a good compromise between these
>>>> two. However, it may make sense to adapt these settings to local policies.
>>>> For example, if a disk queue is written on a dedicated 200gb disk, it may
>>>> make sense to use a 2gb (or even larger) chunk size.
>>>>
>>>> Please note, however, that the disk queue by default does not update its
>>>> housekeeping structures every time it writes to disk. This is for
>>>> performance reasons. In the event of failure, data will still be lost
>>>> (except when manually is mangled with the file structures). However, disk
>>>> queues can be set to write bookkeeping information on checkpoints (every n
>>>> records), so that this can be made ultra-reliable, too. If the checkpoint
>>>> interval is set to one, no data can be lost, but the queue is
>>>> exceptionally slow.
>>>>
>>>> Each queue can be placed on a different disk for best performance and/or
>>>> isolation. This is currently selected by specifying different
>>>> $WorkDirectory config directives before the queue creation statement.
>>>>
>>>> To create a disk queue, use the "$<object>QueueType Disk" config
>>>> directive. Checkpoint intervals can be specified via
>>>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> you also need to specificly enable syncing (from
>>>> http://www.rsyslog.com/doc-v3compatibility.html )
>>>>
>>>> Output File Syncing
>>>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
>>>> such, it retained stock syslogd's default of syncing every file write if
>>>> not specified otherwise (by placing a dash in front of the output file
>>>> name). While this was a useful feature in past days where hardware was
>>>> much less reliable and UPS seldom, this no longer is useful in today's
>>>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
>>>> writes files around 50 *times* slower than without it. It also affects
>>>> overall system performance due to the high IO activity. In rsyslog v3,
>>>> syncing has been turned off by default. This is done via a specific
>>>> configuration directive "$ActionFileEnableSync on/off" which is off by
>>>> default. So even if rsyslogd finds sync selector lines, it ignores them by
>>>> default. In order to enable file syncing, the administrator must specify
>>>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
>>>> syncing only happens in some installations where the administrator
>>>> actually wanted that (performance-intense) feature. In the fast majority
>>>> of cases (if not all), this dramatically increases rsyslogd performance
>>>> without any negative effects.
>>>>
>>>>
>>>>
>>>>         
>>>>> I already looked at queue.c and it seemed to me that both queues were
>>>>> not designed for that kind of failure, but I could be wrong there. Since
>>>>> an immediate power down of the system is the major failure which will
>>>>> occure pretty often I need to create a soltution there.
>>>>>           
>>>> with checkpoint interval set to 1 and syncing enabled the data should be
>>>> in on the disk safely (assuming you have hardware that supports this) and
>>>> a power-off won't affect it.
>>>>
>>>> David Lang
>>>>
>>>>
>>>>
>>>>         
>>>>> Did you already start to develop something addressing that problem?
>>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>>>> queue myself? I would contribute the code to the rsyslog project if you
>>>>> would like afterwards.
>>>>>
>>>>> bye
>>>>> David Ecker
>>>>>
>>>>>           
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>>         
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>>>       
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>     
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>   



More information about the rsyslog mailing list