[rsyslog] Development of failsafe disk based queue
David Ecker
david at ecker-software.de
Wed Oct 1 15:20:29 CEST 2008
Hi,
this is already the second version of this systems we develop. Not being
able to do error analysis beacause of missing log data was one of the
big problems including filesystem crashes. Having all logs and being
able to proof the cause also helps a lot afterwards not only for
creating a workaround for the incident but also to proof that the
Service Level Agreement wasn't violated.
bye
David Ecker
Rainer Gerhards schrieb:
> Sorry, I overlooked this mail in the big bunch of messages. That's good
> reasoning.
>
> To cover these scenarios, we need to do everything with syncing. This
> also means that you can not use any of the disk-assisted modes, because
> in these modes we always try to keep things in memory in order to save
> writes.
>
> So while you have convinced me things can go wrong, I'd still say that
> is is very unusual (at least very costly) to care for all these things.
> But, of course, there are situations where it is needed. I'll probably
> see that I provide a facility to open files in "always sync" mode, but
> that for sure will not be the default setting ;)
>
> But even with the fast solid state disks (and similar methods) you
> mention, I think there will be a severe impact on performance because
> everything now needs to go through two write (data+metadata) and two
> read (again, data+metadata) OS call where we currently simply update an
> in-memory structure.
>
> Just out of curiosity: do you expect the majority of you rollouts to be
> using such methods?
>
> Rainer
>
> On Wed, 2008-10-01 at 05:35 -0700, david at lang.hm wrote:
>
>>> ... And I have never heard of anybody doing serious datacenter work
>>> without a proper UPS. Is this *really* an issue?
>>>
>> Yes.
>>
>> UPSs fail.
>> generators fail
>> power cords come loose.
>> power cords get unplugged by someone who thinks they are unplugging a
>> different system
>> people bump power switches on power strips.
>> power supplies are defective
>>
>> I had one production outage where a visiting tech pulled a power cord from
>> an overhead plug and dropped it on the ground, where it happened to hit
>> the power switch on a power strip.
>>
>> I've had high-end systems with redundant power supplies go down becouse of
>> faulty hardware that decided to disble both power supplies at once (it
>> turned out that there was a defect in the whole batch of servers, but it
>> took IBM several weeks to figure out what was going on)
>>
>> I've had UPS systems blow up (literally)
>>
>> I've had a datacenter go down becouse the it was running on generator
>> power (due to other issues), and the refueling guy filled the tank
>> incorrectly and got air bubbles into the fuel system, a few min later the
>> 500Kw diesel generator couldn't maintain constant speed and the safety
>> triggers kicked in and disabled it.
>>
>> it's amazing the things that happen in real-life
>>
>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
More information about the rsyslog
mailing list