[rsyslog] rsyslog hangs at startup
a.piesk at gmx.net
Wed Nov 2 22:31:42 CET 2011
On 02.11.2011 08:32, Rainer Gerhards wrote:
> we had a public holiday over here yesterday. Thanks for your analysis, this
same here :)
> sounds very interesting (and probably explains why I haven't seen it so far).
> I need to review the code, it's quite a while since I looked at that place
> the last time, seems it has "worked" for a couple of years. I'll do so later
> today and let you know the findings. We may need to further instrument/try
> out some patches...
so save you some time: for my tests i already instrumented the code, basically dbgprintf() around
the critical pthread_mutex_* calls.
the problem here is, the debug statements change the timing and make it a lot harder to get a dead
lock. after a lot of runs i managed to get a dead lock with the instrumented rsyslogd (patch and
complete log attached). the interesting part:
7592.695797000:2b073f035320: mutLoadUnload locked <- thread A got mutLoadUnload
7592.695806000:2b073f035320: loading module '/lib64/rsyslog/imfile.so'
7592.695840000:420bc940: mutLoadUnload released <- this a strange
7592.695862000:420bc940: source file netstrms.c requested reference for module 'lmnsd_ptcp',
reference count now 1
7592.695868000:420bc940: mutObjGlobalOp released
7592.695885000:420bc940: locking mutObjGlobalOp ...
7592.695890000:420bc940: mutObjGlobalOp locked
7592.695901000:420bc940: mutObjGlobalOp released
7592.696989000:420bc940: TCP sent 181 bytes, requested 181
7592.696999000:420bc940: Action 0x2b0757efe5a0 transitioned to state: rdy
7592.697004000:420bc940: action 0x2b0757efe5a0 call returned 0
7592.697015000:420bc940: Action 0x2b0757efe5a0 transitioned to state: itx
7592.697143000:420bc940: entering actionCalldoAction(), state: itx
7592.697159000:420bc940: CheckConnection detected broken connection - closing it
7592.697173000:420bc940: locking mutObjGlobalOp ...
7592.697177000:420bc940: mutObjGlobalOp locked <- thread B got mutObjGlobalOp
7592.697184000:420bc940: file netstrms.c released module 'lmnsd_ptcp', reference count now 0
7592.697188000:420bc940: module 'lmnsd_ptcp' has zero reference count, unloading...
7592.697192000:420bc940: locking mutLoadUnload ... <- thread B wants mutLoadUnload held by A
7592.703794000:2b073f035320: locking mutObjGlobalOp ... <- thread A wants mutObjGlobalOp held by B
7608.246465000:2b073f035320: DoDie called. <- i did it :)
getting mutObjGlobalOp before mutLoadUnload and releasing it afterwards might by a solution but
there may be better ones.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the rsyslog