[rsyslog-notify] Forum Thread: Sending of log entries via TCP hangs in Drvr.Send under load - (Mode 'post')
noreply at adiscon.com
noreply at adiscon.com
Sun Feb 28 18:08:42 CET 2016
User: mgerhardt
Forumlink: http://kb.monitorware.com/viewtopic.php?p=26379#p26379
Message:
----------
After hours of analysing and debugging I nailed down a current issue that
we have with rsyslogd 5.8.6-1ubuntu8.9 (running under ubuntu 12.04) to
the catched debug output, that tells me that client rsyslogd sending hangs
in the Drvr.Send (runtime/netstrm.c) call. When client rsyslog hangs,
tomcat still sends directly its logs to the server, so connectivity in
general is there.
Here is the story behind. We have a cloud application in OpenStack
environment running for 2 years now. Setup is a bunch of apache/tomcat
servers which logging frequently to an admin VM. All works fine with
thousands of log entries every day. But now we like to move to another
datacenter and we made load tests to verify the new plattform. There it
came out that under load the sending of logs to the server stucks after
some minutes and first come back when load is reduced again. One client
should send ~40 messages a second. Network should be sufficient and the
application function itself does not suffer from the load at all.
So i took a 5.8.6 rsyslog source package, compiled it and add some debug
statements to find the reason for the haning. It came out the it hangs in
the Drvr.Send call. I verified this with debug statements before and after
and the after one first appears after the pause, when the load is reduced
again. In the pause time the messages are queued, which works, but fills up
local discs and produces an bust of load when it come back.
[code:nix0ofi0]static rsRetVal
Send(netstrm_t *pThis, uchar *pBuf, ssize_t *pLenBuf)
{
DEFiRet;
ISOBJ_TYPE_assert(pThis, netstrm);
dbgprintf("TCPDEBUG netstr before Send buffer: %p \n", pBuf);
iRet = pThis->Drvr.Send(pThis->pDrvrData, pBuf, pLenBuf);
dbgprintf("TCPDEBUG after Send buffer: %p \n", pBuf);
RETiRet;
}[/code:nix0ofi0]
[code:nix0ofi0]> tail -f rsyslog.log | grep TCPDEBUG
3160.784898157:7f4e20a35700: TCPDEBUG sent 454 bytes, requested
454
3160.789896012:7f4e20a35700: TCPDEBUG CheckConnection
3160.789918492:7f4e20a35700: TCPDEBUG TCP send 453 bytes
buffer: 0x7f4e14001be0
3160.789932910:7f4e20a35700: TCPDEBUG netstr before Send
buffer: 0x7f4e14001be0
<hanging>
3696.004162400:7f4e20a35700: TCPDEBUG after Send buffer:
0x7f4e14001be0
3696.004262513:7f4e20a35700: TCPDEBUG sent 453 bytes, requested
453
3696.004763511:7f4e137fe700: TCPDEBUG CheckConnection
3696.004789914:7f4e137fe700: TCPDEBUG TCP send 432 bytes
buffer: 0x7f4e040042e0
3696.004808967:7f4e137fe700: TCPDEBUG netstr before Send
buffer: 0x7f4e040042e0[/code:nix0ofi0]
Because I saw, that only one action has the property FilterOK set to 1, the
"hanging" action is the only one that even comes to the place where
something is tried to be sent.
Is there an explication for this?
Thanks for any help and hints
Michael
More information about the rsyslog-notify
mailing list