[rsyslog] Possible bug? 100% cpu on a server thread and the client stops sending data
Nathan March
nathan at gossamer-threads.com
Thu May 21 20:27:53 CEST 2009
Whoops, that's probably the most critical thing =) It's on 3.22.0.
- Nathan
david at lang.hm wrote:
> one thing that looks like it's missing from this report is what version of
> rsyslog you are running.
>
> David Lang
>
> On Thu, 21 May 2009, Nathan March wrote:
>
>
>> Date: Thu, 21 May 2009 11:12:45 -0700
>> From: Nathan March <nathan at gossamer-threads.com>
>> Reply-To: rsyslog-users <rsyslog at lists.adiscon.com>
>> To: rsyslog at lists.adiscon.com
>> Subject: [rsyslog] Possible bug? 100% cpu on a server thread and the client
>> stops sending data
>>
>> Having a weird issue here..... Testing out a new rsyslog deployment and
>> I've got around 30 servers logging to one machine. On one of the clients
>> we had an issue where ssh was mysteriously very slow to login and
>> tracked it down to rsyslog causing issues. Simultaneously, one of the
>> threads on the logging server spiked up to 100%. New log data from the
>> client would stop showing up on the server at this poitn.
>>
>> On the client after attempting to gracefully stop rsyslog:
>>
>> vanguard etc # ps aux | grep -i rsyslog
>> root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
>> root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
>> root 6203 0.0 0.0 1512 524 pts/3 S 15:09 0:00 grep -i
>> rsyslog
>> vanguard etc # strace -p 3764
>> Process 3764 attached - interrupt to quit
>> send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"...,
>> 261, 0 <unfinished ...>
>> Process 3764 detached
>>
>> FD 11 being the socket to the logging server, it never does anything
>> asides from that send.
>>
>> On the server:
>>
>> ldap ~ # ps aux | grep rsyslog
>> root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>> root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07
>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
>>
>> Stracing the 7385 pid just shows this repeating over and over (FD 4
>> being the network socket again):
>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource
>> temporarily unavailable)
>>
>> It happened 3 times within a relatively short period of time, sometimes
>> within minutes of me restarting rsyslog on both machines. I enabled
>> debugging and it took around 8 hours to pop up again.
>>
>> With debugging, I pulled this from the rsyslog server:
>>
>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0.
>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil),
>> lenRcvBuf 0
>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf
>> 0x5c11c0, lenRcvBuf -1
>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf
>> 0x5cb510, lenRcvBuf -1
>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf
>> 0x5d45c0, lenRcvBuf -1
>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf
>> 0x5dd010, lenRcvBuf -1
>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf
>> 0x5e58b0, lenRcvBuf -1
>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf
>> 0x5ee150, lenRcvBuf -1
>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf
>> 0x5f7320, lenRcvBuf -1
>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf
>> 0x5fe0a0, lenRcvBuf -1
>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf
>> 0x608350, lenRcvBuf -1
>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf
>> 0x612110, lenRcvBuf -1
>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf
>> 0x615040, lenRcvBuf -1
>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf
>> 0x6218f0, lenRcvBuf -1
>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf
>> 0x62a190, lenRcvBuf -1
>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf
>> 0x634fe0, lenRcvBuf -1
>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf
>> 0x63b000, lenRcvBuf -1
>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf
>> 0x643c70, lenRcvBuf -1
>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf
>> 0x64dbb0, lenRcvBuf -1
>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf
>> 0x650960, lenRcvBuf -1
>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf
>> 0x65dbc0, lenRcvBuf -1
>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf
>> 0x666020, lenRcvBuf -1
>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf
>> 0x66e480, lenRcvBuf -1
>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf
>> 0x676d20, lenRcvBuf 78
>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf
>> 0x67fdd0, lenRcvBuf -1
>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf
>> 0x687f10, lenRcvBuf -1
>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf
>> 0x6907b0, lenRcvBuf -1
>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf
>> 0x699050, lenRcvBuf -1
>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil),
>> lenRcvBuf 0
>>
>> This repeats enough to generate around 29 million lines in the debug log
>> file.
>>
>> In the client log, there's plenty of these which seems somewhat normal:
>>
>> 6509.743164000:imuxsock.c: --------imuxsock calling select, active file
>> descriptors (max 11): 11
>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11
>> 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20
>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum
>> 6509.743281000:imuxsock.c: Message has legacy syslog format.
>> 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries
>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy
>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start
>>
>> However comparing it to an earlier point in the log there's none of the
>> associated tcp queue entries like this:
>>
>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107
>> 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter
>> 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0,
>> size now 0 entries
>> 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file
>> 3587.234285000:main queue:Reg/w0: (/var/log/maillog)
>> 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd
>> 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com
>> 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp
>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78
>> 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter
>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE,
>> waiting for work.
>>
>> Anyone able to shed some light on this situation? I've got the full
>> debug logs if it's useful to anyone to take a closer look.
>>
>> Thanks!
>>
>> - Nathan
>>
>>
>> ------------------------------------------------------------------------------------
>> Server config file
>> ------------------------------------------------------------------------------------
>>
>> $ModLoad immark.so # provides --MARK-- message capability
>> $ModLoad imuxsock.so # provides support for local system logging (e.g.
>> via logger command)
>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
>>
>> $umask 0137
>> $DirCreateMode 0640
>> $FileCreateMode 0640
>> $FileOwner root
>> $FileGroup admin
>>
>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages"
>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron"
>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure"
>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog"
>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler"
>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot"
>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog"
>>
>> :msg, contains, "no keys found for" ~
>> :msg, contains, "session opened for user" ~
>>
>> # Log all kernel messages to the console.
>> # Logging much else clutters up the screen.
>> kern.* /dev/console
>>
>> # Log anything (except mail) of level info or higher.
>> # Don't log private authentication messages!
>> *.info;mail.none;cron.none ?DynFile
>>
>> *.warn;\
>> authpriv.none;cron.none;mail.none;news.none ?DynFileSyslog
>>
>> # Log all the mail messages in one place.
>> mail.* ?DynFileMail
>>
>> # Log cron stuff
>> cron.* ?DynFileCron
>>
>> # Everybody gets emergency messages
>> *.emerg *
>>
>> # Save news errors of level crit and higher in a special file
>> uucp,news.crit ?DynFileSpool
>>
>> # Save boot messages also to boot.log
>> local7.* ?DynFileBoot
>>
>> # Relp config for PCI
>> $ModLoad imrelp
>> $InputRELPServerRun 2515
>>
>> # make gtls driver the default
>> $DefaultNetstreamDriver gtls
>>
>> # certificate files
>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem
>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem
>>
>> $ModLoad imtcp.so
>>
>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode
>> $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated
>> $InputTCPServerRun 10514 # start up listener at port 10514
>> $InputTCPMaxSessions 500
>>
>> ------------------------------------------------------------------------------------
>> Client config file
>> ------------------------------------------------------------------------------------
>> $ModLoad imuxsock.so # provides support for local system logging (e.g.
>> via logger command)
>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
>>
>> $umask 0137
>> $DirCreateMode 0640
>> $FileCreateMode 0640
>> $FileOwner root
>> $FileGroup admin
>>
>>
>> # Log anything 'info' or higher, but lower than 'warn'.
>> # Exclude authpriv, cron, mail, and news. These are logged elsewhere.
>> *.info;*.!warn;\
>> authpriv.none;cron.none;mail.none;news.none /var/log/messages
>>
>> # Log anything 'warn' or higher.
>> # Exclude authpriv, cron, mail, and news. These are logged elsewhere.
>> *.warn;\
>> authpriv.none;cron.none;mail.none;news.none /var/log/syslog
>>
>> # Debugging information is logged here.
>> *.=debug /var/log/debug
>>
>> # Private authentication message logging:
>> authpriv.* /var/log/secure
>>
>> # Cron related logs:
>> cron.* /var/log/cron
>>
>> # Mail related logs:
>> mail.* /var/log/maillog
>>
>> # Emergency level messages go to all users:
>> *.emerg *
>>
>>
>> ## Uncomment these lines to use RELP instead for PCI compliance (stunnel
>> required)
>> #$ModLoad omrelp
>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat
>>
>> # certificate files - just CA for a client
>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
>>
>> # set up the action
>> $DefaultNetstreamDriver gtls # use gtls netstream driver
>> $ActionSendStreamDriverMode 1 # require TLS for the connection
>> $ActionSendStreamDriverAuthMode anon # server is NOT authenticated
>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages
>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
More information about the rsyslog
mailing list