[rsyslog] Possible bug? 100% cpu on a server thread and the client stops sending data
Rainer Gerhards
rgerhards at hq.adiscon.com
Sun Jun 7 20:46:54 CEST 2009
Please give v4 a try, the bug is potentially fixed there. If it actually is,
this gives me a clue of where to look at.
Rainer
> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com
> [mailto:rsyslog-bounces at lists.adiscon.com] On Behalf Of Nathan March
> Sent: Saturday, June 06, 2009 12:09 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server
> thread and the client stops sending data
>
> Any hope of this bug getting some attention soon? =)
>
> Thanks,
> Nathan
>
>
> Patrick Shen wrote:
> > Hi Rainer,
> >
> > Thanks for the quick reply.
> >
> > My environment is like below:
> >
> > CLIENT ---
> > |
> > | (TCP)
> > --------> SERVER ----
> > |
> > | (RELP Relay)
> > --------------> SERVER Standby
> >
> > I've tested, even if the standby server is under high load,
> one of our
> > clients won't accept any new network connections.
> >
> > Just for your information.
> >
> > Best regards,
> > Patrick
> >
> > Rainer Gerhards wrote:
> >
> >> Hi all,
> >>
> >> thanks for the detailed info. Will look at it asap, but I
> am currently tied
> >> up with some other work...
> >>
> >> Please let me know anything else that may be relevant.
> >>
> >> Thanks,
> >> Rainer
> >>
> >>
> >>> -----Original Message-----
> >>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> >>> bounces at lists.adiscon.com] On Behalf Of Patrick Shen
> >>> Sent: Monday, May 25, 2009 8:17 AM
> >>> To: rsyslog-users
> >>> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server
> thread and the
> >>> client stops sending data
> >>>
> >>> Hi Nathan,
> >>>
> >>> In my company, we have the similar symptom. We have 50+
> servers which
> >>> send logs to the central logging servers. Both clients
> and server use
> >>> rsyslog v3.20.5 right now. In the past when we use
> v3.20.0, the symptom
> >>> is more common when server is under high load average.
> >>>
> >>> It's also weird for us, not all of clients were lost of
> connections. We
> >>> have 2-3 clients which is running the same application
> will looks like
> >>> hanging up when server is high-load. New ssh connection
> is very slow at
> >>> that time. But the connected ssh session works normal (We
> have some
> >>> prepared screen session on other clients which connect to
> them via ssh).
> >>>
> >>> When the clients are lost of connections. So far our
> solution is to
> >>> restart rsyslog both on affected clients and server. Then
> the clients
> >>> get back and look normal.
> >>>
> >>> We've suffered it for a while, but I didn't take chance
> to analyze it
> >>> like you.
> >>>
> >>> Best regards,
> >>> Patrick
> >>>
> >>> Nathan March wrote:
> >>>
> >>>> Having a weird issue here..... Testing out a new rsyslog
> deployment and
> >>>> I've got around 30 servers logging to one machine. On
> one of the clients
> >>>> we had an issue where ssh was mysteriously very slow to login and
> >>>> tracked it down to rsyslog causing issues.
> Simultaneously, one of the
> >>>> threads on the logging server spiked up to 100%. New log
> data from the
> >>>> client would stop showing up on the server at this poitn.
> >>>>
> >>>> On the client after attempting to gracefully stop rsyslog:
> >>>>
> >>>> vanguard etc # ps aux | grep -i rsyslog
> >>>> root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> /etc/rsyslog.conf
> >>>> root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> /etc/rsyslog.conf
> >>>> root 6203 0.0 0.0 1512 524 pts/3 S 15:09
> 0:00 grep -i
> >>>> rsyslog
> >>>> vanguard etc # strace -p 3764
> >>>> Process 3764 attached - interrupt to quit
> >>>> send(11,
> "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"...,
> >>>> 261, 0 <unfinished ...>
> >>>> Process 3764 detached
> >>>>
> >>>> FD 11 being the socket to the logging server, it never
> does anything
> >>>> asides from that send.
> >>>>
> >>>> On the server:
> >>>>
> >>>> ldap ~ # ps aux | grep rsyslog
> >>>> root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07
> >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
> >>>>
> >> /etc/rsyslog/rsyslog.conf
> >>
> >>>> Stracing the 7385 pid just shows this repeating over and
> over (FD 4
> >>>> being the network socket again):
> >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1
> EAGAIN (Resource
> >>>> temporarily unavailable)
> >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1
> EAGAIN (Resource
> >>>> temporarily unavailable)
> >>>>
> >>>> It happened 3 times within a relatively short period of
> time, sometimes
> >>>> within minutes of me restarting rsyslog on both
> machines. I enabled
> >>>> debugging and it took around 8 hours to pop up again.
> >>>>
> >>>> With debugging, I pulled this from the rsyslog server:
> >>>>
> >>>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0.
> >>>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550:
> pszRcvBuf (nil),
> >>>> lenRcvBuf 0
> >>>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf
> >>>> 0x5c11c0, lenRcvBuf -1
> >>>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf
> >>>> 0x5cb510, lenRcvBuf -1
> >>>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf
> >>>> 0x5d45c0, lenRcvBuf -1
> >>>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf
> >>>> 0x5dd010, lenRcvBuf -1
> >>>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf
> >>>> 0x5e58b0, lenRcvBuf -1
> >>>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf
> >>>> 0x5ee150, lenRcvBuf -1
> >>>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf
> >>>> 0x5f7320, lenRcvBuf -1
> >>>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf
> >>>> 0x5fe0a0, lenRcvBuf -1
> >>>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf
> >>>> 0x608350, lenRcvBuf -1
> >>>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf
> >>>> 0x612110, lenRcvBuf -1
> >>>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf
> >>>> 0x615040, lenRcvBuf -1
> >>>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf
> >>>> 0x6218f0, lenRcvBuf -1
> >>>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf
> >>>> 0x62a190, lenRcvBuf -1
> >>>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf
> >>>> 0x634fe0, lenRcvBuf -1
> >>>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf
> >>>> 0x63b000, lenRcvBuf -1
> >>>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf
> >>>> 0x643c70, lenRcvBuf -1
> >>>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf
> >>>> 0x64dbb0, lenRcvBuf -1
> >>>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf
> >>>> 0x650960, lenRcvBuf -1
> >>>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf
> >>>> 0x65dbc0, lenRcvBuf -1
> >>>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf
> >>>> 0x666020, lenRcvBuf -1
> >>>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf
> >>>> 0x66e480, lenRcvBuf -1
> >>>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf
> >>>> 0x676d20, lenRcvBuf 78
> >>>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf
> >>>> 0x67fdd0, lenRcvBuf -1
> >>>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf
> >>>> 0x687f10, lenRcvBuf -1
> >>>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf
> >>>> 0x6907b0, lenRcvBuf -1
> >>>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf
> >>>> 0x699050, lenRcvBuf -1
> >>>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550:
> pszRcvBuf (nil),
> >>>> lenRcvBuf 0
> >>>>
> >>>> This repeats enough to generate around 29 million lines
> in the debug log
> >>>> file.
> >>>>
> >>>> In the client log, there's plenty of these which seems
> somewhat normal:
> >>>>
> >>>> 6509.743164000:imuxsock.c: --------imuxsock calling
> select, active file
> >>>> descriptors (max 11): 11
> >>>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11
> >>>> 6509.743262000:imuxsock.c: logmsg: flags 4, from
> 'vanguard', msg May 20
> >>>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum
> >>>> 6509.743281000:imuxsock.c: Message has legacy syslog format.
> >>>> 6509.743302000:imuxsock.c: main queue: entry added, size
> now 579 entries
> >>>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy
> >>>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg
> advised worker start
> >>>>
> >>>> However comparing it to an earlier point in the log
> there's none of the
> >>>> associated tcp queue entries like this:
> >>>>
> >>>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes,
> requested 107
> >>>> 3587.234204000:main queue:Reg/w0: main queue: entering
> rate limiter
> >>>> 3587.234229000:main queue:Reg/w0: main queue: entry
> deleted, state 0,
> >>>> size now 0 entries
> >>>> 3587.234255000:main queue:Reg/w0: Called action, logging
> to builtin-file
> >>>> 3587.234285000:main queue:Reg/w0: (/var/log/maillog)
> >>>> 3587.234314000:main queue:Reg/w0: Called action, logging
> to builtin-fwd
> >>>> 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com
> >>>> 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp
> >>>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78
> >>>> 3587.234446000:main queue:Reg/w0: main queue: entering
> rate limiter
> >>>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE,
> >>>> waiting for work.
> >>>>
> >>>> Anyone able to shed some light on this situation? I've
> got the full
> >>>> debug logs if it's useful to anyone to take a closer look.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> - Nathan
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> --------------------------------------------------------------
> --------------
> >>
> >>> --------
> >>>
> >>>> Server config file
> >>>>
> >>>>
> >>
> --------------------------------------------------------------
> --------------
> >>
> >>> --------
> >>>
> >>>> $ModLoad immark.so # provides --MARK-- message capability
> >>>> $ModLoad imuxsock.so # provides support for local system
> logging (e.g.
> >>>> via logger command)
> >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
> >>>>
> >>>> $umask 0137
> >>>> $DirCreateMode 0640
> >>>> $FileCreateMode 0640
> >>>> $FileOwner root
> >>>> $FileGroup admin
> >>>>
> >>>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages"
> >>>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron"
> >>>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure"
> >>>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog"
> >>>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler"
> >>>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot"
> >>>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog"
> >>>>
> >>>> :msg, contains, "no keys found for" ~
> >>>> :msg, contains, "session opened for user" ~
> >>>>
> >>>> # Log all kernel messages to the console.
> >>>> # Logging much else clutters up the screen.
> >>>> kern.*
> /dev/console
> >>>>
> >>>> # Log anything (except mail) of level info or higher.
> >>>> # Don't log private authentication messages!
> >>>> *.info;mail.none;cron.none ?DynFile
> >>>>
> >>>> *.warn;\
> >>>> authpriv.none;cron.none;mail.none;news.none
> ?DynFileSyslog
> >>>>
> >>>> # Log all the mail messages in one place.
> >>>> mail.*
> ?DynFileMail
> >>>>
> >>>> # Log cron stuff
> >>>> cron.*
> ?DynFileCron
> >>>>
> >>>> # Everybody gets emergency messages
> >>>> *.emerg *
> >>>>
> >>>> # Save news errors of level crit and higher in a special file
> >>>> uucp,news.crit
> ?DynFileSpool
> >>>>
> >>>> # Save boot messages also to boot.log
> >>>> local7.*
> ?DynFileBoot
> >>>>
> >>>> # Relp config for PCI
> >>>> $ModLoad imrelp
> >>>> $InputRELPServerRun 2515
> >>>>
> >>>> # make gtls driver the default
> >>>> $DefaultNetstreamDriver gtls
> >>>>
> >>>> # certificate files
> >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
> >>>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem
> >>>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem
> >>>>
> >>>> $ModLoad imtcp.so
> >>>>
> >>>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode
> >>>> $InputTCPServerStreamDriverAuthMode anon # client is NOT
> authenticated
> >>>> $InputTCPServerRun 10514 # start up listener at port 10514
> >>>> $InputTCPMaxSessions 500
> >>>>
> >>>>
> >>>>
> >>
> --------------------------------------------------------------
> --------------
> >>
> >>> --------
> >>>
> >>>> Client config file
> >>>>
> >>>>
> >>
> --------------------------------------------------------------
> --------------
> >>
> >>> --------
> >>>
> >>>> $ModLoad imuxsock.so # provides support for local system
> logging (e.g.
> >>>> via logger command)
> >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
> >>>>
> >>>> $umask 0137
> >>>> $DirCreateMode 0640
> >>>> $FileCreateMode 0640
> >>>> $FileOwner root
> >>>> $FileGroup admin
> >>>>
> >>>>
> >>>> # Log anything 'info' or higher, but lower than 'warn'.
> >>>> # Exclude authpriv, cron, mail, and news. These are
> logged elsewhere.
> >>>> *.info;*.!warn;\
> >>>> authpriv.none;cron.none;mail.none;news.none
> /var/log/messages
> >>>>
> >>>> # Log anything 'warn' or higher.
> >>>> # Exclude authpriv, cron, mail, and news. These are
> logged elsewhere.
> >>>> *.warn;\
> >>>> authpriv.none;cron.none;mail.none;news.none
> /var/log/syslog
> >>>>
> >>>> # Debugging information is logged here.
> >>>> *.=debug
> /var/log/debug
> >>>>
> >>>> # Private authentication message logging:
> >>>> authpriv.*
> /var/log/secure
> >>>>
> >>>> # Cron related logs:
> >>>> cron.*
> /var/log/cron
> >>>>
> >>>> # Mail related logs:
> >>>> mail.*
> /var/log/maillog
> >>>>
> >>>> # Emergency level messages go to all users:
> >>>> *.emerg *
> >>>>
> >>>>
> >>>> ## Uncomment these lines to use RELP instead for PCI
> compliance (stunnel
> >>>> required)
> >>>> #$ModLoad omrelp
> >>>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat
> >>>>
> >>>> # certificate files - just CA for a client
> >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
> >>>>
> >>>> # set up the action
> >>>> $DefaultNetstreamDriver gtls # use gtls netstream driver
> >>>> $ActionSendStreamDriverMode 1 # require TLS for the connection
> >>>> $ActionSendStreamDriverAuthMode anon # server is NOT
> authenticated
> >>>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages
> >>>>
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.comoften
> >>>>
> >>> --
> >>> Patrick Shen
> >>> Operations Engineer
> >>>
> >>> net mobile AG Shanghai office
> >>>
> >>> T: +86 21 61350900 - 222
> >>> F: +86 21 61350906
> >>> M: +86 13122245730
> >>>
> >>> FE55 6A7F 3192 F359 C1DF 12F6 9078 58B4 57AF 0BE3
> >>>
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> >>
> >
> >
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
More information about the rsyslog
mailing list