[rsyslog-notify] Forum Thread: rsyslog 7.6.1/7.6.2 stability - (Mode 'post')

Wed Mar 19 18:04:31 CET 2014

User: jeff 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=24407#p24407

Message: 
----------
Follow-up to a discussion started in the email thread (original below, but
in summary rsyslog 7.6.2 installed on RHEL 6 via RPM repo, 45GB data daily
spread across 4 rsyslog servers).

Following the troubleshooting guidelines, I set 

[code:29eu7vij]export MALLOC_CHECK_=2[/code:29eu7vij]
and enabled debugging

[code:29eu7vij]rsyslogd -i /var/run/syslogd.pid -4 -s
mydomain.net:mydomain.org -dn[/code:29eu7vij]
With this option set, rsyslogd stops pretty darn right away (within
seconds). It doesn't generate a core dump, but does show a SIGABRT at the
end of the debug log:

[code:29eu7vij]3364.093535894:7f2a982f0700: wti
0x7f2a9fb245b0: worker awoke from idle processing
3364.093544433:7f2a982f0700: DeleteProcessedBatch: we
deleted 0 objects and enqueued 0 objects
3364.093551150:7f2a982f0700: doDeleteBatch: delete batch
from store, new sizes: log 1, phys 1
3364.093566920:7f2a982f0700: processBatch: batch of 1
elements must be processed
3364.093574379:7f2a982f0700: scriptExec: batch of 1
elements, active (nil), active[0]:1
3364.093580522:7f2a982f0700:     ACTION 0x7f2a9fb1a120
[builtin:omfile:-?DYNnetwork]
3364.093594688:7f2a982f0700: RRRR: execAct
[builtin:omfile]: batch of 1 elements, active (nil)
3364.093602707:7f2a982f0700: Called action(NotAllMark),
processing batch[0] via 'builtin:omfile'
3364.093608927:7f2a982f0700: Called action(Batch), logging to
builtin:omfile
3364.093625978:7f2a982f0700: dnscache: entry (nil) found
3364.093967380:7f2a996f2700: imudp: epoll_wait() returned
with 1 fds
3364.093989113:7f2a996f2700: imudp: recvmmsg returned 1
3364.094001229:7f2a996f2700:
recv(7,182),acl:1,msg:<190>Mar 19 2014 11:36:04 psufw :
%ASA-6-302014: Teardown TCP connection 1365933321 for
outside:204.77.213.190/3005 to inside:17
3364.094011981:7f2a996f2700: msg parser: flags 70, from
'~NOTRESOLVED~', msg '<190>Mar 19 2014 11:36:04 psufw :
%ASA-6-302014: Teardown TC'
3364.094019754:7f2a996f2700: parse using parser list
0x7f2a9fafeda0 (the default list).
3364.094026365:7f2a996f2700: dropped LF at very end of message
(DropTrailingLF is set)
3364.094033789:7f2a996f2700: Parser 'rsyslog.rfc5424'
returned -2160
3364.094040344:7f2a996f2700: Message will now be parsed by the
legacy syslog parser (one size fits all... ;)).
3364.094047876:7f2a996f2700: Parser 'rsyslog.rfc3164'
returned 0
3364.094056106:7f2a996f2700: imudp: recvmmsg returned -1
3364.094064297:7f2a996f2700: main Q: qqueueAdd: entry
added, size now log 1, phys 2 entries
3364.094071944:7f2a996f2700: main Q: MultiEnqObj advised
worker start
3364.094248518:7f2a982f0700: 

Signal 6 (SIGABRT) occured, execution must be terminated.

3364.094263190:7f2a982f0700: Mutex log for all known mutex
operations:
3364.094269820:7f2a982f0700: If the call trace is empty, you
may want to ./configure --enable-rtinst
3364.094275780:7f2a982f0700: 

To submit bug reports, visit http://www.rsyslog.com/bugs

3364.094281580:7f2a982f0700: 

To submit bug reports, visit
http://www.rsyslog.com/bugs[/code:29eu7vij]
If I unset MALLOC_CHECK_, rsyslogd will run for hours before it stops
working. Next steps?

[b:29eu7vij][u:29eu7vij][i:29eu7vij]original
e-mail[/i:29eu7vij][/u:29eu7vij][/b:29eu7vij]
Hi… hoping for some quick guidance:

I have four servers dedicated to rsyslog as a gateway for syslog data into
our log aggregator. The systems are more or less identically configured
running on Red Hat Enterprise 6, and are sitting  behind a load balancer.
I’ve been running v5.x (from the RHEL repos) and they’ve been pretty rock
solid over the past couple of years, really. The four servers send >45GB of
data daily with seemingly no major issues identified. All of the data I’m
accepting is over UDP (three separate listeners), with a TCP listener
available for the load balancer service monitor.

I needed/wanted to take advantage of the enhanced DNS caching (I had
disabled DNS lookups under 5.x since it appeared essentially every
connection was resulting in a lookup), so this past Saturday morning I
upgraded to rsyslog v7.6.1 (then today to v7.6.2) using the Adiscon RHEL
repo. Everything seemed to be working as expected, but a couple of things
happened:

 - at about 4:30pm Saturday, rsyslogd abended on three of the four servers
without me noticing
 - Tuesday morning just after 5:00am, rsyslogd abended on the fourth
server… it took us a few hours to notice the outage.

When I say abend, I really mean “stops running”, since it doesn’t seem to
log any data with the rsyslog service down… service rsyslog status reports
service is not running but pid exists…

We’ve since put monitoring in place, and I’ve upgraded all of them to the
just released v7.6.2 version of rsyslog. The service on rsyslog3 stopped
again yesterday afternoon, and rsyslog1 stopped overnight.

I was running debug logging on one system for a little bit, but it doesn’t
seem like a good idea to leave it that way until the service abends (the
debug log grew to 1.5GB in under 30 minutes), so I’m not sure the best way
to get additional information...

Not sure if this is normal, but I do note the memory utilization for the
rsyslogd service may have a memory leak. I can see the memory utilization
going up over time, but even for the one system that stayed up from Sat -
Tue morning utilization peaked under 40MB (perhaps due to the load?) and
the system has 2GB so I don’t think that’s a critical load….

By comparison, the v5 rsyslogd was keeping well under this mark on the days
leading up to the upgrade (rsyslog1 hovered under 15MB, rsyslog2 under
10MB, and the others under 7.5MB)… though again I had disabled DNS lookups
so perhaps this explains the relatively lower memory usage?

Any guidance you can provide appreciated. I really wanted to keep up with
the current stable channel, especially to take advantage of the DNS lookups
for the hosts what aren’t sending hostname along with their data…