[rsyslog-notify] Forum Thread: Re: Dequeue Perfomance - (Mode 'reply')

Mon May 4 01:55:28 CEST 2015

User: kamermans 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25467#p25467

Message: 
----------
Hi Rainer, first of all, let me quickly thank you for your contribution to
the open-source community.  I am an open-source author myself and truly
respect your work!

Interesting idea with oom_killer, I should have thought of that, in fact,
I've even
[url=http://www.stevekamerman.com/2011/01/keep-oom_killer-from-killing-your-server/:16eqdre0]written
a script[/url:16eqdre0] to keep things like sshd from getting blasted by it
:P .

There is no record in /var/log/kern.log, but I suppose if the kernel is
logging to /dev/log that it had to kill rsyslog, that event itself might
not be logged before it's killed.  I do doubt that, however, since
oom_killer outputs the results of it vote for a victim process before any
action is taken.

Assuming it was oom_killer, something must have restarted the parent
process.  The counters for all queues were reset and the process got a new
PID.  I can't think of any way that rsyslog could do that autonomously, so
it would have to be the Debian-based init script, which launches rsyslogd
with this command line (in Ubuntu Server 14.04 x64)

[code:16eqdre0]
start-stop-daemon --start --quiet --pidfile /var/run/rsyslogd.pid
--exec /usr/sbin/rsyslogd -- -c5
[/code:16eqdre0]

I don't see any evidence that start-stop-daemon has a process supervisor /
auto-restart capability, so my impression is that rsyslogd, at least the
parent process, was not killed by oom_killer.
(the "-c5" is possibly an artifact from the v7 init script or something,
but it doesn't seem to be affecting v8.7.0, which I'm using from your repo)

In any case, you're right that the memory graph suggests that the process
ran out of memory and crashed, then restarted, or something similar.

[quote:16eqdre0]Check if you have sized the in-memory queues
reasonable[/quote:16eqdre0]

I'm honestly not sure how to size the in-memory queue with the LinkedList
option in Disk Assisted mode.  My goal was to use all the RAM (15GB), then
shove things down to disk (70GB) when all the memory is gone.

Here's the current config for the queue that is overflowing:

[code:16eqdre0]
ruleset(name="forwardRelp"
        queue.type="LinkedList"
        queue.fileName="relp_queue"
        queue.spoolDirectory="/var/spool/rsyslog/relp_queue"
        queue.maxDiskSpace="70g"
        queue.maxFileSize="1g"
        queue.size="172800000"  # two days at 1000/sec
        queue.dequeueBatchSize="750"
        queue.saveOnShutdown="on"
        queue.workerThreads="6"
        queue.workerThreadMinimumMessages="100"
) {
        action( type="omrelp"
                target="10.1.2.3"
                port="15140"
#               action.retryCount="-1"
        )
}
[/code:16eqdre0]

I suppose this line is what's killing me:

[code:16eqdre0]
queue.size="172800000"  # two days at 1000/sec
[/code:16eqdre0]

If this is supposed to fit in RAM, it's definitely going to run out, but
then how do I specify the size of the RAM queue.size and the disk-assisted
queue.size separately?