[rsyslog-notify] Forum Thread: Re: Dequeue Perfomance - (Mode 'edit_last_post')

noreply at adiscon.com noreply at adiscon.com
Thu May 7 05:49:21 CEST 2015


User: kamermans 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25518#p25518

Message: 
----------
[quote:1txavngd]what do the processed stats look like on the relp output
action?[/quote:1txavngd]

If that information is not in my previous post, then I'm not sure where it
would be, since that is the complete pstats output for one interval.

I assume the RELP actions must be [b:1txavngd]actions 1 & 2[/b:1txavngd]
since they're the only actions with a sizeable number of messages.

The logging sends a 5Mbps (bits, not bytes) RELP stream down a 50Mbps pipe,
from AWS us-east-1, about 10 miles away to our collection server, which is
running Rsyslog 8.7.  That server is basically idle at all times and the
queue never rises above 0.  It is a physical server with dual quad-core
Xeons which writes the data directly to a dedicated disk array.

It looks like my partitioning helped for a while, but in the end things
still went down for no apparent reason with no logged information.

Here's the pstats data at the time that I suspect it crashed:
[code:1txavngd]
Thu May  7 01:42:19 2015: imuxsock: origin=imuxsock
submitted=1382 ratelimit.discarded=0 ratelimit.numratelimiters=0
Thu May  7 01:42:19 2015: action 1: origin=core.action
processed=70395570 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 2: origin=core.action
processed=86313400 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 3: origin=core.action
processed=1202 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 4: origin=core.action
processed=184 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 5: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 6: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: action 7: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:42:19 2015: imudp(*:514): origin=imudp
submitted=78004089
Thu May  7 01:42:19 2015: imudp(*:514): origin=imudp
submitted=0
Thu May  7 01:42:19 2015: imudp(*:515): origin=imudp
submitted=89166253
Thu May  7 01:42:19 2015: imudp(*:515): origin=imudp
submitted=0
Thu May  7 01:42:19 2015: resource-usage: origin=impstats
utime=6058046265 stime=4895587760 maxrss=14944560 minflt=1819466 majflt=0
inblock=296 oublock=5560 nvcsw=465720072 nivcsw=45595
Thu May  7 01:42:19 2015: forwardPartition1[DA]:
origin=core.queue size=0 enqueued=7162500 full=0 discarded.full=0
discarded.nf=0 maxqsize=0
Thu May  7 01:42:19 2015: forwardPartition1:
origin=core.queue size=446769 enqueued=78004089 full=0
discarded.full=0 discarded.nf=0 maxqsize=450809
Thu May  7 01:42:19 2015: forwardPartition2[DA]:
origin=core.queue size=0 enqueued=2450250 full=0 discarded.full=0
discarded.nf=0 maxqsize=0
Thu May  7 01:42:19 2015: forwardPartition2:
origin=core.queue size=403353 enqueued=89166253 full=0
discarded.full=0 discarded.nf=0 maxqsize=450800
Thu May  7 01:42:19 2015: main Q: origin=core.queue
size=0 enqueued=1386 full=0 discarded.full=0 discarded.nf=0
maxqsize=2
Thu May  7 01:42:19 2015: imudp(w1): origin=imudp
called.recvmmsg=162022705 called.recvmsg=0
msgs.received=48005814
Thu May  7 01:42:19 2015: imudp(w0): origin=imudp
called.recvmmsg=276510773 called.recvmsg=0
msgs.received=119164528
 ... Counters reset at about 1:45 UTC / 9:45 PM EST
(see graphs) ...
Thu May  7 01:49:09 2015: imuxsock: origin=imuxsock
submitted=0 ratelimit.discarded=0 ratelimit.numratelimiters=0
Thu May  7 01:49:09 2015: action 1: origin=core.action
processed=25761 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 2: origin=core.action
processed=23159 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 3: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 4: origin=core.action
processed=3 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 5: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 6: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: action 7: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Thu May  7 01:49:09 2015: imudp(*:514): origin=imudp
submitted=247286
Thu May  7 01:49:09 2015: imudp(*:514): origin=imudp
submitted=0
Thu May  7 01:49:09 2015: imudp(*:515): origin=imudp
submitted=58670
Thu May  7 01:49:09 2015: imudp(*:515): origin=imudp
submitted=0
Thu May  7 01:49:09 2015: resource-usage: origin=impstats
utime=6382729 stime=5512357 maxrss=437196 minflt=536 majflt=1 inblock=328
oublock=8 nvcsw=606826 nivcsw=14
Thu May  7 01:49:09 2015: forwardPartition1[DA]:
origin=core.queue size=0 enqueued=0 full=0 discarded.full=0
discarded.nf=0 maxqsize=0
Thu May  7 01:49:09 2015: forwardPartition1:
origin=core.queue size=222275 enqueued=247286 full=0
discarded.full=0 discarded.nf=0 maxqsize=222275
Thu May  7 01:49:09 2015: forwardPartition2[DA]:
origin=core.queue size=0 enqueued=0 full=0 discarded.full=0
discarded.nf=0 maxqsize=0
Thu May  7 01:49:09 2015: forwardPartition2:
origin=core.queue size=36261 enqueued=58670 full=0 discarded.full=0
discarded.nf=0 maxqsize=36261
Thu May  7 01:49:09 2015: main Q: origin=core.queue
size=0 enqueued=3 full=0 discarded.full=0 discarded.nf=0 maxqsize=2
Thu May  7 01:49:09 2015: imudp(w1): origin=imudp
called.recvmmsg=286268 called.recvmsg=0 msgs.received=31425
Thu May  7 01:49:09 2015: imudp(w0): origin=imudp
called.recvmmsg=571362 called.recvmsg=0 msgs.received=274531
[/code:1txavngd]

And again we see the classic OOM/reset model in this memory graph from the
host:

[img:1txavngd]http://s24.postimg.org/93ho7mmid/stats7.png[/img:1txavngd]

Here is my nice new shiny queue partitioning performance graph, showing the
share of traffic going to each memory queue, and each of their Disk
Assisted queues.  The stacked graph doesn't make much sense when the DA
queue kicks, but at least you can see how things are piling up.

[img:1txavngd]http://s2.postimg.org/6itdpxpu1/stats8.png[/img:1txavngd]

This graph is a total mess but it shows the queue stats for the timeframe. 
Note that the left and right axis do not have the same scale.  The numbers
are probably easier to use to get a clear understanding.

[img:1txavngd]http://s3.postimg.org/d4ovzfx0z/stats9.png[/img:1txavngd]

I'm going to decrease my memory queues from 500k to 200k each, so a total
of 400k messages would be put in RAM, that gives each message a staggering
37KB of RAM.  Let's see if we crash again tomorrow.
I've set the main_queue's queue.dequeueslowdown to 500us and the RELP
windowSize on both RELP actions to 1024.


More information about the rsyslog-notify mailing list