[rsyslog-notify] Forum Thread: Dequeue Perfomance - (Mode 'post')
noreply at adiscon.com
noreply at adiscon.com
Sat May 2 05:13:27 CEST 2015
User: kamermans
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25461#p25461
Message:
----------
Hi,
I am using rsyslog on several frontend webservers (aka 'webservers') that
forward requests via UDP to an rsyslog log aggregator (aka 'log-agg'), this
aggregator then forwards all the logs via RELP to our central rsyslog log
collector (aka 'collector').
At peak, log-agg is receiving 3000 req/sec on UDP and forwarding them our
to RELP at the same rate. Most of the time we receive the log entries
within 1 second of them being emitted by the webservers, so all is well.
When there is an interruption between log-agg and the collector, however,
log-agg is forced to spool up messages into its 30GB of RAM and 70GB of
dedicated spool disk space (using LinkedList). After a short network
outage, log-agg starts spooling things like crazy into RAM until it's full,
then down to disk.
The problem, however, is that today's outage only lasted one minute or so,
but it took 5.5 hours to dequeue, and a cursory look at my downstream stats
suggests that some data for about 30 minutes or so is completely gone.
The dequeuing issue is my main concern, so I would like some advice as to
what may be causing this, or how I can tweak the settings to causing
dequeuing to happen as rapidly as possible (btw, my connection is 50Mbps
up/down, so that's not the bottleneck).
Here are some performance graphs from the rsyslog stats module:
This shows log-agg on the left and our collector on the right, note that
this is in messages per 5 minutes, so 600K is actually 2,000/second. The
brief outage was at 11:00am.
[img:2nfqfcgd]http://s24.postimg.org/9oe5qbtc5/stats1.png[/img:2nfqfcgd]
Here you can see the following graphs:
1. Memory usage reported by the system
2. CPU usage reported by the system
3. Memory vs CPU usage reported by rsyslog
[img:2nfqfcgd]http://s14.postimg.org/sv46ejkrl/stats2.png[/img:2nfqfcgd]
And here is the anonymized config:
[code:2nfqfcgd]
# Stats logging must be first!
module( load="impstats"
interval="300"
severity="7"
log.syslog="off"
log.file="/var/log/rsyslog-stats.log")
# End stats logging
# Can't figure out how to set these in RainerScript format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$RepeatedMsgReduction off
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
$WorkDirectory /var/spool/rsyslog
# Super-verbose, be careful!
# $DebugFile /var/log/rsyslog-debug.log
# $DebugLevel 2
$Ruleset RSYSLOG_DefaultRuleset
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none -/var/log/syslog
kern.* -/var/log/kern.log
mail.* -/var/log/mail.log
*.emerg :omusrmsg:*
# We are doing everything in this one file
# $IncludeConfig /etc/rsyslog.d/*.conf
module( load="imklog"
permitNonKernelFacility="on"
)
module( load="imuxsock")
module( load="imudp"
threads="2"
timeRequery="8"
batchSize="256"
)
module( load="imrelp" )
module( load="omrelp" )
ruleset(name="forwardRelp"
queue.type="LinkedList"
queue.fileName="relp_queue"
queue.spoolDirectory="/var/spool/rsyslog/relp_queue"
queue.maxDiskSpace="70g"
queue.maxFileSize="1g"
queue.size="172800000" # two days at 1000/sec
queue.dequeueBatchSize="512"
queue.saveOnShutdown="on"
queue.workerThreads="6"
queue.workerThreadMinimumMessages="512"
) {
action( type="omrelp"
target="10.2.3.4"
port="15140"
# action.retryCount="-1"
)
# Use this instead of the above to log locally
# action( type="omfile"
# file="/mnt/log_backup/local-backup.log"
# ioBufferSize="64k"
# flushOnTXEnd="off"
# asyncWriting="on"
# )
}
input( type="imuxsock"
socket="/dev/log"
)
# UDP Logging
input( type="imudp"
port="514"
ruleset="forwardRelp"
)
[/code:2nfqfcgd]
Thanks in advance for your help!
More information about the rsyslog-notify
mailing list