[rsyslog] Losing UDP packages
dave at fly.srk.fer.hr
Thu Dec 16 11:15:17 CET 2010
david at lang.hm wrote:
> I haven't done tests on a recent version, butabout ayearago I
> didextensive testing with UDP (early 5.x days)
> make sure you disable name lookups.
> I found that I was able to keep up with ~gig-E wire speed (380K logs/sec
> with ~250 byte log messages) as long as the peak did not exceed the
> memory queue size. At the time, rsyslog would process messages from the
> queue at between 30K and 80K messages/sec (depending on the
> configuration). I was testing this by using tcpreplay to send the
> packets, 1,000,000 packets per test. In about 100 such tests I did not
> experiance _any_ lost messages.
I believe it can be done without message loss. However, in my setup there
are other programs on the machine which can use a lot of CPU time, so
rsyslog might not be scheduled on a CPU fast enough. There is one CPU with
four cores, one of them will be taken by a SCHED_FIFO process belonging to
clusterware whenever it needs it and I'd like to do the same arrangement
for rsyslog's UDP receiver thread. That leaves two cores for everything
else, which should be perfectly fine. (There is a potential issue with
memory in case the machine starts swapping, but I'm not trying to solve
that right now.)
I also don't have any control over the sending side, meaning it's very
hard for me to figure out what the peak is (or will be, once all senders
Is there any way to get some statistics from rsyslog? It would be nice if
it could write some counters in a file every hour or so. For example: the
number of messages received by each of the input plugins and the number of
messages written by each of the output plugins.
Full SNMP support would be even nicer, but that's reasonably easy to
implement if all the counters exist. (There might be some issues with
Net-SNMP and multithreaded programs, though.)
> I do have network receive buffers set to larger than normal values.
Yes, that would seem necessary.
> So I don't think that the problem is on the receiver side.
Well, my receiver side has other processes competing for the CPU. I think
that can make a difference.
> Rainer has been doing some tests recently on the sending side that may
> indicate some issues. I haven't had time to get my test systems setup
> again to try more tests.
> David Lang
> On Wed, 15 Dec 2010, Dra?en Ka?ar wrote:
>> I have rsyslog 5.6.2 which is a destination for a lot of traffic which is
>> currently using UDP as a transport protocol. We intend to switch some
>> sources to TCP, but UDP will still be used for a large chunk of data and I
>> can't do anything about that.
>> Current tests show that we lose 100-400 lines in every 100000 sent. That
>> isn't much and we could live with it, but I'd like to do something about
>> it, nevertheless.
>> My guess is that the kernel buffer fills up, rsyslog doesn't manage to
>> make room fast enough and then the kernel drops packages. I'm on CentOS
>> 5.5. Kernel log shows lines like:
>> Dec 8 14:44:22 ssm1 kernel: Initalizing network drop monitor service
>> Dec 13 15:09:26 ssm1 kernel: Initalizing network drop monitor service
>> Curiously, the times do not match the ones for which I'm certain that the
>> package loss occured. But even so, the log shows that packet drop is
>> occuring on the system.
>> As far as I know, two things can be done.
>> 1. Increase UDP receive buffer size. That should allow the system to
>> handle the bigger spike load than it's handling now, but that's about it.
>> There will be a spike load which is too large for any given buffer size.
>> I didn't manage to find a configuration option for this. I can increase
>> the system default, but I dislike doing that because it affects all
>> sockets on the system. If rsyslog really doesn't have this configuration
>> option, I'm willing to write a patch.
>> 2. Put a thread that handles UDP socket in real-time class (or the closest
>> equivalent that a given OS has). The POSIX way for this is to use
>> sched_setscheduler() with SCHED_FIFO. There could be other non-portable
>> interfaces of interest (like priocntl() on Solaris), but I won't discuss
>> them right now.
>> The patch should add two configuration parameters:
>> * one to define scheduling policy (SCHED_FIFO, SCHED_RR, etc.), let's
>> call it $<object>QueueThreadSchedulingPolicy (although I'd like a
>> shorter name)
>> * and another to define sched_priority, let's call it
>> In case of SCHED_SPORADIC (and maybe other non-portable policies on some
>> operating systems) there are additional parameters that can be set, so
>> maybe the second parameter would turn out to be the string with several
>> numbers that needs to be parsed. Or there could be additional
>> configuration options introduced. I don't have a position about that and
>> would like to follow the existing rsyslog policy regarding configuration
>> options. Except that I don't know what that policy is, so I'd appreciate
>> if someone can enlighten me.
>> Anyway, the idea is to have a high priority thread (or a small number of
>> threads) sitting on a UDP socket whose sole purpose in life is to
>> transfer the messages from UDP socket to another queue(s), as fast as
>> possible. By default (if $<object>QueueThreadSchedulingPolicy is defined),
>> the number of threads put in that scheduling policy can be
>> $<object>QueueWorkerThreads, ie. the minimum number of worker threads.
>> There could be additional parameter to define the number of such threads,
>> but I'm not sure how to go about it.
>> There could be no imposed limit, so if an administator puts 128 in the
>> config file, 128 threads it is. But this is potentialy dangerous. Or there
>> could be rsyslog imposed restriction that the number of such threads cannot
>> exceed the number defined in $<object>QueueWorkerThreads, which is less
>> dangerous. I'm personally leaning to the second option.
>> Now, this whole thing is dangerous and must be implemented and used with
>> care and understanding. Currently I don't understand rsyslog queues well
>> enough, which is why I'm writing this mail. The UDP thread in FIFO class
>> should be used just to pick the packet from the socket and place it on
>> another action queue. Let's say we have the following configuration:
>> $Ruleset indata
>> $RulesetCreateMainQueue on
>> $UDPServerRun 514
>> $InputUDPServerBindRuleset indata
>> $MainMsgQueueThreadSchedulingPolicy fifo
>> $MainMsgQueueThreadSchedulingPriority 10
>> $ActionName local1
>> $ActionQueueType LinkedList
>> $ActionQueueFileName local1
>> $ActionResumeRetryCount -1
>> $ActionQueueSaveOnShutdown on
>> $ActionQueueDiscardSeverity 8
>> local1.* |/path/to/fifo
>> & /path/to/file
>> If I understand things correctly, there are two queues here. The first one
>> is ruleset indata queue with the UDP input module and the second one is
>> the action queue for the destinations (FIFO and file). I'm not sure about
>> the following:
>> 1. After a $Ruleset directive followed by "$RulesetCreateMainQueue on",
>> are $MainMsgQueue* directives applied to that ruleset queue? It's not very
>> 2. Say we add a TCP listener (or any other input module) to the above
>> configuration and we want UDP processing thread(s) to have SCHED_FIFO
>> priority, but other input handling threads to stay at the default
>> priority. I'm not sure if that can be specified with one ruleset. If two
>> rulesets must be specified, then I suppose the action directives must be
>> duplicated for each ruleset, which isn't very readable and
>> maintenance-friendly. (This is without taking omruleset plugin into
>> 3. The action queue is disk assisted. Which thread writes to disk when the
>> need arises? The one from the ruleset indata queue or the one from the
>> action queue?
> rsyslog mailing list
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
| dave at fly.srk.fer.hr
More information about the rsyslog