[rsyslog] Losing UDP packages
dave at fly.srk.fer.hr
Thu Dec 16 17:48:38 CET 2010
david at lang.hm wrote:
> On Thu, 16 Dec 2010, Dra?en Ka?ar wrote:
>> I believe it can be done without message loss. However, in my setup there
>> are other programs on the machine which can use a lot of CPU time, so
>> rsyslog might not be scheduled on a CPU fast enough. There is one CPU with
>> four cores, one of them will be taken by a SCHED_FIFO process belonging to
>> clusterware whenever it needs it and I'd like to do the same arrangement
>> for rsyslog's UDP receiver thread. That leaves two cores for everything
>> else, which should be perfectly fine. (There is a potential issue with
>> memory in case the machine starts swapping, but I'm not trying to solve
>> that right now.)
> this is where increasing the kernel buffers can save you by giving you
> more leeway in when rsyslog _needs_ cpu to avoid loss.
> the other thing that you can do is to boost the priority of rsyslog with
Using SCHED_FIFO scheduling class should be better than boosting nice
value. SCHED_FIFO is basically a real-time scheduling class. If the thread
is runnable, it gets the CPU immediately (unless there are other RT
threads with higher priority) and stays on it as long as it wants. I'm not
sure if Linux kernel handles priority inversion properly, but even if it
doesn't, I hope that won't be a problem.
> ideally you would do this only to the thread that is handling the
> UDP reception, but I don't know a good way to find the pid of that
I suppose it can be done, but pthread_setschedparam(pthread_self(),
SCHED_FIFO, ...) is portable, simpler and better for my purposes. Also
slightly dangerous, if one doesn't have enough processors for all RT
>> Is there any way to get some statistics from rsyslog? It would be nice if
>> it could write some counters in a file every hour or so. For example: the
>> number of messages received by each of the input plugins and the number of
>> messages written by each of the output plugins.
> there is a stats module that was created fairly recently.
Thanks, I'll take a look.
>> Well, my receiver side has other processes competing for the CPU. I think
>> that can make a difference.
> one other performance tweak, there is an option to tell rsyslog not to
> lookup the time for every message, but only do it every N messages when
> there are alog of messages to process. setting this can help (the biggest
> win is at the beginning, 2 saves 1/2 the lookups of 1, 4 saves 3/4 the
> lookups, 10 saves 9/10 the lookups, etc)
I know, I've done that already.
> David Lang
>>> Rainer has been doing some tests recently on the sending side that may
>>> indicate some issues. I haven't had time to get my test systems setup
>>> again to try more tests.
>>> David Lang
>>> On Wed, 15 Dec 2010, Dra?en Ka?ar wrote:
>>>> I have rsyslog 5.6.2 which is a destination for a lot of traffic which is
>>>> currently using UDP as a transport protocol. We intend to switch some
>>>> sources to TCP, but UDP will still be used for a large chunk of data and I
>>>> can't do anything about that.
>>>> Current tests show that we lose 100-400 lines in every 100000 sent. That
>>>> isn't much and we could live with it, but I'd like to do something about
>>>> it, nevertheless.
>>>> My guess is that the kernel buffer fills up, rsyslog doesn't manage to
>>>> make room fast enough and then the kernel drops packages. I'm on CentOS
>>>> 5.5. Kernel log shows lines like:
>>>> Dec 8 14:44:22 ssm1 kernel: Initalizing network drop monitor service
>>>> Dec 13 15:09:26 ssm1 kernel: Initalizing network drop monitor service
>>>> Curiously, the times do not match the ones for which I'm certain that the
>>>> package loss occured. But even so, the log shows that packet drop is
>>>> occuring on the system.
>>>> As far as I know, two things can be done.
>>>> 1. Increase UDP receive buffer size. That should allow the system to
>>>> handle the bigger spike load than it's handling now, but that's about it.
>>>> There will be a spike load which is too large for any given buffer size.
>>>> I didn't manage to find a configuration option for this. I can increase
>>>> the system default, but I dislike doing that because it affects all
>>>> sockets on the system. If rsyslog really doesn't have this configuration
>>>> option, I'm willing to write a patch.
>>>> 2. Put a thread that handles UDP socket in real-time class (or the closest
>>>> equivalent that a given OS has). The POSIX way for this is to use
>>>> sched_setscheduler() with SCHED_FIFO. There could be other non-portable
>>>> interfaces of interest (like priocntl() on Solaris), but I won't discuss
>>>> them right now.
>>>> The patch should add two configuration parameters:
>>>> * one to define scheduling policy (SCHED_FIFO, SCHED_RR, etc.), let's
>>>> call it $<object>QueueThreadSchedulingPolicy (although I'd like a
>>>> shorter name)
>>>> * and another to define sched_priority, let's call it
>>>> In case of SCHED_SPORADIC (and maybe other non-portable policies on some
>>>> operating systems) there are additional parameters that can be set, so
>>>> maybe the second parameter would turn out to be the string with several
>>>> numbers that needs to be parsed. Or there could be additional
>>>> configuration options introduced. I don't have a position about that and
>>>> would like to follow the existing rsyslog policy regarding configuration
>>>> options. Except that I don't know what that policy is, so I'd appreciate
>>>> if someone can enlighten me.
>>>> Anyway, the idea is to have a high priority thread (or a small number of
>>>> threads) sitting on a UDP socket whose sole purpose in life is to
>>>> transfer the messages from UDP socket to another queue(s), as fast as
>>>> possible. By default (if $<object>QueueThreadSchedulingPolicy is defined),
>>>> the number of threads put in that scheduling policy can be
>>>> $<object>QueueWorkerThreads, ie. the minimum number of worker threads.
>>>> There could be additional parameter to define the number of such threads,
>>>> but I'm not sure how to go about it.
>>>> There could be no imposed limit, so if an administator puts 128 in the
>>>> config file, 128 threads it is. But this is potentialy dangerous. Or there
>>>> could be rsyslog imposed restriction that the number of such threads cannot
>>>> exceed the number defined in $<object>QueueWorkerThreads, which is less
>>>> dangerous. I'm personally leaning to the second option.
>>>> Now, this whole thing is dangerous and must be implemented and used with
>>>> care and understanding. Currently I don't understand rsyslog queues well
>>>> enough, which is why I'm writing this mail. The UDP thread in FIFO class
>>>> should be used just to pick the packet from the socket and place it on
>>>> another action queue. Let's say we have the following configuration:
>>>> $Ruleset indata
>>>> $RulesetCreateMainQueue on
>>>> $UDPServerRun 514
>>>> $InputUDPServerBindRuleset indata
>>>> $MainMsgQueueThreadSchedulingPolicy fifo
>>>> $MainMsgQueueThreadSchedulingPriority 10
>>>> $ActionName local1
>>>> $ActionQueueType LinkedList
>>>> $ActionQueueFileName local1
>>>> $ActionResumeRetryCount -1
>>>> $ActionQueueSaveOnShutdown on
>>>> $ActionQueueDiscardSeverity 8
>>>> local1.* |/path/to/fifo
>>>> & /path/to/file
>>>> If I understand things correctly, there are two queues here. The first one
>>>> is ruleset indata queue with the UDP input module and the second one is
>>>> the action queue for the destinations (FIFO and file). I'm not sure about
>>>> the following:
>>>> 1. After a $Ruleset directive followed by "$RulesetCreateMainQueue on",
>>>> are $MainMsgQueue* directives applied to that ruleset queue? It's not very
>>>> 2. Say we add a TCP listener (or any other input module) to the above
>>>> configuration and we want UDP processing thread(s) to have SCHED_FIFO
>>>> priority, but other input handling threads to stay at the default
>>>> priority. I'm not sure if that can be specified with one ruleset. If two
>>>> rulesets must be specified, then I suppose the action directives must be
>>>> duplicated for each ruleset, which isn't very readable and
>>>> maintenance-friendly. (This is without taking omruleset plugin into
>>>> 3. The action queue is disk assisted. Which thread writes to disk when the
>>>> need arises? The one from the ruleset indata queue or the one from the
>>>> action queue?
>>> rsyslog mailing list
> rsyslog mailing list
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
| dave at fly.srk.fer.hr
More information about the rsyslog