[rsyslog-notify] Forum Thread: Re: RSyslog not sending messages - (Mode 'reply')

noreply at adiscon.com noreply at adiscon.com
Sat Jan 24 07:23:38 CET 2015


User: dlang 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25192#p25192

Message: 
----------
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]if it's a failover pair of routers, they can fail
oer.[/quote:2tk2htvv]

Sure, but that would mean that if the sender sends messages and some are
dropped (no ACK received) then the sender would send again until it
received an ACK, so I don't see how messages could be lost here?
[/quote:2tk2htvv]

when the resent packet hits the new router, it doesn't have the state
needed to know about it and so it ignores the packet (causing a timeout)

This is the normal situation when routers have ACLs on them controlling
what can go through them

[quote="lethalduck":2tk2htvv]Looking at the impstats doc, I put the
following into the rsyslog.conf at the bottom:

[code:2tk2htvv]
# Turn on some internal counters to trouble-shoot missing messages
module(load="impstats"
       interval="600"
       severity="7"
       log.syslog="off"

       # need to turn log stream logging off
       log.file="/var/log/rsyslog-stats.log")
[/code:2tk2htvv]
[/quote:2tk2htvv]
I think it needs to be up at the top not at the end.
[quote="lethalduck":2tk2htvv]Now when I restart rsyslog, I get the
following:

[code:2tk2htvv]rsyslogd-2039:  Could not open output pipe
'/dev/xconsole': No such file or directory [try
http://www.rsyslog.com/e/2039 ][/code:2tk2htvv]

I've never seen this before and it appeared just by adding the snippet
above. Now from what I gather this is because I don't have a /dev/xconsol
but rather a /dev/console.
Any idea why the following is in the rsyslog.conf by default rather than
/dev/console?
[/quote:2tk2htvv]
bad defaults in the distro that have been there forever and never updated.

This error is normally generated, but usually is absorbed by the startup
scripts. It is an error, but it's not related to the problem you are
having.

[quote="lethalduck":2tk2htvv]I change the /dev/xconsole in the rsyslog.conf
to /dev/console and the error goes away. Should it be left as /dev/console
or /dev/xconsole? I'm assuming the former?[/quote:2tk2htvv]

well, it only matters if you have a console up. Historically (20 years ago,
in the era of multi-user systems with serial terminals), there was a system
console that was directly connected to the system. This entry in
syslog.conf was to spit significant messages out to that console so when
the admin came in later for their shift, the message would be on their
display.

I just delete it, and for all the other things that are in there by
default, look at what they are outputting and decide if that's how you want
the data to appear.

I start off with putting everything in /var/log/messages (if something goes
wrong, I want to see all the other logs around it, and grep will let me
ignore what I want to), I then create my own output files based on how I'm
going to process the logs afterwords. I usually do this filtering by
programname, not facility/severity

[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]2. error, no space in the sending system TCP stack sending
queue[/quote:2tk2htvv]

I can't see why the sending queue would be full with my current volume.
[/quote:2tk2htvv]
It would only be full if the machine receiving the logs isn't processing
them. I agree that it seems odd for your low volume, but I'm listing the
possible things that can happen.
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]3. no error, the message is now in the system TCP stack and
the application can get no more feedback, it considers the message
sent[/quote:2tk2htvv]

Last time I was programming with TCP messaging was with WCF. Applications
written with WCF would have insight into the TCP stack, so that when errors
occurred in the TCP stack, the sending application would know about it.
When the sending application sent messages, it didn't just fire and forget,
unless you coded it that way. We had baked in instrumentation that would
tell us what was going on at the transport layer. Does RSyslog not have
anything like this baked in? If not, seems like quite a large over-sight,
yeah?
[/quote:2tk2htvv]
You have some information, but I really doubt that you have as much as you
think you do.

If you have sent 2000 bytes to the socket and the send() calls have all
succeded, how do you tell how many bytes have been sent? how do you tell
how many have been acked? how do you tell if those bytes that have been
acked have been read by the application on the remote machine? When a
connection is closed, how do you tell how many bytes haven't been sent,
haven't been acked, or haven't been read by the application on the far end?

If it is possible to fix this with better insight into the TCP stack,
Rainer would love to do so. but please follow the links in the post he
gave, including those going back to the RFCs and those showing that other
people have also 'discovered' this problem.

[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]2. remote machine hasn't responded for a long time (2 min
usually), system TCP stack considers the connection 'broken' and closes it,
throwing away all the packets in the queue. It has no way of telling the
application about this.[/quote:2tk2htvv]

So the socket API that RSyslog uses (I have no idea which one it is or how
it works) doesn't provide error messages back to the consuming application
(RSyslog)?
In C, if you write to a socket that has failed with send(), you will get
back the number of bytes that were sent. If this does not match the number
of bytes you meant to send, then you have a problem. Is RSyslog not
checking this?
[/quote:2tk2htvv]
The problem isn't when the send() fails (that happens if the queue is
full), it's when the send() succeeds and the bytes are now in the OS TCP
queues. When send() returns, that doesn't mean that the bytes have left the
machine, it means that they have been accepted by the TCP stack and are
queued for delivery. If you do another send() call fast enough, the TCP
stack will combine the data from both send() calls into the packet that
gets sent over the wire.

I think that this is the core misunderstanding that you have.
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]Rsyslog reads the data from the system TCP stack and adds
it to it's main queue for processing. If this is a memory or disk assisted
queue, the data can still be lost if rsyslog dies before outputting the
message[/quote:2tk2htvv]
Right, because it's still in volatile memory right? If it was a disk queue
rather than a disk assisted queue it'd be safe. Correct me if I'm wrong
here.
[/quote:2tk2htvv]
If it's in a disk queue tuned to maximum reliability (which is cripplingly
slow performance wise). Also, don't confuse a pure Disk Queue with a Disk
Assisted Queue.
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]action 18 resumed implies that it's blocking. This can be
that the connection is getting closed, or that the machine that you are
sending it to has gotten behind enough to fill up the receiving machines
TCP buffers, and the sending machines TCP buffers, so that the sending TCP
stack refuses to accept the message from the application. This is where
impstats info will really help.[/quote:2tk2htvv]

Now I'm not sure how to read the impstats log. I wasn't sure of where the
definitive documentation of all the output values where?
[/quote:2tk2htvv]
<!-- m --><a class="postlink"
href="http://www.rsyslog.com/doc/master/configuration/modules/impstats.html">http://www.rsyslog.com/doc/master/confi
... stats.html</a><!-- m -->

the following are shortened URLs (trying to get around the forum limit of 2
URLs per message :-))
rsyslog.com/impstats-analyzer/
rsyslog.com/rsyslog-statistic-counter/
rsyslog.com/rsyslog-statistic-counter-actions/
rsyslog.com/rsyslog-statistic-counter-queues/
rsyslog.com/rsyslog-statistic-counter-plugin-impstats/

[quote="lethalduck":2tk2htvv]
In saying that, I'm seeing some quite interesting patterns around the time
we get the two missing messages and the [code:2tk2htvv]action 'action 18'
resumed[/code:2tk2htvv]
I'm not sure what the action numbers refer to in the output of impstats.
[/quote:2tk2htvv]
They are numbered in the order that they appear in the file. If you use the
new action() syntax, you can add an entry "name"="something" that changes
the labels in the stats from actionX to "something"

In this case, you know there is something going wrong with action 18, so
that is what I would be most interested in.
[quote="lethalduck":2tk2htvv]
action 2 and 15 at 06:18 has 3 more processed than the previous impstats
runs. Any idea what this mean?
imuxsock: submitted also has 3 more than the previous impstats runs. Any
idea what this means?
action 17, 18 and enqueued at 06:18 has 5 more processed than the previous
impstats runs. Any idea what this means?
The main Q also has 5 more than previous impstats runs. Any idea what this
means?
[/quote:2tk2htvv]
since you didn't configure impstats to reset the counters each time it's
run, these numbers are cumulative since rsyslog started.
[quote="lethalduck":2tk2htvv]
As I'm unfamiliar with this output... and I'm not sure if there are any
other relevant numbers here? What is this telling us?
I would have thought that at the impstats 06:08 run which was after the
first missed message (which should have been logged to papertrail at
06:06:15 would have showed us some change in the pattern, but we don't see
the pattern change until the subsequent impstats run?

[quote:2tk2htvv]You should also look at the stats on the receiving machine,
there could be problems there.[/quote:2tk2htvv]

I don't have access to papertrail's servers. Just to the web UI that they
expose. I've asked them for some feedback on this though.
[/quote:2tk2htvv]
Ok, I have been misunderstanding things here. I thought you were sending
logs from rsyslog to rsyslog. If you are sending from rsyslog to someone
else, and have no way to look at their servers, it makes things MUCH more
complicated. You are going to end up having to do a tcpdump of the
connection and look at the data that was sent to see if it was sent from
your system, acked by their system, or what.

Given how widely rsyslog is used, I would strongly suspect a problem on
their end receiving the data before I would expect a problem in rsyslog
sending the data (with the acknowlegement of sources of loss with any TCP
sender that we've been talking about)
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]the impstats output will also show how many errors rsyslog
had in trying to output the data.[/quote:2tk2htvv]

I didn't see any errors. Am I misinterpreting the output or is there simply
not enough evidence of errors?
[/quote:2tk2htvv]
I don't see any errors in the snippet you attached. Are you sure that this
includes time after the logs talking about action 18 being resumed? I would
expect to see the action18 line with non-zero values in the time after that
log entry was made.

In the timeframe provided, the queue size hit a max of 2, both on the main
queue and on the seprate queue for action 18, but it had no errors and was
never syspended or resumed (and a total of 141 messages processed)
[quote="lethalduck":2tk2htvv]
[quote:2tk2htvv]This is one of the places where it would be really helpful
to convert to the new syntax (the action()), it would make everything
that's going on much easier to read and any error messages from rsyslog
would be clearer about what's going wrong with this action[/quote:2tk2htvv]

Any more detail around this "new syntax"?
1) What is it?
2) How does it work?
3) What does it give us?
4) What do you mean by convert to it?
5) How do we convert to it?
[/quote:2tk2htvv]
Starting with v6, rsyslog introduced a new way of specifying outputs.
Instead of setting a bunch of variables, some of which persist for the rest
of the config, and some of which only apply to the next action, and some of
which override any previous definition of the same variable, you instead
define an action() where all the parameters for the action are defined
inside the action() call

for example, per <!-- m --><a class="postlink"
href="http://www.rsyslog.com/doc/master/configuration/modules/omfile.html">http://www.rsyslog.com/doc/master/confi
... mfile.html</a><!-- m -->
[code:2tk2htvv]
$DirCreateMode 0700
$FileCreateMode 0644
*.* /var/log/messages
[/code:2tk2htvv]
vs
[code:2tk2htvv]
action(type="omfile" dirCreateMode="0700" FileCreateMode="0644"
File="/var/log/messages")
[/code:2tk2htvv]
with your config for action 18 there is are a lot more parameters obviously
;-)
look at <!-- m --><a class="postlink"
href="http://www.rsyslog.com/doc/omfwd.html">http://www.rsyslog.com/doc/omfwd.html</a><!--
m --> for the available parameters

to convert the config, change the outputs to action() calls. In this case
you won't win much clarity by changing the inputs and module load calls,
but you could do that as well. the nice thing is that you con't have to
convert everything. Use the new syntax when it makes things clearer, and
stick with the old syntax when it makes it clearer.

[quote="lethalduck":2tk2htvv]There doesn't appear to be any errors in the
syslog. See previously attached log file. I just checked this mornings one
where we have the missing messages again. Same deal, no errors. rsyslog
just says it sent the two missing messages. Papertrail never got them, or
at least that's what the papertrail web UI says.
Nothing unexpected in /var/log/messages or
/var/log/daemon.log.[/quote:2tk2htvv]
As i note above, without being able to see what's going on with the
receiving system, it gets a lot uglier.
When you do the tcpdump, you will want to do -s 0 so that you capture all
the data, not just the default of the first 200 bytes of each packet.


More information about the rsyslog-notify mailing list