[rsyslog-notify] Forum Thread: Re: RSyslog not sending messages - (Mode 'reply')

noreply at adiscon.com noreply at adiscon.com
Sat Jan 24 06:11:07 CET 2015


User: lethalduck 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25191#p25191

Message: 
----------
First of all, thanks a lot for your time and explanations.

@rgerhards Thanks for that and yeah, I've read that before. Unfortunately
papertrail doesn't support RELP and they were the easiest and must feature
full service to ship event messages to real-time.
I've had and still am having lengthy discussions with them about why they
don't. Pretty much the answer is that no one else is experiencing any
message loss. My reply to that is "well they're probably not actually doing
comparisons on what is sent and what is received". The fact that my system
is sending such a low volume of messages makes it easier to see.

[quote:34y831r4]if it's a failover pair of routers, they can fail
oer.[/quote:34y831r4]

Sure, but that would mean that if the sender sends messages and some are
dropped (no ACK received) then the sender would send again until it
received an ACK, so I don't see how messages could be lost here?

Looking at the impstats doc, I put the following into the rsyslog.conf at
the bottom:

[code:34y831r4]
# Turn on some internal counters to trouble-shoot missing messages
module(load="impstats"
       interval="600"
       severity="7"
       log.syslog="off"

       # need to turn log stream logging off
       log.file="/var/log/rsyslog-stats.log")
[/code:34y831r4]

Now when I restart rsyslog, I get the following:

[code:34y831r4]rsyslogd-2039:  Could not open output pipe
'/dev/xconsole': No such file or directory [try
http://www.rsyslog.com/e/2039 ][/code:34y831r4]

I've never seen this before and it appeared just by adding the snippet
above. Now from what I gather this is because I don't have a /dev/xconsol
but rather a /dev/console.
Any idea why the following is in the rsyslog.conf by default rather than
/dev/console?

[code:34y831r4]
daemon.*;mail.*;\
        news.err;\
        *.=debug;*.=info;\
        *.=notice;*.=warn       |/dev/xconsole

[/code:34y831r4]

I see the /etc/init.d/rsyslog has a create_xconsole function. The system
that rsyslog is running on is a server, any idea why we have xconsole all
over the place? Inside this function it's dealing with /dev/xconsole which
again doesn't exist?
I change the /dev/xconsole in the rsyslog.conf to /dev/console and the
error goes away. Should it be left as /dev/console or /dev/xconsole? I'm
assuming the former?

[quote:34y831r4]2. error, no space in the sending system TCP stack sending
queue[/quote:34y831r4]

I can't see why the sending queue would be full with my current volume.

[quote:34y831r4]3. no error, the message is now in the system TCP stack and
the application can get no more feedback, it considers the message
sent[/quote:34y831r4]

Last time I was programming with TCP messaging was with WCF. Applications
written with WCF would have insight into the TCP stack, so that when errors
occurred in the TCP stack, the sending application would know about it.
When the sending application sent messages, it didn't just fire and forget,
unless you coded it that way. We had baked in instrumentation that would
tell us what was going on at the transport layer. Does RSyslog not have
anything like this baked in? If not, seems like quite a large over-sight,
yeah?

[quote:34y831r4]2. remote machine hasn't responded for a long time (2 min
usually), system TCP stack considers the connection 'broken' and closes it,
throwing away all the packets in the queue. It has no way of telling the
application about this.[/quote:34y831r4]

So the socket API that RSyslog uses (I have no idea which one it is or how
it works) doesn't provide error messages back to the consuming application
(RSyslog)?
In C, if you write to a socket that has failed with send(), you will get
back the number of bytes that were sent. If this does not match the number
of bytes you meant to send, then you have a problem. Is RSyslog not
checking this?

[quote:34y831r4]On the remote machine, the application is supposed to be
checking the system TCP stack receive queue to see if there are messages
for it periodically. If it dies, all the messages in the receive queue are
lost and the system TCP queue will stop sending acks for packets and
attempt to tell the sending system to close the connection because it can't
do anything with the data. There is no way to tell the sending system what
data was acked that wasn't delivered to the application.[/quote:34y831r4]

Gotcha. This makes sense.

[quote:34y831r4]Rsyslog reads the data from the system TCP stack and adds
it to it's main queue for processing. If this is a memory or disk assisted
queue, the data can still be lost if rsyslog dies before outputting the
message[/quote:34y831r4]
Right, because it's still in volatile memory right? If it was a disk queue
rather than a disk assisted queue it'd be safe. Correct me if I'm wrong
here.

[quote:34y831r4]action 18 resumed implies that it's blocking. This can be
that the connection is getting closed, or that the machine that you are
sending it to has gotten behind enough to fill up the receiving machines
TCP buffers, and the sending machines TCP buffers, so that the sending TCP
stack refuses to accept the message from the application. This is where
impstats info will really help.[/quote:34y831r4]

Now I'm not sure how to read the impstats log. I wasn't sure of where the
definitive documentation of all the output values where?
In saying that, I'm seeing some quite interesting patterns around the time
we get the two missing messages and the [code:34y831r4]action 'action 18'
resumed[/code:34y831r4]
I'm not sure what the action numbers refer to in the output of impstats.
action 2 and 15 at 06:18 has 3 more processed than the previous impstats
runs. Any idea what this mean?
imuxsock: submitted also has 3 more than the previous impstats runs. Any
idea what this means?
action 17, 18 and enqueued at 06:18 has 5 more processed than the previous
impstats runs. Any idea what this means?
The main Q also has 5 more than previous impstats runs. Any idea what this
means?
As I'm unfamiliar with this output... and I'm not sure if there are any
other relevant numbers here? What is this telling us?
I would have thought that at the impstats 06:08 run which was after the
first missed message (which should have been logged to papertrail at
06:06:15 would have showed us some change in the pattern, but we don't see
the pattern change until the subsequent impstats run?

[quote:34y831r4]You should also look at the stats on the receiving machine,
there could be problems there.[/quote:34y831r4]

I don't have access to papertrail's servers. Just to the web UI that they
expose. I've asked them for some feedback on this though.

[quote:34y831r4]the impstats output will also show how many errors rsyslog
had in trying to output the data.[/quote:34y831r4]

I didn't see any errors. Am I misinterpreting the output or is there simply
not enough evidence of errors?

[quote:34y831r4]You do have the watermark settings so that if there are too
many messages in the queue, some will get thrown away to keep from running
out of space. You didn't post the configuration of the receiving system, so
i don't know how that is configured.[/quote:34y831r4]

>From what I'm seeing, it doesn't look like there are messages sitting in
the queue when impstats runs. I doubt very much that messages would get
thrown away in this case as the ActionQueueDiscardMark is so high (97500)?
I don't have access to the receiving system (papertrail) config, although I
have expressed the interest in knowing how they're config looks. My guess
is that they don't use RSyslog, so it may not mean much to us, but I'll
keep pushing.

[quote:34y831r4]what we don't know here is why it's suspending. If you
could run it in debug mode (-dn) and capture the huge amount of output that
this will produce, you will probably see the reason for the action being
syspended.[/quote:34y831r4]
[quote:34y831r4]startup scripts, you can see if it's writing anything out
to stderr (most system startup scripts throw this away)[/quote:34y831r4]

Will do. Will let you know what we find.

[quote:34y831r4]This is one of the places where it would be really helpful
to convert to the new syntax (the action()), it would make everything
that's going on much easier to read and any error messages from rsyslog
would be clearer about what's going wrong with this action[/quote:34y831r4]

Any more detail around this "new syntax"?
1) What is it?
2) How does it work?
3) What does it give us?
4) What do you mean by convert to it?
5) How do we convert to it?

[quote:34y831r4]Also, check the sending machine's local logs for any error
messages that rsyslog may be writing.[/quote:34y831r4]

There doesn't appear to be any errors in the syslog. See previously
attached log file. I just checked this mornings one where we have the
missing messages again. Same deal, no errors. rsyslog just says it sent the
two missing messages. Papertrail never got them, or at least that's what
the papertrail web UI says.
Nothing unexpected in /var/log/messages or /var/log/daemon.log.


More information about the rsyslog-notify mailing list