[rsyslog-notify] Forum Thread: Re: RSyslog not sending messages - (Mode 'reply')

noreply at adiscon.com noreply at adiscon.com
Fri Jan 23 03:54:16 CET 2015


User: dlang 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25187#p25187

Message: 
----------
[quote:2vv2abik]Any idea why a router would cut the
connection?[/quote:2vv2abik]
if it's a failover pair of routers, they can fail oer. If it's a stateful
packet filter (which most are), there is a timeout that if there is no
traffic in X time it considers the connection closed, if the number of
connections through a router exceed the capability of the router to keep
track of connections it can 'forget' about some (which closes them), all
sorts of things.

[quote:2vv2abik]I've heard this quite a bit, but don't really understand
it. TCP is supposed to be reliable. If a message is sent and an ACK isn't
received, then the sender will just keep sending. [/quote:2vv2abik]

The full data flow is:
Application attempts to deliver message to the sending system TCP stack
  1. error, connection closed. re-open connection and try again
  2. error, no space in the sending system TCP stack sending queue. suspend
the output, wait a bit and try again (when it then works you get the action
resumed message you are seeing
  3. no error, the message is now in the system TCP stack and the
application can get no more feedback, it considers the message sent.

System TCP stack works through it's queue and sends a packet to the remote
machine
  1. remote machine doesn't respond, sending machine re-sends the packet
after a 'short' timeout (20 sec usually)
  2. remote machine hasn't responded for a long time (2 min usually),
system TCP stack considers the connection 'broken' and closes it, throwing
away all the packets in the queue. It has no way of telling the application
about this.
  3. remote machine responds with an ack and puts the message in it's TCP
receive queue.

On the remote machine, the application is supposed to be checking the
system TCP stack receive queue to see if there are messages for it
periodically. If it dies, all the messages in the receive queue are lost
and the system TCP queue will stop sending acks for packets and attempt to
tell the sending system to close the connection because it can't do
anything with the data. There is no way to tell the sending system what
data was acked that wasn't delivered to the application. Rsyslog reads the
data from the system TCP stack and adds it to it's main queue for
processing. If this is a memory or disk assisted queue, the data can still
be lost if rsyslog dies before outputting the message.

TCP only provides reliable delivery if the connection doesn't get closed
and the applications on each end don't exit.

With the application level ack that rsyslog sends once it's added the
message to it's main queue, the sender now has a way of knowing that the
message got through all the jumps in the middle.

action 18 resumed implies that it's blocking. This can be that the
connection is getting closed, or that the machine that you are sending it
to has gotten behind enough to fill up the receiving machines TCP buffers,
and the sending machines TCP buffers, so that the sending TCP stack refuses
to accept the message from the application. This is where impstats info
will really help.

You should also look at the stats on the receiving machine, there could be
problems there.

I agree that from the printouts that you are showing it doesn't look like
there are enough messages to indicate that queues are building up, but the
impstats output will also show how many errors rsyslog had in trying to
output the data.

You do have the watermark settings so that if there are too many messages
in the queue, some will get thrown away to keep from running out of space.
You didn't post the configuration of the receiving system, so i don't know
how that is configured.

what we don't know here is why it's suspending. If you could run it in
debug mode (-dn) and capture the huge amount of output that this will
produce, you will probably see the reason for the action being syspended.

This is one of the places where it would be really helpful to convert to
the new syntax (the action()), it would make everything that's going on
much easier to read and any error messages from rsyslog would be clearer
about what's going wrong with this action

Also, check the sending machine's local logs for any error messages that
rsyslog may be writing. If you can run rsyslog manually instead of through
the system startup scripts, you can see if it's writing anything out to
stderr (most system startup scripts throw this away)

I hope this help you understand why we say that TCP isn't as reliable as
people think.


More information about the rsyslog-notify mailing list