[rsyslog-notify] Forum Thread: Re: RSyslog not sending messages - (Mode 'reply')

noreply at adiscon.com noreply at adiscon.com
Sun Jan 25 03:57:36 CET 2015


User: dlang 
Forumlink: http://kb.monitorware.com/viewtopic.php?p=25197#p25197

Message: 
----------
[quote="lethalduck":2d1m79gi][quote:2d1m79gi]when the resent packet hits
the new router, it doesn't have the state needed to know about it and so it
ignores the packet (causing a timeout)[/quote:2d1m79gi]

Ah, of course, unless the fresh router shares the state also, which from
memory is quite common?
[/quote:2d1m79gi]
yes and no, you can set routers to share state, it's not as common as you
would think, and it's also not as reliable as you would think.

If you think about what the router has to do to handle a packet and update
it's state, you will realize that it doesn't want to hold each packet while
it sends a state update to the other router and receives and ack that the
secondary router received the state. Instead, the router updates it's state
and queues an update to be sent to the other router in the near future, and
that update gets combined with many other updates for efficiency.

So the replicated state is always going to lag behind. If the connection
isn't very busy, you have good odds that the state was replicated before
the router failed, but it's not as reliable as the marketing blurbs make it
sound.

[quote="lethalduck":2d1m79gi]
[quote:2d1m79gi]but please follow the links in the post he gave, including
those going back to the RFCs and those showing that other people have also
'discovered' this problem.[/quote:2d1m79gi]

At this stage until it's determined that papertrail is at fault I can't do
much about using RELP. If they are not at fault, then it's something on my
end, which I still need to find out.
[/quote:2d1m79gi]
I was meaning to read the links to understand the nuances around how TCP
could still loose data.
[quote="lethalduck":2d1m79gi]
I've attached the relevant impstatsOutput1.txt with some comments.
I've attached the relevant rsyslog-debug1.log. What stands out to me in
this is the [code:2d1m79gi]TCPSendBuf error -2078, destruct TCP
Connection![/code:2d1m79gi] just after Jan 25 07:46:24 or maybe the
[code:2d1m79gi]unexpected GnuTLS error [/code:2d1m79gi] just above it.
[/quote:2d1m79gi]
I think you just found the smoking gun.

something in GnuTLS is not happy, so the connection is getting closed, and
that's when data loss is going to happen.

what version of rsyslog and GnuTLS are you using. I know that there has
been work in that area not too long ago. Unless you are using a pretty
current version (7.6 may be current enough, or it may need to be 8.x) you
are behind in some updates to nasty bugs in GnuTLS and some fixes in
rsyslog

[quote="lethalduck":2d1m79gi]Now as far as I can see from wireshark, the
event I'm expecting to see in the papertrail web UI for 07:36:24 is sent
and acknowledged by papertrail. The packets look exactly the same as the
successful 07:26:24 event apart from the obvious sequence numbers.
For the event I'm expecting to see in papertrail at 07:46:24 a new
connection is being established (TCP handshake) preceding by a DNS query.
My server then sends 7 TCP Dup ACKs to papertrail.[/quote:2d1m79gi]And the
trouble is that since it's encrypted, you can't actually see what data got
sent and acked vs what data got sent and not acked. The GnuTLS handshake
adds a lot of extra data over the wire.

I know I've seen a lot of people struggling with GnuTLS headaches over the
last year, and it's not something I've had to deal with (I work over either
internal networks, or WAN links that are already encrypted)

Since it starts off working and then fails, i would guess that there is
some subtle incompatibility between the version of GnuTLS on your machine
and what's on the other end. You start off working and then their system
sends a message that your GnuTLS doesn't understand, so your system kills
the connection and reconnects.


More information about the rsyslog-notify mailing list