Red Hat Bugzilla – Bug 164863
OpenSSH keepalives do not work correctly when using IPVS
Last modified: 2007-11-30 17:07:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050715 Firefox/1.0.6 NLD/1.0.6-4.2
Description of problem:
I'm using a few RHEL3 machines clustered together with IPVS to load balance SSH connections. Users are frequently disconnected from the RHEL3 machines. Sometimes the disconnects occur during idle times, other times they occur while the users are actively typing on the command line or in an editor, etc. I get the following messages (repeatedly) in the /var/log/secure log file when I turn the log level to DEBUG:
sshd: debug1: Got CHANNEL_FAILURE for keepalive
I compiled a new version of ssh (v4.1) and started that. Now I see the following lines in the /var/log/secure log file:
sshd: debug1: Got 100/105 for keepalive
(which seems to indicate that the keepalives are working okay, as well as the fact that I haven't seen a disconnect, yet). It seems that it may only be occurring when the machine is accessed via the IPVS IP address and not directly at its normally assigned address.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
It is hard to describe actual steps or results as the behavior doesn't seem to have a pattern of occurrence.
Could you try setting some reasonable value to ClientAliveInterval in
/etc/ssh/sshd_config? This should make the server to send some special data over
the channel if no normal data were sent for the value of the option.
You must use protocol 2 for this setting to have the desired effect.
This should be more reliable than TCP keepalives.
I had the following keepalive configuration directives set:
I looked out on the web for information on these directives, and the
information I found seemed to indicate that the above configuration was fairly
decent, giving 45 seconds of no response from the client before the server
closed the connection. The protocol was also at the default 2,1 setting, and
the clients are able to communicate with protocol version 2.
Hmmm I overlooked this part of your report: "Sometimes the disconnects occur
during idle times, other times they occur while the users are actively typing on
the command line or in an editor, etc."
This means the problem isn't about the keepalives. The "Got CHANNEL_FAILURE for
keepalive" message actually means that they work. This message was changed in
the newer openssh versions.
Could you somehow verify that the problem is fixed in 4.1p1 version (running it
experimentaly for a longer time and so on). Also could you attach here debugging
logs from both client and server from sessions when the connection drops? What
exact failure happens?
This problem will be resolved in a future major release of Red Hat Enterprise
Linux. Red Hat does not currently plan to provide a resolution for this in a Red
Hat Enterprise Linux update for currently deployed systems.
With the goal of minimizing risk of change for deployed systems, and in response
to customer and partner requirements, Red Hat takes a conservative approach when
evaluating changes for inclusion in maintenance updates for currently deployed
products. The primary objectives of update releases are to enable new hardware
platform support and to resolve critical defects.