From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050715 Firefox/1.0.6 NLD/1.0.6-4.2 Description of problem: I'm using a few RHEL3 machines clustered together with IPVS to load balance SSH connections. Users are frequently disconnected from the RHEL3 machines. Sometimes the disconnects occur during idle times, other times they occur while the users are actively typing on the command line or in an editor, etc. I get the following messages (repeatedly) in the /var/log/secure log file when I turn the log level to DEBUG: sshd[3219]: debug1: Got CHANNEL_FAILURE for keepalive I compiled a new version of ssh (v4.1) and started that. Now I see the following lines in the /var/log/secure log file: sshd[23241]: debug1: Got 100/105 for keepalive (which seems to indicate that the keepalives are working okay, as well as the fact that I haven't seen a disconnect, yet). It seems that it may only be occurring when the machine is accessed via the IPVS IP address and not directly at its normally assigned address. Version-Release number of selected component (if applicable): 3.6.1p2-33.30.4 How reproducible: Sometimes Steps to Reproduce: It is hard to describe actual steps or results as the behavior doesn't seem to have a pattern of occurrence. Additional info:
Could you try setting some reasonable value to ClientAliveInterval in /etc/ssh/sshd_config? This should make the server to send some special data over the channel if no normal data were sent for the value of the option. You must use protocol 2 for this setting to have the desired effect. This should be more reliable than TCP keepalives.
I had the following keepalive configuration directives set: ClientAliveInterval 15 ClientAliveCountMax 3 I looked out on the web for information on these directives, and the information I found seemed to indicate that the above configuration was fairly decent, giving 45 seconds of no response from the client before the server closed the connection. The protocol was also at the default 2,1 setting, and the clients are able to communicate with protocol version 2.
Hmmm I overlooked this part of your report: "Sometimes the disconnects occur during idle times, other times they occur while the users are actively typing on the command line or in an editor, etc." This means the problem isn't about the keepalives. The "Got CHANNEL_FAILURE for keepalive" message actually means that they work. This message was changed in the newer openssh versions. Could you somehow verify that the problem is fixed in 4.1p1 version (running it experimentaly for a longer time and so on). Also could you attach here debugging logs from both client and server from sessions when the connection drops? What exact failure happens?
This problem will be resolved in a future major release of Red Hat Enterprise Linux. Red Hat does not currently plan to provide a resolution for this in a Red Hat Enterprise Linux update for currently deployed systems. With the goal of minimizing risk of change for deployed systems, and in response to customer and partner requirements, Red Hat takes a conservative approach when evaluating changes for inclusion in maintenance updates for currently deployed products. The primary objectives of update releases are to enable new hardware platform support and to resolve critical defects.