Bug 164863 - OpenSSH keepalives do not work correctly when using IPVS
Summary: OpenSSH keepalives do not work correctly when using IPVS
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: openssh
Version: 3.0
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Tomas Mraz
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-08-01 23:28 UTC by Nick Couchman
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-23 15:18:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nick Couchman 2005-08-01 23:28:56 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050715 Firefox/1.0.6 NLD/1.0.6-4.2

Description of problem:
I'm using a few RHEL3 machines clustered together with IPVS to load balance SSH connections.  Users are frequently disconnected from the RHEL3 machines.  Sometimes the disconnects occur during idle times, other times they occur while the users are actively typing on the command line or in an editor, etc.  I get the following messages (repeatedly) in the /var/log/secure log file when I turn the log level to DEBUG:
sshd[3219]: debug1: Got CHANNEL_FAILURE for keepalive

I compiled a new version of ssh (v4.1) and started that.  Now I see the following lines in the /var/log/secure log file:
sshd[23241]: debug1: Got 100/105 for keepalive

(which seems to indicate that the keepalives are working okay, as well as the fact that I haven't seen a disconnect, yet).  It seems that it may only be occurring when the machine is accessed via the IPVS IP address and not directly at its normally assigned address.

Version-Release number of selected component (if applicable):
3.6.1p2-33.30.4

How reproducible:
Sometimes

Steps to Reproduce:
It is hard to describe actual steps or results as the behavior doesn't seem to have a pattern of occurrence.

Additional info:

Comment 1 Tomas Mraz 2005-08-02 07:31:33 UTC
Could you try setting some reasonable value to ClientAliveInterval in
/etc/ssh/sshd_config? This should make the server to send some special data over
the channel if no normal data were sent for the value of the option.

You must use protocol 2 for this setting to have the desired effect.

This should be more reliable than TCP keepalives.


Comment 2 Nick Couchman 2005-08-02 13:37:04 UTC
I had the following keepalive configuration directives set:
ClientAliveInterval 15
ClientAliveCountMax 3

I looked out on the web for information on these directives, and the 
information I found seemed to indicate that the above configuration was fairly 
decent, giving 45 seconds of no response from the client before the server 
closed the connection.  The protocol was also at the default 2,1 setting, and 
the clients are able to communicate with protocol version 2.

Comment 3 Tomas Mraz 2005-08-02 14:50:32 UTC
Hmmm I overlooked this part of your report: "Sometimes the disconnects occur
during idle times, other times they occur while the users are actively typing on
the command line or in an editor, etc."

This means the problem isn't about the keepalives. The "Got CHANNEL_FAILURE for
keepalive" message actually means that they work. This message was changed in
the newer openssh versions.

Could you somehow verify that the problem is fixed in 4.1p1 version (running it
experimentaly for a longer time and so on). Also could you attach here debugging
logs from both client and server from sessions when the connection drops? What
exact failure happens?


Comment 4 Tomas Mraz 2005-11-23 15:18:26 UTC
This problem will be resolved in a future major release of Red Hat Enterprise
Linux. Red Hat does not currently plan to provide a resolution for this in a Red
Hat Enterprise Linux update for currently deployed systems.

With the goal of minimizing risk of change for deployed systems, and in response
to customer and partner requirements, Red Hat takes a conservative approach when
evaluating changes for inclusion in maintenance updates for currently deployed
products. The primary objectives of update releases are to enable new hardware
platform support and to resolve critical defects. 



Note You need to log in before you can comment on or make changes to this bug.