Bug 1701050

Summary: SSH connection hangs on Azure
Product: OpenShift Container Platform Reporter: Alex Crawford <crawford>
Component: RHCOSAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Micah Abbott <miabbott>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bbreard, dustymabe, imcleod, jligon, nstielau
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Azure requires ClientAliveInterval to be set to 180 within the sshd configuration Consequence: When not set ssh connections hang within Azure Fix: Default the sshd config to ClientAliveInterval 180 Result: SSH no longer hangs within Azure
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:28:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alex Crawford 2019-04-17 21:30:58 UTC
Description of problem:

When SSH'ing to an RHCOS host on Azure, the connection times out if it's left idle for too long (about five minutes).

Version-Release number of selected component (if applicable):

How reproducible:


Steps to Reproduce:
1. Create host on Azure
2. SSH to host
3. Wait five minutes

Actual results:

Connection hangs and must be terminated.

Expected results:

Connection is kept alive.

Additional info:

The SSHD configuration disables the ClientAliveInterval. It should be set to 180. Steven Zarkos (MSFT engineer) told me that value years ago and we use that in Container Linux. Making that change in RHCOS also fixes the issue.

Comment 5 Micah Abbott 2019-07-01 18:39:03 UTC
Verified changes are present in RHCOS using 4.2.0-0.nightly-2019-06-30-221852

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-06-30-221852   True        False         3m14s   Cluster version is 4.2.0-0.nightly-2019-06-30-221852
[miabbott@mastershake (container) ~/openshift-cluster-installs/4.2.0-0.nightly-2019-06-30-221852 ]$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-139-116.us-west-2.compute.internal   Ready    worker   12m   v1.14.0+04ae0f405
ip-10-0-141-177.us-west-2.compute.internal   Ready    master   21m   v1.14.0+04ae0f405
ip-10-0-150-236.us-west-2.compute.internal   Ready    worker   12m   v1.14.0+04ae0f405
ip-10-0-151-141.us-west-2.compute.internal   Ready    master   21m   v1.14.0+04ae0f405
ip-10-0-163-151.us-west-2.compute.internal   Ready    worker   12m   v1.14.0+04ae0f405
ip-10-0-167-205.us-west-2.compute.internal   Ready    master   21m   v1.14.0+04ae0f405
[miabbott@mastershake (container) ~/openshift-cluster-installs/4.2.0-0.nightly-2019-06-30-221852 ]$ oc debug node/ip-10-0-139-116.us-west-2.compute.internal
Starting pod/ip-10-0-139-116us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:66e8bca50ff16c7082425b8f42578b69f1d28b08fe62d359451132a3e837735d
              CustomOrigin: Managed by pivot tool
                   Version: 420.8.20190630.0 (2019-06-30T20:53:07Z)

              CustomOrigin: Provisioned from oscontainer
                   Version: 420.8.20190624.0 (2019-06-24T00:25:32Z)
sh-4.4# grep ClientAlive /etc/ssh/sshd_config 
ClientAliveInterval 180
#ClientAliveCountMax 3

Comment 6 errata-xmlrpc 2019-10-16 06:28:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.