Bug 2008339
| Summary: | In openssh 8.0p1-6.el8_4.2 SSH session disconnects after setting Server Aliver and Client Alive parameters recommended in open-scap rules | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Apurbita Mukherjee <apmukher> |
| Component: | openssh | Assignee: | Dmitry Belyavskiy <dbelyavs> |
| Status: | CLOSED DUPLICATE | QA Contact: | BaseOS QE Security Team <qe-baseos-security> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.4 | CC: | dbelyavs, ggasparb, jjelen, mhaicman, nido1234, rmetrich, wsato |
| Target Milestone: | rc | Keywords: | Bugfix, Triaged |
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-09 11:27:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 2
Jakub Jelen
2021-09-29 13:59:22 UTC
Looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1952411 to me Please have a look at the following support case: https://access.redhat.com/support/cases/#/case/03038430 I try to summarize quickly what we experience: === Problem on RHEL 8 Servers: sshd sessions are closed if clients execute scripts / commands that produce regularly output and run for a long time. === On Server side use openssh-server-8.0p1-6.el8_4.2.x86_64 and the following 2 options in /etc/ssh/sshd_config --- clientaliveinterval 600 clientalivecountmax 0 --- On Client side use openssh-clients-8.0p1-6.el8_4.2.x86_64 with the following 2 options in /etc/ssh/ssh_config --- ServerAliveInterval 60 ServerAliveCountMax 5 --- We use "clientalivecountmax 0" because this is also recommended by https://static.open-scap.org/ssg-guides/ssg-rhel8-guide-rht-ccp.html#xccdf_org.ssgproject.content_group_ssh What we experience is the following: We connect with the above config / setup from the ssh client (loginserver) to the target server and execute the following commands: --- l_date=$(date) while true do echo "${l_date} $(date)" sleep 20 done ---- this produces output like: === Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:15:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:16:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:16:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:16:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:17:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:17:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:17:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:18:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:18:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:18:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:19:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:19:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:19:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:20:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:20:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:20:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:21:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:21:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:21:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:22:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:22:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:22:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:23:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:23:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:23:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:24:10 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:24:30 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:24:50 CEST 2021 Mon Sep 20 08:15:30 CEST 2021 Mon Sep 20 08:25:10 CEST 2021 Connection to XXXXXXXX closed by remote host. Connection to XXXXXXXX closed. === We get disconnected from the sshd server after 600 seconds even though we have a process running that produces regular output. This behaviour is from our point of view wrong. Interestingly the ssh client is not sending any "alive messages", at least every 60 seconds as he should according to the option "serveraliveinterval 60". We believe this is the case because we are receiving regularly output (every 20 seconds a date line is sent) from the ssh server. If we connect from the same client (openssh-clients-8.0p1-6.el8_4.2.x86_64) to older ssh servers (RHEL 7, openssh-server-7.4p1-21.el7.x86_64) or (RHEL-6, openssh-server-5.3p1-124.el6_10.x86_64) with the same 2 options in sshd_config --- clientaliveinterval 600 clientalivecountmax 0 --- -> We are NOT disconnected when running the above commands / test! (In reply to Dmitry Belyavskiy from comment #3) > Looks like a duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=1952411 to me you are probably right. Can you provide your full configuration file to check if there might be clash with the time-based rekey limit? > clientalivecountmax 0 Ok, I was probably too fast. The following change is only in OpenSSH 8.2: https://github.com/openssh/openssh-portable/commit/69334996ae203c51c70bf01d414c918a44618f8e But the zero value is problematic also in older version so I would suggest not to use it. Both in your configuration and in the ssg rules. I think we talked about this in the past. Below the 2 sshd options which include 'rekey' in their name and seem to be active on the ssh server.
sshd -T | grep -i rekey
---
gssapistorecredentialsonrekey no
rekeylimit 0 0
---
Please reproduce the issue with the information I provided above.
Hint: If you run the test with longer sleep times (for example 90 instead of 20), you'll never be disconnected because after 60 seconds of silence the ssh client will send a package to keep the session open due to the ssh option "ServerAliveInterval 60".
Use verbose mode for the 2 tests: "ssh -vvv <ssh_server>"
# After login execute (first test with 20 second sleep):
---
l_date=$(date)
while true
do
echo "${l_date} $(date)"
sleep 20
done
---
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:08:49 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:09:09 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:09:29 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:09:49 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:10:09 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:10:29 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:10:49 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:11:09 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:11:29 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:11:49 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:12:09 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:12:29 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:12:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:13:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:13:30 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:13:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:14:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:14:30 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:14:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:15:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:15:30 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:15:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:16:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:16:30 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:16:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:17:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:17:30 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:17:50 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:18:10 CEST 2021
Thu Sep 30 13:08:48 CEST 2021 Thu Sep 30 13:18:30 CEST 2021
debug3: send packet: type 1
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
#0 client-session (t4 r0 i0/0 o0/0 e[write]/0 fd 4/5/6 sock -1 cc -1)
debug3: fd 1 is not O_NONBLOCK
Connection to XXXXXXXX closed by remote host.
Connection to XXXXXXXX closed.
Transferred: sent 10456, received 10540 bytes, in 624.9 seconds
Bytes per second: sent 16.7, received 16.9
debug1: Exit status -1
---
# After login execute (second test with 90 second sleep):
---
l_date=$(date)
while true
do
echo "${l_date} $(date)"
sleep 90
done
---
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:28:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:29:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:31:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:32:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:34:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:35:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:37:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:38:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:40:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:41:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:43:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:44:50 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
Thu Sep 30 13:28:19 CEST 2021 Thu Sep 30 13:46:20 CEST 2021
debug3: send packet: type 80
debug3: receive packet: type 82
...
---
-> In this example we never get disconnected, even though we use "clientalivecountmax 0" on ssh server side. The client keeps the session alive.
# So still ... why is the server disconnecting the client with "clientalivecountmax 0" after 600 seconds while a job is running which generates regularly output -> we have information being exchanged between server and client. So no reason to disconnect him. I would only understand it if during more than 600 seconds (clientaliveinterval 600) no more data flows between ssh server and client.
After discussions with Jakub and Dmitry, I am switching this BZ back to OpenSSH. SCAP rules are not the cause of the problem here. I can reproduce this as well: SSH Server: # strace -fttTvyy -s 128 -o sshd.strace -- /usr/sbin/sshd -p 8022 -o ClientAliveInterval=60 -o ClientAliveCountMax=0 -ddd SSH Client (local): # strace -fttTvyy -s 128 -o ssh.strace -- ssh -p 8022 -o ServerAliveInterval=6 -o ServerAliveCountMax=5 localhost "date; while sleep 2; do date; done" After 1 minute sshd closed the connection. However the script continues executing on the server side ... Debug log shows: debug2: channel 0: rwin 2096857 elen 57 euse 1 debug2: channel 0: sent ext data 57 debug2: channel 0: read 164 from efd 19 debug2: channel 0: rwin 2096800 elen 164 euse 1 debug2: channel 0: sent ext data 164 Timeout, client not responding from user root ::1 port 46070 debug1: do_cleanup debug3: PAM: sshpam_thread_cleanup entering debug3: mm_request_send entering: type 124 debug3: mm_request_receive entering debug3: monitor_read: checking request 124 debug3: mm_request_send entering: type 122 debug3: mm_request_receive_expect entering: type 123 debug3: mm_request_receive entering debug3: mm_request_receive entering debug3: monitor_read: checking request 122 debug3: mm_request_send entering: type 123 debug3: mm_request_receive entering debug1: do_cleanup debug1: PAM: cleanup debug1: PAM: closing session debug1: PAM: deleting credentials debug3: PAM: sshpam_thread_cleanup entering I went back down to openssh-7.8p1-4 (RHEL8.0) and this one also has the issue. This odd behaviour was fixed "recently" by the following commit:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit 69334996ae203c51c70bf01d414c918a44618f8e
Author: djm <djm>
Date: Sat Jan 25 22:41:01 2020 +0000
upstream: make sshd_config:ClientAliveCountMax=0 disable the
connection killing behaviour, rather than killing the connection after
sending the first liveness test probe (regardless of whether the client was
responsive) bz2627; ok markus
OpenBSD-Commit-ID: 5af79c35f4c9fa280643b6852f524bfcd9bccdaf
diff --git a/serverloop.c b/serverloop.c
index e16eabe2..a8c99e2e 100644
--- a/serverloop.c
+++ b/serverloop.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: serverloop.c,v 1.220 2020/01/25 04:48:26 djm Exp $ */
+/* $OpenBSD: serverloop.c,v 1.221 2020/01/25 22:41:01 djm Exp $ */
/*
* Author: Tatu Ylonen <ylo.fi>
* Copyright (c) 1995 Tatu Ylonen <ylo.fi>, Espoo, Finland
@@ -184,7 +184,8 @@ client_alive_check(struct ssh *ssh)
int r, channel_id;
/* timeout, check to see how many we have had */
- if (ssh_packet_inc_alive_timeouts(ssh) >
+ if (options.client_alive_count_max > 0 &&
+ ssh_packet_inc_alive_timeouts(ssh) >
options.client_alive_count_max) {
sshpkt_fmt_connection_id(ssh, remote_id, sizeof(remote_id));
logit("Timeout, client not responding from %s", remote_id);
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Please backport to RHEL8 asap, since all DISA customers will potentially hit fix.
With the above fix, the check is not performed when ClientAliveCountMax=0, which seems sane.
Yes, and this behavior is recently backported to RHEL 8.6 https://bugzilla.redhat.com/show_bug.cgi?id=2015828 *** This bug has been marked as a duplicate of bug 2015828 *** |