Bug 2044342
| Summary: | ssh client stays running even after the remote process has terminated | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Lukáš Doktor <ldoktor> | ||||
| Component: | openssh | Assignee: | Dmitry Belyavskiy <dbelyavs> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 9.0 | CC: | jjelen, stefanha | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-07-24 07:28:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Lukáš Doktor
2022-01-24 12:28:18 UTC
Additional details collected by Stefan:
The ssh client process stays running even after the remove fio process
has terminated. The remote sshd process is still present.
The openssh-8.6p1-7.el9.1.x86_64 client blocks trying to read from a
pty:
(gdb) bt
#0 0x00007feb53484992 in read () from /lib64/libc.so.6
#1 0x000055e4b31bac46 in read (__nbytes=16384, __buf=0x7fffa3ee8b40, __fd=<optimized out>) at /usr/include/bits/unistd.h:47
#2 channel_handle_rfd.constprop.0.isra.0 (ssh=0x55e4b4a344b0, c=0x55e4b4a4a040, readset=<optimized out>, writeset=<optimized out>)
at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/channels.c:1946
#3 0x000055e4b3176819 in channel_post_open (ssh=0x55e4b4a344b0, c=0x55e4b4a4a040, readset=0x55e4b4a609b0, writeset=0x55e4b4a60a70)
at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/channels.c:2175
#4 0x000055e4b317eb0a in channel_handler (ssh=0x55e4b4a344b0, table=1, readset=0x55e4b4a609b0, writeset=0x55e4b4a60a70, unpause_secs=0x0)
at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/channels.c:2416
#5 0x000055e4b3158ff6 in channel_after_select (writeset=<optimized out>, readset=<optimized out>, ssh=0x55e4b4a344b0) at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/channels.c:2500
#6 client_loop (ssh=<optimized out>, have_pty=<optimized out>, escape_char_arg=<optimized out>, ssh2_chan_id=<optimized out>)
at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/clientloop.c:1358
#7 0x000055e4b31480b9 in main (ac=<optimized out>, av=<optimized out>) at /usr/src/debug/openssh-8.6p1-7.el9.1.x86_64/ssh.c:1714
(gdb) p *c
$2 = {type = 4, self = 0, remote_id = 0, have_remote_id = 1, istate = 0, ostate = 0, flags = 0, rfd = 4, wfd = 5, efd = 6, sock = -1, ctl_chan = -1, isatty = 0, client_tty = 0,
force_drain = 0, notbefore = 0, delayed = 0, restore_block = 0, input = 0x55e4b4a4acc0, output = 0x55e4b4a49900, extended = 0x55e4b4a36b90, path = 0x0, listening_port = 0,
listening_addr = 0x0, host_port = 0, remote_name = 0x55e4b4a61580 "client-session", remote_window = 2097152, remote_maxpacket = 32768, local_window = 2097006, local_window_max = 2097152,
local_consumed = 146, local_maxpacket = 32768, extended_usage = 2, single_connection = 0, ctype = 0x55e4b31c19ac "session", open_confirm = 0x55e4b314adb0 <ssh_session2_setup>,
open_confirm_ctx = 0x0, detach_user = 0x55e4b3152260 <client_channel_closed>, detach_close = 0, status_confirms = {tqh_first = 0x0, tqh_last = 0x55e4b4a4a118}, input_filter = 0x0,
output_filter = 0x0, filter_ctx = 0x0, filter_cleanup = 0x0, datagram = 0, connect_ctx = {host = 0x0, port = 0, ai = 0x0, aitop = 0x0}, mux_rcb = 0x0, mux_ctx = 0x0, mux_pause = 0,
mux_downstream_id = 0}
The file descriptor is in blocking mode:
# ls -alF /proc/26338/fd
total 0
dr-x------. 2 root root 0 Jan 24 06:40 ./
dr-xr-xr-x. 9 root root 0 Jan 24 05:36 ../
lrwx------. 1 root root 64 Jan 24 06:40 0 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 24 06:57 1 -> /dev/null
lrwx------. 1 root root 64 Jan 24 06:57 2 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 24 06:57 3 -> 'socket:[155926]'
lrwx------. 1 root root 64 Jan 24 06:57 4 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 24 06:57 5 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 24 06:57 6 -> /dev/pts/2
# cat /proc/26338/fdinfo/4
pos: 0
flags: 02100002
mnt_id: 27
ino: 5
As a result it seems that the ssh client does not process messages from
the remote sshd. It does not notice that the remote command has
terminated.
Tried with the RHEL-9.0.0-20220125.3 with openssh-8.7p1-6.el9.x86_64 and failed on the first attempt: debug2: channel_input_open_confirmation: channel 0: callback start debug2: fd 3 setting TCP_NODELAY debug3: set_sock_tos: set socket 3 IP_TOS 0x20 debug2: client_session2_setup: id 0 debug1: Sending command: fio /fio.job debug2: channel 0: request exec confirm 1 debug3: send packet: type 98 debug2: channel_input_open_confirmation: channel 0: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel 0: rcvd adjust 2097152 debug3: receive packet: type 99 debug2: channel_input_status_confirm: type 99 id 0 debug2: exec request accepted on channel 0 job-/dev/vdb: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8 ... fio-3.27 Starting 10 processes ^C^C^C^C^C^C ----------------------------------------------------------------- After killing the bash process it produced additional output: debug2: channel 0: read failed debug2: chan_shutdown_read: channel 0: (i0 o0 sock -1 wfd 4 efd 6 [write]) debug2: channel 0: input open -> drain debug3: send packet: type 1 debug1: channel 0: free: client-session, nchannels 1 debug3: channel 0: status: The following connections are open: #0 client-session (t4 r0 i1/0 o0/0 e[write]/0 fd -1/5/6 sock -1 cc -1) Killed by signal 1. [1]+ Terminated bash check.sh I don't know the ssh code but here are my thoughts when I collected the information from gdb. The ssh client code is that the decision to put fds into non-blocking mode is a little complicated: https://github.com/openssh/openssh-portable/blob/master/channels.c#L327 I wonder if the bash "timeout 350 ssh -T -vvv 192.168.122.211 fio /fio.job" command line takes a code path that wasn't really intended? I didn't investigate why channel_handle_rfd() does a blocking read expecting more data than currently available on the fd. If the fd is blocking it probably should have figured out how much data to read? https://github.com/openssh/openssh-portable/blob/master/channels.c#L1917 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |