Bug 1940067
Summary: | With dbus 1.12.8-12.el8_3 ssh session executing single command hang | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | James Hartsock <hartsjc> |
Component: | dbus | Assignee: | David King <dking> |
Status: | CLOSED ERRATA | QA Contact: | Petr Schindler <pschindl> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.3 | CC: | amike, ayadav, bwelterl, casantos, dbodnarc, djast, dking, jeharris, mark, mclasen, miha.tomsic, mpoole, mvanderw, pandrade, pasteur, pschindl, qguo, rmetrich, rstrode, sbarcomb, schanzle, simon.matter, swa, sysop, tpelka, tpopela, vpakolu |
Target Milestone: | rc | Keywords: | Regression, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | dbus-1.12.8-18.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-10 15:27:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Deadline: | 2021-12-06 |
Description
James Hartsock
2021-03-17 14:35:27 UTC
With an ssh session hung $ ssh localhost 'echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh' localhost:10.0 eval $(dbus-launch --sh-syntax --exit-with-session) Go to another window and can see the dbus-launch lingering. And if when I kill it, it does free the ssh session # ps -flu jhartsoc | grep -e PID -e dbus-launch F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 1 S jhartsoc 15714 1 0 80 0 - 15768 core_s 07:35 ? 00:00:00 dbus-launch --sh-syntax --exit-with-session $ ssh -v localhost 'echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh' OpenSSH_8.0p1, OpenSSL 1.1.1g FIPS 21 Apr 2020 ... debug1: Requesting X11 forwarding with authentication spoofing. debug1: Requesting authentication agent forwarding. debug1: Sending environment. debug1: Sending env LANG = en_US.UTF-8 debug1: Sending env XMODIFIERS = @im=ibus debug1: Sending command: echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh debug1: client_input_channel_open: ctype x11 rchan 3 win 65536 max 16384 debug1: client_request_x11: request from 127.0.0.1 59000 debug1: channel 1: new [x11] debug1: confirm x11 debug1: client_input_channel_open: ctype x11 rchan 4 win 65536 max 16384 debug1: client_request_x11: request from 127.0.0.1 59002 debug1: channel 2: new [x11] debug1: confirm x11 debug1: client_input_channel_open: ctype auth-agent rchan 5 win 65536 max 16384 debug1: channel 3: new [authentication agent connection] debug1: confirm auth-agent debug1: channel 3: FORCE input drain debug1: channel 3: free: authentication agent connection, nchannels 4 localhost:10.0 eval $(dbus-launch --sh-syntax --exit-with-session) debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 debug1: client_input_channel_req: channel 0 rtype eow reply 0 debug1: channel 0: free: client-session, nchannels 3 # kill 16037 <------ in other window debug1: channel 1: FORCE input drain debug1: channel 2: FORCE input drain debug1: channel 1: free: x11, nchannels 2 debug1: channel 2: free: x11, nchannels 1 Transferred: sent 15560, received 5240 bytes, in 41.8 seconds Bytes per second: sent 372.4, received 125.4 debug1: Exit status 0 This is extremely irritating bug as causes <tab> completion on scp's to hang too. I have worked-around this with: # tail -n 1 /etc/profile.d/ssh-x-forwarding.sh tty -s && eval $(dbus-launch --sh-syntax --exit-with-session) (In reply to James Hartsock from comment #6) > I have worked-around this with: > # tail -n 1 /etc/profile.d/ssh-x-forwarding.sh > tty -s && eval $(dbus-launch --sh-syntax --exit-with-session) Thanks for this suggestion. Unsure if GDK_BACKEND should be set, I decided to add before the GDK_BACKEND assignment+export: tty -s || return With the tty check preventing dbus-launch from running, requesting a tty with '-t' as in time ssh -Xt somehost uptime will exit, albeit with a +3 second delay. The delay is caused by the dbus-launch, easily verified by adding a 'return' before dbus-launch and repeating the ssh. I am surprised this bug went undetected before release. I hope there will be improved ssh testing for changes to dbus-x11, which is required by gnome-session and others. Note that the "tty -s || return" workaround removes the fix for Bug 1874282 for invocations like "ssh -X host xwininfo -root" (where the command being run requires X forwarding, but no terminal is allocated). This is not pretty, but an alternative workaround may be to add trap 'sleep 1; kill `pgrep -g $$ dbus-launch`' 0 before the "eval $(dbus-launch --sh-syntax --exit-with-session)" invocation in /etc/profile.d/ssh-x-forwarding.sh. This causes the shell to kill the dbus-launch process belonging to the shell's process group when exiting (unless the shell has reset the EXIT trap). The "sleep 1" is needed to avoid a race condition where the shell exits before dbus-launch has created the watchdog process (e.g., "ssh -X host /bin/true" still hangs if the sleep is not included). This workaround is not effective for csh/tcsh (which use /etc/profile.d/ssh-x-forwarding.csh), and I have not tested this workaround with any shell other than bash. If there's an alternative method of running a command after the shell exits, it would probably be preferable to using the shell's trap builtin, but in general this problem really needs a more elegant approach. The attempted workaround in my previous comment is flawed: it causes the dbus session to be killed when the shell exits, which breaks applications launched in the background (e.g., when doing "ssh -X host gnome-terminal", the terminal will be killed immediately). My apologies. Since this issue also affects rsync via ssh, I've proposed small improvements to make rsync more robust: https://github.com/WayneD/rsync/issues/176 Clearly the problem is not in rsync code but it would still be nice if ssh couldn't hurt rsync that way. So why not improve it? Regards, Simon see bug 1874282 comment 56 for a possible way forward. I guess this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1940348 which also includes a solution with --exit-with-x11 and this works for me. Is there any special reason why not to use the same option as in .csh version of the ssh-x-forwarding script? Just made this comment for a customer experiencing this issue, and using dbus-1.12.8-12.el8_4.2: """ This issue is still not fully resolved by engineering. Sorry for the suggestion of removing execute bit of /etc/profile.d/ssh-x-forwarding.sh it does not correct the issue. Root cause of the problem is that a command like: """ $ ssh -X user@host gnome-terminal """ will not work, because gnome-terminal needs a dbus backend. But the correction to the above caused other issues: """ $ ssh -X user@host terminal-command """ the above will cause a hang, because a dbus backend will be created and never finished. A workaround for it is to use the pseudo patches: """ === /etc/profile.d/ssh-x-forwarding.csh if ( $?SSH_CONNECTION == 0 ) exit if ( $?DISPLAY == 0 ) exit if ( $SHLVL > 1 ) exit +tty -s || exit setenv GDK_BACKEND x11 eval `dbus-launch --auto-syntax --exit-with-x11` === /etc/profile.d/ssh-x-forwarding.sh [ -z "$SSH_CONNECTION" ] && return [ -z "$DISPLAY" ] && return [ "$SHLVL" -gt 1 ] && return +tty -s || return GDK_BACKEND=x11; export GDK_BACKEND eval $(dbus-launch --sh-syntax --exit-with-session) """ A different kind of problem happens if one runs just: """ $ ssh user@host """ in this case, when dbus-launch is started, it will race with the shell to read stdin, attempting to detect when stdin is closed. This should be fixed by pressing ^C, or just killing the dbus-launch process. Also, it might not be easy to start graphical applications if running: """ $ ssh -X user@host """ due to the dbus-launch race for stdin. Still, applying the suggested patch, to use 'tty -s || return' might be best temporary workaround, as one still can run: """ $ ssh -X user@host dbus-launch gnome-terminal """ Please note that this issue is still being investigated by engineering, and a patch should be available at some point. Likely using inotify to check for stdin, as initially commented at https://bugzilla.redhat.com/show_bug.cgi?id=1874282#c55 """ Hopefully the above kind of summarize the problem. BTW, there is still the issue of: --exit-with-x11 and --exit-with-session and likely, need to patch /etc/profile.d/ssh-x-forwarding.sh to match the .csh one (use --exit-with-x11). *** Bug 1994964 has been marked as a duplicate of this bug. *** I had a very quick look, the issue is 100% reproducible when creating a X11-forwarding session and specifying an invalid DISPLAY, e.g.: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- $ DISPLAY=localhost:10 ssh -X vm-gui8 [...] connect localhost port 6010: Connection refused connect localhost port 6010: Connection refused [root@vm-gui8 ~]# --> keyboard "dead", no key entered is printed -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Once ^C is entered, the keyboard works again: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [root@vm-gui8 ~]# ^C [root@vm-gui8 ~]# azerty -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Stracing sshd, I can see that the characters sent through ssh are received on backend side, then written to /dev/ptmx (as expected). Unfortunately it's **dbus-launch** (started by /etc/profile.d/ssh-x-forwarding.sh) that "eats" them, as shown in example below with character "z": -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3088 08:28:40.119043 write(16</dev/ptmx<char 5:2>>, "z", 1) = 1 <0.000095> 3118 08:28:40.119732 <... read resumed>"z", 512) = 1 <0.000281> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3088 is sshd reading on the network. 3118 is a "double-child" of 3115 which is dbus-launch: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # grep " = 3118" /tmp/sshd.strace | grep clone 3116 08:28:34.176801 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2f950dac10) = 3118 <0.000100> # grep " = 3116" /tmp/sshd.strace | grep clone 3115 08:28:34.176129 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2f950dac10) = 3116 <0.000139> # grep "^3115" /tmp/sshd.strace | grep execve 3115 08:28:34.162167 execve("/usr/bin/dbus-launch", ["dbus-launch", "--sh-syntax", "--exit-with-session"], ["LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232"..., "SSH_CONNECTION=192.168.122.1 46016 192.168.122.75 8022", "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "DISPLAY=localhost:10.0", "HOSTNAME=vm-gui8", "SSH_AUTH_SOCK=/tmp/ssh-vRw2NhfYi4/agent.3088", "USER=root", "SELINUX_ROLE_REQUESTED=", "PWD=/root", "SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass", "HOME=/root", "SSH_CLIENT=192.168.122.1 46016 8022", "SELINUX_LEVEL_REQUESTED=", "XDG_DATA_DIRS=/root/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share", "SSH_TTY=/dev/pts/0", "MAIL=/var/spool/mail/root", "TERM=xterm-256color", "SHELL=/bin/bash", "XMODIFIERS=@im=none", "SELINUX_USE_CURRENT_RANGE=", "SHLVL=1", "GDK_BACKEND=x11", "LOGNAME=root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin", "HISTSIZE=1000", "LESSOPEN=||/usr/bin/lesspipe.sh %s", "_=/usr/bin/dbus-launch"]) = 0 <0.000178> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- A more robust workaround is to make sure **dbus-launch** never reads anything on stdin: /etc/profile.d/ssh-x-forwarding.sh: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [...] eval $(dbus-launch --sh-syntax --exit-with-session </dev/null) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- I think the issue is with **dbus-launch** which seems "persistent" until a Ctrl-C is hit in the terminal (whatever there is a valid value for DISPLAY or not). Additionally, not related directly to this but still to be taken into account, the current mechanism for **dbus-launch** is NOT WORKING: if you hit Ctrl-C in the terminal, dbus-launch dies shortly after and it's not possible to spawn an X application anymore, e.g.: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [root@vm-gui8 ~]# gnome-terminal # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0) # unwatch_fast: "/org/gnome/terminal/legacy/" (active: 0, establishing: 1) # watch_established: "/org/gnome/terminal/legacy/" (establishing: 0) --> works fine, I got a terminal locally [root@vm-gui8 ~]# ^C [root@vm-gui8 ~]# gnome-terminal # Error constructing proxy for org.gnome.Terminal:/org/gnome/Terminal/Factory0: Could not connect: Connection refused [root@vm-gui8 ~]# --> fails, gnome-terminal doesn't spawn -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- dbus-1.12.8-14.el8 and above have the fix for dbus-launch consuming characters from stdin (changing --exit-with-session to --exit-with-x11). There are a few different ways to attempt to solve the scope of dbus-laucnh, as described towards the end of bug 1874282, and the solution will likely be to make use of the logind session. (In reply to David King from comment #20) > There are a few different ways to attempt to solve the scope of dbus-laucnh, > as described towards the end of bug 1874282, and the solution will likely be > to make use of the logind session. On closer inspection, inotifywait does not seem to be present in RHEL8, so this will need a rethink (or using something slightly more complicated than a simple shell script). (In reply to David King from comment #25) > (In reply to David King from comment #20) > > There are a few different ways to attempt to solve the scope of dbus-laucnh, > > as described towards the end of bug 1874282, and the solution will likely be > > to make use of the logind session. > > On closer inspection, inotifywait does not seem to be present in RHEL8, so > this will need a rethink (or using something slightly more complicated than > a simple shell script). inotifywait is present in RHEL-8. It is provided by inotify-tools package. can probably just use "gio monitor -f " ? just for completeness, I gave gio monitor a try this afternoon and it seems to work as well as the inotifywait version of dbus-start-ssh-session-bus, though it's gotta get called a little differently since it doesn't exit after receiving an event: #!/bin/bash exec >& /dev/null export GVFS_DISABLE_FUSE=1 eval `dbus-launch --sh-syntax` trap "kill -TERM $DBUS_SESSION_BUS_PID" EXIT while loginctl show-session $XDG_SESSION_ID | grep -q ^State=active do read < <(gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}") done So while I was cleaning up my browser tabs I got another glance at comment 32 and noticed there's a race: If the login screen changes to closing before the monitor runs then the monitor will miss the event. I think we should actually start the monitor first, so we don't miss out on early events... maybe like so: #!/bin/bash exec >& /dev/null export GVFS_DISABLE_FUSE=1 eval `dbus-launch --sh-syntax` trap "kill -TERM $DBUS_SESSION_BUS_PID" EXIT coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}") while grep -q ^State=active <(loginctl show-session $XDG_SESSION_ID) do read -u ${SESSION_MONITOR[0]} done s/login screen/logind session/ ssh quits correctly with the version dbus-1.12.8-17.el8. All tests from our test suite passes correctly. Version dbus-1.12.8-18.el8 fixes the bug and also all our tests are passing. No regression found. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (dbus bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2071 I found dbus-1.12.8-18.el8 to work better - but it's still not fully fixed, or perhaps this is one of the many variants covered under different issues... Before 1.12.8-18.el8, an interactive SSH to the machine that exit would hang for a a few seconds. After 1.12.8-18.el8, the exit is quick. This is an improvement. If I start a gnome-terminal from within the SSH, it comes up right away, which I believe was the original reason for this code in the first place. However, if I close that gnome-terminal, and then try to exit the SSH session, it still hangs for a much longer time (more than a few seconds), even with dbus-1.12.8-18.el8. I can hit control-C, but this shouldn't be necessary. I haven't thought through the why here, but it seems like a lot of work to make gnome-terminal work through SSH, and it still doesn't fully work. For us, we are still deleting the ssh-x-forwarding.sh and ssh-x-forwarding.csh from all of our installs. I was testing the above to see if we could drop this work-around, and I don't think we can yet. As long as the session can still cause SSH to hang during shutdown, I consider this feature to still be broken, and we are still better off without it. I'll try to do more research as time permits, but just sharing my experiences here. |