Bug 1940067 - With dbus 1.12.8-12.el8_3 ssh session executing single command hang
Summary: With dbus 1.12.8-12.el8_3 ssh session executing single command hang
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2021-12-06
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: dbus
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: David King
QA Contact: Petr Schindler
URL:
Whiteboard:
: 1994964 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-17 14:35 UTC by James Hartsock
Modified: 2022-05-10 16:48 UTC (History)
27 users (show)

Fixed In Version: dbus-1.12.8-18.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-10 15:27:27 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5477871 0 None None None 2022-04-26 18:45:12 UTC
Red Hat Knowledge Base (Solution) 6267551 0 None None None 2021-08-30 06:44:25 UTC
Red Hat Product Errata RHBA-2022:2071 0 None None None 2022-05-10 15:27:29 UTC

Description James Hartsock 2021-03-17 14:35:27 UTC
Description of problem:
Executing an ssh session to run command and exit hangs when dbus-x11-1.12.8-12.el8_3 executes /etc/profile.d/ssh-x-forwarding.sh following command:

  eval $(dbus-launch --sh-syntax --exit-with-session)

Version-Release number of selected component (if applicable):
dbus-x11-1.12.8-12.el8_3

How reproducible:
Very

Steps to Reproduce:
1. Install dbus-x11-1.12.8-12.el8_3 & xorg-x11-xauth.
1. ssh rhel8 command (with X11 forward enabled)

Actual results:
ssh session hangs
  $ ssh -X -l hartsjc rhel8-3 rpm -q dbus-x11 xorg-x11-xauth
  dbus-x11-1.12.8-12.el8_3.x86_64
  xorg-x11-xauth-1.0.9-12.el8.x86_64
  ^C <-------------------------- have to control-c, doesn't exit on own


Expected results:
Command shouldn't hang, which it does NOT with dbus-x11-1.12.8-11.el8.
  $ ssh -X -l hartsjc rhel8-3 rpm -q dbus-x11 xorg-x11-xauth
  dbus-x11-1.12.8-11.el8.x86_64
  xorg-x11-xauth-1.0.9-12.el8.x86_64


Additional info:

Issue is with /etc/profile.d/ssh-x-forwarding.sh
  eval $(dbus-launch --sh-syntax --exit-with-session)

With above commented out, everything works as expected
  $ ssh -X -l hartsjc rhel8-3 tail -n1 /etc/profile.d/ssh-x-forwarding.sh
  #eval $(dbus-launch --sh-syntax --exit-with-session)

  $ ssh -X -l hartsjc rhel8-3 rpm -q dbus-x11 xorg-x11-xauth
  dbus-x11-1.12.8-12.el8_3.x86_64
  xorg-x11-xauth-1.0.9-12.el8.x86_64

Comment 4 James Hartsock 2021-03-18 12:40:03 UTC
With an ssh session hung
  $ ssh localhost 'echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh'
  localhost:10.0
  eval $(dbus-launch --sh-syntax --exit-with-session)


Go to another window and can see the dbus-launch lingering.  And if when I kill it, it does free the ssh session
  # ps -flu jhartsoc | grep -e PID -e dbus-launch
  F S UID          PID    PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
  1 S jhartsoc   15714       1  0  80   0 - 15768 core_s 07:35 ?        00:00:00 dbus-launch --sh-syntax --exit-with-session


$ ssh -v localhost 'echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh'
OpenSSH_8.0p1, OpenSSL 1.1.1g FIPS  21 Apr 2020
...
debug1: Requesting X11 forwarding with authentication spoofing.
debug1: Requesting authentication agent forwarding.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
debug1: Sending env XMODIFIERS = @im=ibus
debug1: Sending command: echo $DISPLAY ; tail -n1 /etc/profile.d/ssh-x-forwarding.sh
debug1: client_input_channel_open: ctype x11 rchan 3 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 59000
debug1: channel 1: new [x11]
debug1: confirm x11
debug1: client_input_channel_open: ctype x11 rchan 4 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 59002
debug1: channel 2: new [x11]
debug1: confirm x11
debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 5 win 65536 max 16384
debug1: channel 3: new [authentication agent connection]
debug1: confirm auth-agent@openssh.com
debug1: channel 3: FORCE input drain
debug1: channel 3: free: authentication agent connection, nchannels 4
localhost:10.0
eval $(dbus-launch --sh-syntax --exit-with-session)
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug1: channel 0: free: client-session, nchannels 3


# kill 16037 <------ in other window

debug1: channel 1: FORCE input drain
debug1: channel 2: FORCE input drain
debug1: channel 1: free: x11, nchannels 2
debug1: channel 2: free: x11, nchannels 1
Transferred: sent 15560, received 5240 bytes, in 41.8 seconds
Bytes per second: sent 372.4, received 125.4
debug1: Exit status 0

Comment 6 James Hartsock 2021-03-19 21:26:21 UTC
This is extremely irritating bug as causes <tab> completion on scp's to hang too.

I have worked-around this with:
  # tail -n 1 /etc/profile.d/ssh-x-forwarding.sh 
  tty -s && eval $(dbus-launch --sh-syntax --exit-with-session)

Comment 7 schanzle 2021-04-14 17:00:13 UTC
(In reply to James Hartsock from comment #6)
> I have worked-around this with:
>   # tail -n 1 /etc/profile.d/ssh-x-forwarding.sh 
>   tty -s && eval $(dbus-launch --sh-syntax --exit-with-session)

Thanks for this suggestion.  Unsure if GDK_BACKEND should be set, I decided to add before the GDK_BACKEND assignment+export:

tty -s || return

With the tty check preventing dbus-launch from running, requesting a tty with '-t' as in

time ssh -Xt somehost uptime

will exit, albeit with a +3 second delay.  The delay is caused by the dbus-launch, easily verified by adding a 'return' before dbus-launch and repeating the ssh.

I am surprised this bug went undetected before release.  I hope there will be improved ssh testing for changes to dbus-x11, which is required by gnome-session and others.

Comment 8 Dan Astoorian 2021-04-15 14:03:56 UTC
Note that the "tty -s || return" workaround removes the fix for Bug 1874282 for invocations like "ssh -X host xwininfo -root" (where the command being run requires X forwarding, but no terminal is allocated).

Comment 9 Dan Astoorian 2021-04-15 14:55:34 UTC
This is not pretty, but an alternative workaround may be to add

trap 'sleep 1; kill `pgrep -g $$ dbus-launch`' 0

before the "eval $(dbus-launch --sh-syntax --exit-with-session)" invocation in /etc/profile.d/ssh-x-forwarding.sh.

This causes the shell to kill the dbus-launch process belonging to the shell's process group when exiting (unless the shell has reset the EXIT trap).  The "sleep 1" is needed to avoid a race condition where the shell exits before dbus-launch has created the watchdog process (e.g., "ssh -X host /bin/true" still hangs if the sleep is not included).

This workaround is not effective for csh/tcsh (which use /etc/profile.d/ssh-x-forwarding.csh), and I have not tested this workaround with any shell other than bash.  If there's an alternative method of running a command after the shell exits, it would probably be preferable to using the shell's trap builtin, but in general this problem really needs a more elegant approach.

Comment 10 Dan Astoorian 2021-04-15 15:04:42 UTC
The attempted workaround in my previous comment is flawed: it causes the dbus session to be killed when the shell exits, which breaks applications launched in the background (e.g., when doing "ssh -X host gnome-terminal", the terminal will be killed immediately).  My apologies.

Comment 11 Simon Matter 2021-04-16 14:04:25 UTC
Since this issue also affects rsync via ssh, I've proposed small improvements to make rsync more robust:

https://github.com/WayneD/rsync/issues/176

Clearly the problem is not in rsync code but it would still be nice if ssh couldn't hurt rsync that way. So why not improve it?

Regards,
Simon

Comment 13 Ray Strode [halfline] 2021-04-22 18:32:50 UTC
see bug 1874282 comment 56 for a possible way forward.

Comment 15 Miha Tomšič 2021-05-31 08:44:31 UTC
I guess this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1940348 which also includes a solution with --exit-with-x11 and this works for me. 
Is there any special reason why not to use the same option as in .csh version of the ssh-x-forwarding script?

Comment 16 Paulo Andrade 2021-08-09 17:08:05 UTC
  Just made this comment for a customer experiencing this issue,
and using dbus-1.12.8-12.el8_4.2:

"""
  This issue is still not fully resolved by engineering.

  Sorry for the suggestion of removing execute bit of
/etc/profile.d/ssh-x-forwarding.sh it does not correct the issue.

  Root cause of the problem is that a command like:
"""
$ ssh -X user@host gnome-terminal
"""
will not work, because gnome-terminal needs a dbus backend.

  But the correction to the above caused other issues:
"""
$ ssh -X user@host terminal-command
"""
the above will cause a hang, because a dbus backend will be
created and never finished. A workaround for it is to use the
pseudo patches:
"""
=== /etc/profile.d/ssh-x-forwarding.csh
 if ( $?SSH_CONNECTION == 0 ) exit
 if ( $?DISPLAY == 0 ) exit
 if ( $SHLVL > 1 ) exit
+tty -s || exit
 setenv GDK_BACKEND x11
eval `dbus-launch --auto-syntax --exit-with-x11`

=== /etc/profile.d/ssh-x-forwarding.sh
 [ -z "$SSH_CONNECTION" ] && return
 [ -z "$DISPLAY" ] && return
 [ "$SHLVL" -gt 1 ] && return
+tty -s || return
 GDK_BACKEND=x11; export GDK_BACKEND
 eval $(dbus-launch --sh-syntax --exit-with-session)
"""

  A different kind of problem happens if one runs just:
"""
$ ssh user@host
"""
in this case, when dbus-launch is started, it will race with the shell
to read stdin, attempting to detect when stdin is closed. This should
be fixed by pressing ^C, or just killing the dbus-launch process.
  Also, it might not be easy to start graphical applications if
running:
"""
$ ssh -X user@host
"""
due to the dbus-launch race for stdin.

  Still, applying the suggested patch, to use 'tty -s || return' might be
best temporary workaround, as one still can run:
"""
$ ssh -X user@host dbus-launch gnome-terminal
"""

  Please note that this issue is still being investigated by engineering,
and a patch should be available at some point. Likely using inotify to
check for stdin, as initially commented at 
https://bugzilla.redhat.com/show_bug.cgi?id=1874282#c55
"""

  Hopefully the above kind of summarize the problem.

  BTW, there is still the issue of:

--exit-with-x11 and --exit-with-session

and likely, need to patch /etc/profile.d/ssh-x-forwarding.sh to match
the .csh one (use --exit-with-x11).

Comment 17 Jakub Jelen 2021-08-19 12:50:58 UTC
*** Bug 1994964 has been marked as a duplicate of this bug. ***

Comment 18 Renaud Métrich 2021-09-02 06:51:58 UTC
I had a very quick look, the issue is 100% reproducible when creating a X11-forwarding session and specifying an invalid DISPLAY, e.g.:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
$ DISPLAY=localhost:10 ssh -X vm-gui8 
[...]
connect localhost port 6010: Connection refused
connect localhost port 6010: Connection refused
[root@vm-gui8 ~]# 
--> keyboard "dead", no key entered is printed
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Once ^C is entered, the keyboard works again:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[root@vm-gui8 ~]# ^C
[root@vm-gui8 ~]# azerty
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Stracing sshd, I can see that the characters sent through ssh are received on backend side, then written to /dev/ptmx (as expected).
Unfortunately it's **dbus-launch** (started by /etc/profile.d/ssh-x-forwarding.sh) that "eats" them, as shown in example below with character "z":

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
3088  08:28:40.119043 write(16</dev/ptmx<char 5:2>>, "z", 1) = 1 <0.000095>
3118  08:28:40.119732 <... read resumed>"z", 512) = 1 <0.000281>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3088 is sshd reading on the network.
3118 is a "double-child" of 3115 which is dbus-launch:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# grep " = 3118" /tmp/sshd.strace | grep clone
3116  08:28:34.176801 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2f950dac10) = 3118 <0.000100>

# grep " = 3116" /tmp/sshd.strace | grep clone
3115  08:28:34.176129 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2f950dac10) = 3116 <0.000139>

# grep "^3115" /tmp/sshd.strace | grep execve
3115  08:28:34.162167 execve("/usr/bin/dbus-launch", ["dbus-launch", "--sh-syntax", "--exit-with-session"], ["LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232"..., "SSH_CONNECTION=192.168.122.1 46016 192.168.122.75 8022", "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "DISPLAY=localhost:10.0", "HOSTNAME=vm-gui8", "SSH_AUTH_SOCK=/tmp/ssh-vRw2NhfYi4/agent.3088", "USER=root", "SELINUX_ROLE_REQUESTED=", "PWD=/root", "SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass", "HOME=/root", "SSH_CLIENT=192.168.122.1 46016 8022", "SELINUX_LEVEL_REQUESTED=", "XDG_DATA_DIRS=/root/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share", "SSH_TTY=/dev/pts/0", "MAIL=/var/spool/mail/root", "TERM=xterm-256color", "SHELL=/bin/bash", "XMODIFIERS=@im=none", "SELINUX_USE_CURRENT_RANGE=", "SHLVL=1", "GDK_BACKEND=x11", "LOGNAME=root", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin", "HISTSIZE=1000", "LESSOPEN=||/usr/bin/lesspipe.sh %s", "_=/usr/bin/dbus-launch"]) = 0 <0.000178>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

A more robust workaround is to make sure **dbus-launch** never reads anything on stdin:

/etc/profile.d/ssh-x-forwarding.sh:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[...]
eval $(dbus-launch --sh-syntax --exit-with-session </dev/null)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

I think the issue is with **dbus-launch** which seems "persistent" until a Ctrl-C is hit in the terminal (whatever there is a valid value for DISPLAY or not).


Additionally, not related directly to this but still to be taken into account, the current mechanism for **dbus-launch** is NOT WORKING:
if you hit Ctrl-C in the terminal, dbus-launch dies shortly after and it's not possible to spawn an X application anymore, e.g.:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[root@vm-gui8 ~]# gnome-terminal 
# watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)
# unwatch_fast: "/org/gnome/terminal/legacy/" (active: 0, establishing: 1)
# watch_established: "/org/gnome/terminal/legacy/" (establishing: 0)

--> works fine, I got a terminal locally

[root@vm-gui8 ~]# ^C
[root@vm-gui8 ~]# gnome-terminal 
# Error constructing proxy for org.gnome.Terminal:/org/gnome/Terminal/Factory0: Could not connect: Connection refused
[root@vm-gui8 ~]# 

--> fails, gnome-terminal doesn't spawn
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 20 David King 2021-10-04 15:00:24 UTC
dbus-1.12.8-14.el8 and above have the fix for dbus-launch consuming characters from stdin (changing --exit-with-session to --exit-with-x11). There are a few different ways to attempt to solve the scope of dbus-laucnh, as described towards the end of bug 1874282, and the solution will likely be to make use of the logind session.

Comment 25 David King 2021-11-08 10:11:58 UTC
(In reply to David King from comment #20)
> There are a few different ways to attempt to solve the scope of dbus-laucnh,
> as described towards the end of bug 1874282, and the solution will likely be
> to make use of the logind session.

On closer inspection, inotifywait does not seem to be present in RHEL8, so this will need a rethink (or using something slightly more complicated than a simple shell script).

Comment 26 Carlos Santos 2021-11-08 11:47:33 UTC
(In reply to David King from comment #25)
> (In reply to David King from comment #20)
> > There are a few different ways to attempt to solve the scope of dbus-laucnh,
> > as described towards the end of bug 1874282, and the solution will likely be
> > to make use of the logind session.
> 
> On closer inspection, inotifywait does not seem to be present in RHEL8, so
> this will need a rethink (or using something slightly more complicated than
> a simple shell script).

inotifywait is present in RHEL-8. It is provided by inotify-tools package.

Comment 31 Ray Strode [halfline] 2021-11-08 14:58:39 UTC
can probably just use "gio monitor -f " ?

Comment 32 Ray Strode [halfline] 2021-11-08 19:21:37 UTC
just for completeness, I gave gio monitor a try this afternoon and it seems to work as well as the inotifywait version of dbus-start-ssh-session-bus, though it's gotta get called a little differently since it doesn't exit after receiving an event:

#!/bin/bash
exec >& /dev/null

export GVFS_DISABLE_FUSE=1

eval `dbus-launch --sh-syntax`
trap "kill -TERM $DBUS_SESSION_BUS_PID" EXIT

while loginctl show-session $XDG_SESSION_ID | grep -q ^State=active
do
        read < <(gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}")
done

Comment 33 Ray Strode [halfline] 2021-11-08 20:24:57 UTC
So while I was cleaning up my browser tabs I got another glance at comment 32 and noticed there's a race:

If the login screen changes to closing before the monitor runs then the monitor will miss the event.

I think we should actually start the monitor first, so we don't miss out on early events... maybe like so:

#!/bin/bash
exec >& /dev/null

export GVFS_DISABLE_FUSE=1

eval `dbus-launch --sh-syntax`
trap "kill -TERM $DBUS_SESSION_BUS_PID" EXIT

coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}")

while grep -q ^State=active <(loginctl show-session $XDG_SESSION_ID)
do
    read -u ${SESSION_MONITOR[0]}
done

Comment 34 Ray Strode [halfline] 2021-11-08 20:38:09 UTC
s/login screen/logind session/

Comment 37 Petr Schindler 2021-12-06 11:25:20 UTC
ssh quits correctly with the version dbus-1.12.8-17.el8. All tests from our test suite passes correctly.

Comment 44 Petr Schindler 2021-12-13 22:38:37 UTC
Version dbus-1.12.8-18.el8 fixes the bug and also all our tests are passing. No regression found.

Comment 53 errata-xmlrpc 2022-05-10 15:27:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (dbus bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2071


Note You need to log in before you can comment on or make changes to this bug.