Bug 2089362
| Summary: | gio process invoked from /usr/libexec/dbus-1/dbus-kill-process-with-session never quits | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | GV <rhel> | ||||
| Component: | dbus | Assignee: | Ray Strode [halfline] <rstrode> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Petr Schindler <pschindl> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.6 | CC: | dereks, jwright, mkolbas, pschindl, rstrode, sbarcomb, tpelka, tpopela | ||||
| Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | dbus-1.12.8-23.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2097784 2136054 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-11-08 10:53:59 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2097784, 2136054 | ||||||
| Attachments: |
|
||||||
|
Description
GV
2022-05-23 13:57:10 UTC
Can https://issues.redhat.com/browse/RHELPLAN-122999 be opened for public viewing? "You can't view this issue" when I goto that URL. Base off the comment here https://bugzilla.redhat.com/show_bug.cgi?id=1940067#c33 that approx. the script /usr/libexec/dbus-1/dbus-kill-process-with-session should the ... trap "kill -TERM $DBUS_SESSION_BUS_PID" EXIT coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}") ... be after of the coproc including "$!" to ensure the gio process is terminated as well? ... coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}") trap "kill -TERM $DBUS_SESSION_BUS_PID $!" EXIT ... That terminate both dbus session and the gio process when the script exits. (In reply to Derek Schrock from comment #1) > Can https://issues.redhat.com/browse/RHELPLAN-122999 be opened for public > viewing? "You can't view this issue" when I goto that URL. That's just an internally mirrored bug into JIRA - the content is identical. I'm adding Ray as he made the comment you've referenced. It's unfortunate that we're still having issues.
Thanks Derek for investigating. The actual script we shipped was:
---
╎❯ cat dbus-kill-process-with-session
#!/bin/bash
# This script ensures the dbus-daemon is killed when the session closes.
# It's used by SSH sessions that have X forwarding (since the X display
# may outlive the session in those cases)
[ $# != 1 ] && exit 1
exec >& /dev/null
trap "kill -TERM $1" EXIT
export GVFS_DISABLE_FUSE=1
coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}")
while grep -q ^State=active <(loginctl show-session $XDG_SESSION_ID)
do
read -u ${SESSION_MONITOR[0]}
done
---
But you have indeed figured out the problem. I think it's slightly better to keep the trap where it is, but change it to:
trap 'kill -TERM $1 $(jobs -p)' EXIT
That way the trap is in effect from the beginning
LGTM. Thanks for the feedback. Created attachment 1890654 [details]
Proposed fix
I couldn't reproduce the bug with version dbus-1.12.8-19.el8. No regression found. The bug is still there if shell is tcsh. $ rpm -q dbus-x11 dbus-x11-1.12.8-18.el8_6.1.x86_64 [ssh several times to remote-host...] $ ps auxww | grep gio remote-user 3432354 0.0 0.0 143972 7476 ? Sl 12:28 0:00 gio monitor -f /run/systemd/sessions/249 remote-user 3432423 0.0 0.0 143972 7488 ? Sl 12:28 0:00 gio monitor -f /run/systemd/sessions/250 remote-user 3432496 0.0 0.0 143972 7436 ? Sl 12:28 0:00 gio monitor -f /run/systemd/sessions/251 remote-user 3432565 0.0 0.0 143972 7416 ? Sl 12:28 0:00 gio monitor -f /run/systemd/sessions/252 remote-user 3432642 0.0 0.0 143972 7416 ? Sl 12:28 0:00 gio monitor -f /run/systemd/sessions/253 remote-user 3432658 0.0 0.0 12140 1188 pts/4 S+ 12:28 0:00 grep --color=auto gio Interesting bug. tcsh apparently, oddly, instructs all child processes in startup scripts to ignore requests to be killed. See: https://access.redhat.com/solutions/74293 Some programs, like dbus-daemon will explicitly override that instruction, but others like the gio program just go with the flow. The two ways to fix this that I can think of would be to either: 1. Change the gio program to explicitly die on termination signals (like dbus-daemon does) 2. Use a hang up signal instead of a termination signal, since tcsh instructs child processes in start up scripts to treat those as fatal. I think 2 is the path of least resistance... Something like, --- /usr/libexec/dbus-1/dbus-kill-process-with-session +++ /usr/libexec/dbus-1/dbus-kill-process-with-session @@ -8,3 +8,3 @@ -trap 'kill -TERM $1 $(jobs -p)' EXIT +trap 'kill -TERM $1; KILL -HUP $(jobs -p)' EXIT Our tests pass. No regression found. I wasn't able to reproduce original bug. Not even with original version of dbus. So it looks good for me. So this reproducer highlights a race condition.
1. trap 'kill -TERM $1; kill -HUP $(jobs -p)' EXIT
2.
3. export GVFS_DISABLE_FUSE=1
4. coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}")
5.
6. while grep -q ^State=active <(loginctl show-session $XDG_SESSION_ID)
7. do
8. read -u ${SESSION_MONITOR[0]}
9. done
line 4 is asynchronous and the file monitor might not actually get set up until after the session closes. line 6 is supposed to catch that, but it's possible the session closes right after 6 runs. If it does, the file monitor may get set up after the file has changed, and so won't see the change, leading to the read call to hang indefinitely.
Possible ideas:
a) Put a "sleep $TIMEOUT" on line 5. That would give enough time for the file monitor to get setup, so would practically speaking eliminate the race, though technically speaking still have the race in pathological situations
b) Add a loop to explicitly wait for the file monitor on line 5. This would require looking in fdinfo of the gio process for an inotify watch. This assumes implementation details of how gio works, so it's not a great idea.
c) put a "-t $TIMEOUT" on the read call to make the loop wake up every $TIMEOUT seconds. if we hit the above race, the read call will block for a bit and then see the session is closed. This has the downside that the loop will wake up every iteration even when the race doesn't happen. We'd probably want a pretty big timeout, maybe 60 seconds or something, to minimize needless wake ups.
d) only put "-t $TIMEOUT" the first time through the loop, and do a fully blocking read for every subsequent run. This is pretty much equivalent to idea a), just written a little differently.
So I thought about this at lunch and had another idea...We could do something like this:
```
MONITOR_READY_FILE=$(mktemp dbus-session-monitor.XXXXXX --tmpdir)
trap 'rm -f "${MONITOR_READY_FILE}"; kill -TERM $1; kill -HUP $(jobs -p)' EXIT
export GVFS_DISABLE_FUSE=1
coproc SESSION_MONITOR (gio monitor -f "/run/systemd/sessions/${XDG_SESSION_ID}" "${MONITOR_READY_FILE}")
# Poll until the gio monitor command is actively monitoring
until
touch "${MONITOR_READY_FILE}"
read -t 0.25 -u ${SESSION_MONITOR[0]}
do
continue
done
# Wait until the session is closed
while
grep -q ^State=active <(loginctl show-session $XDG_SESSION_ID)
do
read -u ${SESSION_MONITOR[0]}
done
```
In other words, at line 5 in the previous comment, poll the monitor pipe until it detects a change event (using a brief timeout that should normally need no more than 1 iteration). Ensure the change event will always happen by using a separate file that the script has full control over, and make the script explicitly touch that file.
I tested versions 22 and 23 with code from comment 24. I was finally able to reproduce the bug but with lower frequency (it seems that 0.001 s sleep worked best for me and I got 71 sessions stuck out of 1000). With version dbus-1.12.8-23.el8 I've got 0/1000 so it looks (hopefully) fixed this time. No regression found with our test suit. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (dbus bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7769 |