Bug 1753328

Summary: Stop NOTIFY_SOCKET from leaking into the GNOME environment
Product: [Fedora] Fedora Reporter: Nathaniel McCallum <npmccallum>
Component: gnome-sessionAssignee: Debarshi Ray <debarshir>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 31CC: bberg, caillon+fedoraproject, david, debarshir, giallu, gmarr, gnome-sig, gscrivan, john.j5live, lsm5, mclasen, pasik, rhughes, rstrode, sandmann, sanjay.ankur, santiago, sgraf, splinux25, stefano, taaem
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: gnome-session-3.34.0-2.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-26 00:02:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1644940    

Description Nathaniel McCallum 2019-09-18 15:27:17 UTC
Preparation:
1. Install Fedora 31 Silverblue.
2. Ensure that you have a sub?id range set up appropriately.


$ podman run --rm -it fedora:latest
Trying to pull docker.io/library/fedora:latest...
Getting image source signatures
Copying blob 5a915a173fbc done
Copying config e9ed59d2ba done
Writing manifest to image destination
Storing signatures

After this point podman hangs.

$ strace -f -p `pidof /usr/bin/crun`
strace: Process 9511 attached
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=1434}) = 0 (Timeout)
kill(9505, 0)                           = 0
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout)
kill(9505, 0)                           = 0
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout)
kill(9505, 0)                           = 0
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout)
kill(9505, 0)                           = 0

The above repeats forever. Relevant processes are:

$ pstree -s 9505
systemd───systemd───conmon───bash

$ ps aux | egrep '(crun|podman|conmon)'
nmccallu    9448  0.0  0.3 1000924 57424 pts/0   Sl+  10:59   0:00 podman run --rm -it fedora:latest
nmccallu    9458  0.5  0.6 1148552 103376 pts/0  Sl+  10:59   0:08 podman run --rm -it fedora:latest
nmccallu    9461  0.0  0.0      0     0 ?        Zs   10:59   0:00 [podman] <defunct>
nmccallu    9462  0.0  0.2  78248 35652 ?        S    10:59   0:00 podman
nmccallu    9499  0.0  0.0  80400  2092 ?        Ssl  10:59   0:00 /usr/libexec/podman/conmon --api-version 1 -s -c 5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241 -u 5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241 -r /usr/bin/crun -b /var/home/nmccallu/.local/share/containers/storage/overlay-containers/5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241/userdata -p /run/user/16827/overlay-containers/5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241/userdata/pidfile -l k8s-file:/var/home/nmccallu/.local/share/containers/storage/overlay-containers/5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241/userdata/ctr.log --exit-dir /run/user/16827/libpod/tmp/exits --socket-dir-path /run/user/16827/libpod/tmp/socket --log-level error -t --conmon-pidfile /run/user/16827/overlay-containers/5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/home/nmccallu/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/16827 --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/16827/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241
nmccallu    9511  0.3  0.0   6308  1556 pts/0    S+   10:59   0:05 /usr/bin/crun start 5e4e43c68cca9024133e9e161cae0a3e193d655c1579627c5849ec80a863f241
nmccallu   10159  0.0  0.0 216116   892 pts/1    S+   11:26   0:00 grep -E --color=auto (crun|podman|conmon)

Comment 1 Nathaniel McCallum 2019-09-18 15:32:53 UTC
I believe this is a high priority issue given the importance of podman to Silverblue.

Comment 2 taaem 2019-09-18 16:11:23 UTC
*** Bug 1752851 has been marked as a duplicate of this bug. ***

Comment 3 Giuseppe Scrivano 2019-09-18 17:10:48 UTC
the issue seems to be in the environment variable NOTIFY_SOCKET being always specified.

Not sure why it is set in a GNOME terminal session.  This breaks runc/crun as they create another socket that is then passed down to the container, then they wait for the notification.

On F30 there is no NOTIFY_SOCKET env variable as part of a terminal session.

Moving to gnome-terminal for further triaging

Comment 4 Debarshi Ray 2019-09-18 17:51:34 UTC
On Fedora 31, the GNOME session is managed by 'systemd --user'. The presence of the NOTIFY_SOCKET is very likely a fallout from that. GNOME Terminal itself isn't to blame.

Comment 5 Nathaniel McCallum 2019-09-18 18:15:52 UTC
The good news is that `unset NOTIFY_SOCKET` appears to cause everything to work.

Comment 6 Debarshi Ray 2019-09-19 12:27:22 UTC
I submitted a merge request against gnome-session:
https://gitlab.gnome.org/GNOME/gnome-session/merge_requests/22

Comment 7 Fedora Update System 2019-09-19 17:05:23 UTC
FEDORA-2019-c129dc7174 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-c129dc7174

Comment 8 Fedora Update System 2019-09-20 02:56:59 UTC
gnome-session-3.34.0-2.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-c129dc7174

Comment 9 Fedora Update System 2019-09-21 00:02:58 UTC
gnome-session-3.34.0-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 10 taaem 2019-09-22 16:25:58 UTC
After that update I still have NOTIFY_SOCKET set in my terminal session and so most things are still broken.

Comment 11 Jens Petersen 2019-09-23 04:59:01 UTC
Re-opening since the update didn't seem to fix this.

Comment 12 Geoffrey Marr 2019-09-23 18:15:32 UTC
Discussed during the 2019-09-23 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedFreezeException" was made due to the impact of broken podman on OOTB experience, and also on Silverblue, which is composed from stable packages.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-09-23/f31-blocker-review.2019-09-23-16.03.txt

Comment 13 Benjamin Berg 2019-09-23 18:56:32 UTC
The gnome-session part is a partial fix. There is also a gnome-shell fix that has been merged upstream:

  https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/741

Quite likely, that fixes the issue completely.

Comment 14 Fedora Update System 2019-09-24 17:50:30 UTC
FEDORA-2019-a6017bfdd9 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-a6017bfdd9

Comment 15 Debarshi Ray 2019-09-24 17:53:43 UTC
(In reply to Benjamin Berg from comment #13)
> The gnome-session part is a partial fix. There is also a gnome-shell fix
> that has been merged upstream:
> 
>   https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/741
> 
> Quite likely, that fixes the issue completely.

Here's a gnome-shell build with that patch:
https://bodhi.fedoraproject.org/updates/FEDORA-2019-a6017bfdd9

Comment 16 Stefano Figura 2019-09-24 20:32:51 UTC
(In reply to Debarshi Ray from comment #15)
> (In reply to Benjamin Berg from comment #13)
> > The gnome-session part is a partial fix. There is also a gnome-shell fix
> > that has been merged upstream:
> > 
> >   https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/741
> > 
> > Quite likely, that fixes the issue completely.
> 
> Here's a gnome-shell build with that patch:
> https://bodhi.fedoraproject.org/updates/FEDORA-2019-a6017bfdd9

Thanks, it is working for me now. No need to unset NOTIFY_SOCKET anymore.

Comment 17 Fedora Update System 2019-09-25 01:38:28 UTC
gnome-shell-3.34.0-3.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-a6017bfdd9

Comment 18 Debarshi Ray 2019-09-25 14:16:51 UTC
Thanks for all the testing and feedback, everybody!

Comment 19 Fedora Update System 2019-09-26 00:02:05 UTC
gnome-shell-3.34.0-3.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Giuseppe Scrivano 2019-10-01 12:35:20 UTC
*** Bug 1756059 has been marked as a duplicate of this bug. ***