Bug 2109145 - FD leak in introduced pidfd_getfd() causes polkit fail to load auth rules
Summary: FD leak in introduced pidfd_getfd() causes polkit fail to load auth rules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glib2
Version: 37
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Kalev Lember
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
: 2107936 2117512 (view as bug list)
Depends On:
Blocks: BetaBlocker, F37BetaBlocker BetaFreezeException, F37BetaFreezeException
TreeView+ depends on / blocked
 
Reported: 2022-07-20 14:05 UTC by Lukas Ruzicka
Modified: 2022-08-22 00:23 UTC (History)
20 users (show)

Fixed In Version: glib2-2.73.2-8.fc37 glib2-2.73.2-8.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-15 13:15:57 UTC
Type: Bug


Attachments (Terms of Use)
The menu withou the power off button. (591.04 KB, image/png)
2022-07-20 14:05 UTC, Lukas Ruzicka
no flags Details
Journalctl showing polkit errors. (568.40 KB, text/plain)
2022-07-20 14:07 UTC, Lukas Ruzicka
no flags Details
Patch from glib2 to solve this problem (1.21 KB, patch)
2022-08-12 06:24 UTC, Anton Guda
no flags Details | Diff

Description Lukas Ruzicka 2022-07-20 14:05:01 UTC
Created attachment 1898293 [details]
The menu withou the power off button.

Description of problem:

On several latest nighthly builds of Fedora workstation a problem has shown:

When there are more than one user and several user switches are performed, the system ends up in a state where polkitd is reporting an error and the system cannot be powered off, because the power off button disappears from the Gnome menu. At this point, you can only switch users or log out a user.

When the user logs out and logs in again, the situation stays the same.

Opening the terminal and trying to poweroff via a CLI command `systemctl poweroff -i` is not possible either with an Access Denied error.

The only reliable method to poweroff the computer is to do `sudo systemctl poweroff -i`. Then it works.

Another problem spotted after this happens, however not sure if it is connected or not, is that the Activities regime stops being able to find an application and it also does not show any application icons, so no applications cannot be started via this mode.

Trying the "Alt-F2" and providing the exact CLI command works.


Version-Release number of selected component (if applicable):
Fedora Workstation 20220717 and 20220719 nightlies.

How reproducible:

Always

Steps to Reproduce:
1. Create at least two users on the system. 
2. Log in, switch to another user and back. 
3. Repeat for several times - 5 times should be enough to cause the problem.

Actual results:
See above.

Expected results:
User switching works normally, no matter how often (and what the time delta is) I switch the actual users.

Additional info:
The situation is visible in this openQA test:
https://openqa.fedoraproject.org/tests/1331718#step/desktop_login/113

The journalctl from the affected machine is attached, as well as two screenshots depicting the above noted problem.

Comment 1 Lukas Ruzicka 2022-07-20 14:06:02 UTC
Created attachment 1898294 [details]
Attempt to look for an application gets you eternal searching.

Comment 2 Lukas Ruzicka 2022-07-20 14:07:12 UTC
Created attachment 1898295 [details]
Journalctl showing polkit errors.

Comment 3 Fedora Blocker Bugs Application 2022-07-20 14:10:20 UTC
Proposed as a Blocker and Freeze Exception for 37-beta by Fedora user lruzicka using the blocker tracking app because:

 Proposing this as a blocker as it violates the following Shutdown criterion:

https://fedoraproject.org/wiki/Fedora_37_Beta_Release_Criteria#Shutdown,_reboot,_login,_logout

Comment 4 Adam Williamson 2022-07-20 15:23:36 UTC
Thanks for investigating this, Lukas, I noticed the 'power button went away after user switch' part yesterday but didn't get around to investigating in detail.

Comment 5 Anton Guda 2022-07-24 16:46:48 UTC
This behavior I noticed after glib2 update to glib2-2.73.2.
After fall back to glib2-2.73.1-2 - all fine.

Comment 6 Jan Rybar 2022-07-25 06:42:35 UTC
Yeah, looks like another duktape issue with rules not conforming to ECMA Ed.5.
I'll look into it.

Probably polkit featuring duktape instead of mozjs isn't mature enough to go into F37.
Switching back to mozjs shouldn't be hard.

Comment 7 Jan Rybar 2022-08-02 15:26:25 UTC
I switched back to mozjs91 as JS engine, which seems to fix polkit (and related things) for now.  
The issue looks quite bigger (and harder to debug) than just invalid .rules files. Polkit will be switched to duktape once the fix is ready, but not earlier than F37 branch.
This thing seems to need more time in Rawhide and it's unwanted to break important things right before F37 beta.

Anyway, thanks for catching this one! This is something the test kit or quick manual test wouldn't have revealed.

Comment 8 Anton Guda 2022-08-02 20:32:45 UTC
Sorry, but it seems that polkit-121-3.fc37.x86_64 show the similar behavior.
With glib2-2.73.1-2.fc37.x86_64 it works, with glib2-2.73.2-7.fc37.x86_64 - fails.
The failure occurs not at once, but after 30-60 min.

Comment 9 Tomasz Kłoczko 2022-08-04 16:22:55 UTC
Someone opened some issue ticket against duktape?

Comment 10 Jan Rybar 2022-08-05 09:11:22 UTC
(In reply to Tomasz Kłoczko from comment #9)
> Someone opened some issue ticket against duktape?

I don't think it's a bug in duktape. From preliminary investigation of the reproducer, it looks like something kills the thread in which the JS engine binding runs.

Comment 11 Tomasz Kłoczko 2022-08-05 10:15:32 UTC
IMO more importand question is qhy at all things like polkit needs JS engine :)

Comment 12 Jan Rybar 2022-08-05 10:26:57 UTC
(In reply to Tomasz Kłoczko from comment #11)
> IMO more importand question is qhy at all things like polkit needs JS engine
> :)

Because this question has been asked and answered MANY times in the upstream. Please don't bring this topic even here.

Comment 13 Adam Williamson 2022-08-05 16:32:04 UTC
So I've seen this twice now on my laptop without switching users, and I'm using polkit-121-3.fc37.x86_64 . I'm not sure exactly what triggers it. Obvious suspects are locking/unlocking the screen and suspending/resuming, but it's definitely survived *some* cycles of both without the problem happening - but right now, it's happened again. My options for changing network settings in the system menu are greyed out (can't change wifi network or enable a VPN connection from there), and all power options apart from "Log Out" are gone.

Comment 14 Ben Cotton 2022-08-09 13:40:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.

Comment 15 Yanko Kaneti 2022-08-11 11:39:41 UTC
*** Bug 2117512 has been marked as a duplicate of this bug. ***

Comment 16 Anton Guda 2022-08-12 06:24:38 UTC
Created attachment 1905079 [details]
Patch from glib2 to solve this problem

it seems, that glib2 git commit b62745fe8e1699473f87caff328ac2c6ce394c55 can help us to solve this problem. It works for me ;-)

Comment 17 Vincent Mihalkovič 2022-08-12 19:14:29 UTC
> With glib2-2.73.1-2.fc37.x86_64 it works, with glib2-2.73.2-7.fc37.x86_64 - fails.
I (git) bisect between those tags and find out that problem in polkit starts from following glib2 commit: https://gitlab.gnome.org/GNOME/glib/-/commit/f615eef4bafaa2fbe11530c0f66f7d28a28a58a9

Here's the glib2 scratch-build from mentioned commit: https://koji.fedoraproject.org/koji/taskinfo?taskID=90644651
and also (for sake completness) scratch-build from parent commit (https://gitlab.gnome.org/GNOME/glib/-/commit/7b93693ab3007670a3d95d6ac3cb9260c5643493): https://koji.fedoraproject.org/koji/taskinfo?taskID=90645553 

> it seems, that glib2 git commit b62745fe8e1699473f87caff328ac2c6ce394c55 can help us to solve this problem
Here's the glib2 scratch-build (https://koji.fedoraproject.org/koji/taskinfo?taskID=90734680) with b62745fe8e1699473f87caff328ac2c6ce394c55.patch - I confirm that this patch is solving mentioned polkit problem.

Comment 18 Geraldo Simião 2022-08-13 22:18:49 UTC
Still present at last F37 compose: Live-x86_64-37-20220813.n.0.iso

Comment 19 Jan Rybar 2022-08-15 06:40:16 UTC
New change in gmain.c contains FD leak.
This leak in glib2 renders polkit unable to load auth rules, therefore fences users from basic desktop usage in Gnome Shell.

Please backport https://gitlab.gnome.org/GNOME/glib/-/commit/b62745fe8e1699473f87caff328ac2c6ce394c55 asap.

Thank you.

Comment 20 Kalev Lember 2022-08-15 12:53:56 UTC
I went ahead and backported that in https://src.fedoraproject.org/rpms/glib2/c/aba250710a90254819cc8b86d6e5b912570bd9fb?branch=rawhide

Comment 21 Fedora Update System 2022-08-15 13:12:46 UTC
FEDORA-2022-4596d19c57 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-4596d19c57

Comment 22 Fedora Update System 2022-08-15 13:13:47 UTC
FEDORA-2022-3c3217ab9e has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-3c3217ab9e

Comment 23 Fedora Update System 2022-08-15 13:15:57 UTC
FEDORA-2022-3c3217ab9e has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2022-08-15 15:40:01 UTC
FEDORA-2022-4596d19c57 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Michael Catanzaro 2022-08-22 00:23:06 UTC
*** Bug 2107936 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.