Bug 2280840 - Live installer sometimes fails to run from KDE, after a polkit-kde-authentication-agent-1 crash, anaconda reports not running as root
Summary: Live installer sometimes fails to run from KDE, after a polkit-kde-authentica...
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: polkit-kde
Version: 42
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: marcdeop
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-16 15:42 UTC by Adam Williamson
Modified: 2025-02-26 13:02 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
backtrace of the crashed polkit agent (18.15 KB, text/plain)
2024-06-03 21:54 UTC, Adam Williamson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
KDE GitLab plasma polkit-kde-agent-1 merge_requests 41 0 None merged Fix crash on hyprland 2024-05-16 17:08:33 UTC
KDE Software Compilation 485407 0 NOR VERIFIED polkit-kde-agent crashes with nullptr in PolicyKitListener::finishObtainPrivilege() when run in Hyprland 2024-05-16 17:08:20 UTC

Description Adam Williamson 2024-05-16 15:42:18 UTC
Description of problem:
Since about 2024-05-09, in openQA testing of Rawhide updates, the KDE live installer sometimes fails to run when launched by double-clicking the desktop icon.

Version-Release number of selected component (if applicable):
polkit-kde-6.0.4-1.fc41

How reproducible:
12 out of the last 100 attempts seem to have hit this.

Steps to Reproduce:
1. Boot a freshly-built KDE live image, wait till you see the icon to launch the installer
2. Double-click it

Actual results:
A polkit auth window very briefly appears then disappears, the installer does not launch

Expected results:
The installer should launch

Additional info:
In the journal, we see this:

May 16 08:25:47 localhost-live systemd[1370]: Started app-liveinst-b10389250096410faa5aeba07aa49133.scope - Install to Hard Drive - Install.
May 16 08:25:48 localhost-live plasmashell[2638]: localuser:root being added to access control list
May 16 08:25:48 localhost-live polkit-agent-helper-1[2653]: pam_unix(polkit-1:auth): user [liveuser] has blank password; authenticated without it
May 16 08:25:48 localhost-live polkit-kde-authentication-agent-1[2429]: KCrash: appFilePath points to nullptr!
May 16 08:25:48 localhost-live polkit-kde-authentication-agent-1[2429]: KCrash: Application '<unknown>' crashing... crashRecursionCounter = 2
May 16 08:25:48 localhost-live pkexec[2639]: liveuser: Error executing command as another user: Not authorized [USER=root] [TTY=unknown] [CWD=/home/liveuser] [COMMAND=/usr/bin/liveinst]
May 16 08:25:48 localhost-live plasmashell[2639]: Error executing command as another user: Not authorized
May 16 08:25:48 localhost-live plasmashell[2639]: This incident has been reported.
May 16 08:25:48 localhost-live systemd[1370]: plasma-polkit-agent.service: Main process exited, code=killed, status=11/SEGV
May 16 08:25:48 localhost-live systemd[1370]: plasma-polkit-agent.service: Failed with result 'signal'.

There's some subsequent stuff about anaconda's exit handler crashing and abrt not being there, but I think that's all ultimately a consequence of this failure. Unfortunately it doesn't seem like openQA found a coredump to upload. I might see if I can reproduce this manually and find one.

Comment 1 Adam Williamson 2024-05-16 17:08:21 UTC
aleasto points out that this is likely https://bugs.kde.org/show_bug.cgi?id=485407 - thanks! I've backported the fix for that, we'll see if that makes the failures go away.

Comment 2 Adam Williamson 2024-05-22 23:23:22 UTC
Unfortunately I think this may still be happening :(

Comment 3 Adam Williamson 2024-06-03 18:24:24 UTC
Yes, this is definitely still happening with polkit-kde-6.0.4-2.fc41 (which was the build where I backported the patch), and with polkit-kde-6.0.90-1.fc41 (the current build). Current log:

Jun 03 06:32:05 localhost-live systemd[1438]: Started app-liveinst - Install to Hard Drive - Install.
Jun 03 06:32:05 localhost-live liveinst[2722]: localuser:root being added to access control list
Jun 03 06:32:06 localhost-live polkit-agent-helper-1[2735]: pam_unix(polkit-1:auth): user [liveuser] has blank password; authenticated without it
Jun 03 06:32:06 localhost-live audit[2735]: USER_AUTH pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Jun 03 06:32:06 localhost-live audit[2735]: USER_ACCT pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success'
Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: appFilePath points to nullptr!
Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: Application '<unknown>' crashing... crashRecursionCounter = 2
Jun 03 06:32:06 localhost-live audit[2492]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=2492 comm="polkit-kde-auth" exe="/usr/libexec/kf6/polkit-kde-authentication-agent-1" sig=11 res=1
Jun 03 06:32:06 localhost-live polkitd[1127]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.75, object path /org/kde/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jun 03 06:32:06 localhost-live polkitd[1127]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action org.fedoraproject.pkexec.liveinst for unix-process:2720:24515 [/usr/bin/bash /usr/bin/liveinst] (owned by unix-user:liveuser)
Jun 03 06:32:06 localhost-live pkexec[2723]: liveuser: Error executing command as another user: Not authorized [USER=root] [TTY=unknown] [CWD=/home/liveuser] [COMMAND=/usr/bin/liveinst]
Jun 03 06:32:06 localhost-live liveinst[2723]: Error executing command as another user: Not authorized
Jun 03 06:32:06 localhost-live liveinst[2723]: This incident has been reported.
Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Main process exited, code=killed, status=11/SEGV
Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Failed with result 'signal'.

Comment 4 Adam Williamson 2024-06-03 18:53:58 UTC
Weirdly, we don't seem to get the actual crash dump anywhere. It's not in coredumpctl list, it's not in abrt, it's not in KDE's crash viewer thing, it's just nowhere. The above is all we get, unfortunately.

Comment 5 Adam Williamson 2024-06-03 21:16:36 UTC
So I've been experimenting with reproducing this manually, and found some interesting stuff...

For me, the polkit agent crash seems to *always* happen; even when anaconda runs OK, I see that crash in the journal. But when anaconda does not run, I see this error:

Jun 03 16:40:18 localhost-live kded6[2782]: anaconda must be run as root.

that's a fatal error for anaconda (it does `sys.exit(1)` immediately after printing that message). It happens when anaconda believes it is not running as root; the check is `if os.geteuid() != 0:` . So, this does seem like privilege/polkit stuff, somehow. Somehow, anaconda winds up not running as root when the problem happens.

I can reproduce this both by launching anaconda from the desktop icon and from the "Welcome Center" window that appears on boot of the live image. I haven't yet reproduced it by running a console and then running `liveinst` from the console, but will try that a few more times.

Comment 6 Adam Williamson 2024-06-03 21:18:14 UTC
Oh, and: if I retry launching anaconda after the first try hits the bug, it often succeeds, the failure state isn't "locked in" somehow. I will probably use this as a workaround for openQA.

Comment 7 Nicolas Fella 2024-06-03 21:24:59 UTC
"appFilePath points to nullptr" comes from KCrash, it has nothing to do with the actual cause of the crash

Comment 8 Adam Williamson 2024-06-03 21:28:35 UTC
aha, managed to reproduce from a console, it shows:

```
localuser:root being added to access control list
Error executing command as another user: Not authorized

This incident has been reported.
```

Then a bunch of other errors, but I think those are all things failing because it's not root and so doesn't have the expected power to do stuff, e.g. can't run setenforce, can't write to /sys/devices, and so on.

Also, found another journal message that only appears when anaconda does not run:

Jun 03 16:40:15 localhost-live polkitd[1213]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action 

when anaconda runs OK, we get the crash messages, but not that message. That message is logged in the same second as the crash messages, so there may be some kind of race here - maybe the crash can either happen juuuuuust late enough that anaconda gets authorization first, or juuuuuust early enough that it doesn't, and that's the difference?

Comment 9 Adam Williamson 2024-06-03 21:54:55 UTC
Created attachment 2036180 [details]
backtrace of the crashed polkit agent

I managed to get a coredump by attaching gdb to the polkit agent before running liveinst, and backtraced it in mock, here is the trace.

Comment 10 Adam Williamson 2024-06-03 22:17:19 UTC
So, we're in polkit code when we crash; specifically, https://github.com/polkit-org/polkit/blob/7b3c9c85980f2f6a521aac97089c99647b4cf4ce/src/polkitagent/polkitagentsession.c#L381 . polkit is trying to kill a session's "helper" (not quite sure what that is), and calls g_source_destroy on `session->child_stdout_watch_source`. The address that points to in our trace is suspicious: 0xaaaaaaaaaaaaaaaa . Per https://opensuse-factory.opensuse.narkive.com/UOHWrreA/corrupted-pointer-0xaaaaaaaaaaaaaaaa it sounds like maybe `session->child_stdout_watch_source` got free'd unexpectedly at some point?

Comment 11 Adam Williamson 2024-06-03 22:20:30 UTC
or...maybe the *session* got freed?

Comment 12 Adam Williamson 2024-06-04 00:14:04 UTC
I tried a guess for this, on the theory of extending the fix Nate Graham recently did:

diff --git a/policykitlistener.cpp b/policykitlistener.cpp
index bcf1bd2..d5500fb 100644
--- a/policykitlistener.cpp
+++ b/policykitlistener.cpp
@@ -190,7 +190,9 @@ void PolicyKitListener::finishObtainPrivilege()
         m_dialog.data()->authenticationFailure();
 
         if (m_numTries < 3) {
-            m_session.data()->deleteLater();
+            if (!m_session.isNull()) {
+                m_session.data()->deleteLater();
+            }
 
             tryAgain();
             return;

...but that doesn't seem to help. Hoping someone else has an idea.

Comment 13 Adam Williamson 2024-06-04 15:23:57 UTC
In https://openqa.fedoraproject.org/tests/2664429 , this bug happened five times in a row!

Comment 14 Alessandro Astone 2024-06-15 20:13:52 UTC
I just hit this in KVM with a random polkit privilege escalation request on a non-live, non-empty-password user.

Comment 15 Aoife Moloney 2025-02-26 13:02:44 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle.
Changing version to 42.


Note You need to log in before you can comment on or make changes to this bug.