Description of problem: Since about 2024-05-09, in openQA testing of Rawhide updates, the KDE live installer sometimes fails to run when launched by double-clicking the desktop icon. Version-Release number of selected component (if applicable): polkit-kde-6.0.4-1.fc41 How reproducible: 12 out of the last 100 attempts seem to have hit this. Steps to Reproduce: 1. Boot a freshly-built KDE live image, wait till you see the icon to launch the installer 2. Double-click it Actual results: A polkit auth window very briefly appears then disappears, the installer does not launch Expected results: The installer should launch Additional info: In the journal, we see this: May 16 08:25:47 localhost-live systemd[1370]: Started app-liveinst-b10389250096410faa5aeba07aa49133.scope - Install to Hard Drive - Install. May 16 08:25:48 localhost-live plasmashell[2638]: localuser:root being added to access control list May 16 08:25:48 localhost-live polkit-agent-helper-1[2653]: pam_unix(polkit-1:auth): user [liveuser] has blank password; authenticated without it May 16 08:25:48 localhost-live polkit-kde-authentication-agent-1[2429]: KCrash: appFilePath points to nullptr! May 16 08:25:48 localhost-live polkit-kde-authentication-agent-1[2429]: KCrash: Application '<unknown>' crashing... crashRecursionCounter = 2 May 16 08:25:48 localhost-live pkexec[2639]: liveuser: Error executing command as another user: Not authorized [USER=root] [TTY=unknown] [CWD=/home/liveuser] [COMMAND=/usr/bin/liveinst] May 16 08:25:48 localhost-live plasmashell[2639]: Error executing command as another user: Not authorized May 16 08:25:48 localhost-live plasmashell[2639]: This incident has been reported. May 16 08:25:48 localhost-live systemd[1370]: plasma-polkit-agent.service: Main process exited, code=killed, status=11/SEGV May 16 08:25:48 localhost-live systemd[1370]: plasma-polkit-agent.service: Failed with result 'signal'. There's some subsequent stuff about anaconda's exit handler crashing and abrt not being there, but I think that's all ultimately a consequence of this failure. Unfortunately it doesn't seem like openQA found a coredump to upload. I might see if I can reproduce this manually and find one.
aleasto points out that this is likely https://bugs.kde.org/show_bug.cgi?id=485407 - thanks! I've backported the fix for that, we'll see if that makes the failures go away.
Unfortunately I think this may still be happening :(
Yes, this is definitely still happening with polkit-kde-6.0.4-2.fc41 (which was the build where I backported the patch), and with polkit-kde-6.0.90-1.fc41 (the current build). Current log: Jun 03 06:32:05 localhost-live systemd[1438]: Started app-liveinst - Install to Hard Drive - Install. Jun 03 06:32:05 localhost-live liveinst[2722]: localuser:root being added to access control list Jun 03 06:32:06 localhost-live polkit-agent-helper-1[2735]: pam_unix(polkit-1:auth): user [liveuser] has blank password; authenticated without it Jun 03 06:32:06 localhost-live audit[2735]: USER_AUTH pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success' Jun 03 06:32:06 localhost-live audit[2735]: USER_ACCT pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success' Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: appFilePath points to nullptr! Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: Application '<unknown>' crashing... crashRecursionCounter = 2 Jun 03 06:32:06 localhost-live audit[2492]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=2492 comm="polkit-kde-auth" exe="/usr/libexec/kf6/polkit-kde-authentication-agent-1" sig=11 res=1 Jun 03 06:32:06 localhost-live polkitd[1127]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.75, object path /org/kde/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus) Jun 03 06:32:06 localhost-live polkitd[1127]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action org.fedoraproject.pkexec.liveinst for unix-process:2720:24515 [/usr/bin/bash /usr/bin/liveinst] (owned by unix-user:liveuser) Jun 03 06:32:06 localhost-live pkexec[2723]: liveuser: Error executing command as another user: Not authorized [USER=root] [TTY=unknown] [CWD=/home/liveuser] [COMMAND=/usr/bin/liveinst] Jun 03 06:32:06 localhost-live liveinst[2723]: Error executing command as another user: Not authorized Jun 03 06:32:06 localhost-live liveinst[2723]: This incident has been reported. Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Main process exited, code=killed, status=11/SEGV Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Failed with result 'signal'.
Weirdly, we don't seem to get the actual crash dump anywhere. It's not in coredumpctl list, it's not in abrt, it's not in KDE's crash viewer thing, it's just nowhere. The above is all we get, unfortunately.
So I've been experimenting with reproducing this manually, and found some interesting stuff... For me, the polkit agent crash seems to *always* happen; even when anaconda runs OK, I see that crash in the journal. But when anaconda does not run, I see this error: Jun 03 16:40:18 localhost-live kded6[2782]: anaconda must be run as root. that's a fatal error for anaconda (it does `sys.exit(1)` immediately after printing that message). It happens when anaconda believes it is not running as root; the check is `if os.geteuid() != 0:` . So, this does seem like privilege/polkit stuff, somehow. Somehow, anaconda winds up not running as root when the problem happens. I can reproduce this both by launching anaconda from the desktop icon and from the "Welcome Center" window that appears on boot of the live image. I haven't yet reproduced it by running a console and then running `liveinst` from the console, but will try that a few more times.
Oh, and: if I retry launching anaconda after the first try hits the bug, it often succeeds, the failure state isn't "locked in" somehow. I will probably use this as a workaround for openQA.
"appFilePath points to nullptr" comes from KCrash, it has nothing to do with the actual cause of the crash
aha, managed to reproduce from a console, it shows: ``` localuser:root being added to access control list Error executing command as another user: Not authorized This incident has been reported. ``` Then a bunch of other errors, but I think those are all things failing because it's not root and so doesn't have the expected power to do stuff, e.g. can't run setenforce, can't write to /sys/devices, and so on. Also, found another journal message that only appears when anaconda does not run: Jun 03 16:40:15 localhost-live polkitd[1213]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action when anaconda runs OK, we get the crash messages, but not that message. That message is logged in the same second as the crash messages, so there may be some kind of race here - maybe the crash can either happen juuuuuust late enough that anaconda gets authorization first, or juuuuuust early enough that it doesn't, and that's the difference?
Created attachment 2036180 [details] backtrace of the crashed polkit agent I managed to get a coredump by attaching gdb to the polkit agent before running liveinst, and backtraced it in mock, here is the trace.
So, we're in polkit code when we crash; specifically, https://github.com/polkit-org/polkit/blob/7b3c9c85980f2f6a521aac97089c99647b4cf4ce/src/polkitagent/polkitagentsession.c#L381 . polkit is trying to kill a session's "helper" (not quite sure what that is), and calls g_source_destroy on `session->child_stdout_watch_source`. The address that points to in our trace is suspicious: 0xaaaaaaaaaaaaaaaa . Per https://opensuse-factory.opensuse.narkive.com/UOHWrreA/corrupted-pointer-0xaaaaaaaaaaaaaaaa it sounds like maybe `session->child_stdout_watch_source` got free'd unexpectedly at some point?
or...maybe the *session* got freed?
I tried a guess for this, on the theory of extending the fix Nate Graham recently did: diff --git a/policykitlistener.cpp b/policykitlistener.cpp index bcf1bd2..d5500fb 100644 --- a/policykitlistener.cpp +++ b/policykitlistener.cpp @@ -190,7 +190,9 @@ void PolicyKitListener::finishObtainPrivilege() m_dialog.data()->authenticationFailure(); if (m_numTries < 3) { - m_session.data()->deleteLater(); + if (!m_session.isNull()) { + m_session.data()->deleteLater(); + } tryAgain(); return; ...but that doesn't seem to help. Hoping someone else has an idea.
In https://openqa.fedoraproject.org/tests/2664429 , this bug happened five times in a row!
I just hit this in KVM with a random polkit privilege escalation request on a non-live, non-empty-password user.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle. Changing version to 42.