Bug 2280840
| Summary: | Live installer sometimes fails to run from KDE, after a polkit-kde-authentication-agent-1 crash, anaconda reports not running as root | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | ||||
| Component: | polkit-kde | Assignee: | marcdeop | ||||
| Status: | ASSIGNED --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 42 | CC: | aleixpol, ales.astone, jreznik, kde-sig, marcdeop, nate, ngompa13, nicolas.fella, rdieter, than | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | Bug | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Adam Williamson
2024-05-16 15:42:18 UTC
aleasto points out that this is likely https://bugs.kde.org/show_bug.cgi?id=485407 - thanks! I've backported the fix for that, we'll see if that makes the failures go away. Unfortunately I think this may still be happening :( Yes, this is definitely still happening with polkit-kde-6.0.4-2.fc41 (which was the build where I backported the patch), and with polkit-kde-6.0.90-1.fc41 (the current build). Current log: Jun 03 06:32:05 localhost-live systemd[1438]: Started app-liveinst - Install to Hard Drive - Install. Jun 03 06:32:05 localhost-live liveinst[2722]: localuser:root being added to access control list Jun 03 06:32:06 localhost-live polkit-agent-helper-1[2735]: pam_unix(polkit-1:auth): user [liveuser] has blank password; authenticated without it Jun 03 06:32:06 localhost-live audit[2735]: USER_AUTH pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success' Jun 03 06:32:06 localhost-live audit[2735]: USER_ACCT pid=2735 uid=1000 auid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix acct="liveuser" exe="/usr/lib/polkit-1/polkit-agent-helper-1" hostname=? addr=? terminal=? res=success' Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: appFilePath points to nullptr! Jun 03 06:32:06 localhost-live polkit-kde-authentication-agent-1[2492]: KCrash: Application '<unknown>' crashing... crashRecursionCounter = 2 Jun 03 06:32:06 localhost-live audit[2492]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=2492 comm="polkit-kde-auth" exe="/usr/libexec/kf6/polkit-kde-authentication-agent-1" sig=11 res=1 Jun 03 06:32:06 localhost-live polkitd[1127]: Unregistered Authentication Agent for unix-session:1 (system bus name :1.75, object path /org/kde/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus) Jun 03 06:32:06 localhost-live polkitd[1127]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action org.fedoraproject.pkexec.liveinst for unix-process:2720:24515 [/usr/bin/bash /usr/bin/liveinst] (owned by unix-user:liveuser) Jun 03 06:32:06 localhost-live pkexec[2723]: liveuser: Error executing command as another user: Not authorized [USER=root] [TTY=unknown] [CWD=/home/liveuser] [COMMAND=/usr/bin/liveinst] Jun 03 06:32:06 localhost-live liveinst[2723]: Error executing command as another user: Not authorized Jun 03 06:32:06 localhost-live liveinst[2723]: This incident has been reported. Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Main process exited, code=killed, status=11/SEGV Jun 03 06:32:06 localhost-live systemd[1438]: plasma-polkit-agent.service: Failed with result 'signal'. Weirdly, we don't seem to get the actual crash dump anywhere. It's not in coredumpctl list, it's not in abrt, it's not in KDE's crash viewer thing, it's just nowhere. The above is all we get, unfortunately. So I've been experimenting with reproducing this manually, and found some interesting stuff... For me, the polkit agent crash seems to *always* happen; even when anaconda runs OK, I see that crash in the journal. But when anaconda does not run, I see this error: Jun 03 16:40:18 localhost-live kded6[2782]: anaconda must be run as root. that's a fatal error for anaconda (it does `sys.exit(1)` immediately after printing that message). It happens when anaconda believes it is not running as root; the check is `if os.geteuid() != 0:` . So, this does seem like privilege/polkit stuff, somehow. Somehow, anaconda winds up not running as root when the problem happens. I can reproduce this both by launching anaconda from the desktop icon and from the "Welcome Center" window that appears on boot of the live image. I haven't yet reproduced it by running a console and then running `liveinst` from the console, but will try that a few more times. Oh, and: if I retry launching anaconda after the first try hits the bug, it often succeeds, the failure state isn't "locked in" somehow. I will probably use this as a workaround for openQA. "appFilePath points to nullptr" comes from KCrash, it has nothing to do with the actual cause of the crash aha, managed to reproduce from a console, it shows: ``` localuser:root being added to access control list Error executing command as another user: Not authorized This incident has been reported. ``` Then a bunch of other errors, but I think those are all things failing because it's not root and so doesn't have the expected power to do stuff, e.g. can't run setenforce, can't write to /sys/devices, and so on. Also, found another journal message that only appears when anaconda does not run: Jun 03 16:40:15 localhost-live polkitd[1213]: Operator of unix-session:1 FAILED to authenticate to gain authorization for action when anaconda runs OK, we get the crash messages, but not that message. That message is logged in the same second as the crash messages, so there may be some kind of race here - maybe the crash can either happen juuuuuust late enough that anaconda gets authorization first, or juuuuuust early enough that it doesn't, and that's the difference? Created attachment 2036180 [details]
backtrace of the crashed polkit agent
I managed to get a coredump by attaching gdb to the polkit agent before running liveinst, and backtraced it in mock, here is the trace.
So, we're in polkit code when we crash; specifically, https://github.com/polkit-org/polkit/blob/7b3c9c85980f2f6a521aac97089c99647b4cf4ce/src/polkitagent/polkitagentsession.c#L381 . polkit is trying to kill a session's "helper" (not quite sure what that is), and calls g_source_destroy on `session->child_stdout_watch_source`. The address that points to in our trace is suspicious: 0xaaaaaaaaaaaaaaaa . Per https://opensuse-factory.opensuse.narkive.com/UOHWrreA/corrupted-pointer-0xaaaaaaaaaaaaaaaa it sounds like maybe `session->child_stdout_watch_source` got free'd unexpectedly at some point? or...maybe the *session* got freed? I tried a guess for this, on the theory of extending the fix Nate Graham recently did:
diff --git a/policykitlistener.cpp b/policykitlistener.cpp
index bcf1bd2..d5500fb 100644
--- a/policykitlistener.cpp
+++ b/policykitlistener.cpp
@@ -190,7 +190,9 @@ void PolicyKitListener::finishObtainPrivilege()
m_dialog.data()->authenticationFailure();
if (m_numTries < 3) {
- m_session.data()->deleteLater();
+ if (!m_session.isNull()) {
+ m_session.data()->deleteLater();
+ }
tryAgain();
return;
...but that doesn't seem to help. Hoping someone else has an idea.
In https://openqa.fedoraproject.org/tests/2664429 , this bug happened five times in a row! I just hit this in KVM with a random polkit privilege escalation request on a non-live, non-empty-password user. This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle. Changing version to 42. |