Hide Forgot
Description of problem: On F33 KDE, I can't switch repeatedly between several users. Once I try to log in with a user which has been logged in before (and now is logged out), the screen shows a very long spinning circle and then a black screen. No action is possible at that point. If I manually switch to VT2 and reboot, the reboot is waiting 90 seconds for pending user sessions. There is something stuck when starting those user sessions. See attached video. Version-Release number of selected component (if applicable): sddm-0.18.1-6.fc33.x86_64 kf5-plasma-5.72.0-1.fc33.x86_64 plasma-user-manager-5.19.3-1.fc33.x86_64 Fedora-KDE-Live-x86_64-Rawhide-20200721.n.0.iso How reproducible: seems always Steps to Reproduce: 1. install KDE from Fedora-KDE-Live-x86_64-Rawhide-20200721.n.0.iso 2. create two users - user1 and user2. (I marked user1 as admin and not user2, but that shouldn't be important). 3. reboot 4. log in as user1 5. log out 6. log in as user2 7. log out 8. log in as user1 - see a very log spinning circle and then an endless black screen (with a functioning cursor) Additional info: I tried the same procedure without user2, i.e. just log in user1, log out, log in user1 - and that works, even repeatedly. This is somehow related to the session that user2 starts and stops between those steps. Also please note that I see "Process 1518 (kglobalaccel5) of user 1001 dumped core." and "Process 2215 (kglobalaccel5) of user 1002 dumped core" in the journal. I'm not sure if it is related to the login issue or not. I'll try to report the crash in a separate bug, but abrt currently claims "Error: File './coredump' is not a coredump", so I can't easily report it.
Created attachment 1702779 [details] bug demonstration video
Created attachment 1702780 [details] journal
Created attachment 1702781 [details] rpm -qa
(In reply to Kamil Páral from comment #0) > Also please note that I see "Process 1518 (kglobalaccel5) of user 1001 > dumped core." and "Process 2215 (kglobalaccel5) of user 1002 dumped core" in > the journal. I'm not sure if it is related to the login issue or not. I'll > try to report the crash in a separate bug, but abrt currently claims "Error: > File './coredump' is not a coredump", so I can't easily report it. That is bug 1860616, it seems abrt doesn't support zstd-compressed coredumps currently.
Proposing as an F33 blocker. The most fitting criterion might be this: "A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility." https://fedoraproject.org/wiki/Basic_Release_Criteria#Expected_installed_system_boot_behavior The login process works initially, but stops working after following the sequence of actions described in comment 0. If people feel this criterion is not a good candidate, then it's time to refine it or create a new one. We have precise criteria about everything else (log out, user switching, reboot, shutdown, etc), but we don't have a criterion which would say "login must work (every time)". We can add it, if needed.
Looks like another variant of (largely) well-known bug #1817708 , where processes from previous sessions persist causing subsequent logins to fail. -1 blocker
(In reply to Rex Dieter from comment #6) > Looks like another variant of (largely) well-known bug #1817708 , where > processes from previous sessions persist causing subsequent logins to fail. > > -1 blocker I intend to propose all user switching-related bugs as blockers, now that we have a specific criterion for it in F33. This bug might or might not be related, but I'm not sure if it's relevant for the release blocking decision. Bug 1817708 was rejected as a blocker, because we didn't have a user switching criterion at that time. Now we have. If user switching is definitely covered by criteria now, it seems silly not to hold basic user login to at least the same standards. As I said, we can clarify the criteria as needed, if stakeholders agree. What is your rationale for -1 blocker? Thanks.
(In reply to Kamil Páral from comment #7) > I intend to propose all user switching-related bugs as blockers, now that we > have a specific criterion for it in F33. Actually, I need to clarify this. I just found out that user switching in no longer available in F33 KDE (not sure whether it is a bug or intentional). As long as it is not offered to the user, I can't of course propose them as blockers. So I'll keep to just user login/logout testing for now. I'll individually report the user switching issues and propose them as blockers, once/if user switching is offered again in KDE.
My objection was mostly about how to deal with the blocker. I feel not enough is known about the root cause(s), much less how to diagnose and fix them. In this specific case in particular, so a user has processes lingering after log out. What's to blame? Who's job is it to fix that? *This* bug was filed against sddm, a component I have large doubts has little to do with the root problem.
Thanks for reply, Rex. I chose sddm because it's at least partially related, I don't know the root cause and the actual culprit either. I hope that's something that developers can figure out! :-) I'll be happy to provide as many details as I can, and test different approaches to help pinpoint the root cause better. Sddm is just a temporary station here, if this turns out to be systemd-related (or we suspect it could), we can reassign there. We can also move this to "distribution", if you prefer. If this turns out to be a hard nut to crack, we have options how to deal with it, e.g. with "Difficult to fix blocker bugs": https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Exceptional_cases Or there can be other ways to handle this, but basically the unknown factor (what caused it? how to fix it?) shouldn't affect the blocker decision.
Discussed during the 2020-08-03 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as we acknowledge that this is a bad bug and perhaps intellectually violates the criteria, however we don’t believe we have a specific criterion to use against this bug at the moment. We will punt for now and send out mail to discuss implementing a new criterion or redefining an existing one. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-03/f33-blocker-review.2020-08-03-16.02.txt
I've spent more time debugging this. I can reproduce this problem every time, but sometimes the problem occurs on the third login (user1 -> user2 -> user1), and sometimes on the fourth login (user1 -> user2 -> user1 -> user2). So there seems to be some timing or a race condition in play. I found that I can also reproduce this 'black screen instead of a login' problem with just one user, if I log in and out a sufficient number of times (sometimes 3x, sometimes 10x). It might be the same or a different bug, but it has the same consequence and it's also clearly a race condition. Then I experimented with KillUserProcesses=Yes in /etc/systemd/logind.conf. After setting this, I can log in three times while alternating users just fine (user1 -> user2 -> user1), but I get a black screen when logging *out* the fourth time (i.e. user1 -> user2 -> user1 -> user2 -> black screen). This is mostly consistent, but I also saw it during the second logout (user1 -> user2 -> black screen). Again, a race condition. On top of that, in this configuration, I also saw a login black screen very rarely when logging in just with user1 repeatedly. So another (same or different) race condition. Sigh, this is really broken. Whichever systemd configuration and whichever number of users, the login/logout process gets broken if you relogin a sufficient number of times to get a bad dice roll. Today I tested with: kernel-5.8.0-1.fc33.x86_64 kf5-plasma-5.72.0-2.fc33.x86_64 plasma-user-manager-5.19.4-2.fc33.x86_64 sddm-0.18.1-7.fc33.x86_64 systemd-246-1.fc33.x86_64 xorg-x11-server-Xorg-1.20.8-3.fc33.x86_64
Discussed during the 2020-08-10 blocker review meeting: [0] The decision to classify this bug as an "AcceptedBlocker" was made as a violation of the in-progress but accepted-in-principle logout criterion, see thread https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/message/PTRPHWTMVEKLTCMVDBGIWYYMYVIBAJYB/ for more information. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-10/f33-blocker-review.2020-08-10-16.17.txt
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
Created attachment 1711349 [details] stuck processes during reboot If I perform the reproducer from comment 0, then switch to TTY3 using Ctrl+Alt+F3 and try to reboot the machine, the shutdown phase waits for "stopping user session <number>" (something along those lines), which lasts for 90 seconds. Then the session is killed and the reboot proceeds. If I get impatient, though, and hit Ctrl+Alt+Del 7 times in a quick succession, I believe it should convince systemd to kill the remaining processes. Instead, it prints a list of processes it's waiting for (and doesn't kill, for some reason). See the screenshot. One of those processes is likely the one hanging and the reason why even a regular reboot doesn't work until the session termination times out.
Rex, Martin: anything from your side in response to kparal's new info? Is there sufficient info to debug this now? Thanks!
(In reply to Geoffrey Marr from comment #13) > The decision to classify this bug as an "AcceptedBlocker" was made as a > violation of the in-progress but accepted-in-principle logout criterion, see > thread > https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/ > message/PTRPHWTMVEKLTCMVDBGIWYYMYVIBAJYB/ for more information. The new criterion is now live: https://fedoraproject.org/wiki/Fedora_33_Beta_Release_Criteria#Shutdown.2C_reboot.2C_login.2C_logout
From rdieter in #fedora-kde: no time, nor aptitude to really debug that properly. I think we'll have to do with the logind KillUserProcesses=yes hammer Martin, do you have anything you can add?
Could the reporter or someone with reproducibility share the process list while logout is delaying? Maybe it's quite obvious with knowlegde of active processes to see ande guess what to blame, I tend to agree with Rex about SDDM as propably _not_ the root cause in first place. What about systemd and its user session handling (wild guess because not aware about what's really running in a reproducible F33 system)?
Hello, I did some digging about this issue some time ago. The problem is completely unrelated to switching users. The problem is that whenever you log out, some processes might linger and then you are not able to log back in. loginctl list-sessions loginctl session-status $SESSION_NUMBER Will help you find out which processes are lingering. If you kill manually those processes, the session will terminate properly and everything will get back to normal :-) I have the impression that the problem here is the fact that fedora uses dbus-broker. Other distros which don't use dbus-broker do not have this issue we are facing here.
I have this list from when I was facing a similar issue a week ago. The cause was different (I had a bad shell script that Plasma got hung up on), but the behavior on subsequent login is the same. I had seen similar behavior a while ago and the process list below matches what I recall seeing: bcotton 13986 0.0 0.0 21412 11392 ? Ss 10:03 0:00 /usr/lib/systemd/systemd --user bcotton 13988 0.0 0.0 211932 6224 ? S 10:03 0:00 (sd-pam) bcotton 13997 0.0 0.0 667240 15268 ? S<sl 10:03 0:00 /usr/bin/pulseaudio --daemonize=no bcotton 14007 0.0 0.0 277560 3820 ? Ss 10:03 0:00 /usr/bin/dbus-broker-launch --scope user bcotton 14008 0.3 0.0 7056 4588 ? S 10:03 0:00 dbus-broker --log 4 --controller 11 --machine-id 7a5e67f4bf324273952fb570d0122c43 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000 bcotton 14052 0.0 0.0 17228 5564 ? S 10:03 0:00 /usr/libexec/pulse/gconf-helper bcotton 14053 0.0 0.0 233836 6268 ? Ss 10:03 0:00 /usr/libexec/gconfd-2 bcotton 14095 0.0 0.0 156108 4972 ? Ssl 10:03 0:00 /usr/libexec/dconf-service bcotton 14135 0.0 0.1 268871540 23932 ? SNl 10:03 0:00 /usr/bin/baloo_file bcotton 14170 0.0 0.0 444380 4832 ? Sl 10:03 0:00 /usr/libexec/geoclue-2.0/demos/agent bcotton 14190 0.0 0.0 305884 5260 ? Sl 10:03 0:00 /home/bcotton/.config/bluejeans-v2/BluejeansHelper --hidden bcotton 14198 0.0 0.1 485928 21420 ? Sl 10:03 0:00 /usr/bin/abrt-applet --gapplication-service bcotton 14241 0.0 0.0 305336 6548 ? Ssl 10:03 0:00 /usr/libexec/at-spi-bus-launcher bcotton 14250 0.0 0.0 268468 3508 ? S 10:03 0:00 /usr/bin/dbus-broker-launch --config-file=/usr/share/defaults/at-spi2/accessibility.conf --scope user bcotton 14252 0.0 0.0 5244 2868 ? S 10:03 0:00 dbus-broker --log 4 --controller 9 --machine-id 7a5e67f4bf324273952fb570d0122c43 --max-bytes 100000000000000 --max-fds 6400000 --max-matches 5000000000 bcotton 14305 0.0 0.0 446888 7116 ? Ssl 10:03 0:00 /usr/libexec/gvfsd
Indeed, besides systemd (loginctl) my experiences tell me to look for debugging time delays into another good suspiction that's anything DBUS related.
dbus-broker should write its own log or some entries to journalctl. Can you share that snippets for relevance as well?
> Manager responder socket. Jul 29 11:46:18 localhost.localdomain dbus-broker[699]: Dispatched 9509 messages @ 14(±21)μs / message. > Jul 29 11:46:18 localhost.localdomain systemd[1]: Stopping D-Bus System Message Wow, that's a lot about 9509. ±21 indicates a high fluctuation time of message processing. We may need to know what messages are those.
(In reply to Ben Cotton from comment #18) > From rdieter in #fedora-kde: no time, nor aptitude to really debug that > properly. I think we'll have to do with the logind KillUserProcesses=yes > hammer Please note that I already tested KillUserProcesses=Yes and the results were slightly different, but ultimately the same. See comment 12.
Created attachment 1714449 [details] Systemctl logs with comments I've been doing some testing in my laptop with KDE & SDDM. Interestingly enough, I didn't have any issues logging out and back in when using Wayland. The attached file shows Systemctl with some notes for context. Hope it's useful to somebody. I would like to point out there are two other bug reports that are related to this one, both have been sitting for several release cycles and show the same issue: SDDM gets stuck, fails or hangs in log out.
Related bugs: F32 KDE can't log out https://bugzilla.redhat.com/show_bug.cgi?id=1725933 Relogin doesn't work https://bugzilla.redhat.com/show_bug.cgi?id=1742893 Can't logout from KDE https://bugzilla.redhat.com/show_bug.cgi?id=1847746
Reassigning to dbus-broker. The consensus in #fedora-kde is that this isn't directly involved in this. The "gut feeling" is that it's related to dbus-broker, but that component's maintainers may have helpful insight either way.
(In reply to Raphael Groner from comment #24) > > Manager responder socket. > Jul 29 11:46:18 localhost.localdomain dbus-broker[699]: Dispatched 9509 > messages > @ 14(±21)μs / message. > > Jul 29 11:46:18 localhost.localdomain systemd[1]: Stopping D-Bus System Message > > Wow, that's a lot about 9509. ±21 indicates a high fluctuation time of > message processing. We may need to know what messages are those. Both of these look normal. Message transmission is usually quick enough that a single task preemption or flushed caches cause the transmission time to take twice as long. > Reassigning to dbus-broker. The consensus in #fedora-kde is that this isn't directly involved in this. The "gut feeling" is that it's related to dbus-broker, but that component's maintainers may have helpful insight either way. If you want to know whether this is an issue with the integration of dbus-broker, you can easily verify with: ``` dnf install -y dbus-daemon systemctl disable dbus-broker systemctl enable dbus-daemon systemctl --global disable dbus-broker systemctl --global enable dbus-daemon ``` This will make your system use dbus-daemon, rather than dbus-broker. Note that after those commands, you might be unable to reboot via normal means, since the dbus-units will point to the wrong service. `reboot -ff` should do the trick.
(In reply to David Rheinsberg from comment #29) > If you want to know whether this is an issue with the integration of > dbus-broker, you can easily verify with: > > ``` > dnf install -y dbus-daemon > systemctl disable dbus-broker > systemctl enable dbus-daemon > systemctl --global disable dbus-broker > systemctl --global enable dbus-daemon > ``` Thanks for the tip. I have tried this and, unfortunately, I still see the same pattern Login, wait 10 secs, logout: my session doesn't terminate properly and I have baloo_file, geoclue-2.0 and abrt-applet running. Notice that I have also tried with GDM instead of SDDM and see the same. Back to square one...
In bug #1817708 it is notable that logind does not show any message that the the session has completely died (https://bugzilla.redhat.com/attachment.cgi?id=1673873), i.e. Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: Session 2 logged out. Waiting for processes to exit. Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: New session c2 of user sddm. Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: New session 5 of user jim. Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Session c2 logged out. Waiting for processes to exit. Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Removed session c2. i.e. there is no "Removed session 2" in that log. Which suggests that some process is still lingering in the session scope. Has anyone checked using systemd-cgls to see if the old session quits properly or whether there are processes left? I am kind of guessing that a process that does not need X11/wayland may be surviving for longer. And this *might* keep the session open, also preventing the systemd user instance (and with it dbus) to shut down 10s after logout. All speculation though :) GNOME has this weird thing were we restart dbus at logout time. That is more of a hack, but it does get rid of dbus services surviving the session (we only do that to kill evolution-data-server which may take up a lot of memory). As for switching to dbus-daemon, you would probably want to do the same for the user manager, i.e. you probably need to repeat the commands with --user added to the systemctl calls.
(In reply to Benjamin Berg from comment #31) > As for switching to dbus-daemon, you would probably want to do the same for > the user manager, i.e. you probably need to repeat the commands with --user > added to the systemctl calls. No need. The `--global` switch does exactly that but for all users (a bit of a misnomer, but we are stuck with it).
Oh! Thanks for correcting me. Didn't realise that there is a difference between --user and --global, which makes a lot of sense, even if the naming might not be that intuitive.
FEDORA-2020-5b486e191b has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b
Submitted kf5-baloo update, https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b This one should stop at least one of the errant processes (baloo_file) persisting after session logout.
I have been having this problem forever too, very annoying. I think kf5-baloo helped. When I select logout from KDE plasma menu nothing happens. However, now if I restart the X-server via ctrl-alt-backspace I can login again via sddm. In the past that would hang requiring a restart of the computer. After issuing the logout I switched the console, logged in as root and look at processes running under my username. There are quiet a few. I had no apps running. PID TTY TIME CMD 1572 ? 00:00:00 systemd 1582 ? 00:00:00 (sd-pam) 1594 ? 00:00:00 pulseaudio 1875 ? 00:00:00 dbus-broker-lau 1876 ? 00:00:00 dbus-broker 1877 ? 00:00:00 gconf-helper 1878 ? 00:00:00 gconfd-2 2140 ? 00:00:00 imsettings-daem 2143 ? 00:00:00 gvfsd 2148 ? 00:00:00 gvfsd-fuse 2349 ? 00:00:00 kded5 2355 ? 00:00:00 kwin_x11 2356 ? 00:00:00 kactivitymanage 2365 ? 00:00:00 kglobalaccel5 2373 ? 00:00:00 dconf-service 2429 ? 00:00:00 xembedsniproxy 2431 ? 00:00:00 kaccess 2448 ? 00:00:00 gmenudbusmenupr 2530 ? 00:00:00 at-spi-bus-laun 2538 ? 00:00:00 dbus-broker-lau 2550 ? 00:00:00 dbus-broker 2552 ? 00:00:00 kscreen_backend 2842 ? 00:00:00 agent 2850 ? 00:00:00 zeitgeist-datah 2855 ? 00:00:00 org_kde_powerde 2868 ? 00:00:00 tracker-miner-r 2874 ? 00:00:00 zeitgeist-daemo 2876 ? 00:00:04 tracker-miner-f 2920 ? 00:00:00 zeitgeist-fts 2951 ? 00:00:00 gvfs-udisks2-vo 3004 ? 00:00:00 gvfs-afc-volume 3346 ? 00:00:00 at-spi2-registr 3459 ? 00:00:00 kio_http_cache_ 3692 ? 00:00:00 kdeinit5 3693 ? 00:00:00 klauncher 3696 ? 00:00:00 file.so
Now that we have dbus maintainer ears, in the past, xinit ran the snippet on login: eval `dbus-launch --sh-syntax --exit-with-session` and I figure when the "parent" xinit process ended, so did dbus. Now that dbus user session (effectively) starts on demand, how is dbus expected to end/stop on session logout?
FEDORA-2020-5b486e191b has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5b486e191b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
> Now that dbus user session (effectively) starts on demand, how is dbus expected to end/stop on session logout? Not a dbus maintainer. dbus will be stopped together with the systemd user instance. Which generally means 10s after the last user session quits (UserStopDelaySec= in logind.conf). So it should be expected that dbus does not quit between two logins. The real problem are probably processes left over in the session scope. This is why I suggested running $ systemd-cgls and checking session-X.scope (of the old session). Those are the processes that prevent logind from cleaning up the session completely. It is much more likely they are causing issues than anything that is launched by DBus (visible somewhere underneath user). I think most of the processes listed in the above ps output fall into that category.
Fedora KDE fresh installation on a VM, login as a user, wait a few seconds, logout, wait a few seconds and trigger the `systemd-cgls` and `loginctl session-status` commands Output of `systemd-cgls -u session-2.scope`: Unit session-2.scope (/user.slice/user-1000.slice/session-2.scope): ├─1211 /usr/bin/baloo_file ├─1241 /usr/bin/abrt-applet --gapplication-service └─1243 /usr/libexec/geoclue-2.0/demos/agent Equivalent to `loginctl session-status 2`: 2 - marc.deop (1000) Since: Tue 2020-09-15 22:47:03 CEST; 6min ago Leader: 1047 Seat: seat0; vc1 Display: :0 Service: sddm; type x11; class user Desktop: KDE State: closing Unit: session-2.scope ├─1211 /usr/bin/baloo_file ├─1241 /usr/bin/abrt-applet --gapplication-service └─1243 /usr/libexec/geoclue-2.0/demos/agent Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule* Sep 15 22:50:06 localhost.localdomain startplasma-x11[1072]: QProcess: Destroyed while process ("ksplashqml") is still running. Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_unix(sddm:session): session closed for user marc.deop Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_kwallet5(sddm:session): pam_kwallet5: pam_sm_close_session Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_kwallet5(sddm:setcred): pam_kwallet5: pam_sm_setcred
Kamil, can you see if the baloo update helps your case at all?
Did not help me. Logout ends with a black screen with active cursor (waiting for somehing). After ctrl-alt-backspace can login again.
My first try (kf5-baloo-5.73.0-2) was full of fail, please try kf5-baloo-5.73.0-3.fc33
In my case, the change in dbus didn't help. Using Ctrl+Alt+Backspace doesn't help either, it logs out and then it's stuck in a black screen with active cursor and nothing else. IT's unresponsive and the only way to get out is by pressing the power button (force shutdown). How can I try the update on Baloo? Maybe that works.
Created attachment 1715130 [details] Output of systemd-cgls
I set up a VM with Fedora 33 and added the kf5-baloo -3 update. After 4 or 5 logout/login sequences, I wound up with a black screen. I attached the output of systemd-cgls as attachment 1715130 [details].
Created attachment 1715131 [details] New version with -l flag Noticed that the first version of the attachment had its width truncated, so here is one with -l to show the full text.
(In reply to Steven A. Falco from comment #48) > I set up a VM with Fedora 33 and added the kf5-baloo -3 update. After 4 or > 5 logout/login sequences, I wound up with a black screen. I attached the > output of systemd-cgls as attachment 1715130 [details]. I got the same and this is still reproducible
Created attachment 1715462 [details] Output for systemd-cgls
To avoid confusion. As I understand it, this bug is about a subsequent *login* failing. If *logout* is failing sometimes, then it might be a different issue altogether.
Benjamin - I agree with and understand your description of the bug. As I reported, after 4 or 5 repetitions of logging out, _then_ logging in, I wind up at a black screen. In other words, that last login attempt does not complete properly, and I don't see the desktop - I just see a black screen.
In my case it never works, it gets stuck in the first attempt of logging out.
Assigning back to sddm as testing seemed to indicate quite strongly that dbus-broker isn't the issue here.
FEDORA-2020-5b486e191b has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
(In reply to Adam Williamson from comment #55) > Assigning back to sddm as testing seemed to indicate quite strongly that > dbus-broker isn't the issue here. I don't think it's an SDDM problem as I see the same behaveour with GDM :'(
(In reply to marcdeop from comment #57) > I don't think it's an SDDM problem as I see the same behaveour with GDM :'( I tested GDM on F33 and saw nothing like that happening. If you can reproduce it on F33 (not F32), please file a separate bug report and link it here, I'll look at it. So far, I believe this a problem somewhere in the KDE stack and there's no such universal and trivially reproducible issue happening with GNOME.
Setting back to ASSIGNED due to comment 48 and comment 50.
The user sessions are not being killed off, it must surely be related... $ loginctl session-status 10 10 - user2 (1001) Since: Wed 2020-09-23 20:52:50 MDT; 3min 33s ago Leader: 2680 Seat: seat0; vc1 Display: :0 Service: sddm; type x11; class user Desktop: KDE State: closing Unit: session-10.scope ├─2909 /usr/libexec/geoclue-2.0/demos/agent └─2922 /usr/bin/abrt-applet --gapplication-service When I login user1, logout, login user2, logout, login user1 and get the hang and black screen, I see the following red text items in the journal [ 806.445412] sddm-helper[3727]: gkr-pam: unable to locate daemon control file ... [ 833.013871] kded5[3834]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") [ 833.021354] kactivitymanagerd[3859]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") [ 833.035365] kwin_x11[3838]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") [ 833.058151] ksmserver[3866]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") ... [ 883.025155] kaccess[3898]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") [ 883.026404] plasmashell[3900]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") ... [ 908.028328] org_kde_powerdevil[3971]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") The user sessions keep accumulating... $ loginctl SESSION UID USER SEAT TTY 10 1001 user2 seat0 13 1000 chris seat0 2 1000 chris seat0 9 1000 chris pts/0 This session isn't even responding to 'loginctl kill-session 13' nor 'loginctl terminate-session 13' $ loginctl session-status 13 13 - chris (1000) Since: Wed 2020-09-23 20:57:26 MDT; 7min ago Leader: 3727 Seat: seat0; vc1 Display: :0 Service: sddm; type x11; class user Desktop: KDE State: closing Unit: session-13.scope └─3838 /usr/bin/kwin_x11 'kill -9 3838' does finally kill the session, but doesn't bring back the login window Next tried: systemctl isolate multi-user.target systemctl graphical.target This does bring back the login window, but login in as either user fails, with the same messages in red as before. So something in the environment is sufficiently confused that it needs a reboot to get back to a login state.
Created attachment 1716198 [details] full journal, comment 60
The problem is reproducible if I flip users: first login to user 2, logout, login user1, logout, then login user2 fails with hang.
├─2909 /usr/libexec/geoclue-2.0/demos/agent └─2922 /usr/bin/abrt-applet --gapplication-service I think those two processes do not open a connection to the display server. Which means that they will survive unless the desktop environment kills them at logout time (something that should be fixed by the systemd work that got merged upstream last week \o/). [ 437.246844] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel: Succeeded. ... [ 806.472598] systemd-logind[838]: New session 13 of user chris. [ 806.473290] systemd[1]: Started Session 13 of user chris. ... [ 807.068215] systemd[1196]: Started dbus-:1.2-org.kde.KSplash. [ 807.081646] ksplashqml[3792]: QCoreApplication::arguments: Please instantiate the QApplication object first [ 807.127226] systemd[1196]: dbus-:1.2-org.kde.KSplash: Succeeded. [ 807.206229] klauncher[3810]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString) [ 807.375863] plasma_session[3801]: org.kde.plasma.session: process job "kcminit_startup" finished with exit code 0 [ 807.789431] systemd[1196]: Started dbus-:1.2-org.kde.ActivityManager. [ 808.056183] ksmserver[3866]: Qt: Session management error: networkIdsList argument is NULL ... [ 833.013871] kded5[3834]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.") So, that seems odd. This is a timeout from the DBus server (see src/kglobalaccel.cpp). But I don't understand how that could happen. If the service was running, DBus should reply immediately. If not, it should start the corresponding transient unit and we should see that in the log (i.e. there should be "systemd[1196]: Started dbus-:1.2-org.kde.kglobalaccel.", the @2 as it is the second time it gets launched by that dbus server). But neither appears to happen, which suggests that dbus-broker is stuck trying to activate the service but never does so.
Talked a bit with David Edmundson, and he pointed out that a kglobalaccel crash might make it wait for a coredump to be collected. But it is not clear whether that is relevant. So, there are two things that might help pinpointing this further: 1. Get a ps/systemd-cgls output after logout and/or at the time that the next login hangs. 2. Get a "dbus-monitor --session" (can be grabbed from a different VT or via SSH) Maybe that gives some more hints as to why service activation is simply not happening. Other than that, there *might* be the following race condition in dbus-broker: 1. First login logs out 2. kglobalaccel crashes [ 437.050657] kglobalaccel5[1346]: The X11 connection broke (error 1). Did the X11 server die? [ 437.054968] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel: Succeeded. 3. Something requests kglobalaccel (maybe just automatic dbus activation?) [ 437.092048] systemd[1196]: Started dbus-:1.2-org.kde.kglobalaccel. 4. It fails immediately, as there is no X server anymore. Assumption: It does *not* register its name on the bus [ 437.113714] kglobalaccel5[2524]: qt.qpa.xcb: could not connect to display :0 [ 437.246844] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel: Succeeded. [ 437.294378] systemd-coredump[2530]: Process 2524 (kglobalaccel5) of user 1000 dumped core. 5. This *could* mean that dbus-broker gets confused. The service *has* started from systemds point of view, but the name never appeared on the bus. 6. Next login, dbus-broker sees its own old activation request (activation->pending == TRUE), and figures it just needs to wait for the existing activation to complete.
OK, easier test. Install https://koji.fedoraproject.org/koji/taskinfo?taskID=52153402 then post the log after triggering the bug. I have added a log message for activation, which means for each request you'll see: dbus-broker[X]: Activation request for $SERVICE queuing or dbus-broker[X]: Activation request for $SERVICE already pending If the theory that I floated in the last comment is correct, then you will see a number of dbus-broker[X]: Activation request for org.kde.kglobalaccel already pending messages in the log, but no message with "queuing". Which would prove that dbus-broker really is still waiting on the kglobalaccel5 process that had already crashed during the last logout.
Created attachment 1716354 [details] Logs as requested I ran the requested experiment, and I've attached a tar of the log files. Here is how they are named: files with "before_any_login" in their name are from before any graphical login files with "in_1" in their name are from after the first login files with "out_1" in their name are from after the first logout The naming pattern repeats. On the third attempt to log in, I got the black screen. Those logs have "in_3" in their names. I do see a bunch of "Activation request ... already pending" messages in the messages_in_3.log file. Most are from kglobalaccel, but one is from bluez. If you need any different data or experiments run, please let me know.
Hmm, I think that confirms it, fishing out relevant stuff from the log: Sep 24 11:44:17 localhost systemd-logind[808]: Session 7 logged out. Waiting for processes to exit. Sep 24 11:44:17 localhost kglobalaccel5[1581]: The X11 connection broke (error 1). Did the X11 server die? Sep 24 11:44:17 localhost systemd[918]: dbus-:1.2-org.kde.kglobalaccel: Succeeded. Sep 24 11:44:17 localhost dbus-broker[961]: Activation request for org.kde.kglobalaccel queuing So, session is closing already, kglobalaccel5 dies, then something re-activates it immediately. And it tries to start up: Sep 24 11:44:17 localhost systemd[918]: Started dbus-:1.2-org.kde.kglobalaccel. Sep 24 11:44:17 localhost kglobalaccel5[2207]: qt.qpa.xcb: could not connect to display :0 Sep 24 11:44:17 localhost kglobalaccel5[2207]: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found. Sep 24 11:44:17 localhost kglobalaccel5[2207]: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.#012#012Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb. Sep 24 11:44:18 localhost systemd[918]: dbus-:1.2-org.kde.kglobalaccel: Succeeded. But fails. Then, later: Sep 24 11:44:43 localhost systemd-logind[808]: New session 9 of user sfalco. Sep 24 11:44:43 localhost systemd[1]: Started Session 9 of user sfalco. ... Sep 24 11:44:45 localhost dbus-broker[961]: Activation request for org.kde.kglobalaccel already pending And no "queuing" message anywhere after the one that failed. So dbus-broker still thinks (after 28 seconds) that kglobalaccel is starting up. ---- Now, the question is how to solve it. I think there is a dbus-broker bug, it should notice that services are failing without ever registering on the bus[1]. I personally suspect the best short term solution is to adopt the hack that GNOME has. i.e. restart the dbus systemd service at logout time. That might even get rid of those two processes that are holding on in the session scope. [1] Adding BusName= to the transient unit would be the most elegant solution. However, template units are used, so if an old failed unit is still loaded then systemd might start complaining. So one might need to watch the transient service until the name is registered on the bus.
I'm pretty sure there multiple overlapping bugs that make this so hard to diagnose. We need to fix them one by one to chip away at the problem. I tried to reproduce this by creating a pristine new user on F33 and 'dnf group install "KDE (K Desktop Environment)" --enablerepo=updates-testing', logging in and logging out. The right time to look at the process list is *after* logging out, and not when the subsequent login fails. Once we try to log in again, new processes are hopelessly mixed with the old ones. I get a failure to log out cleanly (lingering processes) in maybe 1/3 of attempts. Interestingly, the hang is not indefinite, but seems semi-random. Sometimes the session terminates on its own after a few seconds, sometimes a few minutes. -- (In reply to Benjamin Berg from comment #39) > The real problem are probably processes left over in the session scope. This > is why I suggested running > > $ systemd-cgls > > and checking session-X.scope (of the old session). Those are the processes > that prevent logind from cleaning up the session completely. It is much more > likely they are causing issues than anything that is launched by DBus > (visible somewhere underneath user). I think most of the processes > listed in the above ps output fall into that category. Yep. Any processes that are left over in the session scope session-NN.scope are already a problem. Anything under user would only need to be considered once we eliminate the session. $ loginctl session-status 40 40 - kde (1004) Since: Thu 2020-09-24 17:56:40 CEST; 1min 23s ago Leader: 288889 (gdm-session-wor) Seat: seat0; vc7 TTY: tty7 Service: gdm-password; type x11; class user State: online Unit: session-40.scope ├─288889 gdm-session-worker [pam/gdm-password] ├─288927 /usr/bin/gnome-keyring-daemon --daemonize --login ├─288949 /usr/libexec/gdm-x-session --register-session --run-script /usr/bin/startplasma-x11 ├─288951 /usr/libexec/Xorg vt7 -displayfd 3 -auth /run/user/1004/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -novtswitch -verbose 3 ├─289046 /usr/bin/startplasma-x11 ├─289085 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "/usr/bin/startplasma-x11" ├─289146 /usr/libexec/kf5/start_kdeinit ├─289147 kdeinit5: Running... ├─289148 /usr/libexec/kf5/klauncher --fd=9 ├─289168 /usr/bin/kded5 ├─289172 /usr/bin/kwin_x11 -- (In reply to Kamil Páral from comment #12) > Then I experimented with KillUserProcesses=Yes in /etc/systemd/logind.conf. (In reply to Benjamin Berg from comment #31) > In bug #1817708 it is notable that logind does not show any message that the > the session has completely died > (https://bugzilla.redhat.com/attachment.cgi?id=1673873), i.e. > > Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: Session 2 logged > out. Waiting for processes to exit. > Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: New session c2 of > user sddm. > Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: New session 5 of > user jim. > Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Session c2 logged > out. Waiting for processes to exit. > Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Removed session > c2. > > i.e. there is no "Removed session 2" in that log. Right. At least for me, after I click "exit", logind thinks the session is still active: "State: online". This means that KillUserProcesses= does not matter: systemd would only kill processes *after* the session has finished. Ultimately, I think that this is a problem with the kde session: the process controlling the session does not exit. Things go south from there. -- (In reply to Chris Murphy from comment #60) > $ loginctl > SESSION UID USER SEAT TTY > 10 1001 user2 seat0 > 13 1000 chris seat0 > 2 1000 chris seat0 > 9 1000 chris pts/0 > > > This session isn't even responding to 'loginctl kill-session 13' nor > 'loginctl terminate-session 13' > > $ loginctl session-status 13 > 13 - chris (1000) > Since: Wed 2020-09-23 20:57:26 MDT; 7min ago > Leader: 3727 > Seat: seat0; vc1 > Display: :0 > Service: sddm; type x11; class user > Desktop: KDE > State: closing > Unit: session-13.scope > └─3838 /usr/bin/kwin_x11 The first clear bug that I see is that /usr/bin/kwin_x11 does not respond to SIGTERM. If processes don't exit on SIGTERM but just hang around, we have problems. Chris's trace above has this, and I also saw kwin_x11 as the lone remaining process in the session scope. strace while doing 'kill 289172' three times with a few seconds inbetween and than finally SIGKILL: restart_syscall(<... resuming interrupted read ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290235, si_uid=0} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) poll([{fd=3, events=POLLIN}], 1, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290247, si_uid=0} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) poll([{fd=3, events=POLLIN}], 1, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290258, si_uid=0} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) poll([{fd=3, events=POLLIN}], 1, -1 <unfinished ...>) = ? +++ killed by SIGKILL +++ I didn't capture the stack for this process. But in another hang which looked similar: $ cat pstack.282115 Thread 2 (Thread 0x7fb93ffff640 (LWP 282128)): #0 0x00007fb94b46db0e in ppoll () at /lib64/libc.so.6 #1 0x00007fb94b9f7ef0 in qt_safe_poll(pollfd*, unsigned long, timespec const*) () at /lib64/libQt5Core.so.5 #2 0x00007fb94b9f92b3 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5 #3 0x00007fb94b9af74b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5 #4 0x00007fb94b81e2f7 in QThread::exec() () at /lib64/libQt5Core.so.5 #5 0x00007fb94c977a47 in QDBusConnectionManager::run() () at /lib64/libQt5DBus.so.5 #6 0x00007fb94b81f4cc in QThreadPrivate::start(void*) () at /lib64/libQt5Core.so.5 #7 0x00007fb94acd53f9 in start_thread () at /lib64/libpthread.so.0 #8 0x00007fb94b478b03 in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7fb945b3e0c0 (LWP 282115)): #0 0x00007fb94acdb6c2 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007fb94b824e6b in QWaitCondition::wait(QMutex*, QDeadlineTimer) () at /lib64/libQt5Core.so.5 #2 0x00007fb94c9c42a2 in QDBusPendingCallPrivate::waitForFinished() () at /lib64/libQt5DBus.so.5 #3 0x00007fb94c9c53c2 in QDBusPendingReplyData::argumentAt(int) const () at /lib64/libQt5DBus.so.5 #4 0x00007fb949eb8e2e in QDBusPendingReply<QList<int>, void, void, void, void, void, void, void>::Select<0>::Type const QDBusPendingReply<QList<int>, void, void, void, void, void, void, void>::argumentAt<0>() const () at /lib64/libKF5GlobalAccel.so.5 #5 0x00007fb949ebddd1 in KGlobalAccelPrivate::updateGlobalShortcut(QAction*, QFlags<KGlobalAccelPrivate::ShortcutType>, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5 #6 0x00007fb949ebe200 in KGlobalAccel::setShortcut(QAction*, QList<QKeySequence> const&, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5 #7 0x00007fb94cd29109 in void KWin::Workspace::initShortcut<KWin::Workspace, void (KWin::Workspace::*)()>(QString const&, QString const&, QKeySequence const&, KWin::Workspace*, void (KWin::Workspace::*)(), QVariant const&) [clone .constprop.0] () at /lib64/libkwin.so.5 #8 0x00007fb94ccb0ddb in KWin::Workspace::initShortcuts() () at /lib64/libkwin.so.5 #9 0x00007fb94cccfac0 in KWin::Workspace::Workspace() () at /lib64/libkwin.so.5 #10 0x00007fb94cc48ad0 in KWin::Application::createWorkspace() () at /lib64/libkwin.so.5 #11 0x0000561595197db5 in QtPrivate::QFunctorSlotObject<KWin::ApplicationX11::performStartup()::{lambda()#2}::operator()() const::{lambda()#1}, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) () #12 0x00007fb94b9df356 in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt5Core.so.5 #13 0x00005615951985b6 in QtPrivate::QFunctorSlotObject<KWin::ApplicationX11::performStartup()::{lambda()#2}, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) () #14 0x00007fb94b9df356 in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt5Core.so.5 #15 0x00007fb94c8a1013 in KSelectionOwner::Private::claimSucceeded() () at /lib64/libKF5WindowSystem.so.5 #16 0x00007fb94c8a6d01 in KSelectionOwner::filterEvent(void*) () at /lib64/libKF5WindowSystem.so.5 #17 0x00007fb94b9adfec in QAbstractEventDispatcher::filterNativeEvent(QByteArray const&, void*, long*) () at /lib64/libQt5Core.so.5 #18 0x00007fb9452fd001 in QXcbConnection::handleXcbEvent(xcb_generic_event_t*) () at /lib64/libQt5XcbQpa.so.5 #19 0x00007fb9452fe516 in QXcbConnection::processXcbEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5XcbQpa.so.5 #20 0x00007fb94532102c in QXcbUnixEventDispatcher::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5XcbQpa.so.5 #21 0x00007fb94b9af74b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5 #22 0x00007fb94b9b7110 in QCoreApplication::exec() () at /lib64/libQt5Core.so.5 #23 0x0000561595196f83 in main () I don't know this framework, but my guess is that it installs a signal handler that sets some flag to exit, but it's not integrated with the event loop (QEventLoop?). Various people are seeing kwin_x11 as the culprit process. (I think there might be other issues, but this one seems common and obvious.) I think this should be reassigned to kwin, though the issue may ultimately lie in the framework. kwin-5.19.5-2.fc33.x86_64 qt5-qtbase-5.15.1-1.fc33.x86_64
Interesting. I think you are right that kwin_x11 isn't properly shutting down. In the case of your stack trace kwin_x11 seems to be processing some last X11 messages before the X server was shutting down for good. The stack trace looks like a sync dbus call to me (running a nested mainloop, but only serving the DBus socket). So kwin_x11 will likely quit 25s later when the call times out. #5 0x00007fb949ebddd1 in KGlobalAccelPrivate::updateGlobalShortcut(QAction*, QFlags<KGlobalAccelPrivate::ShortcutType>, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5 And, the dbus call seems related to the kglobalaccel service. So that would explain why that services is being restarted.
So... 25s wouldn't be too bad, iff things were "blocked". It would be an annoying timeout, nothing more. But instead we try to start a concurrent session, and then the remnants of the old session prevent the new one from starting. So we really need to fix this both ways: first figure out why it's not quitting in a timely fashion (your direction with kglobalaccel seems relevant here), but second we need to make sure the new session is not started before the old one exits, so the whole thing is robust is something blocks exit.
Note though, that I expect KDE in F34 to switch to using systemd, and systemd is pretty good at making sure things shut down cleanly. David Edmundson mentioned to me that he probably forgot to install a service file for kglobalaccel. Doing that is trivial and it would prevent dbus-broker from getting confused as we would have a proper BusName= in the unit file then.
Discussed during the 2020-09-24 Fedora 33 Go/No-Go meeting: [0] The decision to waive this bug's existing classification as an "AcceptedBlocker (Beta)" in the Go/No-Go decision was made under the "Difficult to fix blocker bugs" exception due to the difficulty in isolating the cause. [0] https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2020-09-24/f33-beta-go_no_go-meeting.2020-09-24-17.00.txt
(In reply to Benjamin Berg from comment #67) > Now, the question is how to solve it. I think there is a dbus-broker bug, it > should notice that services are failing without ever registering on the > bus[1]. > > I personally suspect the best short term solution is to adopt the hack that > GNOME has. i.e. restart the dbus systemd service at logout time. That might > even get rid of those two processes that are holding on in the session scope. > > > [1] Adding BusName= to the transient unit would be the most elegant > solution. However, template units are used, so if an old failed unit is > still loaded then systemd might start complaining. So one might need to > watch the transient service until the name is registered on the bus. DBus messages have no timeouts. Any actual timeouts you see in client libraries are implemented in the client library. The broker never sees them (which, btw., leaks "reply-windows" in the broker). Hence, activation-requests from a client will never "time-out". They will simply remain pending until the activation fails. This does not preclude the broker from maintaining its own timeouts, but there is no standardized way to communicate these timeouts in a non-racy manner. When you activate a service, dbus-broker will start the service specified by the dbus-service-file, or create a transient unit to execute the specified binary. In both cases, the only way to cause the activation to fail is to make this binary or service fail. If they return success, there is no way for dbus-broker to know what other things are going on, or what to track (just imagine one-shot units, you cannot know what side-effects they have). If you want dbus-broker to be notified of a failed service activation, I highly suggest returning an error to the broker. A very simple solution is to provide your own service file, rather than relying on the transient unit, and then using `Type=dbus` as the systemd service type (we cannot do this for transient units, since they do not necessarily run on a bus that is tracked by systemd; think at-spi and friends). I am open to suggestions, but I doubt another layer of timeouts would improve on anything :-/
(In reply to Geoffrey Marr from comment #72) > Discussed during the 2020-09-24 Fedora 33 Go/No-Go meeting: [0] > > The decision to waive this bug's existing classification as an > "AcceptedBlocker (Beta)" in the Go/No-Go decision was made under the > "Difficult to fix blocker bugs" exception due to the difficulty in isolating > the cause. > > [0] > https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2020-09-24/f33-beta- > go_no_go-meeting.2020-09-24-17.00.txt If a blocker is waived according to an exception rule, it's moved to the very next milestone per policy. Setting as a Final blocker.
> I am open to suggestions, but I doubt another layer of timeouts would improve on anything :-/ Hmm, yeah, it is ugly situation. You are probably right and there is no existing mechanism to solve the issue reliably other than timeouts (e.g. fd passing to the spawned process). In this case I was first imagining to watch the transient unit for failures (until it owns the name), but even that makes assumptions that may not be true. Trying to summarize the situation: * I think the proper fix will happen with the systemd change, but that is for F34 only * The simplest short term fix is to ship systemd service units for the DBus activatable services (kglobalaccel specifically, but ideally all of them) Shipping those units (and updating the dbus service files), should improve the situation a lot: * It will fix detection of the failed dbus service startup for kglobalaccel * This in turn works around kwin_x11 hanging during the dbus call * Which should fix kwin_x11 quitting, even if it is not graceful Things will still be mess during shutdown (services randomly failing), but at least the state afterwards should become a lot cleaner. And, there may still be other issues, but it seems like it should fix a scenario that has been coming up quite frequently in this bug.
Last time I tested in a VM, Rawhide 33, it worked just fine. Log out and log in. Right now I tested it again, with KDE Spin beta and... well, kind of works. Log out is fine, but log in leaves me in a blank screen with only a moving cursor. Keyboard unresponsive and no other way out but forcing power off.
Can someone please try implementing Benjamin's suggestion from #c75? It seems like a sensible idea to me. Rex, can you do that?
I have the same issue here, exactly behavior as described on Comment #76 By Sylvia on my F33 kde (instaled form the Fedora-KDE-Live-x86_64-33_Beta-1.3.iso).
I tried this on my setup without user2; just log in user1, log out, log in user1 - and the login stuck, everytime. OS: Fedora 33 (KDE Plasma Prerelease) x86_64 Host: Aspire V3-571 V2.11 DE: Plasma 5.19.5 KDE Frameworks: 5.73.0 Qt: 5.15.1 WM: KWin WM Theme: Breeze Theme: Breeze Dark [Plasma], Adwaita [GTK2] Extract of the journalctl -r command right after I tried to login again (full journalctl-r attached to this) - host tamgamandapio/ user (only user on this setup) geraldo out 13 11:32:35 tangamandapio systemd[1]: systemd-hostnamed.service: Succeeded. out 13 11:32:31 tangamandapio ksmserver[2088]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no> out 13 11:32:31 tangamandapio kwin_x11[2065]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not> out 13 11:32:31 tangamandapio kded5[2061]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not re> out 13 11:32:31 tangamandapio kactivitymanagerd[2081]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply",> out 13 11:32:15 tangamandapio systemd[1]: user-990.slice: Consumed 1.331s CPU time. out 13 11:32:15 tangamandapio systemd[1]: Removed slice User Slice of UID 990.
Created attachment 1721211 [details] Reverse journalctl form my attempt to logout and login again on the same user Reverse journalctl from the time I logout from the first session until the moment I needed to enter a tty3 session for reboot the system because the new login get stuck (only one user - geraldo).
When you say you "tried this", what is "this"? I don't think you should have cancelled my needinfo for Rex, I was asking him to implement a packaging change for testing.
Created attachment 1721259 [details] Coredumpctl gdb after the login stuck
(In reply to Adam Williamson from comment #81) > When you say you "tried this", what is "this"? I don't think you should have > cancelled my needinfo for Rex, I was asking him to implement a packaging > change for testing. Sorry Adam, I cancelled your needinfo by mistake, I didn't wanted to do that. :( I hope I do it right this time. The "this" was just login, logout and new login on the same user.
So since no-one else seems willing to try and implement Ben's idea, I did. Though I'm no dbus expert so I don't really know if I did it right. Here's the scratch build for testing: https://koji.fedoraproject.org/koji/taskinfo?taskID=53469223 it adds a /usr/lib/systemd/system/kf5-kglobalaccel.service with this content: === [Unit] Description=KGlobalaccel [Service] Type=dbus BusName=org.kde.kglobalaccel ExecStart=/usr/bin/kglobalaccel5 === and adds a single line to /usr/share/dbus-1/services/org.kde.kglobalaccel.service: SystemdService=kf5-kglobalaccel that's all it does. Benjamin, if it needs more than that, please let me know. Thanks.
Thanks Adam, with that much to start with, I can test and refine things. As is, looks like that's not enough or right, I'm ending up with a session with no kglobalaccel5 running. I think it has to be user/kglobalacces.service instead. Testing theory now.
oh yes, that sounds logical. I'll try it too.
Confirmed, that worked to get kglobalaccel running properly in my session, but logout/login continues to get stuck. loginctl session-status (still) reports some stuff lingering including State: closing Unit: session-2.scope: ... /usr/libexec/geoclue-2.0-demos/agent ... /usr/sbin/abrt-applet --gapplication-service progress?
Yeah, I think we need to give up the idea of getting rid of those lingering processes for F33. It isn't nice, but at least those two shouldn't be inhibiting the next login. So hard to tell. You at least shouldn't see the messages that something is waiting for kglobalaccel at next login anymore. i.e.: out 13 11:32:31 tangamandapio ksmserver[2088]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no> Maybe check if you see similar messages for another service?
Unfortunately, I'm seeing the same stuff in journalctl --user output after a stuck login: kaccess: Couldn't start kglobalaccess from org.kde.kglobalaccess.service ... plasmashell: Couldn't start kglobalaccess from org.kde.kglobalaccess.service ... org_kde_powerdevil : Couldn't start kglobalaccess from org.kde.kglobalaccess.service ...
Well, I'll get an official kf5-kglobalaccel build with this in , I don't think it will hurt.
Unless I'm getting really lucky, the fix does seem to be helping me. I've just done 10 straight login / logout cycles without getting stuck. Did you reboot after updating, Rex?
that is, with https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 , which puts the unit in /user not /system as Rex suggested.
Hmm, oh, Rex may have run into the systemd restart limit. i.e. if you were quicker than 10s, then systemd might be refusing to try starting the service again. The simple (and reasonable) fix for that would be to run "systemctl --user reset-failed" at login time before starting everything else.
rebooted many times, I'll make sure to wait at least 10 seconds for subsquent tests, build going here, https://koji.fedoraproject.org/koji/taskinfo?taskID=53477863 Only minor change from what Adam had done was to use kglobalaccel.service name (instead of kf5-kglobalaccel)
Still no luck, (sorry for the spam), login stuck even after waiting 30+ seconds and issuing systemctl --user reset-failed Maybe what I committed is wrong, please review for errors: systemd kglobalaccel.service (#1861700) https://src.fedoraproject.org/rpms/kf5-kglobalaccel/c/75acba304693c7252b3f9cbe9fad255d0f1b6d7f?branch=f33 import kglobalaccel.service https://src.fedoraproject.org/rpms/kf5-kglobalaccel/c/0030ad7eed9b5e9013e81cd8154afbb1bed9b7b0?branch=f33
Hmm, seems like those commits failed to build possibly due to a missing build-dependency on systemd-rpm-macros? Not sure what you actually tested now. btw. you can always run "systemctl --user" or "systemctl --user status kglobalaccel.service" to check. If it worked, you'll see the service up and running there.
Sorry for the build error, new one(s) going, https://koji.fedoraproject.org/koji/buildinfo?buildID=1625458 Happily for my local build, systemctl --user status kglobalaccel.service reports Active: active (running)
Rex: I think the .service in: echo 'SystemdService=kglobalaccel.service' >> %{buildroot}%{_datadir}/dbus-1/services/org.kde.kglobalaccel.service may be wrong. Or at least, I didn't have it in my build. My line is: echo "SystemdService=kf5-%{framework}" >> %{buildroot}%{_datadir}/dbus-1/services/org.kde.kglobalaccel.service note, no .service there.
oh, I do see lots of existing files in that directory use the .service, though. So it's probably fine. :/
(In reply to Rex Dieter from comment #97) > Sorry for the build error, new one(s) going, > https://koji.fedoraproject.org/koji/buildinfo?buildID=1625458 > > Happily for my local build, > systemctl --user status kglobalaccel.service > reports > Active: active (running) Hello Rex, I tried now with the new build you made (kf5-kglobalaccel-5.73.0-2.fc33) and the problem persists on my setup. I will attach the journalctl -r file to the bug, the coredumpctl dbg and the systemctl user status too. Here's a small part of the journalctl: out 15 10:21:31 tangamandapio plasma_session[2210]: org.kde.plasma.session: Starting autostart service "/etc/xdg/autostart/polkit-kde-authentication-agent-1.desktop"> out 15 10:21:31 tangamandapio plasma_session[2210]: org.kde.plasma.session: Starting autostart service "/etc/xdg/autostart/xembedsniproxy.desktop" ("/usr/bin/xembeds> out 15 10:21:06 tangamandapio kactivitymanagerd[2257]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply",> out 15 10:21:06 tangamandapio kwin_x11[2241]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not> out 15 10:21:06 tangamandapio kded5[2237]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not re> out 15 10:21:06 tangamandapio ksmserver[2264]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no> out 15 10:20:50 tangamandapio systemd[1104]: run-user-990.mount: Succeeded. out 15 10:20:41 tangamandapio ksmserver[2264]: Qt: Session management error: networkIdsList argument is NULL out 15 10:20:41 tangamandapio systemd[1104]: Started dbus-:1.2-org.kde.ActivityManager.
Created attachment 1721848 [details] Reverse journalctl with packages kf5-kglobalaccel-5.73.0-2.fc33
Created attachment 1721849 [details] Coredumpctl gdb after the login stuck (new build kf5-kglobalaccel-5.73.0-2.fc33)
Created attachment 1721850 [details] User service status (from the gui panel- after installation from build kf5-kglobalaccel-5.73.0-2.fc33)
Created attachment 1721855 [details] systemctl user status (comand line) using kf5-kglobalaccel-5.73.0-2.fc33
So baring miracles, if we want this fixed for GA, I think we'll still have to go with the logind KillUserProcesses=yes workaround for kde spin.
Just out of curiosity, Geraldo, can you try with my https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 instead? You'll probably need to use dnf --downgrade as it'll be lower versioned than Rex's. Rex, didn't we already establish above that KillUserProcesses didn't help?
(In reply to Rex Dieter from comment #105) > So baring miracles, if we want this fixed for GA, I think we'll still have > to go with the logind KillUserProcesses=yes workaround for kde spin. I'd rather ship with this bug than introduce surprising behavior this late in the process.
So, seems like the dbus servers (both dbus-broker *and* dbus-daemon) will not try to activate the service again. dbus-daemon seems to do so after 120s, dbus-broker probably never. I don't get that really, there is no problem with bombarding systemd with activation requests … Anyway, I now propose the following fix: /etc/xdg/plasma-workspace/shutdown/99-restart-dbus.sh """ #!/bin/sh # Restart DBus as shutdown may not be clean leaving failed activation requests that prevent the next login systemctl --user restart dbus.service """
I think I tried the "logind KillUserProcesses=yes" at some point, it works BUT it also killed my nohup jobs so it was unacceptable.
Comment #108 Good idea, I'll try that. restart rather than stop ?
the initial suggestion led to logout getting stuck, but I suspect that's because dbus got pulled out from under it. This seems better though, --------------------- #!/bin/sh # Restart DBus as shutdown may not be clean leaving failed activation requests that prevent the next login (sleep 15 && systemctl --user stop dbus.service) & --------------------- waiting a bit first, then doing the deed. With this, dbus does get stopped. The old session and some processes still linger (see below), but at least it's no longer preventing subsequent logins. Benjamin (or anyone), can you clarify if 'stop' or 'restart' is preferable here, and if this looks agreeable, I'll stuff this into packaging somewhere (probably kde-settings-plasma) -------------------------------- these processes remain after logout: rdieter 1014 0.3 0.3 25048 14884 ? Ss 12:16 0:00 /usr/lib/systemd/systemd --user rdieter 1016 0.0 0.1 50364 5716 ? S 12:16 0:00 (sd-pam) rdieter 1266 0.0 0.1 523124 5180 ? Sl 12:17 0:00 /usr/libexec/geoclue-2.0/demos/agent and loginctl session-status output: 2 - rdieter (1000) Since: Thu 2020-10-15 12:16:57 CDT; 2min 46s ago Leader: 1009 Seat: seat0; vc1 Display: :0 Service: sddm; type x11; class user Desktop: KDE State: closing Unit: session-2.scope └─1266 /usr/libexec/geoclue-2.0/demos/agent
You really should not be stopping dbus, so "restart" is the way to go! Not sure about waiting 15s, I see the point of trying to delay it till everything else is finished, but 15s seems longer than reasonable to me.
(In reply to Adam Williamson from comment #106) > Just out of curiosity, Geraldo, can you try with my > https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 instead? You'll > probably need to use dnf --downgrade as it'll be lower versioned than Rex's. > > Rex, didn't we already establish above that KillUserProcesses didn't help? Hi Adam, I tried your build right now and it works!!! I don't get stucked anymore on the login. I tried four times logoff and login and it worked fine. But instead, now the keyboard shortcuts doesn't work anymore. :( [geraldo@tangamandapio ~]$ systemctl --user status kf5-kglobalaccel.service ● kf5-kglobalaccel.service - KGlobalaccel Loaded: loaded (/usr/lib/systemd/user/kf5-kglobalaccel.service; static) Active: inactive (dead) [geraldo@tangamandapio ~]$ journalctl -r (if you want it I can attach the full journalctl from this last boot) out 15 14:09:49 tangamandapio systemd[1104]: Started dbus-:1.2-ca.desrt.dconf. out 15 14:09:49 tangamandapio systemd[1104]: Created slice dbus\x2d:1.2\x2dca.desrt.dconf.slice. out 15 14:09:49 tangamandapio systemd[1104]: Started dbus-:1.2-org.kde.ActivityManager. out 15 14:09:49 tangamandapio systemd[1104]: Created slice dbus\x2d:1.2\x2dorg.kde.ActivityManager.slice. out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed. out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed. out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed. out 15 14:09:49 tangamandapio kded5[1207]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.ServiceUnknown", "Cou> out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed. [geraldo@tangamandapio ~]$ rpm -qa| grep kf5-kglobalaccel kf5-kglobalaccel-libs-5.73.0-1.2aw.fc33.x86_64 kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64 kf5-kglobalaccel-debuginfo-5.73.0-1.2aw.fc33.x86_64 kf5-kglobalaccel-libs-debuginfo-5.73.0-1.2aw.fc33.x86_64 [geraldo@tangamandapio ~]$
Well, to be fair, dbus is socket activated. So "stop" is also fine, as the service will be restarted automatically when anything tries to use it. GNOME is doing a restart, which is why I proposed that.
Created attachment 1721884 [details] Reverse journalctl with build from Adam (kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64) With this build - kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64 I managed to logout and login again many times, sucessfully. But now keybord shortcuts doesn't work anymore.
Ah. So that sounds like my build doesn't so much "work" as it just...doesn't run kglobalaccel at all any more, for some reason? Which would tie in with the results we both saw, but isn't the right way to fix the bug...
So I think what we can do here is this: * Create an update with Benjamin's proposed "restart dbus on logout" hammer * Do a candidate compose with it to see how it goes * Don't push it stable unless that looks good Could you do that, Rex? As long as we don't push the update stable, the change is trivially reversible.
I'll work on it today.
FEDORA-2020-f5be904258 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-f5be904258
Details about what was committed, https://src.fedoraproject.org/rpms/kde-settings/c/34266bc9b362b3db9f0ef5c7e4a3d0154a02f8a9
Thanks Rex! Kamil, Geraldo - can you test?
FYI, the shutdown script (still) allows some time, 8 seconds, before executing the user dbus restart hack. So you'll need to wait at least that long during logout/login tests.
Why not let systemd kill dbus service automatically on user logout?
Re: comment #123 How is that supposed to work? Isn't that the point of this bug, that is apparently not happening by itself, or is there something else that needs to happen to trigger that?
(In reply to Adam Williamson from comment #121) > Thanks Rex! > > Kamil, Geraldo - can you test? Adam, I just have tested these new builds from Rex: kde-settings-33.0-2.fc33.noarch.rpm kde-settings-plasma-33.0-2.fc33.noarch.rpm kde-settings-pulseaudio-33.0-2.fc33.noarch.rpm qt-settings-33.0-2.fc33.noarch.rpm I made five rounds logout and login again. It only get stucked one time, the other four attempts have been successfully completed!! :D I have two coredumps from the kglobalaccel5 (debugg attached), but they occurred on two successfull logins. After the complete logout I have waited at least 10 seconds for each new login attempt. I wil try to continue that, to see if get stucked again.
Created attachment 1722149 [details] Reverse journalctl with build kde-settings-33.0-2.fc33 Reverse Journalctl from a successful logout-login attempt.
Created attachment 1722150 [details] Two coredumps debugged from two different logout-login rounds After update from the packages kde-settings-33.0-2.fc33. Two coredumpctl gdb outputs: both from kglobalaccel5 (succesful login)
Created attachment 1722152 [details] Kglobalaccel service running
Created attachment 1722165 [details] systemctl--user-status
Created attachment 1722166 [details] systemctl--user-status after successful logout-login
Done 10 more logout-login rounds again. All successful here. All ended on a new working session. But for each one it causes a coredump on /usr/bin/kglobalaccel5
FEDORA-2020-f5be904258 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-f5be904258` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-f5be904258 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
The coredump for kglobalaccel5 is expected with the current workaround (i.e. we don't prevent the crash, we prevent it from breaking the next login). So you'll see the report at the next login. Trying to fix/work around that might be possible, but maybe it is better to move that into a different issue to discuss it there. It only causes an annoying message that is safe to ignore.
(In reply to Rex Dieter from comment #120) > Details about what was committed, > https://src.fedoraproject.org/rpms/kde-settings/c/ > 34266bc9b362b3db9f0ef5c7e4a3d0154a02f8a9 I've only tested this on a single-user system so far. And I believe this is a wrong way to go. Before this patch, I can do 5 successful and immediate logouts&logins of the *same user* in a row without any issues. After this patch, I haven't seen a single case where it would work. In my 5 attempts, I got 3 black screens during a subsequent login (no user action possible) and 2 desktops with a "plasma crashed" icon in the systray during which system actions were nonfunctional (sleep, reboot and poweroff were completely missing from the system menu, and lock and logout were present but did nothing when clicked on). Reproducer: 1. A freshly installed KDE system with only stable updates and the update from comment 119, only a single user account created. 2. Reboot. 3. Log in. Do some work, like opening a terminal, etc. 4. Log out. 5. Log in immediately again (don't wait 8 seconds). Note that I intentionally didn't wait for the 8 seconds timeout, because the login screen doesn't show that timer anywhere and another login is not prevented, i.e. the user is not aware of any need to wait. This is a use case e.g. when you add yourself to a libvirt or mock group and are told to logout and login again. Or when something breaks in your desktop and you want it fixed. I don't think it makes sense to try to fix a multi-user scenario and break a single-user scenario in the process.
Kamil, this problem has nothing to do with single/multi user scenarios (from a theoretical poitn of view, but I also reproduced it with only one user). So, either you run into the race condition where kglobalaccel crashes, or you do not. Other people regularly hit the situation and logging back may *always* break for them unless rebooted, which is a lot worse than breaking an immediate re-login within a few seconds.
(In reply to Benjamin Berg from comment #135) > Kamil, this problem has nothing to do with single/multi user scenarios (from > a theoretical poitn of view, but I also reproduced it with only one user). Right, I guess I already established that in comment 12 paragraph 2, but I forgot. When I tested the single user login/logout scenario, and I honored the secret 8 seconds delay in sddm, I didn't see any issues and I could consistently log in and log out repeatedly. But then I followed up with multi-user tests. With the patch applied, I'm hitting a black screen on the 2nd or 4th logout, every time I test this (I tested it 5 times, with full reboots in between, and I did honor the 8 seconds delay in sddm). So the symptoms are the same as mentioned in comment 12 paragraph 3 (that time I tested KillUserProcesses=Yes). Judging the likelihood of hitting a race condition is something I gladly leave to others :-) But at least in my experience, the patch doesn't fix the multi-user problem at all (the black screen just moves from login to logout) and it consistently breaks a single user logout+login if you're faster than 8 seconds in sddm. I understand the patch can improve certain cases, I just don't hit those often enough to be able to test them.
Discussed during the 2020-10-19 blocker review meeting: [0] The decision to waive this bugs classification as an "AcceptedBlocker (F33 Final)" and instead classify it as an "AcceptedBlocker (F34 Beta)" and an "AcceptedFreezeException (F33 Final)" was made under the "Difficult to fix blocker bugs" proviso at https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Exceptional_cases. We have tried multiple approaches to fix this bug and it's not going away, we are not confident that we'll be able to produce a clear fix any time soon. Tt will be made an F34 Beta blocker instead. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-10-19/f33-blocker-review.2020-10-19-16.01.txt
FEDORA-2020-f5be904258 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
I adjusted fields after F33 release. We're now targeting F34 Beta in order to fully fix this. There's some glimpse of hope in comment 71 that KDE using systemd in F34 could solve this for us.
This bug is definitely related to dbus-broker, not sddm. It can be fixed by switching to another D-Bus implementation (from dbus-broker to dbus-daemon). Disable dbus-broker: systemctl disable dbus-broker.service systemctl --global disable dbus-broker.service Enable dnf-daemon: systemctl enable dbus-daemon.service systemctl --global enable dbus-daemon.service Remove dbus-broker from default presets: sudo sed -e 's/dbus-broker/dbus-daemon/g' -i /usr/lib/systemd/user-preset/90-default-user.preset Restart system: sudo systemctl reboot
Can confirm advice in #c140 helps. Just checked in the test VM - switching back to dbus-daemon fixes this issue. No more hanging dbus-broker processes waiting for something after logout.
But we already discussed that, months ago, and tried it, and it didn't work. See comment #29 onwards.
Tested on a clean installation in VM. It works for me. We need to do more tests.
The bug is already partly fixed by the kglobalaccel change. If you're logging in and out with only one user and leaving 7 seconds between attempts it may well be fine. Remaining issues are if you cycle too fast for the timeout, and apparently switching between users (according to kparal).
My apologies, looks like the kglobalaccel change never made it to bodhi. I'll submit it now.
FEDORA-2020-054b3b620d has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d
FEDORA-2020-054b3b620d has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-054b3b620d` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
(In reply to Adam Williamson from comment #144) > The bug is already partly fixed by the kglobalaccel change. This fix doesn't work either. > If you're logging in and out with only one user and leaving 7 seconds between attempts > it may well be fine. Waiting for 8 seconds is not a fix. This is a dirty hack. The real solution is to get rid of the buggy dbus-broker. I suggest patching the systemd package SPEC to allow multiple Dbus implementations using rich dependencies.
(In reply to Fedora Update System from comment #146) > FEDORA-2020-054b3b620d has been submitted as an update to Fedora 33. > https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d With this update applied, I'm still hitting issues mentioned in comment 134, but I no longer see the issues mentioned in comment 136.
(In reply to Vitaly Zaitsev from comment #140) > This bug is definitely related to dbus-broker, not sddm. I applied the steps from comment 140, and then I followed the reproducer from comment 134, and I still see the same issues. I didn't do extensive testing, but it doesn't magically fix the issues, at least for me.
> I applied the steps from comment 140, and then I followed the reproducer from comment 134, and I still see the same issues. I didn't do extensive testing, but it doesn't magically fix the issues, at least for me. Check what processes are still running after switching to dbus-daemon & doing logout. And paste it there. It could be another bug. Current bug is caused by dbus-broker processes not exiting after logout.
Reported this issue to upstream: https://github.com/bus1/dbus-broker/issues/244
for me, the results are the same whether I use dbus-daemon or dbus-broker, testing twice each with our workarounds removed, I saw lingering processes: /usr/libexec/geoclue-2.0/demos/agent /usr/bin/abrt-applet and session status perpetually with "State: closing"
So, the thing with this bug is that there are multiple related issue, which makes it really hard to understand what is happening, where things are going on and what can be fixed easily. I would like to propose to move any remaining problems to a new bug, trying to focus on a specific issue. From what I can tell: 1. geoclue and abrt-applet remaining after the session quits is bad. But so far there is been no prove that this causes any issues. 2. dbus-broker and dbus-daemon does not make a fundamental difference. Failures are random though, so it might look like that during testing. 3. We have an ugly workaround for dbus in place. With that workaround dbus-broker/dbus-daemon cannot really be at fault anymore! And the workaround will hide an number of KDE issues. So, there are at least 5 inter-related problems here: * KDE not cleaning up autostarted applications * KDE not waiting for kwin and other processes to shutdown before quitting the session leader * DBus (both dbus-daemon *and* dbus-broker) not being able to handle activation properly if a previous activation has failed * kwin poking kglobaccel during shutdown and not setting the flag to prevent dbus activation * something getting stuck during logout (probably a KDE issue) The workaround just "fixes" one of the above, preventing some others from causing issues. Again, most of the KDE side bugs will become a non-issue with KDE moving to systemd (earliest for F34), which would also hide any dbus problem. Anyway, I am sure that the "logout stuck" problem is real. But I think it would be a good idea to move that to a new bug and debug it there.
I upgraded from F32 to F33 (KDE Spin) and now it's working. (Shoot me if I know what happened) I'll test again after the first important batch of updates, but so far so good.
FEDORA-2020-4057c26351 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4057c26351` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4057c26351 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
After applying FEDORA-2020-4057c26351, I was not able to trigger the bug after a half dozen logout/login to the same user. However, I do notice that there are a lot of geoclue processes that are not cleaned up: fedora33# ps -fe | grep sfalco sfalco 911 1 0 11:11 ? 00:00:00 /usr/lib/systemd/systemd --user sfalco 912 911 0 11:11 ? 00:00:00 (sd-pam) sfalco 1130 1 0 11:11 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 1749 1 0 11:12 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 2357 1 0 11:13 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 2966 1 0 11:13 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 3560 1 0 11:14 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 4187 1 0 11:15 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 4794 1 0 11:16 ? 00:00:00 /usr/libexec/geoclue-2.0/demos/agent sfalco 5228 911 0 11:17 ? 00:00:00 /usr/bin/dbus-broker-launch --scope user sfalco 5229 5228 0 11:17 ? 00:00:00 dbus-broker --log 4 --controller 11 --machine-id 453c53fe1e0e40538136bd60780043d7 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
Just reinstalled Fedora 33 KDE Spin in the VM and updated it. Can't confirm that update FEDORA-2020-4057c26351 (kf5-5.75) helps in any way - still black screen + mouse cursor after login.
FEDORA-2020-4057c26351 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4057c26351
I tried a couple of times and logging out and in seemed to work. Now I tried again with kernels 5.18, 5.17 and 5.16... It doesn't work. This is F33 KDE Spin (upgraded from F32).
(In reply to Sylvia Sánchez from comment #161) > I tried a couple of times and logging out and in seemed to work. Now I tried > again with kernels 5.18, 5.17 and 5.16... It doesn't work. > This is F33 KDE Spin (upgraded from F32). This still prevents me from a valid system upgrade for our users. Can anyoune confirm to blame the kernel?
FEDORA-2020-4057c26351 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
Could this bug can be related - https://bugs.kde.org/show_bug.cgi?id=424408 ?
Yes, kwin incorrectly activates kglobalaccel again which then aborts.
With the new KDE Plasma 5.20.3 update, this bug reappeared on my F33 setup. This update ==> https://bodhi.fedoraproject.org/updates/FEDORA-2020-d700fa11cf
After the update https://bodhi.fedoraproject.org/updates/FEDORA-2020-b6bf57746c now I can do logout and login many times again, without being stuck. Nice :) I tested both wayland and X sessions, both works fine. A lot of coredumps on the journal however, but it's working. I'm using kernel 5.9.8-200.fc33.x86_64. Just to register the coredumps showing after the command coredumpctl: TIME PID UID GID SIG COREFILE EXE Sun 2020-11-15 08:40:42 -03 4542 1000 1000 6 present /usr/bin/kwalletd5 Sun 2020-11-15 08:40:42 -03 5207 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:42 -03 5206 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:42 -03 5270 1000 1000 6 present /usr/bin/kglobalaccel5 Sun 2020-11-15 08:40:42 -03 5222 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:43 -03 5343 1000 1000 6 present /usr/bin/kded5 Sun 2020-11-15 08:40:43 -03 5346 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:43 -03 5345 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:43 -03 5347 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:43 -03 5344 1000 1000 6 present /usr/libexec/org_kde_powerdevil Sun 2020-11-15 08:40:43 -03 5216 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 5233 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 5217 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 5244 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 4939 1000 1000 6 present /usr/bin/akonadi_birthdays_resource Sun 2020-11-15 08:40:44 -03 4940 1000 1000 6 present /usr/bin/akonadi_contacts_resource Sun 2020-11-15 08:40:44 -03 4947 1000 1000 6 present /usr/bin/akonadi_migration_agent Sun 2020-11-15 08:40:44 -03 5221 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 5218 1000 1000 6 present /usr/libexec/drkonqi Sun 2020-11-15 08:40:44 -03 5238 1000 1000 6 present /usr/libexec/drkonqi
Great news! Crashes after logout could be fixed by patch for this issue: https://bugs.kde.org/show_bug.cgi?id=424408.
Backported upstream fix: https://src.fedoraproject.org/rpms/plasma-workspace/pull-request/9 Please test, merge this PR and rebuild for F33 and Rawhide.
After new plasma-workspace-5.20.3-2.fc33 build: No more coredumps registered with logout/login in Plasma X session, not even kglobalaccel5. :) However, with Plasma Wayland it continues showing a lot of crashes registered on the journal after logout/login procedure (but system is working fine anyway). Here are the coredumps showing after logout/login, only in the Plasma WAYLAND Session: TIME PID UID GID SIG COREFILE EXE Tue 2020-11-17 08:04:43 -03 4859 1000 1000 6 present /usr/bin/kwalletd5 Tue 2020-11-17 08:04:43 -03 6476 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:43 -03 6482 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:43 -03 6471 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:43 -03 6553 1000 1000 6 present /usr/libexec/kdeconnectd Tue 2020-11-17 08:04:43 -03 6563 1000 1000 6 present /usr/bin/kglobalaccel5 Tue 2020-11-17 08:04:43 -03 6480 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:43 -03 6472 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:44 -03 6649 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:44 -03 6647 1000 1000 6 present /usr/libexec/org_kde_powerdevil Tue 2020-11-17 08:04:44 -03 6651 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:44 -03 6648 1000 1000 6 present /usr/bin/kded5 Tue 2020-11-17 08:04:44 -03 6650 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:44 -03 5284 1000 1000 6 present /usr/bin/akonadi_contacts_resource Tue 2020-11-17 08:04:44 -03 5291 1000 1000 6 present /usr/bin/akonadi_migration_agent Tue 2020-11-17 08:04:44 -03 6485 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:44 -03 5282 1000 1000 6 present /usr/bin/akonadi_birthdays_resource Tue 2020-11-17 08:04:45 -03 6538 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6498 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6483 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6473 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6495 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6499 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6536 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6486 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:45 -03 6481 1000 1000 6 present /usr/libexec/drkonqi Tue 2020-11-17 08:04:46 -03 6805 1000 1000 6 present /usr/libexec/drkonqi
So can we call this bug fixed at this point? Thanks!
Plasma-5.21 support for systemd user sessions resolves this, yes. Currently only in rawhide (f34), but it's working well enough we will likely do for f33 too when the time comes.
well, comments suggested f33 updates improved the situation too?
Taking off the F34 blocker list given Rex's comment.
I'll make sure to test this in F34 and report back.
Hello, I may be out of line for commenting on this thread but I am having issues in F33 and KDE that seem to be fitting this issue. I have not tried some of the suggestions for logging but in both F32 and F33 I have had issues logging out of extra sessions where processes do not stop and in F33 they respawn if I try to just kill them. The only way that I can get the session to end is to kill the sddm-helper process which starts at the login. I use multiple sessions on a daily basis and will open and close extra sessions as required on one computer.
I tested with a default installation of Fedora-KDE-Live-x86_64-34-20210215.n.0.iso, containing: dbus-1.12.20-3.fc34.x86_64 dbus-broker-26-2.fc34.x86_64 kernel-5.11.0-0.rc7.149.fc34.x86_64 kf5-plasma-5.79.0-2.fc34.x86_64 plasma-workspace-5.21.0-1.fc34.x86_64 sddm-0.19.0-7.fc34.x86_64 systemd-247.3-1.fc34.x86_64 When logging in and out using the same user or two different users, KDE processes are crashing right and left, however the desktop seems to stay always usable (sometimes I see 10 seconds of a black screen after invoking the KDE menu, when it crashes, but it always resurrects itself after a while). Overall this problem seems to be resolved in F34. Rex, should we close this or do you want to keep it open for F33?
Here is a different data point: I built all of the plasma 5.79.0 and desktop 5.21.0 rpms using source rpms from koji on FC33 (using mock). Everything is working well except the logout does not work. It freezes the screen and never goes back to sddm. If I do ctrl-alt-bckspace I get the sddm and can login again, which means that there are lingering processes to prevent me from login back in. So, the problem is that sddm is not restarted during the logout process. Any ideas? Thanks!
PS: I did update the sddm rpm as well as well as kde-settings.
The switch to systemd user sessions in F34/Rawhide is a big one, I'm not sure just rebuilding the plasma packages on F33 would work - it might need something else you didn't backport.
I was hoping that someone with a deeper understanding would point it out to me. I did removed and reinstalled sddm, at which time it created a systemd link. Also, is it working under FC34? It does not seem to restart sddm or X (running X11 not Wayland).
My logins all started working after kf5-kglobalaccel-5.80.0-2.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.