Bug 1861700 - login stuck when changing users repeatedly (log out, log in a different one)
Summary: login stuck when changing users repeatedly (log out, log in a different one)
Keywords:
Status: VERIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: sddm
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Rex Dieter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
Depends On: 1884696
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-29 10:09 UTC by Kamil Páral
Modified: 2021-03-30 21:27 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
bug demonstration video (10.05 MB, video/x-matroska)
2020-07-29 10:12 UTC, Kamil Páral
no flags Details
journal (532.46 KB, text/plain)
2020-07-29 10:12 UTC, Kamil Páral
no flags Details
rpm -qa (58.05 KB, text/plain)
2020-07-29 10:13 UTC, Kamil Páral
no flags Details
stuck processes during reboot (10.62 KB, image/png)
2020-08-13 17:17 UTC, Kamil Páral
no flags Details
Systemctl logs with comments (6.80 KB, text/plain)
2020-09-10 15:16 UTC, Sylvia Sánchez
no flags Details
Output of systemd-cgls (8.84 KB, text/plain)
2020-09-16 20:59 UTC, Steven A. Falco
no flags Details
New version with -l flag (10.25 KB, text/plain)
2020-09-16 21:07 UTC, Steven A. Falco
no flags Details
Output for systemd-cgls (11.42 KB, text/plain)
2020-09-20 13:37 UTC, sumantro
no flags Details
full journal, comment 60 (691.53 KB, text/plain)
2020-09-24 03:15 UTC, Chris Murphy
no flags Details
Logs as requested (1.81 MB, application/octet-stream)
2020-09-24 15:58 UTC, Steven A. Falco
no flags Details
Reverse journalctl form my attempt to logout and login again on the same user (78.14 KB, text/plain)
2020-10-13 15:01 UTC, Geraldo Simião
no flags Details
Coredumpctl gdb after the login stuck (11.70 KB, text/plain)
2020-10-13 17:24 UTC, Geraldo Simião
no flags Details
Reverse journalctl with packages kf5-kglobalaccel-5.73.0-2.fc33 (57.12 KB, text/plain)
2020-10-15 14:14 UTC, Geraldo Simião
no flags Details
Coredumpctl gdb after the login stuck (new build kf5-kglobalaccel-5.73.0-2.fc33) (11.54 KB, text/plain)
2020-10-15 14:16 UTC, Geraldo Simião
no flags Details
User service status (from the gui panel- after installation from build kf5-kglobalaccel-5.73.0-2.fc33) (107.85 KB, image/png)
2020-10-15 14:20 UTC, Geraldo Simião
no flags Details
systemctl user status (comand line) using kf5-kglobalaccel-5.73.0-2.fc33 (3.23 KB, text/plain)
2020-10-15 14:22 UTC, Geraldo Simião
no flags Details
Reverse journalctl with build from Adam (kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64) (131.68 KB, text/plain)
2020-10-15 17:55 UTC, Geraldo Simião
no flags Details
Reverse journalctl with build kde-settings-33.0-2.fc33 (63.83 KB, text/plain)
2020-10-16 19:45 UTC, Geraldo Simião
no flags Details
Two coredumps debugged from two different logout-login rounds (23.50 KB, text/plain)
2020-10-16 19:48 UTC, Geraldo Simião
no flags Details
Kglobalaccel service running (102.22 KB, image/png)
2020-10-16 19:52 UTC, Geraldo Simião
no flags Details
systemctl--user-status (1.96 KB, text/plain)
2020-10-16 20:15 UTC, Geraldo Simião
no flags Details
systemctl--user-status after successful logout-login (4.24 KB, text/plain)
2020-10-16 20:21 UTC, Geraldo Simião
no flags Details

Description Kamil Páral 2020-07-29 10:09:55 UTC
Description of problem:
On F33 KDE, I can't switch repeatedly between several users. Once I try to log in with a user which has been logged in before (and now is logged out), the screen shows a very long spinning circle and then a black screen. No action is possible at that point. If I manually switch to VT2 and reboot, the reboot is waiting 90 seconds for pending user sessions. There is something stuck when starting those user sessions.

See attached video.

Version-Release number of selected component (if applicable):
sddm-0.18.1-6.fc33.x86_64
kf5-plasma-5.72.0-1.fc33.x86_64
plasma-user-manager-5.19.3-1.fc33.x86_64
Fedora-KDE-Live-x86_64-Rawhide-20200721.n.0.iso

How reproducible:
seems always

Steps to Reproduce:
1. install KDE from Fedora-KDE-Live-x86_64-Rawhide-20200721.n.0.iso
2. create two users - user1 and user2. (I marked user1 as admin and not user2, but that shouldn't be important).
3. reboot
4. log in as user1
5. log out
6. log in as user2
7. log out
8. log in as user1 - see a very log spinning circle and then an endless black screen (with a functioning cursor)

Additional info:
I tried the same procedure without user2, i.e. just log in user1, log out, log in user1 - and that works, even repeatedly. This is somehow related to the session that user2 starts and stops between those steps.

Also please note that I see "Process 1518 (kglobalaccel5) of user 1001 dumped core." and "Process 2215 (kglobalaccel5) of user 1002 dumped core" in the journal. I'm not sure if it is related to the login issue or not. I'll try to report the crash in a separate bug, but abrt currently claims "Error: File './coredump' is not a coredump", so I can't easily report it.

Comment 1 Kamil Páral 2020-07-29 10:12:16 UTC
Created attachment 1702779 [details]
bug demonstration video

Comment 2 Kamil Páral 2020-07-29 10:12:52 UTC
Created attachment 1702780 [details]
journal

Comment 3 Kamil Páral 2020-07-29 10:13:00 UTC
Created attachment 1702781 [details]
rpm -qa

Comment 4 Kamil Páral 2020-07-29 11:19:02 UTC
(In reply to Kamil Páral from comment #0)
> Also please note that I see "Process 1518 (kglobalaccel5) of user 1001
> dumped core." and "Process 2215 (kglobalaccel5) of user 1002 dumped core" in
> the journal. I'm not sure if it is related to the login issue or not. I'll
> try to report the crash in a separate bug, but abrt currently claims "Error:
> File './coredump' is not a coredump", so I can't easily report it.

That is bug 1860616, it seems abrt doesn't support zstd-compressed coredumps currently.

Comment 5 Kamil Páral 2020-07-29 17:16:16 UTC
Proposing as an F33 blocker. The most fitting criterion might be this:
"A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility."
https://fedoraproject.org/wiki/Basic_Release_Criteria#Expected_installed_system_boot_behavior

The login process works initially, but stops working after following the sequence of actions described in comment 0. If people feel this criterion is not a good candidate, then it's time to refine it or create a new one. We have precise criteria about everything else (log out, user switching, reboot, shutdown, etc), but we don't have a criterion which would say "login must work (every time)". We can add it, if needed.

Comment 6 Rex Dieter 2020-07-29 20:51:51 UTC
Looks like another variant of (largely) well-known bug #1817708 , where processes from previous sessions persist causing subsequent logins to fail.

-1 blocker

Comment 7 Kamil Páral 2020-07-30 11:29:29 UTC
(In reply to Rex Dieter from comment #6)
> Looks like another variant of (largely) well-known bug #1817708 , where
> processes from previous sessions persist causing subsequent logins to fail.
> 
> -1 blocker

I intend to propose all user switching-related bugs as blockers, now that we have a specific criterion for it in F33. This bug might or might not be related, but I'm not sure if it's relevant for the release blocking decision. Bug 1817708 was rejected as a blocker, because we didn't have a user switching criterion at that time. Now we have. If user switching is definitely covered by criteria now, it seems silly not to hold basic user login to at least the same standards. As I said, we can clarify the criteria as needed, if stakeholders agree. What is your rationale for -1 blocker? Thanks.

Comment 8 Kamil Páral 2020-07-30 14:08:03 UTC
(In reply to Kamil Páral from comment #7)
> I intend to propose all user switching-related bugs as blockers, now that we
> have a specific criterion for it in F33. 

Actually, I need to clarify this. I just found out that user switching in no longer available in F33 KDE (not sure whether it is a bug or intentional). As long as it is not offered to the user, I can't of course propose them as blockers. So I'll keep to just user login/logout testing for now. I'll individually report the user switching issues and propose them as blockers, once/if user switching is offered again in KDE.

Comment 9 Rex Dieter 2020-07-30 14:29:23 UTC
My objection was mostly about how to deal with the blocker.  I feel not enough is known about the root cause(s), much less how to diagnose and fix them.  In this specific case in particular, so a user has processes lingering after log out.  What's to blame?  Who's job is it to fix that?  *This* bug was filed against sddm, a component I have large doubts has little to do with the root problem.

Comment 10 Kamil Páral 2020-07-30 14:49:53 UTC
Thanks for reply, Rex. I chose sddm because it's at least partially related, I don't know the root cause and the actual culprit either. I hope that's something that developers can figure out! :-) I'll be happy to provide as many details as I can, and test different approaches to help pinpoint the root cause better. Sddm is just a temporary station here, if this turns out to be systemd-related (or we suspect it could), we can reassign there. We can also move this to "distribution", if you prefer. If this turns out to be a hard nut to crack, we have options how to deal with it, e.g. with "Difficult to fix blocker bugs":
https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Exceptional_cases
Or there can be other ways to handle this, but basically the unknown factor (what caused it? how to fix it?) shouldn't affect the blocker decision.

Comment 11 Geoffrey Marr 2020-08-03 17:08:19 UTC
Discussed during the 2020-08-03 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as we acknowledge that this is a bad bug and perhaps intellectually violates the criteria, however we don’t believe we have a specific criterion to use against this bug at the moment. We will punt for now and send out mail to discuss implementing a new criterion or redefining an existing one.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-03/f33-blocker-review.2020-08-03-16.02.txt

Comment 12 Kamil Páral 2020-08-06 12:32:06 UTC
I've spent more time debugging this. I can reproduce this problem every time, but sometimes the problem occurs on the third login (user1 -> user2 -> user1), and sometimes on the fourth login (user1 -> user2 -> user1 -> user2). So there seems to be some timing or a race condition in play.

I found that I can also reproduce this 'black screen instead of a login' problem with just one user, if I log in and out a sufficient number of times (sometimes 3x, sometimes 10x). It might be the same or a different bug, but it has the same consequence and it's also clearly a race condition.

Then I experimented with KillUserProcesses=Yes in /etc/systemd/logind.conf. After setting this, I can log in three times while alternating users just fine (user1 -> user2 -> user1), but I get a black screen when logging *out* the fourth time (i.e. user1 -> user2 -> user1 -> user2 -> black screen). This is mostly consistent, but I also saw it during the second logout (user1 -> user2 -> black screen). Again, a race condition. On top of that, in this configuration, I also saw a login black screen very rarely when logging in just with user1 repeatedly. So another (same or different) race condition.

Sigh, this is really broken. Whichever systemd configuration and whichever number of users, the login/logout process gets broken if you relogin a sufficient number of times to get a bad dice roll.

Today I tested with:
kernel-5.8.0-1.fc33.x86_64
kf5-plasma-5.72.0-2.fc33.x86_64
plasma-user-manager-5.19.4-2.fc33.x86_64
sddm-0.18.1-7.fc33.x86_64
systemd-246-1.fc33.x86_64
xorg-x11-server-Xorg-1.20.8-3.fc33.x86_64

Comment 13 Geoffrey Marr 2020-08-10 22:18:37 UTC
Discussed during the 2020-08-10 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as a violation of the in-progress but accepted-in-principle logout criterion, see thread https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/message/PTRPHWTMVEKLTCMVDBGIWYYMYVIBAJYB/ for more information.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-10/f33-blocker-review.2020-08-10-16.17.txt

Comment 14 Ben Cotton 2020-08-11 13:50:36 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 15 Kamil Páral 2020-08-13 17:17:14 UTC
Created attachment 1711349 [details]
stuck processes during reboot

If I perform the reproducer from comment 0, then switch to TTY3 using Ctrl+Alt+F3 and try to reboot the machine, the shutdown phase waits for "stopping user session <number>" (something along those lines), which lasts for 90 seconds. Then the session is killed and the reboot proceeds. If I get impatient, though, and hit Ctrl+Alt+Del 7 times in a quick succession, I believe it should convince systemd to kill the remaining processes. Instead, it prints a list of processes it's waiting for (and doesn't kill, for some reason). See the screenshot. One of those processes is likely the one hanging and the reason why even a regular reboot doesn't work until the session termination times out.

Comment 16 Adam Williamson 2020-08-24 17:14:25 UTC
Rex, Martin: anything from your side in response to kparal's new info? Is there sufficient info to debug this now? Thanks!

Comment 17 Kamil Páral 2020-08-26 15:38:30 UTC
(In reply to Geoffrey Marr from comment #13)
> The decision to classify this bug as an "AcceptedBlocker" was made as a
> violation of the in-progress but accepted-in-principle logout criterion, see
> thread
> https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/
> message/PTRPHWTMVEKLTCMVDBGIWYYMYVIBAJYB/ for more information.

The new criterion is now live:
https://fedoraproject.org/wiki/Fedora_33_Beta_Release_Criteria#Shutdown.2C_reboot.2C_login.2C_logout

Comment 18 Ben Cotton 2020-09-09 17:13:56 UTC
From rdieter in #fedora-kde: no time, nor aptitude to really debug that properly. I  think we'll have to do with the logind KillUserProcesses=yes hammer

Martin, do you have anything you can add?

Comment 19 Raphael Groner 2020-09-10 05:55:59 UTC
Could the reporter or someone with reproducibility share the process list while logout is delaying? Maybe it's quite obvious with knowlegde of active processes to see ande guess what to blame, I tend to agree with Rex about SDDM as propably _not_ the root cause in first place. What about systemd and its user session handling (wild guess because not aware about what's really running in a reproducible F33 system)?

Comment 20 marcdeop 2020-09-10 08:48:35 UTC
Hello,

I did some digging about this issue some time ago. The problem is completely unrelated to switching users. The problem is that whenever you log out, some processes might linger and then you are not able to log back in.

loginctl list-sessions
loginctl session-status $SESSION_NUMBER

Will help you find out which processes are lingering. If you kill manually those processes, the session will terminate properly and everything will get back to normal :-)

I have the impression that the problem here is the fact that fedora uses dbus-broker. Other distros which don't use dbus-broker do not have this issue we are facing here.

Comment 21 Ben Cotton 2020-09-10 13:23:13 UTC
I have this list from when I was facing a similar issue a week ago. The cause was different (I had a bad shell script that Plasma got hung up on), but the behavior on subsequent login is the same. I had seen similar behavior a while ago and the process list below matches what I recall seeing:

bcotton    13986  0.0  0.0  21412 11392 ?        Ss   10:03   0:00 /usr/lib/systemd/systemd --user
bcotton    13988  0.0  0.0 211932  6224 ?        S    10:03   0:00 (sd-pam)
bcotton    13997  0.0  0.0 667240 15268 ?        S<sl 10:03   0:00 /usr/bin/pulseaudio --daemonize=no
bcotton    14007  0.0  0.0 277560  3820 ?        Ss   10:03   0:00 /usr/bin/dbus-broker-launch --scope user
bcotton    14008  0.3  0.0   7056  4588 ?        S    10:03   0:00 dbus-broker --log 4 --controller 11 --machine-id 7a5e67f4bf324273952fb570d0122c43 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
bcotton    14052  0.0  0.0  17228  5564 ?        S    10:03   0:00 /usr/libexec/pulse/gconf-helper
bcotton    14053  0.0  0.0 233836  6268 ?        Ss   10:03   0:00 /usr/libexec/gconfd-2
bcotton    14095  0.0  0.0 156108  4972 ?        Ssl  10:03   0:00 /usr/libexec/dconf-service
bcotton    14135  0.0  0.1 268871540 23932 ?     SNl  10:03   0:00 /usr/bin/baloo_file
bcotton    14170  0.0  0.0 444380  4832 ?        Sl   10:03   0:00 /usr/libexec/geoclue-2.0/demos/agent
bcotton    14190  0.0  0.0 305884  5260 ?        Sl   10:03   0:00 /home/bcotton/.config/bluejeans-v2/BluejeansHelper --hidden
bcotton    14198  0.0  0.1 485928 21420 ?        Sl   10:03   0:00 /usr/bin/abrt-applet --gapplication-service
bcotton    14241  0.0  0.0 305336  6548 ?        Ssl  10:03   0:00 /usr/libexec/at-spi-bus-launcher
bcotton    14250  0.0  0.0 268468  3508 ?        S    10:03   0:00 /usr/bin/dbus-broker-launch --config-file=/usr/share/defaults/at-spi2/accessibility.conf --scope user
bcotton    14252  0.0  0.0   5244  2868 ?        S    10:03   0:00 dbus-broker --log 4 --controller 9 --machine-id 7a5e67f4bf324273952fb570d0122c43 --max-bytes 100000000000000 --max-fds 6400000 --max-matches 5000000000
bcotton    14305  0.0  0.0 446888  7116 ?        Ssl  10:03   0:00 /usr/libexec/gvfsd

Comment 22 Raphael Groner 2020-09-10 13:49:11 UTC
Indeed, besides systemd (loginctl) my experiences tell me to look for debugging time delays into another good suspiction that's anything DBUS related.

Comment 23 Raphael Groner 2020-09-10 13:51:07 UTC
dbus-broker should write its own log or some entries to journalctl. Can you share that snippets for relevance as well?

Comment 24 Raphael Groner 2020-09-10 13:55:25 UTC
> Manager responder socket.
Jul 29 11:46:18 localhost.localdomain dbus-broker[699]: Dispatched 9509 messages
@ 14(±21)μs / message.
> Jul 29 11:46:18 localhost.localdomain systemd[1]: Stopping D-Bus System Message

Wow, that's a lot about 9509. ±21 indicates a high fluctuation time of message processing. We may need to know what messages are those.

Comment 25 Kamil Páral 2020-09-10 14:27:12 UTC
(In reply to Ben Cotton from comment #18)
> From rdieter in #fedora-kde: no time, nor aptitude to really debug that
> properly. I  think we'll have to do with the logind KillUserProcesses=yes
> hammer

Please note that I already tested KillUserProcesses=Yes and the results were slightly different, but ultimately the same. See comment 12.

Comment 26 Sylvia Sánchez 2020-09-10 15:16:17 UTC
Created attachment 1714449 [details]
Systemctl logs with comments

I've been doing some testing in my laptop with KDE & SDDM.
Interestingly enough, I didn't have any issues logging out and back in when using Wayland.
The attached file shows Systemctl with some notes for context. Hope it's useful to somebody.
I would like to point out there are two other bug reports that are related to this one, both have been sitting for several release cycles and show the same issue:  SDDM gets stuck, fails or hangs in log out.

Comment 27 Sylvia Sánchez 2020-09-10 15:19:38 UTC
Related bugs: 

F32 KDE can't log out  https://bugzilla.redhat.com/show_bug.cgi?id=1725933

Relogin doesn't work  https://bugzilla.redhat.com/show_bug.cgi?id=1742893

Can't logout from KDE https://bugzilla.redhat.com/show_bug.cgi?id=1847746

Comment 28 Ben Cotton 2020-09-10 19:29:52 UTC
Reassigning to dbus-broker. The consensus in #fedora-kde is that this isn't directly involved in this. The "gut feeling" is that it's related to dbus-broker, but that component's maintainers may have helpful insight either way.

Comment 29 David Rheinsberg 2020-09-11 05:35:10 UTC
(In reply to Raphael Groner from comment #24)
> > Manager responder socket.
> Jul 29 11:46:18 localhost.localdomain dbus-broker[699]: Dispatched 9509
> messages
> @ 14(±21)μs / message.
> > Jul 29 11:46:18 localhost.localdomain systemd[1]: Stopping D-Bus System Message
> 
> Wow, that's a lot about 9509. ±21 indicates a high fluctuation time of
> message processing. We may need to know what messages are those.

Both of these look normal. Message transmission is usually quick enough that a single task preemption or flushed caches cause the transmission time to take twice as long.

> Reassigning to dbus-broker. The consensus in #fedora-kde is that this isn't directly involved in this. The "gut feeling" is that it's related to dbus-broker, but that component's maintainers may have helpful insight either way.

If you want to know whether this is an issue with the integration of dbus-broker, you can easily verify with:

```
dnf install -y dbus-daemon
systemctl disable dbus-broker
systemctl enable dbus-daemon
systemctl --global disable dbus-broker
systemctl --global enable dbus-daemon
```

This will make your system use dbus-daemon, rather than dbus-broker. Note that after those commands, you might be unable to reboot via normal means, since the dbus-units will point to the wrong service. `reboot -ff` should do the trick.

Comment 30 marcdeop 2020-09-11 09:47:39 UTC
(In reply to David Rheinsberg from comment #29)
> If you want to know whether this is an issue with the integration of
> dbus-broker, you can easily verify with:
> 
> ```
> dnf install -y dbus-daemon
> systemctl disable dbus-broker
> systemctl enable dbus-daemon
> systemctl --global disable dbus-broker
> systemctl --global enable dbus-daemon
> ```

Thanks for the tip.

I have tried this and, unfortunately, I still see the same pattern

Login, wait 10 secs, logout: my session doesn't terminate properly and I have baloo_file, geoclue-2.0 and abrt-applet running.

Notice that I have also tried with GDM instead of SDDM and see the same.

Back to square one...

Comment 31 Benjamin Berg 2020-09-12 21:00:19 UTC
In bug #1817708 it is notable that logind does not show any message that the the session has completely died (https://bugzilla.redhat.com/attachment.cgi?id=1673873), i.e.

Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: Session 2 logged out. Waiting for processes to exit.
Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: New session c2 of user sddm.
Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: New session 5 of user jim.
Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Session c2 logged out. Waiting for processes to exit.
Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Removed session c2.

i.e. there is no "Removed session 2" in that log.

Which suggests that some process is still lingering in the session scope. Has anyone checked using systemd-cgls to see if the old session quits properly or whether there are processes left? I am kind of guessing that a process that does not need X11/wayland may be surviving for longer. And this *might* keep the session open, also preventing the systemd user instance (and with it dbus) to shut down 10s after logout. All speculation though :)

GNOME has this weird thing were we restart dbus at logout time. That is more of a hack, but it does get rid of dbus services surviving the session (we only do that to kill evolution-data-server which may take up a lot of memory).


As for switching to dbus-daemon, you would probably want to do the same for the user manager, i.e. you probably need to repeat the commands with --user added to the systemctl calls.

Comment 32 David Rheinsberg 2020-09-14 07:18:08 UTC
(In reply to Benjamin Berg from comment #31)
> As for switching to dbus-daemon, you would probably want to do the same for
> the user manager, i.e. you probably need to repeat the commands with --user
> added to the systemctl calls.

No need. The `--global` switch does exactly that but for all users (a bit of a misnomer, but we are stuck with it).

Comment 33 Benjamin Berg 2020-09-14 07:34:34 UTC
Oh! Thanks for correcting me. Didn't realise that there is a difference between --user and --global, which makes a lot of sense, even if the naming might not be that intuitive.

Comment 34 Fedora Update System 2020-09-14 19:47:23 UTC
FEDORA-2020-5b486e191b has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b

Comment 35 Rex Dieter 2020-09-14 19:49:08 UTC
Submitted kf5-baloo update,
https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b

This one should stop at least one of the errant processes (baloo_file) persisting after session logout.

Comment 36 Sammy 2020-09-15 12:54:19 UTC
I have been having this problem forever too, very annoying. I think kf5-baloo helped. When I select logout from KDE plasma menu nothing happens. However, now if I restart the X-server via ctrl-alt-backspace I can login again via sddm. In the past that would hang requiring a restart of the computer. After issuing the logout I switched the console, logged in as root
and look at processes running under my username. There are quiet a few. I had no apps running.
   PID TTY          TIME CMD
   1572 ?        00:00:00 systemd
   1582 ?        00:00:00 (sd-pam)
   1594 ?        00:00:00 pulseaudio
   1875 ?        00:00:00 dbus-broker-lau
   1876 ?        00:00:00 dbus-broker
   1877 ?        00:00:00 gconf-helper
   1878 ?        00:00:00 gconfd-2
   2140 ?        00:00:00 imsettings-daem
   2143 ?        00:00:00 gvfsd
   2148 ?        00:00:00 gvfsd-fuse
   2349 ?        00:00:00 kded5
   2355 ?        00:00:00 kwin_x11
   2356 ?        00:00:00 kactivitymanage
   2365 ?        00:00:00 kglobalaccel5
   2373 ?        00:00:00 dconf-service
   2429 ?        00:00:00 xembedsniproxy
   2431 ?        00:00:00 kaccess
   2448 ?        00:00:00 gmenudbusmenupr
   2530 ?        00:00:00 at-spi-bus-laun
   2538 ?        00:00:00 dbus-broker-lau
   2550 ?        00:00:00 dbus-broker
   2552 ?        00:00:00 kscreen_backend
   2842 ?        00:00:00 agent
   2850 ?        00:00:00 zeitgeist-datah
   2855 ?        00:00:00 org_kde_powerde
   2868 ?        00:00:00 tracker-miner-r
   2874 ?        00:00:00 zeitgeist-daemo
   2876 ?        00:00:04 tracker-miner-f
   2920 ?        00:00:00 zeitgeist-fts
   2951 ?        00:00:00 gvfs-udisks2-vo
   3004 ?        00:00:00 gvfs-afc-volume
   3346 ?        00:00:00 at-spi2-registr
   3459 ?        00:00:00 kio_http_cache_
   3692 ?        00:00:00 kdeinit5
   3693 ?        00:00:00 klauncher
   3696 ?        00:00:00 file.so

Comment 37 Rex Dieter 2020-09-15 13:50:38 UTC
Now that we have dbus maintainer ears, in the past, xinit ran the snippet on login:
eval `dbus-launch --sh-syntax --exit-with-session`

and I figure when the "parent" xinit process ended, so did dbus.  

Now that dbus user session (effectively) starts on demand, how is dbus expected to end/stop on session logout?

Comment 38 Fedora Update System 2020-09-15 14:53:34 UTC
FEDORA-2020-5b486e191b has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5b486e191b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 39 Benjamin Berg 2020-09-15 16:23:43 UTC
> Now that dbus user session (effectively) starts on demand, how is dbus expected to end/stop on session logout?

Not a dbus maintainer. dbus will be stopped together with the systemd user instance. Which generally means 10s after the last user session quits (UserStopDelaySec= in logind.conf). So it should be expected that dbus does not quit between two logins.

The real problem are probably processes left over in the session scope. This is why I suggested running

$ systemd-cgls

and checking session-X.scope (of the old session). Those are the processes that prevent logind from cleaning up the session completely. It is much more likely they are causing issues than anything that is launched by DBus (visible somewhere underneath user@X.service). I think most of the processes listed in the above ps output fall into that category.

Comment 40 marcdeop 2020-09-15 20:58:42 UTC
Fedora KDE fresh installation on a VM, login as a user, wait a few seconds, logout, wait a few seconds and trigger the `systemd-cgls` and `loginctl session-status` commands

Output of `systemd-cgls -u session-2.scope`:

Unit session-2.scope (/user.slice/user-1000.slice/session-2.scope):
├─1211 /usr/bin/baloo_file
├─1241 /usr/bin/abrt-applet --gapplication-service
└─1243 /usr/libexec/geoclue-2.0/demos/agent

Equivalent to `loginctl session-status 2`:

2 - marc.deop (1000)
           Since: Tue 2020-09-15 22:47:03 CEST; 6min ago
          Leader: 1047
            Seat: seat0; vc1
         Display: :0
         Service: sddm; type x11; class user
         Desktop: KDE
           State: closing
            Unit: session-2.scope
                  ├─1211 /usr/bin/baloo_file
                  ├─1241 /usr/bin/abrt-applet --gapplication-service
                  └─1243 /usr/libexec/geoclue-2.0/demos/agent

Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain kded5[1137]: QDBusAbstractAdaptor: Cannot relay signal KDEDModule::moduleDeleted(KDEDModule*): Pointers are not supported: KDEDModule*
Sep 15 22:50:06 localhost.localdomain startplasma-x11[1072]: QProcess: Destroyed while process ("ksplashqml") is still running.
Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_unix(sddm:session): session closed for user marc.deop
Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_kwallet5(sddm:session): pam_kwallet5: pam_sm_close_session
Sep 15 22:50:06 localhost.localdomain sddm-helper[1047]: pam_kwallet5(sddm:setcred): pam_kwallet5: pam_sm_setcred

Comment 41 Adam Williamson 2020-09-15 22:29:25 UTC
Kamil, can you see if the baloo update helps your case at all?

Comment 42 Sammy 2020-09-15 23:12:03 UTC
Did not help me. Logout ends with a black screen with active cursor (waiting for somehing). After ctrl-alt-backspace can login again.

Comment 43 Fedora Update System 2020-09-16 13:11:54 UTC
FEDORA-2020-5b486e191b has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b

Comment 44 Rex Dieter 2020-09-16 14:06:06 UTC
My first try (kf5-baloo-5.73.0-2) was full of fail, please try kf5-baloo-5.73.0-3.fc33

Comment 45 Sylvia Sánchez 2020-09-16 14:34:50 UTC
In my case, the change in dbus didn't help.  Using Ctrl+Alt+Backspace doesn't help either, it logs out and then it's stuck in a black screen with active cursor and nothing else.  IT's unresponsive and the only way to get out is by pressing the power button (force shutdown).  
How can I try the update on Baloo?  Maybe that works.

Comment 46 Fedora Update System 2020-09-16 18:56:00 UTC
FEDORA-2020-5b486e191b has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5b486e191b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5b486e191b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 47 Steven A. Falco 2020-09-16 20:59:47 UTC
Created attachment 1715130 [details]
Output of systemd-cgls

Comment 48 Steven A. Falco 2020-09-16 21:02:29 UTC
I set up a VM with Fedora 33 and added the kf5-baloo -3 update.  After 4 or 5 logout/login sequences, I wound up with a black screen.  I attached the output of systemd-cgls as attachment 1715130 [details].

Comment 49 Steven A. Falco 2020-09-16 21:07:48 UTC
Created attachment 1715131 [details]
New version with -l flag

Noticed that the first version of the attachment had its width truncated, so here is one with -l to show the full text.

Comment 50 sumantro 2020-09-20 13:32:47 UTC
(In reply to Steven A. Falco from comment #48)
> I set up a VM with Fedora 33 and added the kf5-baloo -3 update.  After 4 or
> 5 logout/login sequences, I wound up with a black screen.  I attached the
> output of systemd-cgls as attachment 1715130 [details].

I got the same and this is still reproducible

Comment 51 sumantro 2020-09-20 13:37:17 UTC
Created attachment 1715462 [details]
Output for systemd-cgls

Comment 52 Benjamin Berg 2020-09-20 14:25:39 UTC
To avoid confusion. As I understand it, this bug is about a subsequent *login* failing. If *logout* is failing sometimes, then it might be a different issue altogether.

Comment 53 Steven A. Falco 2020-09-20 15:12:15 UTC
Benjamin - I agree with and understand your description of the bug.

As I reported, after 4 or 5 repetitions of logging out, _then_ logging in, I wind up at a black screen.  In other words, that last login attempt does not complete properly, and I don't see the desktop - I just see a black screen.

Comment 54 Sylvia Sánchez 2020-09-21 09:37:29 UTC
In my case it never works, it gets stuck in the first attempt of logging out.

Comment 55 Adam Williamson 2020-09-21 19:41:12 UTC
Assigning back to sddm as testing seemed to indicate quite strongly that dbus-broker isn't the issue here.

Comment 56 Fedora Update System 2020-09-21 20:48:47 UTC
FEDORA-2020-5b486e191b has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 57 marcdeop 2020-09-22 12:40:00 UTC
(In reply to Adam Williamson from comment #55)
> Assigning back to sddm as testing seemed to indicate quite strongly that
> dbus-broker isn't the issue here.

I don't think it's an SDDM problem as I see the same behaveour with GDM :'(

Comment 58 Kamil Páral 2020-09-22 12:50:36 UTC
(In reply to marcdeop from comment #57)
> I don't think it's an SDDM problem as I see the same behaveour with GDM :'(

I tested GDM on F33 and saw nothing like that happening. If you can reproduce it on F33 (not F32), please file a separate bug report and link it here, I'll look at it. So far, I believe this a problem somewhere in the KDE stack and there's no such universal and trivially reproducible issue happening with GNOME.

Comment 59 Kamil Páral 2020-09-22 12:51:40 UTC
Setting back to ASSIGNED due to comment 48 and comment 50.

Comment 60 Chris Murphy 2020-09-24 03:12:55 UTC
The user sessions are not being killed off, it must surely be related...

$ loginctl session-status 10
10 - user2 (1001)
           Since: Wed 2020-09-23 20:52:50 MDT; 3min 33s ago
          Leader: 2680
            Seat: seat0; vc1
         Display: :0
         Service: sddm; type x11; class user
         Desktop: KDE
           State: closing
            Unit: session-10.scope
                  ├─2909 /usr/libexec/geoclue-2.0/demos/agent
                  └─2922 /usr/bin/abrt-applet --gapplication-service

When I login user1, logout, login user2, logout, login user1 and get the hang and black screen, I see the following red text items in the journal


[  806.445412] sddm-helper[3727]: gkr-pam: unable to locate daemon control file
...
[  833.013871] kded5[3834]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
[  833.021354] kactivitymanagerd[3859]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
[  833.035365] kwin_x11[3838]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
[  833.058151] ksmserver[3866]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")

...
[  883.025155] kaccess[3898]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
[  883.026404] plasmashell[3900]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")
...
[  908.028328] org_kde_powerdevil[3971]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")

The user sessions keep accumulating...

$ loginctl
SESSION  UID USER  SEAT  TTY  
     10 1001 user2 seat0      
     13 1000 chris seat0      
      2 1000 chris seat0      
      9 1000 chris       pts/0


This session isn't even responding to 'loginctl kill-session 13' nor 'loginctl terminate-session 13'

$ loginctl session-status 13
13 - chris (1000)
           Since: Wed 2020-09-23 20:57:26 MDT; 7min ago
          Leader: 3727
            Seat: seat0; vc1
         Display: :0
         Service: sddm; type x11; class user
         Desktop: KDE
           State: closing
            Unit: session-13.scope
                  └─3838 /usr/bin/kwin_x11

'kill -9 3838' does finally kill the session, but doesn't bring back the login window

Next tried:
systemctl isolate multi-user.target
systemctl graphical.target

This does bring back the login window, but login in as either user fails, with the same messages in red as before. So something in the environment is sufficiently confused that it needs a reboot to get back to a login state.

Comment 61 Chris Murphy 2020-09-24 03:15:12 UTC
Created attachment 1716198 [details]
full journal, comment 60

Comment 62 Chris Murphy 2020-09-24 03:50:17 UTC
The problem is reproducible if I flip users:  first login to user 2, logout, login user1, logout, then login user2 fails with hang.

Comment 63 Benjamin Berg 2020-09-24 07:39:24 UTC
                  ├─2909 /usr/libexec/geoclue-2.0/demos/agent
                  └─2922 /usr/bin/abrt-applet --gapplication-service

I think those two processes do not open a connection to the display server. Which means that they will survive unless the desktop environment kills them at logout time (something that should be fixed by the systemd work that got merged upstream last week \o/).


[  437.246844] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel@1.service: Succeeded.
...
[  806.472598] systemd-logind[838]: New session 13 of user chris.
[  806.473290] systemd[1]: Started Session 13 of user chris.
...
[  807.068215] systemd[1196]: Started dbus-:1.2-org.kde.KSplash@1.service.
[  807.081646] ksplashqml[3792]: QCoreApplication::arguments: Please instantiate the QApplication object first
[  807.127226] systemd[1196]: dbus-:1.2-org.kde.KSplash@1.service: Succeeded.
[  807.206229] klauncher[3810]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)
[  807.375863] plasma_session[3801]: org.kde.plasma.session: process job  "kcminit_startup" finished with exit code  0
[  807.789431] systemd[1196]: Started dbus-:1.2-org.kde.ActivityManager@1.service.
[  808.056183] ksmserver[3866]: Qt: Session management error: networkIdsList argument is NULL
...
[  833.013871] kded5[3834]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.")


So, that seems odd. This is a timeout from the DBus server (see src/kglobalaccel.cpp). But I don't understand how that could happen. If the service was running, DBus should reply immediately. If not, it should start the corresponding transient unit and we should see that in the log (i.e. there should be "systemd[1196]: Started dbus-:1.2-org.kde.kglobalaccel@2.service.", the @2 as it is the second time it gets launched by that dbus server).

But neither appears to happen, which suggests that dbus-broker is stuck trying to activate the service but never does so.

Comment 64 Benjamin Berg 2020-09-24 09:41:45 UTC
Talked a bit with David Edmundson, and he pointed out that a kglobalaccel crash might make it wait for a coredump to be collected. But it is not clear whether that is relevant.

So, there are two things that might help pinpointing this further:
 1. Get a ps/systemd-cgls output after logout and/or at the time that the next login hangs.
 2. Get a "dbus-monitor --session" (can be grabbed from a different VT or via SSH)

Maybe that gives some more hints as to why service activation is simply not happening.


Other than that, there *might* be the following race condition in dbus-broker:
 1. First login logs out
 2. kglobalaccel crashes

[  437.050657] kglobalaccel5[1346]: The X11 connection broke (error 1). Did the X11 server die?
[  437.054968] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel@0.service: Succeeded.

 3. Something requests kglobalaccel (maybe just automatic dbus activation?)

[  437.092048] systemd[1196]: Started dbus-:1.2-org.kde.kglobalaccel@1.service.

 4. It fails immediately, as there is no X server anymore.
    Assumption: It does *not* register its name on the bus

[  437.113714] kglobalaccel5[2524]: qt.qpa.xcb: could not connect to display :0
[  437.246844] systemd[1196]: dbus-:1.2-org.kde.kglobalaccel@1.service: Succeeded.
[  437.294378] systemd-coredump[2530]: Process 2524 (kglobalaccel5) of user 1000 dumped core.

 5. This *could* mean that dbus-broker gets confused. The service *has* started from systemds point of view, but the name never appeared on the bus.
 6. Next login, dbus-broker sees its own old activation request (activation->pending == TRUE), and figures it just needs to wait for the existing activation to complete.

Comment 65 Benjamin Berg 2020-09-24 13:08:06 UTC
OK, easier test. Install

  https://koji.fedoraproject.org/koji/taskinfo?taskID=52153402

then post the log after triggering the bug. I have added a log message for activation, which means for each request you'll see:

  dbus-broker[X]: Activation request for $SERVICE queuing
or
  dbus-broker[X]: Activation request for $SERVICE already pending


If the theory that I floated in the last comment is correct, then you will see a number of

  dbus-broker[X]: Activation request for org.kde.kglobalaccel already pending

messages in the log, but no message with "queuing". Which would prove that dbus-broker really is still waiting on the kglobalaccel5 process that had already crashed during the last logout.

Comment 66 Steven A. Falco 2020-09-24 15:58:53 UTC
Created attachment 1716354 [details]
Logs as requested

I ran the requested experiment, and I've attached a tar of the log files.

Here is how they are named:

files with "before_any_login" in their name are from before any graphical login

files with "in_1" in their name are from after the first login

files with "out_1" in their name are from after the first logout

The naming pattern repeats.  On the third attempt to log in, I got the black screen.  Those logs have "in_3" in their names.

I do see a bunch of "Activation request ... already pending" messages in the messages_in_3.log file.  Most are from kglobalaccel, but one is from bluez.

If you need any different data or experiments run, please let me know.

Comment 67 Benjamin Berg 2020-09-24 16:24:59 UTC
Hmm, I think that confirms it, fishing out relevant stuff from the log:

Sep 24 11:44:17 localhost systemd-logind[808]: Session 7 logged out. Waiting for processes to exit.
Sep 24 11:44:17 localhost kglobalaccel5[1581]: The X11 connection broke (error 1). Did the X11 server die?
Sep 24 11:44:17 localhost systemd[918]: dbus-:1.2-org.kde.kglobalaccel@1.service: Succeeded.
Sep 24 11:44:17 localhost dbus-broker[961]: Activation request for org.kde.kglobalaccel queuing

So, session is closing already, kglobalaccel5 dies, then something re-activates it immediately. And it tries to start up:

Sep 24 11:44:17 localhost systemd[918]: Started dbus-:1.2-org.kde.kglobalaccel@2.service.
Sep 24 11:44:17 localhost kglobalaccel5[2207]: qt.qpa.xcb: could not connect to display :0
Sep 24 11:44:17 localhost kglobalaccel5[2207]: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
Sep 24 11:44:17 localhost kglobalaccel5[2207]: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.#012#012Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb.
Sep 24 11:44:18 localhost systemd[918]: dbus-:1.2-org.kde.kglobalaccel@2.service: Succeeded.

But fails. Then, later:

Sep 24 11:44:43 localhost systemd-logind[808]: New session 9 of user sfalco.
Sep 24 11:44:43 localhost systemd[1]: Started Session 9 of user sfalco.
...
Sep 24 11:44:45 localhost dbus-broker[961]: Activation request for org.kde.kglobalaccel already pending

And no "queuing" message anywhere after the one that failed. So dbus-broker still thinks (after 28 seconds) that kglobalaccel is starting up.

----

Now, the question is how to solve it. I think there is a dbus-broker bug, it should notice that services are failing without ever registering on the bus[1].

I personally suspect the best short term solution is to adopt the hack that GNOME has. i.e. restart the dbus systemd service at logout time. That might even get rid of those two processes that are holding on in the session scope.


[1] Adding BusName= to the transient unit would be the most elegant solution. However, template units are used, so if an old failed unit is still loaded then systemd might start complaining. So one might need to watch the transient service until the name is registered on the bus.

Comment 68 Zbigniew Jędrzejewski-Szmek 2020-09-24 16:56:21 UTC
I'm pretty sure there multiple overlapping bugs that make this so hard to diagnose. We need to fix them
one by one to chip away at the problem.

I tried to reproduce this by creating a pristine new user on F33 and
'dnf group install "KDE (K Desktop Environment)" --enablerepo=updates-testing',
logging in and logging out.

The right time to look at the process list is *after* logging out, and not when the subsequent
login fails. Once we try to log in again, new processes are hopelessly mixed with the old ones.

I get a failure to log out cleanly (lingering processes) in maybe 1/3 of attempts.
Interestingly, the hang is not indefinite, but seems semi-random. Sometimes the session
terminates on its own after a few seconds, sometimes a few minutes.

--

(In reply to Benjamin Berg from comment #39)
> The real problem are probably processes left over in the session scope. This
> is why I suggested running
> 
> $ systemd-cgls
> 
> and checking session-X.scope (of the old session). Those are the processes
> that prevent logind from cleaning up the session completely. It is much more
> likely they are causing issues than anything that is launched by DBus
> (visible somewhere underneath user@X.service). I think most of the processes
> listed in the above ps output fall into that category.

Yep. Any processes that are left over in the session scope session-NN.scope are already a problem.
Anything under user@X.service would only need to be considered once we eliminate the session.

$ loginctl session-status 40
40 - kde (1004)
	   Since: Thu 2020-09-24 17:56:40 CEST; 1min 23s ago
	  Leader: 288889 (gdm-session-wor)
	    Seat: seat0; vc7
	     TTY: tty7
	 Service: gdm-password; type x11; class user
	   State: online
	    Unit: session-40.scope
		  ├─288889 gdm-session-worker [pam/gdm-password]
		  ├─288927 /usr/bin/gnome-keyring-daemon --daemonize --login
		  ├─288949 /usr/libexec/gdm-x-session --register-session --run-script /usr/bin/startplasma-x11
		  ├─288951 /usr/libexec/Xorg vt7 -displayfd 3 -auth /run/user/1004/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -novtswitch -verbose 3
		  ├─289046 /usr/bin/startplasma-x11
		  ├─289085 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "/usr/bin/startplasma-x11"
		  ├─289146 /usr/libexec/kf5/start_kdeinit
		  ├─289147 kdeinit5: Running...
		  ├─289148 /usr/libexec/kf5/klauncher --fd=9
		  ├─289168 /usr/bin/kded5
		  ├─289172 /usr/bin/kwin_x11

--

(In reply to Kamil Páral from comment #12)
> Then I experimented with KillUserProcesses=Yes in /etc/systemd/logind.conf.
(In reply to Benjamin Berg from comment #31)
> In bug #1817708 it is notable that logind does not show any message that the
> the session has completely died
> (https://bugzilla.redhat.com/attachment.cgi?id=1673873), i.e.
> 
> Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: Session 2 logged
> out. Waiting for processes to exit.
> Mar 26 20:05:34 localhost.localdomain systemd-logind[789]: New session c2 of
> user sddm.
> Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: New session 5 of
> user jim.
> Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Session c2 logged
> out. Waiting for processes to exit.
> Mar 26 20:05:44 localhost.localdomain systemd-logind[789]: Removed session
> c2.
> 
> i.e. there is no "Removed session 2" in that log.

Right. At least for me, after I click "exit", logind thinks the session is still active: "State: online".
This means that KillUserProcesses= does not matter: systemd would only kill processes *after*
the session has finished.

Ultimately, I think that this is a problem with the kde session: the process
controlling the session does not exit. Things go south from there.

--

(In reply to Chris Murphy from comment #60)
> $ loginctl
> SESSION  UID USER  SEAT  TTY  
>      10 1001 user2 seat0      
>      13 1000 chris seat0      
>       2 1000 chris seat0      
>       9 1000 chris       pts/0
> 
> 
> This session isn't even responding to 'loginctl kill-session 13' nor
> 'loginctl terminate-session 13'
> 
> $ loginctl session-status 13
> 13 - chris (1000)
>            Since: Wed 2020-09-23 20:57:26 MDT; 7min ago
>           Leader: 3727
>             Seat: seat0; vc1
>          Display: :0
>          Service: sddm; type x11; class user
>          Desktop: KDE
>            State: closing
>             Unit: session-13.scope
>                   └─3838 /usr/bin/kwin_x11

The first clear bug that I see is that /usr/bin/kwin_x11 does not respond to SIGTERM.
If processes don't exit on SIGTERM but just hang around, we have problems.
Chris's trace above has this, and I also saw kwin_x11 as the lone remaining process in the session scope.

strace while doing 'kill 289172' three times with a few seconds inbetween and than finally SIGKILL:

restart_syscall(<... resuming interrupted read ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290235, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
poll([{fd=3, events=POLLIN}], 1, -1)    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290247, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
poll([{fd=3, events=POLLIN}], 1, -1)    = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=290258, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
poll([{fd=3, events=POLLIN}], 1, -1 <unfinished ...>) = ?
+++ killed by SIGKILL +++

I didn't capture the stack for this process. But in another hang which looked similar:
$ cat pstack.282115
Thread 2 (Thread 0x7fb93ffff640 (LWP 282128)):
#0  0x00007fb94b46db0e in ppoll () at /lib64/libc.so.6
#1  0x00007fb94b9f7ef0 in qt_safe_poll(pollfd*, unsigned long, timespec const*) () at /lib64/libQt5Core.so.5
#2  0x00007fb94b9f92b3 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5
#3  0x00007fb94b9af74b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5
#4  0x00007fb94b81e2f7 in QThread::exec() () at /lib64/libQt5Core.so.5
#5  0x00007fb94c977a47 in QDBusConnectionManager::run() () at /lib64/libQt5DBus.so.5
#6  0x00007fb94b81f4cc in QThreadPrivate::start(void*) () at /lib64/libQt5Core.so.5
#7  0x00007fb94acd53f9 in start_thread () at /lib64/libpthread.so.0
#8  0x00007fb94b478b03 in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7fb945b3e0c0 (LWP 282115)):
#0  0x00007fb94acdb6c2 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007fb94b824e6b in QWaitCondition::wait(QMutex*, QDeadlineTimer) () at /lib64/libQt5Core.so.5
#2  0x00007fb94c9c42a2 in QDBusPendingCallPrivate::waitForFinished() () at /lib64/libQt5DBus.so.5
#3  0x00007fb94c9c53c2 in QDBusPendingReplyData::argumentAt(int) const () at /lib64/libQt5DBus.so.5
#4  0x00007fb949eb8e2e in QDBusPendingReply<QList<int>, void, void, void, void, void, void, void>::Select<0>::Type const QDBusPendingReply<QList<int>, void, void, void, void, void, void, void>::argumentAt<0>() const () at /lib64/libKF5GlobalAccel.so.5
#5  0x00007fb949ebddd1 in KGlobalAccelPrivate::updateGlobalShortcut(QAction*, QFlags<KGlobalAccelPrivate::ShortcutType>, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5
#6  0x00007fb949ebe200 in KGlobalAccel::setShortcut(QAction*, QList<QKeySequence> const&, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5
#7  0x00007fb94cd29109 in void KWin::Workspace::initShortcut<KWin::Workspace, void (KWin::Workspace::*)()>(QString const&, QString const&, QKeySequence const&, KWin::Workspace*, void (KWin::Workspace::*)(), QVariant const&) [clone .constprop.0] () at /lib64/libkwin.so.5
#8  0x00007fb94ccb0ddb in KWin::Workspace::initShortcuts() () at /lib64/libkwin.so.5
#9  0x00007fb94cccfac0 in KWin::Workspace::Workspace() () at /lib64/libkwin.so.5
#10 0x00007fb94cc48ad0 in KWin::Application::createWorkspace() () at /lib64/libkwin.so.5
#11 0x0000561595197db5 in QtPrivate::QFunctorSlotObject<KWin::ApplicationX11::performStartup()::{lambda()#2}::operator()() const::{lambda()#1}, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) ()
#12 0x00007fb94b9df356 in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt5Core.so.5
#13 0x00005615951985b6 in QtPrivate::QFunctorSlotObject<KWin::ApplicationX11::performStartup()::{lambda()#2}, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) ()
#14 0x00007fb94b9df356 in void doActivate<false>(QObject*, int, void**) () at /lib64/libQt5Core.so.5
#15 0x00007fb94c8a1013 in KSelectionOwner::Private::claimSucceeded() () at /lib64/libKF5WindowSystem.so.5
#16 0x00007fb94c8a6d01 in KSelectionOwner::filterEvent(void*) () at /lib64/libKF5WindowSystem.so.5
#17 0x00007fb94b9adfec in QAbstractEventDispatcher::filterNativeEvent(QByteArray const&, void*, long*) () at /lib64/libQt5Core.so.5
#18 0x00007fb9452fd001 in QXcbConnection::handleXcbEvent(xcb_generic_event_t*) () at /lib64/libQt5XcbQpa.so.5
#19 0x00007fb9452fe516 in QXcbConnection::processXcbEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5XcbQpa.so.5
#20 0x00007fb94532102c in QXcbUnixEventDispatcher::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5XcbQpa.so.5
#21 0x00007fb94b9af74b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib64/libQt5Core.so.5
#22 0x00007fb94b9b7110 in QCoreApplication::exec() () at /lib64/libQt5Core.so.5
#23 0x0000561595196f83 in main ()

I don't know this framework, but my guess is that it installs a signal handler that sets some flag
to exit, but it's not integrated with the event loop (QEventLoop?).

Various people are seeing kwin_x11 as the culprit process. (I think there might be other issues,
but this one seems common and obvious.) I think this should be reassigned to kwin, though the
issue may ultimately lie in the framework.

kwin-5.19.5-2.fc33.x86_64
qt5-qtbase-5.15.1-1.fc33.x86_64

Comment 69 Benjamin Berg 2020-09-24 18:24:52 UTC
Interesting. I think you are right that kwin_x11 isn't properly shutting down. In the case of your stack trace kwin_x11 seems to be processing some last X11 messages before the X server was shutting down for good.

The stack trace looks like a sync dbus call to me (running a nested mainloop, but only serving the DBus socket). So kwin_x11 will likely quit 25s later when the call times out.

#5  0x00007fb949ebddd1 in KGlobalAccelPrivate::updateGlobalShortcut(QAction*, QFlags<KGlobalAccelPrivate::ShortcutType>, KGlobalAccel::GlobalShortcutLoading) () at /lib64/libKF5GlobalAccel.so.5

And, the dbus call seems related to the kglobalaccel service. So that would explain why that services is being restarted.

Comment 70 Zbigniew Jędrzejewski-Szmek 2020-09-24 18:41:01 UTC
So... 25s wouldn't be too bad, iff things were "blocked". It would be an annoying timeout, nothing more. But instead we try to start
a concurrent session, and then the remnants of the old session prevent the new one from starting. So we really need to fix
this both ways: first figure out why it's not quitting in a timely fashion (your direction with kglobalaccel seems relevant here),
but second we need to make sure the new session is not started before the old one exits, so the whole thing is robust is
something blocks exit.

Comment 71 Benjamin Berg 2020-09-24 19:23:26 UTC
Note though, that I expect KDE in F34 to switch to using systemd, and systemd is pretty good at making sure things shut down cleanly.

David Edmundson mentioned to me that he probably forgot to install a service file for kglobalaccel. Doing that is trivial and it would prevent dbus-broker from getting confused as we would have a proper BusName= in the unit file then.

Comment 72 Geoffrey Marr 2020-09-24 19:33:58 UTC
Discussed during the 2020-09-24 Fedora 33 Go/No-Go meeting: [0]

The decision to waive this bug's existing classification as an "AcceptedBlocker (Beta)" in the Go/No-Go decision was made under the "Difficult to fix blocker bugs" exception due to the difficulty in isolating the cause.

[0] https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2020-09-24/f33-beta-go_no_go-meeting.2020-09-24-17.00.txt

Comment 73 David Rheinsberg 2020-09-25 06:18:55 UTC
(In reply to Benjamin Berg from comment #67)
> Now, the question is how to solve it. I think there is a dbus-broker bug, it
> should notice that services are failing without ever registering on the
> bus[1].
> 
> I personally suspect the best short term solution is to adopt the hack that
> GNOME has. i.e. restart the dbus systemd service at logout time. That might
> even get rid of those two processes that are holding on in the session scope.
> 
> 
> [1] Adding BusName= to the transient unit would be the most elegant
> solution. However, template units are used, so if an old failed unit is
> still loaded then systemd might start complaining. So one might need to
> watch the transient service until the name is registered on the bus.

DBus messages have no timeouts. Any actual timeouts you see in client libraries are implemented in the client library. The broker never sees them (which, btw., leaks "reply-windows" in the broker). Hence, activation-requests from a client will never "time-out". They will simply remain pending until the activation fails. This does not preclude the broker from maintaining its own timeouts, but there is no standardized way to communicate these timeouts in a non-racy manner.

When you activate a service, dbus-broker will start the service specified by the dbus-service-file, or create a transient unit to execute the specified binary. In both cases, the only way to cause the activation to fail is to make this binary or service fail. If they return success, there is no way for dbus-broker to know what other things are going on, or what to track (just imagine one-shot units, you cannot know what side-effects they have).

If you want dbus-broker to be notified of a failed service activation, I highly suggest returning an error to the broker. A very simple solution is to provide your own service file, rather than relying on the transient unit, and then using `Type=dbus` as the systemd service type (we cannot do this for transient units, since they do not necessarily run on a bus that is tracked by systemd; think at-spi and friends).

I am open to suggestions, but I doubt another layer of timeouts would improve on anything :-/

Comment 74 Kamil Páral 2020-09-25 10:13:26 UTC
(In reply to Geoffrey Marr from comment #72)
> Discussed during the 2020-09-24 Fedora 33 Go/No-Go meeting: [0]
> 
> The decision to waive this bug's existing classification as an
> "AcceptedBlocker (Beta)" in the Go/No-Go decision was made under the
> "Difficult to fix blocker bugs" exception due to the difficulty in isolating
> the cause.
> 
> [0]
> https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2020-09-24/f33-beta-
> go_no_go-meeting.2020-09-24-17.00.txt

If a blocker is waived according to an exception rule, it's moved to the very next milestone per policy. Setting as a Final blocker.

Comment 75 Benjamin Berg 2020-09-25 10:46:54 UTC
> I am open to suggestions, but I doubt another layer of timeouts would improve on anything :-/

Hmm, yeah, it is ugly situation. You are probably right and there is no existing mechanism to solve the issue reliably other than timeouts (e.g. fd passing to the spawned process). In this case I was first imagining to watch the transient unit for failures (until it owns the name), but even that makes assumptions that may not be true.


Trying to summarize the situation:
 * I think the proper fix will happen with the systemd change, but that is for F34 only
 * The simplest short term fix is to ship systemd service units for the DBus activatable services (kglobalaccel specifically, but ideally all of them)

Shipping those units (and updating the dbus service files), should improve the situation a lot:

 * It will fix detection of the failed dbus service startup for kglobalaccel
 * This in turn works around kwin_x11 hanging during the dbus call
 * Which should fix kwin_x11 quitting, even if it is not graceful

Things will still be mess during shutdown (services randomly failing), but at least the state afterwards should become a lot cleaner. And, there may still be other issues, but it seems like it should fix a scenario that has been coming up quite frequently in this bug.

Comment 76 Sylvia Sánchez 2020-09-27 09:24:43 UTC
Last time I tested in a VM, Rawhide 33, it worked just fine. Log out and log in. Right now I tested it again, with KDE Spin beta and... well, kind of works. Log out is fine, but log in leaves me in a blank screen with only a moving cursor. Keyboard unresponsive and no other way out but forcing power off.

Comment 77 Adam Williamson 2020-10-04 17:13:49 UTC
Can someone please try implementing Benjamin's suggestion from #c75? It seems like a sensible idea to me. Rex, can you do that?

Comment 78 Geraldo Simião 2020-10-08 13:28:02 UTC
I have the same issue here, exactly behavior as described on Comment #76 By Sylvia on my F33 kde (instaled form the Fedora-KDE-Live-x86_64-33_Beta-1.3.iso).

Comment 79 Geraldo Simião 2020-10-13 14:56:22 UTC
I tried this on my setup without user2; just log in user1, log out, log in user1 - and the login stuck, everytime.

OS: Fedora 33 (KDE Plasma Prerelease) x86_64
Host: Aspire V3-571 V2.11
DE: Plasma 5.19.5
KDE Frameworks: 5.73.0
Qt: 5.15.1
WM: KWin 
WM Theme: Breeze 
Theme: Breeze Dark [Plasma], Adwaita [GTK2] 

Extract of the journalctl -r command right after I tried to login again (full journalctl-r attached to this) - host tamgamandapio/ user (only user on this setup) geraldo

out 13 11:32:35 tangamandapio systemd[1]: systemd-hostnamed.service: Succeeded.
out 13 11:32:31 tangamandapio ksmserver[2088]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no>
out 13 11:32:31 tangamandapio kwin_x11[2065]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not>
out 13 11:32:31 tangamandapio kded5[2061]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not re>
out 13 11:32:31 tangamandapio kactivitymanagerd[2081]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply",>
out 13 11:32:15 tangamandapio systemd[1]: user-990.slice: Consumed 1.331s CPU time.
out 13 11:32:15 tangamandapio systemd[1]: Removed slice User Slice of UID 990.

Comment 80 Geraldo Simião 2020-10-13 15:01:55 UTC
Created attachment 1721211 [details]
Reverse journalctl form my attempt to logout and login again on the same user

Reverse journalctl from the time I logout from the first session until the moment I needed to enter a tty3 session for reboot the system because the new login get stuck (only one user - geraldo).

Comment 81 Adam Williamson 2020-10-13 16:56:30 UTC
When you say you "tried this", what is "this"? I don't think you should have cancelled my needinfo for Rex, I was asking him to implement a packaging change for testing.

Comment 82 Geraldo Simião 2020-10-13 17:24:48 UTC
Created attachment 1721259 [details]
Coredumpctl gdb after the login stuck

Comment 83 Geraldo Simião 2020-10-13 17:31:38 UTC
(In reply to Adam Williamson from comment #81)
> When you say you "tried this", what is "this"? I don't think you should have
> cancelled my needinfo for Rex, I was asking him to implement a packaging
> change for testing.

Sorry Adam, I cancelled your needinfo by mistake, I didn't wanted to do that. :(
I hope I do it right this time.
 
The "this" was just login, logout and new login on the same user.

Comment 84 Adam Williamson 2020-10-14 20:01:35 UTC
So since no-one else seems willing to try and implement Ben's idea, I did. Though I'm no dbus expert so I don't really know if I did it right.

Here's the scratch build for testing:

https://koji.fedoraproject.org/koji/taskinfo?taskID=53469223

it adds a /usr/lib/systemd/system/kf5-kglobalaccel.service with this content:

===

[Unit]
Description=KGlobalaccel

[Service]
Type=dbus
BusName=org.kde.kglobalaccel
ExecStart=/usr/bin/kglobalaccel5

===

and adds a single line to /usr/share/dbus-1/services/org.kde.kglobalaccel.service:

SystemdService=kf5-kglobalaccel

that's all it does. Benjamin, if it needs more than that, please let me know. Thanks.

Comment 85 Rex Dieter 2020-10-14 21:07:47 UTC
Thanks Adam, with that much to start with, I can test and refine things.

As is, looks like that's not enough or right, I'm ending up with a session with no kglobalaccel5 running.  I think it has to be user/kglobalacces.service instead.  Testing theory now.

Comment 86 Adam Williamson 2020-10-14 21:12:12 UTC
oh yes, that sounds logical. I'll try it too.

Comment 87 Rex Dieter 2020-10-14 21:16:47 UTC
Confirmed, that worked to get kglobalaccel running properly in my session, but logout/login continues to get stuck.

loginctl session-status (still) reports some stuff lingering including
State: closing
Unit: session-2.scope:
... /usr/libexec/geoclue-2.0-demos/agent
... /usr/sbin/abrt-applet --gapplication-service 


progress?

Comment 88 Benjamin Berg 2020-10-14 21:22:56 UTC
Yeah, I think we need to give up the idea of getting rid of those lingering processes for F33. It isn't nice, but at least those two shouldn't be inhibiting the next login.

So hard to tell. You at least shouldn't see the messages that something is waiting for kglobalaccel at next login anymore. i.e.:

out 13 11:32:31 tangamandapio ksmserver[2088]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no>

Maybe check if you see similar messages for another service?

Comment 89 Rex Dieter 2020-10-14 21:30:30 UTC
Unfortunately, I'm seeing the same stuff in journalctl --user output after a stuck login:

kaccess: Couldn't start kglobalaccess from org.kde.kglobalaccess.service ...
plasmashell: Couldn't start kglobalaccess from org.kde.kglobalaccess.service ...
org_kde_powerdevil : Couldn't start kglobalaccess from org.kde.kglobalaccess.service ...

Comment 90 Rex Dieter 2020-10-14 21:31:30 UTC
Well, I'll get an official kf5-kglobalaccel build with this in , I don't think it will hurt.

Comment 91 Adam Williamson 2020-10-14 21:40:46 UTC
Unless I'm getting really lucky, the fix does seem to be helping me. I've just done 10 straight login / logout cycles without getting stuck.

Did you reboot after updating, Rex?

Comment 92 Adam Williamson 2020-10-14 21:41:24 UTC
that is, with https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 , which puts the unit in /user not /system as Rex suggested.

Comment 93 Benjamin Berg 2020-10-14 21:47:44 UTC
Hmm, oh, Rex may have run into the systemd restart limit.

i.e. if you were quicker than 10s, then systemd might be refusing to try starting the service again. The simple (and reasonable) fix for that would be to run "systemctl --user reset-failed" at login time before starting everything else.

Comment 94 Rex Dieter 2020-10-14 21:56:34 UTC
rebooted many times, I'll make sure to wait at least 10 seconds for subsquent tests,  build going here,
https://koji.fedoraproject.org/koji/taskinfo?taskID=53477863

Only minor change from what Adam had done was to use kglobalaccel.service name (instead of kf5-kglobalaccel)

Comment 95 Rex Dieter 2020-10-14 22:03:14 UTC
Still no luck, (sorry for the spam), login stuck even after waiting 30+ seconds and issuing systemctl --user reset-failed

Maybe what I committed is wrong, please review for errors:
systemd kglobalaccel.service (#1861700)
https://src.fedoraproject.org/rpms/kf5-kglobalaccel/c/75acba304693c7252b3f9cbe9fad255d0f1b6d7f?branch=f33

import kglobalaccel.service
https://src.fedoraproject.org/rpms/kf5-kglobalaccel/c/0030ad7eed9b5e9013e81cd8154afbb1bed9b7b0?branch=f33

Comment 96 Benjamin Berg 2020-10-14 22:17:41 UTC
Hmm, seems like those commits failed to build possibly due to a missing build-dependency on systemd-rpm-macros? Not sure what you actually tested now.

btw. you can always run "systemctl --user" or "systemctl --user status kglobalaccel.service" to check. If it worked, you'll see the service up and running there.

Comment 97 Rex Dieter 2020-10-14 22:25:40 UTC
Sorry for the build error, new one(s) going,
https://koji.fedoraproject.org/koji/buildinfo?buildID=1625458

Happily for my local build, 
systemctl --user status kglobalaccel.service
reports
Active: active (running)

Comment 98 Adam Williamson 2020-10-14 22:44:41 UTC
Rex: I think the .service in:

echo 'SystemdService=kglobalaccel.service' >> %{buildroot}%{_datadir}/dbus-1/services/org.kde.kglobalaccel.service

may be wrong. Or at least, I didn't have it in my build. My line is:

echo "SystemdService=kf5-%{framework}" >> %{buildroot}%{_datadir}/dbus-1/services/org.kde.kglobalaccel.service

note, no .service there.

Comment 99 Adam Williamson 2020-10-14 22:51:20 UTC
oh, I do see lots of existing files in that directory use the .service, though. So it's probably fine. :/

Comment 100 Geraldo Simião 2020-10-15 14:12:15 UTC
(In reply to Rex Dieter from comment #97)
> Sorry for the build error, new one(s) going,
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1625458
> 
> Happily for my local build, 
> systemctl --user status kglobalaccel.service
> reports
> Active: active (running)

Hello Rex,

I tried now with the new build you made (kf5-kglobalaccel-5.73.0-2.fc33) and the problem persists on my setup.
I will attach the journalctl -r file to the bug, the coredumpctl dbg and the systemctl user status too.

Here's a small part of the journalctl:

   out 15 10:21:31 tangamandapio plasma_session[2210]: org.kde.plasma.session: Starting autostart service  "/etc/xdg/autostart/polkit-kde-authentication-agent-1.desktop">
   out 15 10:21:31 tangamandapio plasma_session[2210]: org.kde.plasma.session: Starting autostart service  "/etc/xdg/autostart/xembedsniproxy.desktop" ("/usr/bin/xembeds>
   out 15 10:21:06 tangamandapio kactivitymanagerd[2257]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply",>
   out 15 10:21:06 tangamandapio kwin_x11[2241]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not>
   out 15 10:21:06 tangamandapio kded5[2237]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did not re>
   out 15 10:21:06 tangamandapio ksmserver[2264]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.NoReply", "Did no>
   out 15 10:20:50 tangamandapio systemd[1104]: run-user-990.mount: Succeeded.
   out 15 10:20:41 tangamandapio ksmserver[2264]: Qt: Session management error: networkIdsList argument is NULL
   out 15 10:20:41 tangamandapio systemd[1104]: Started dbus-:1.2-org.kde.ActivityManager@1.service.

Comment 101 Geraldo Simião 2020-10-15 14:14:47 UTC
Created attachment 1721848 [details]
Reverse journalctl with packages kf5-kglobalaccel-5.73.0-2.fc33

Comment 102 Geraldo Simião 2020-10-15 14:16:10 UTC
Created attachment 1721849 [details]
Coredumpctl gdb after the login stuck (new build kf5-kglobalaccel-5.73.0-2.fc33)

Comment 103 Geraldo Simião 2020-10-15 14:20:11 UTC
Created attachment 1721850 [details]
User service status (from the gui panel- after installation from build kf5-kglobalaccel-5.73.0-2.fc33)

Comment 104 Geraldo Simião 2020-10-15 14:22:02 UTC
Created attachment 1721855 [details]
systemctl user status (comand line) using kf5-kglobalaccel-5.73.0-2.fc33

Comment 105 Rex Dieter 2020-10-15 15:23:37 UTC
So baring miracles, if we want this fixed for GA, I think we'll still have to go with the logind KillUserProcesses=yes workaround for kde spin.

Comment 106 Adam Williamson 2020-10-15 15:45:04 UTC
Just out of curiosity, Geraldo, can you try with my https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 instead? You'll probably need to use dnf --downgrade as it'll be lower versioned than Rex's.

Rex, didn't we already establish above that KillUserProcesses didn't help?

Comment 107 Ben Cotton 2020-10-15 16:18:14 UTC
(In reply to Rex Dieter from comment #105)
> So baring miracles, if we want this fixed for GA, I think we'll still have
> to go with the logind KillUserProcesses=yes workaround for kde spin.

I'd rather ship with this bug than introduce surprising behavior this late in the process.

Comment 108 Benjamin Berg 2020-10-15 16:22:26 UTC
So, seems like the dbus servers (both dbus-broker *and* dbus-daemon) will not try to activate the service again. dbus-daemon seems to do so after 120s, dbus-broker probably never. I don't get that really, there is no problem with bombarding systemd with activation requests …

Anyway, I now propose the following fix:

/etc/xdg/plasma-workspace/shutdown/99-restart-dbus.sh
"""
#!/bin/sh

# Restart DBus as shutdown may not be clean leaving failed activation requests that prevent the next login
systemctl --user restart dbus.service
"""

Comment 109 Sammy 2020-10-15 16:37:39 UTC
I think I tried the "logind KillUserProcesses=yes" at some point, it works BUT it also killed my nohup jobs so it was unacceptable.

Comment 110 Rex Dieter 2020-10-15 17:11:52 UTC
Comment #108

Good idea, I'll try that.  restart rather than stop ?

Comment 111 Rex Dieter 2020-10-15 17:29:59 UTC
the initial suggestion led to logout getting stuck, but I suspect that's because dbus got pulled out from under it.  This seems better though,

---------------------
#!/bin/sh

# Restart DBus as shutdown may not be clean leaving failed activation requests that prevent the next login
(sleep 15 && systemctl --user stop dbus.service) &
---------------------

waiting a bit first, then doing the deed.  With this, dbus does get stopped.
The old session and some processes still linger (see below), but at least it's no longer preventing subsequent logins.

Benjamin (or anyone), can you clarify if 'stop' or 'restart' is preferable here, and if this looks agreeable, I'll stuff this into packaging somewhere (probably kde-settings-plasma)


--------------------------------

these processes remain after logout:

rdieter     1014  0.3  0.3  25048 14884 ?        Ss   12:16   0:00 /usr/lib/systemd/systemd --user
rdieter     1016  0.0  0.1  50364  5716 ?        S    12:16   0:00 (sd-pam)
rdieter     1266  0.0  0.1 523124  5180 ?        Sl   12:17   0:00 /usr/libexec/geoclue-2.0/demos/agent


and loginctl session-status output:
 2 - rdieter (1000)
           Since: Thu 2020-10-15 12:16:57 CDT; 2min 46s ago
          Leader: 1009
            Seat: seat0; vc1
         Display: :0
         Service: sddm; type x11; class user
         Desktop: KDE
           State: closing
            Unit: session-2.scope
                  └─1266 /usr/libexec/geoclue-2.0/demos/agent

Comment 112 Benjamin Berg 2020-10-15 17:34:11 UTC
You really should not be stopping dbus, so "restart" is the way to go!

Not sure about waiting 15s, I see the point of trying to delay it till everything else is finished, but 15s seems longer than reasonable to me.

Comment 113 Geraldo Simião 2020-10-15 17:38:58 UTC
(In reply to Adam Williamson from comment #106)
> Just out of curiosity, Geraldo, can you try with my
> https://koji.fedoraproject.org/koji/taskinfo?taskID=53475309 instead? You'll
> probably need to use dnf --downgrade as it'll be lower versioned than Rex's.
> 
> Rex, didn't we already establish above that KillUserProcesses didn't help?


Hi Adam,

I tried your build right now and it works!!!
I don't get stucked anymore on the login. I tried four times logoff and login and it worked fine. 

But instead, now the keyboard shortcuts doesn't work anymore. :(


[geraldo@tangamandapio ~]$ systemctl --user status kf5-kglobalaccel.service
● kf5-kglobalaccel.service - KGlobalaccel
     Loaded: loaded (/usr/lib/systemd/user/kf5-kglobalaccel.service; static)
     Active: inactive (dead)
[geraldo@tangamandapio ~]$ 


journalctl -r (if you want it I can attach the full journalctl from this last boot)

out 15 14:09:49 tangamandapio systemd[1104]: Started dbus-:1.2-ca.desrt.dconf@0.service.
out 15 14:09:49 tangamandapio systemd[1104]: Created slice dbus\x2d:1.2\x2dca.desrt.dconf.slice.
out 15 14:09:49 tangamandapio systemd[1104]: Started dbus-:1.2-org.kde.ActivityManager@0.service.
out 15 14:09:49 tangamandapio systemd[1104]: Created slice dbus\x2d:1.2\x2dorg.kde.ActivityManager.slice.
out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed.
out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed.
out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed.
out 15 14:09:49 tangamandapio kded5[1207]: Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.ServiceUnknown", "Cou>
out 15 14:09:49 tangamandapio dbus-broker-launch[1146]: Activation request for 'org.kde.kglobalaccel' failed.


[geraldo@tangamandapio ~]$ rpm -qa| grep  kf5-kglobalaccel
kf5-kglobalaccel-libs-5.73.0-1.2aw.fc33.x86_64
kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64
kf5-kglobalaccel-debuginfo-5.73.0-1.2aw.fc33.x86_64
kf5-kglobalaccel-libs-debuginfo-5.73.0-1.2aw.fc33.x86_64
[geraldo@tangamandapio ~]$

Comment 114 Benjamin Berg 2020-10-15 17:47:18 UTC
Well, to be fair, dbus is socket activated. So "stop" is also fine, as the service will be restarted automatically when anything tries to use it. GNOME is doing a restart, which is why I proposed that.

Comment 115 Geraldo Simião 2020-10-15 17:55:54 UTC
Created attachment 1721884 [details]
Reverse journalctl with build from Adam (kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64)

With this build - kf5-kglobalaccel-5.73.0-1.2aw.fc33.x86_64 I managed to logout and login again many times, sucessfully.
 
But now keybord shortcuts doesn't work anymore.

Comment 116 Adam Williamson 2020-10-15 18:22:48 UTC
Ah. So that sounds like my build doesn't so much "work" as it just...doesn't run kglobalaccel at all any more, for some reason? Which would tie in with the results we both saw, but isn't the right way to fix the bug...

Comment 117 Adam Williamson 2020-10-15 23:35:08 UTC
So I think what we can do here is this:

* Create an update with Benjamin's proposed "restart dbus on logout" hammer
* Do a candidate compose with it to see how it goes
* Don't push it stable unless that looks good

Could you do that, Rex? As long as we don't push the update stable, the change is trivially reversible.

Comment 118 Rex Dieter 2020-10-16 15:31:20 UTC
I'll work on it today.

Comment 119 Fedora Update System 2020-10-16 16:06:04 UTC
FEDORA-2020-f5be904258 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-f5be904258

Comment 120 Rex Dieter 2020-10-16 16:09:34 UTC
Details about what was committed,
https://src.fedoraproject.org/rpms/kde-settings/c/34266bc9b362b3db9f0ef5c7e4a3d0154a02f8a9

Comment 121 Adam Williamson 2020-10-16 16:12:52 UTC
Thanks Rex!

Kamil, Geraldo - can you test?

Comment 122 Rex Dieter 2020-10-16 16:14:48 UTC
FYI, the shutdown script (still) allows some time, 8 seconds, before executing the user dbus restart hack.  So you'll need to wait at least that long during logout/login tests.

Comment 123 Vitaly Zaitsev 2020-10-16 17:01:07 UTC
Why not let systemd kill dbus service automatically on user logout?

Comment 124 Rex Dieter 2020-10-16 17:59:40 UTC
Re: comment #123

How is that supposed to work?  Isn't that the point of this bug, that is apparently not happening by itself, or is there something else that needs to happen to trigger that?

Comment 125 Geraldo Simião 2020-10-16 19:42:47 UTC
(In reply to Adam Williamson from comment #121)
> Thanks Rex!
> 
> Kamil, Geraldo - can you test?

Adam, I just have tested these new builds from Rex:
kde-settings-33.0-2.fc33.noarch.rpm  
kde-settings-plasma-33.0-2.fc33.noarch.rpm  
kde-settings-pulseaudio-33.0-2.fc33.noarch.rpm  
qt-settings-33.0-2.fc33.noarch.rpm

I made five rounds logout and login again. It only get stucked one time, the other four attempts have been successfully completed!! :D

I have two coredumps from the kglobalaccel5 (debugg attached), but they occurred on two successfull logins.
After the complete logout I have waited at least 10 seconds for each new login attempt.

I wil try to continue that, to see if get stucked again.

Comment 126 Geraldo Simião 2020-10-16 19:45:19 UTC
Created attachment 1722149 [details]
Reverse journalctl with build  kde-settings-33.0-2.fc33

Reverse Journalctl from a successful logout-login attempt.

Comment 127 Geraldo Simião 2020-10-16 19:48:04 UTC
Created attachment 1722150 [details]
Two coredumps debugged from two different logout-login rounds

After update from the packages kde-settings-33.0-2.fc33.
Two coredumpctl gdb outputs: both from kglobalaccel5
(succesful login)

Comment 128 Geraldo Simião 2020-10-16 19:52:17 UTC
Created attachment 1722152 [details]
Kglobalaccel service running

Comment 129 Geraldo Simião 2020-10-16 20:15:10 UTC
Created attachment 1722165 [details]
systemctl--user-status

Comment 130 Geraldo Simião 2020-10-16 20:21:00 UTC
Created attachment 1722166 [details]
systemctl--user-status after successful logout-login

Comment 131 Geraldo Simião 2020-10-16 20:40:30 UTC
Done 10 more logout-login rounds again. All successful here. All ended on a new working session.
But for each one it causes a coredump on /usr/bin/kglobalaccel5

Comment 132 Fedora Update System 2020-10-16 22:18:08 UTC
FEDORA-2020-f5be904258 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-f5be904258`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-f5be904258

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 133 Benjamin Berg 2020-10-17 09:59:03 UTC
The coredump for kglobalaccel5 is expected with the current workaround (i.e. we don't prevent the crash, we prevent it from breaking the next login). So you'll see the report at the next login.

Trying to fix/work around that might be possible, but maybe it is better to move that into a different issue to discuss it there. It only causes an annoying message that is safe to ignore.

Comment 134 Kamil Páral 2020-10-19 10:00:57 UTC
(In reply to Rex Dieter from comment #120)
> Details about what was committed,
> https://src.fedoraproject.org/rpms/kde-settings/c/
> 34266bc9b362b3db9f0ef5c7e4a3d0154a02f8a9

I've only tested this on a single-user system so far. And I believe this is a wrong way to go. Before this patch, I can do 5 successful and immediate logouts&logins of the *same user* in a row without any issues. After this patch, I haven't seen a single case where it would work. In my 5 attempts, I got 3 black screens during a subsequent login (no user action possible) and 2 desktops with a "plasma crashed" icon in the systray during which system actions were nonfunctional (sleep, reboot and poweroff were completely missing from the system menu, and lock and logout were present but did nothing when clicked on). Reproducer:

1. A freshly installed KDE system with only stable updates and the update from comment 119, only a single user account created.
2. Reboot.
3. Log in. Do some work, like opening a terminal, etc.
4. Log out.
5. Log in immediately again (don't wait 8 seconds).

Note that I intentionally didn't wait for the 8 seconds timeout, because the login screen doesn't show that timer anywhere and another login is not prevented, i.e. the user is not aware of any need to wait. This is a use case e.g. when you add yourself to a libvirt or mock group and are told to logout and login again. Or when something breaks in your desktop and you want it fixed.

I don't think it makes sense to try to fix a multi-user scenario and break a single-user scenario in the process.

Comment 135 Benjamin Berg 2020-10-19 10:13:19 UTC
Kamil, this problem has nothing to do with single/multi user scenarios (from a theoretical poitn of view, but I also reproduced it with only one user).

So, either you run into the race condition where kglobalaccel crashes, or you do not. Other people regularly hit the situation and logging back may *always* break for them unless rebooted, which is a lot worse than breaking an immediate re-login within a few seconds.

Comment 136 Kamil Páral 2020-10-19 12:04:06 UTC
(In reply to Benjamin Berg from comment #135)
> Kamil, this problem has nothing to do with single/multi user scenarios (from
> a theoretical poitn of view, but I also reproduced it with only one user).

Right, I guess I already established that in comment 12 paragraph 2, but I forgot.

When I tested the single user login/logout scenario, and I honored the secret 8 seconds delay in sddm, I didn't see any issues and I could consistently log in and log out repeatedly.

But then I followed up with multi-user tests. With the patch applied, I'm hitting a black screen on the 2nd or 4th logout, every time I test this (I tested it 5 times, with full reboots in between, and I did honor the 8 seconds delay in sddm). So the symptoms are the same as mentioned in comment 12 paragraph 3 (that time I tested KillUserProcesses=Yes).

Judging the likelihood of hitting a race condition is something I gladly leave to others :-) But at least in my experience, the patch doesn't fix the multi-user problem at all (the black screen just moves from login to logout) and it consistently breaks a single user logout+login if you're faster than 8 seconds in sddm. I understand the patch can improve certain cases, I just don't hit those often enough to be able to test them.

Comment 137 Geoffrey Marr 2020-10-19 17:49:23 UTC
Discussed during the 2020-10-19 blocker review meeting: [0]

The decision to waive this bugs classification as an "AcceptedBlocker (F33 Final)" and instead classify it as an "AcceptedBlocker (F34 Beta)" and an "AcceptedFreezeException (F33 Final)" was made under the "Difficult to fix blocker bugs" proviso at https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Exceptional_cases. We have tried multiple approaches to fix this bug and it's not going away, we are not confident that we'll be able to produce a clear fix any time soon. Tt will be made an F34 Beta blocker instead. 

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-10-19/f33-blocker-review.2020-10-19-16.01.txt

Comment 138 Fedora Update System 2020-10-22 18:24:54 UTC
FEDORA-2020-f5be904258 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 139 Kamil Páral 2020-10-26 10:32:38 UTC
I adjusted fields after F33 release. We're now targeting F34 Beta in order to fully fix this. There's some glimpse of hope in comment 71 that KDE using systemd in F34 could solve this for us.

Comment 140 Vitaly Zaitsev 2020-10-28 08:05:34 UTC
This bug is definitely related to dbus-broker, not sddm.

It can be fixed by switching to another D-Bus implementation (from dbus-broker to dbus-daemon).

Disable dbus-broker:

systemctl disable dbus-broker.service
systemctl --global disable dbus-broker.service

Enable dnf-daemon:

systemctl enable dbus-daemon.service
systemctl --global enable dbus-daemon.service

Remove dbus-broker from default presets:

sudo sed -e 's/dbus-broker/dbus-daemon/g' -i /usr/lib/systemd/user-preset/90-default-user.preset

Restart system:

sudo systemctl reboot

Comment 141 Yaroslav Sidlovsky 2020-10-28 13:58:04 UTC
Can confirm advice in #c140 helps.
Just checked in the test VM - switching back to dbus-daemon fixes this issue.
No more hanging dbus-broker processes waiting for something after logout.

Comment 142 Adam Williamson 2020-10-28 15:40:28 UTC
But we already discussed that, months ago, and tried it, and it didn't work. See comment #29 onwards.

Comment 143 Vitaly Zaitsev 2020-10-28 16:15:31 UTC
Tested on a clean installation in VM. It works for me. We need to do more tests.

Comment 144 Adam Williamson 2020-10-28 21:13:59 UTC
The bug is already partly fixed by the kglobalaccel change. If you're logging in and out with only one user and leaving 7 seconds between attempts it may well be fine. Remaining issues are if you cycle too fast for the timeout, and apparently switching between users (according to kparal).

Comment 145 Rex Dieter 2020-10-28 21:28:13 UTC
My apologies, looks like the kglobalaccel change never made it to bodhi.  I'll submit it now.

Comment 146 Fedora Update System 2020-10-28 21:31:50 UTC
FEDORA-2020-054b3b620d has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d

Comment 147 Fedora Update System 2020-10-29 01:52:35 UTC
FEDORA-2020-054b3b620d has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-054b3b620d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 148 Vitaly Zaitsev 2020-10-29 09:28:26 UTC
(In reply to Adam Williamson from comment #144)
> The bug is already partly fixed by the kglobalaccel change.

This fix doesn't work either.

> If you're logging in and out with only one user and leaving 7 seconds between attempts
> it may well be fine.

Waiting for 8 seconds is not a fix. This is a dirty hack.

The real solution is to get rid of the buggy dbus-broker. I suggest patching the systemd package SPEC to allow multiple Dbus implementations using rich dependencies.

Comment 149 Kamil Páral 2020-10-29 12:28:53 UTC
(In reply to Fedora Update System from comment #146)
> FEDORA-2020-054b3b620d has been submitted as an update to Fedora 33.
> https://bodhi.fedoraproject.org/updates/FEDORA-2020-054b3b620d

With this update applied, I'm still hitting issues mentioned in comment 134, but I no longer see the issues mentioned in comment 136.

Comment 150 Kamil Páral 2020-10-29 12:41:47 UTC
(In reply to Vitaly Zaitsev from comment #140)
> This bug is definitely related to dbus-broker, not sddm.

I applied the steps from comment 140, and then I followed the reproducer from comment 134, and I still see the same issues. I didn't do extensive testing, but it doesn't magically fix the issues, at least for me.

Comment 151 Yaroslav Sidlovsky 2020-10-29 12:50:48 UTC
> I applied the steps from comment 140, and then I followed the reproducer from comment 134, and I still see the same issues. I didn't do extensive testing, but it doesn't magically fix the issues, at least for me.

Check what processes are still running after switching to dbus-daemon & doing logout.
And paste it there. It could be another bug.
Current bug is caused by dbus-broker processes not exiting after logout.

Comment 152 Vitaly Zaitsev 2020-10-29 13:33:09 UTC
Reported this issue to upstream: https://github.com/bus1/dbus-broker/issues/244

Comment 153 Rex Dieter 2020-10-29 18:02:27 UTC
for me, the results are the same whether I use dbus-daemon or dbus-broker, testing twice each with our workarounds removed, I saw lingering processes:
/usr/libexec/geoclue-2.0/demos/agent
/usr/bin/abrt-applet

and session status perpetually with "State: closing"

Comment 154 Benjamin Berg 2020-10-30 09:34:23 UTC
So, the thing with this bug is that there are multiple related issue, which makes it really hard to understand what is happening, where things are going on and what can be fixed easily.

I would like to propose to move any remaining problems to a new bug, trying to focus on a specific issue. From what I can tell:

 1. geoclue and abrt-applet remaining after the session quits is bad. But so far there is been no prove that this causes any issues.
 2. dbus-broker and dbus-daemon does not make a fundamental difference. Failures are random though, so it might look like that during testing.
 3. We have an ugly workaround for dbus in place. With that workaround dbus-broker/dbus-daemon cannot really be at fault anymore! And the workaround will hide an number of KDE issues.

So, there are at least 5 inter-related problems here:
 * KDE not cleaning up autostarted applications
 * KDE not waiting for kwin and other processes to shutdown before quitting the session leader
 * DBus (both dbus-daemon *and* dbus-broker) not being able to handle activation properly if a previous activation has failed
 * kwin poking kglobaccel during shutdown and not setting the flag to prevent dbus activation
 * something getting stuck during logout (probably a KDE issue)

The workaround just "fixes" one of the above, preventing some others from causing issues. Again, most of the KDE side bugs will become a non-issue with KDE moving to systemd (earliest for F34), which would also hide any dbus problem.


Anyway, I am sure that the "logout stuck" problem is real. But I think it would be a good idea to move that to a new bug and debug it there.

Comment 155 Sylvia Sánchez 2020-10-30 10:18:36 UTC
I upgraded from F32 to F33 (KDE Spin) and now it's working.
(Shoot me if I know what happened)
I'll test again after the first important batch of updates, but so far so good.

Comment 156 Fedora Update System 2020-10-31 02:55:52 UTC
FEDORA-2020-4057c26351 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4057c26351`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4057c26351

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 157 Steven A. Falco 2020-10-31 15:26:38 UTC
After applying FEDORA-2020-4057c26351, I was not able to trigger the bug after a half dozen logout/login to the same user. However, I do notice that there are a lot of geoclue processes that are not cleaned up:

 fedora33# ps -fe | grep sfalco
 sfalco       911       1  0 11:11 ?        00:00:00 /usr/lib/systemd/systemd --user
 sfalco       912     911  0 11:11 ?        00:00:00 (sd-pam)
 sfalco      1130       1  0 11:11 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      1749       1  0 11:12 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      2357       1  0 11:13 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      2966       1  0 11:13 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      3560       1  0 11:14 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      4187       1  0 11:15 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      4794       1  0 11:16 ?        00:00:00 /usr/libexec/geoclue-2.0/demos/agent
 sfalco      5228     911  0 11:17 ?        00:00:00 /usr/bin/dbus-broker-launch --scope user
 sfalco      5229    5228  0 11:17 ?        00:00:00 dbus-broker --log 4 --controller 11 --machine-id 453c53fe1e0e40538136bd60780043d7 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000

Comment 158 Yaroslav Sidlovsky 2020-11-01 10:57:55 UTC
Just reinstalled Fedora 33 KDE Spin in the VM and updated it.
Can't confirm that update FEDORA-2020-4057c26351 (kf5-5.75) helps in any way - still black screen + mouse cursor after login.

Comment 159 Fedora Update System 2020-11-01 13:35:33 UTC
FEDORA-2020-4057c26351 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4057c26351

Comment 160 Fedora Update System 2020-11-02 01:10:33 UTC
FEDORA-2020-4057c26351 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4057c26351`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4057c26351

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 161 Sylvia Sánchez 2020-11-06 12:08:22 UTC
I tried a couple of times and logging out and in seemed to work. Now I tried again with kernels 5.18, 5.17 and 5.16... It doesn't work.
This is F33 KDE Spin (upgraded from F32).

Comment 162 Raphael Groner 2020-11-09 06:02:57 UTC
(In reply to Sylvia Sánchez from comment #161)
> I tried a couple of times and logging out and in seemed to work. Now I tried
> again with kernels 5.18, 5.17 and 5.16... It doesn't work.
> This is F33 KDE Spin (upgraded from F32).

This still prevents me from a valid system upgrade for our users. Can anyoune confirm to blame the kernel?

Comment 163 Fedora Update System 2020-11-12 03:05:46 UTC
FEDORA-2020-4057c26351 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 164 Yaroslav Sidlovsky 2020-11-13 10:12:48 UTC
Could this bug can be related - https://bugs.kde.org/show_bug.cgi?id=424408 ?

Comment 165 Benjamin Berg 2020-11-13 10:36:55 UTC
Yes, kwin incorrectly activates kglobalaccel again which then aborts.

Comment 166 Geraldo Simião 2020-11-14 12:26:55 UTC
With the new KDE Plasma 5.20.3 update, this bug reappeared on my F33 setup.
This update ==> https://bodhi.fedoraproject.org/updates/FEDORA-2020-d700fa11cf

Comment 167 Geraldo Simião 2020-11-15 11:59:10 UTC
After the update https://bodhi.fedoraproject.org/updates/FEDORA-2020-b6bf57746c now I can do logout and login many times again, without being stuck. Nice :) I tested both wayland and X sessions, both works fine. A lot of coredumps on the journal however, but it's working. I'm using kernel 5.9.8-200.fc33.x86_64.

Just to register the coredumps showing after the command coredumpctl:
TIME                            PID   UID   GID SIG COREFILE  EXE
Sun 2020-11-15 08:40:42 -03    4542  1000  1000   6 present   /usr/bin/kwalletd5
Sun 2020-11-15 08:40:42 -03    5207  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:42 -03    5206  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:42 -03    5270  1000  1000   6 present   /usr/bin/kglobalaccel5
Sun 2020-11-15 08:40:42 -03    5222  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:43 -03    5343  1000  1000   6 present   /usr/bin/kded5
Sun 2020-11-15 08:40:43 -03    5346  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:43 -03    5345  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:43 -03    5347  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:43 -03    5344  1000  1000   6 present   /usr/libexec/org_kde_powerdevil
Sun 2020-11-15 08:40:43 -03    5216  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    5233  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    5217  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    5244  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    4939  1000  1000   6 present   /usr/bin/akonadi_birthdays_resource
Sun 2020-11-15 08:40:44 -03    4940  1000  1000   6 present   /usr/bin/akonadi_contacts_resource
Sun 2020-11-15 08:40:44 -03    4947  1000  1000   6 present   /usr/bin/akonadi_migration_agent
Sun 2020-11-15 08:40:44 -03    5221  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    5218  1000  1000   6 present   /usr/libexec/drkonqi
Sun 2020-11-15 08:40:44 -03    5238  1000  1000   6 present   /usr/libexec/drkonqi

Comment 168 Yaroslav Sidlovsky 2020-11-15 12:14:30 UTC
Great news!
Crashes after logout could be fixed by patch for this issue: https://bugs.kde.org/show_bug.cgi?id=424408.

Comment 169 Vitaly Zaitsev 2020-11-15 14:30:59 UTC
Backported upstream fix: https://src.fedoraproject.org/rpms/plasma-workspace/pull-request/9

Please test, merge this PR and rebuild for F33 and Rawhide.

Comment 170 Geraldo Simião 2020-11-17 11:44:53 UTC
After new plasma-workspace-5.20.3-2.fc33 build: No more coredumps registered with logout/login in Plasma X session, not even kglobalaccel5. :)

However, with Plasma Wayland it continues showing a lot of crashes registered on the journal after logout/login procedure (but system is working fine anyway). 
Here are the coredumps showing after logout/login, only in the Plasma WAYLAND Session: 
TIME                            PID   UID   GID SIG COREFILE  EXE
Tue 2020-11-17 08:04:43 -03    4859  1000  1000   6 present   /usr/bin/kwalletd5
Tue 2020-11-17 08:04:43 -03    6476  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:43 -03    6482  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:43 -03    6471  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:43 -03    6553  1000  1000   6 present   /usr/libexec/kdeconnectd
Tue 2020-11-17 08:04:43 -03    6563  1000  1000   6 present   /usr/bin/kglobalaccel5
Tue 2020-11-17 08:04:43 -03    6480  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:43 -03    6472  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:44 -03    6649  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:44 -03    6647  1000  1000   6 present   /usr/libexec/org_kde_powerdevil
Tue 2020-11-17 08:04:44 -03    6651  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:44 -03    6648  1000  1000   6 present   /usr/bin/kded5
Tue 2020-11-17 08:04:44 -03    6650  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:44 -03    5284  1000  1000   6 present   /usr/bin/akonadi_contacts_resource
Tue 2020-11-17 08:04:44 -03    5291  1000  1000   6 present   /usr/bin/akonadi_migration_agent
Tue 2020-11-17 08:04:44 -03    6485  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:44 -03    5282  1000  1000   6 present   /usr/bin/akonadi_birthdays_resource
Tue 2020-11-17 08:04:45 -03    6538  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6498  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6483  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6473  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6495  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6499  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6536  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6486  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:45 -03    6481  1000  1000   6 present   /usr/libexec/drkonqi
Tue 2020-11-17 08:04:46 -03    6805  1000  1000   6 present   /usr/libexec/drkonqi

Comment 171 Adam Williamson 2021-02-05 17:13:36 UTC
So can we call this bug fixed at this point? Thanks!

Comment 172 Rex Dieter 2021-02-05 19:09:16 UTC
Plasma-5.21 support for systemd user sessions resolves this, yes.

Currently only in rawhide (f34), but it's working well enough we will likely do for f33 too when the time comes.

Comment 173 Adam Williamson 2021-02-05 21:36:40 UTC
well, comments suggested f33 updates improved the situation too?

Comment 174 Adam Williamson 2021-02-05 21:37:22 UTC
Taking off the F34 blocker list given Rex's comment.

Comment 175 Kamil Páral 2021-02-08 10:28:34 UTC
I'll make sure to test this in F34 and report back.

Comment 176 Robin Laing 2021-02-11 19:09:38 UTC
Hello,

I may be out of line for commenting on this thread but I am having issues in F33 and KDE that seem to be fitting this issue.

I have not tried some of the suggestions for logging but in both F32 and F33 I have had issues logging out of extra sessions where processes do not stop and in F33 they respawn if I try to just kill them.  The only way that I can get the session to end is to kill the sddm-helper process which starts at the login.

I use multiple sessions on a daily basis and will open and close extra sessions as required on one computer.

Comment 177 Kamil Páral 2021-02-16 09:05:49 UTC
I tested with a default installation of Fedora-KDE-Live-x86_64-34-20210215.n.0.iso, containing:
dbus-1.12.20-3.fc34.x86_64
dbus-broker-26-2.fc34.x86_64
kernel-5.11.0-0.rc7.149.fc34.x86_64
kf5-plasma-5.79.0-2.fc34.x86_64
plasma-workspace-5.21.0-1.fc34.x86_64
sddm-0.19.0-7.fc34.x86_64
systemd-247.3-1.fc34.x86_64

When logging in and out using the same user or two different users, KDE processes are crashing right and left, however the desktop seems to stay always usable (sometimes I see 10 seconds of a black screen after invoking the KDE menu, when it crashes, but it always resurrects itself after a while). Overall this problem seems to be resolved in F34.

Rex, should we close this or do you want to keep it open for F33?

Comment 178 Sammy 2021-02-16 14:51:01 UTC
Here is a different data point: I built all of the plasma 5.79.0 and desktop 5.21.0 rpms using source rpms from koji on FC33 (using mock). Everything is working well except the logout does not work. It freezes the screen and never goes back to sddm. If I do ctrl-alt-bckspace I get the sddm and can login again, which means that there are lingering processes to prevent me from login back in. So, the problem is that sddm is not restarted during the logout process. Any ideas? Thanks!

Comment 179 Sammy 2021-02-16 14:53:24 UTC
PS: I did update the sddm rpm as well as well as kde-settings.

Comment 180 Adam Williamson 2021-02-17 17:31:16 UTC
The switch to systemd user sessions in F34/Rawhide is a big one, I'm not sure just rebuilding the plasma packages on F33 would work - it might need something else you didn't backport.

Comment 181 Sammy 2021-02-17 20:32:19 UTC
I was hoping that someone with a deeper understanding would point it out to me. I did removed and reinstalled sddm, at which time it created a systemd link. Also, is it working under FC34? It does not seem to restart sddm or X (running X11 not Wayland).

Comment 182 Sammy 2021-03-30 21:27:38 UTC
My logins all started working after kf5-kglobalaccel-5.80.0-2.


Note You need to log in before you can comment on or make changes to this bug.