Bug 2082719 - kernel-4.18.0-358.el8.x86_64 doing strange things to sddm on centos stream 8
Summary: kernel-4.18.0-358.el8.x86_64 doing strange things to sddm on centos stream 8
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: sddm
Version: epel8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Troy Dawson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-06 21:44 UTC by Troy Dawson
Modified: 2022-10-03 08:42 UTC (History)
35 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2043771
Environment:
Last Closed: 2022-09-16 14:43:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Troy Dawson 2022-05-06 21:44:37 UTC
+++ This bug was initially created as a clone of Bug #2043771 +++

Description of problem:
Thus far, we are seeing two problems with KDE that only show up on kernel 4.18.0-358.el8 on CentOS Stream 8

1 - Unable to unlock your screen. - https://bugzilla.redhat.com/show_bug.cgi?id=2043322
2 - sddm login screen is completely blank

Both of these problems only show up on kernel 4.18.0-358.el8 and go away when you boot into an older kernel.
This happens on real hardware, as well as virtual machines.

Version-Release number of selected component (if applicable):
kernel-4.18.0-358.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start with CentOS Stream 8 machine.
2. Update to the latest update, with kernel 4.18.0-358.el8
3. Install KDE
dnf install epel-release -y
dnf config-manager --enable powertools
dnf group install kde-desktop -y
systemctl set-default graphical.target

4. To check sddm, enable it, and reboot
systemctl enable sddm -f
reboot

5. To check screen lock problem, log in via gdm, then lock screen
dnf install gdm -y
systemctl enable gdm -f
reboot
* login to Plasma (X11) or Plasma (Wayland) *
* lock screen *

Actual results:
sddm gives you just a black screen.  The mouse icon is there and moves with the mouse.
The screen lock problem, you see the clock, but nothing else, so you cannot log in.

Expected results:
sddm should have a graphical screen allowing you to log in.
The screen unlock should have a place to type in your password when you move your mouse, or type something on the keyboard.

Additional info:
We've tried turning selinux off, and on.
I've looked through several different logs, but I'm not finding anything different between booting between the the kernels.

--- Additional comment from Michel Dänzer on 2022-01-31 09:39:24 UTC ---

(In reply to Troy Dawson from comment #0)
> Both of these problems only show up on kernel 4.18.0-358.el8 and go away
> when you boot into an older kernel.

Which older version(s) have you tested, specifically?

--- Additional comment from Turing on 2022-01-31 13:03:33 UTC ---

As someone who has encountered this bug, I can tell you that 4.18.0-348.el8 (the immediately predecessor version) works fine.

--- Additional comment from Troy Dawson on 2022-01-31 14:14:36 UTC ---

Both 4.18.0-348.2.1.el8_5.x86_64 and 4.18.0-348.el8.x86_64 work correctly.

--- Additional comment from George on 2022-02-11 05:32:43 UTC ---

duplicate I raised here, Bug 2040536

Issue also in kernel 4.18.0-365.el8.x86_64

--- Additional comment from Lyude on 2022-03-16 22:51:06 UTC ---

Yeah, I can reproduce this on a VM using virgl just fine. Will start looking into this

--- Additional comment from Troy Dawson on 2022-03-23 15:53:40 UTC ---

I just found something similar on the KDE bugs.
https://bugs.kde.org/show_bug.cgi?id=445449
https://bugs.kde.org/show_bug.cgi?id=445385

Their bug is dealing with long sddm shutdown time.  It started with the 5.15.2 kernel, and goes away if they go back to the 5.14.16 kernel.
I am seeing these long sddm shutdowns as well, but since they don't stop you from logging in, I didn't worry about them.

Still investigating, but I wanted to put this information here incase anyone else wants to look as well.

--- Additional comment from Turing on 2022-03-23 17:02:11 UTC ---

I did want to note one thing I noticed when diagnosing this issue on my end. Switching to another TTY and using `loginctl unlock-session' did not unlock the session. In addition, there was some indication from the logs that the session might not have been registering properly with the systemd system.

--- Additional comment from Jonathan Sattelberger on 2022-04-14 16:23:43 UTC ---

Issue also in kernel 4.18.0-373.el8.x86_64.

--- Additional comment from Orion Poplawski on 2022-04-22 14:30:48 UTC ---

Still present with kernel-4.18.0-383.el8.x86_64 as well.

--- Additional comment from Troy Dawson on 2022-04-28 22:48:56 UTC ---

I've done some more debugging.
I'm not at an answer yet, but this is what I've found thus far.
It seems there are several things besides sddm that fail on the newer kernels (kscreenlocker_greet, akonadi, sddm-greeter) but I'll stick to sddm-greeter.

sddm-greeter crashes each time it starts, thus the blank/black screen.
This is the core_backtrace that I got from it.

{   "signal": 6
,   "executable": "/usr/bin/sddm-greeter"
,   "stacktrace":
      [ {   "crash_thread": true
        ,   "frames":
              [ {   "address": 140698901846607
                ,   "build_id": "d525c5b11962d363f9e9d3c119d2d46c92a0e3f5"
                ,   "build_id_offset": 322127
                ,   "function_name": "raise"
                ,   "file_name": "/lib64/libc.so.6"
                }
              , {   "address": 140698901663157
                ,   "build_id": "d525c5b11962d363f9e9d3c119d2d46c92a0e3f5"
                ,   "build_id_offset": 138677
                ,   "function_name": "abort"
                ,   "file_name": "/lib64/libc.so.6"
                }
              , {   "address": 140698915831879
                ,   "build_id": "6699bdd7da7c9ed74fd752ae36caf4a04a42ddc2"
                ,   "build_id_offset": 725063
                ,   "function_name": "qt_message_output(QtMsgType, QMessageLogContext const&, QString const&) [clone .cold.117]"
                ,   "file_name": "/lib64/libQt5Core.so.5"
                }
              , {   "address": 140698924449279
                ,   "build_id": "db93a4bda87e9b90d8dd083b00d903b020509acc"
                ,   "build_id_offset": 1424895
                ,   "function_name": "QGuiApplicationPrivate::createPlatformIntegration()"
                ,   "file_name": "/lib64/libQt5Gui.so.5"
                }
              , {   "address": 140698924450677
                ,   "build_id": "db93a4bda87e9b90d8dd083b00d903b020509acc"
                ,   "build_id_offset": 1426293
                ,   "function_name": "QGuiApplicationPrivate::createEventDispatcher()"
                ,   "file_name": "/lib64/libQt5Gui.so.5"
                }
              , {   "address": 140698917900745
                ,   "build_id": "6699bdd7da7c9ed74fd752ae36caf4a04a42ddc2"
                ,   "build_id_offset": 2793929
                ,   "function_name": "QCoreApplicationPrivate::init()"
                ,   "file_name": "/lib64/libQt5Core.so.5"
                }
              , {   "address": 140698924458115
                ,   "build_id": "db93a4bda87e9b90d8dd083b00d903b020509acc"
                ,   "build_id_offset": 1433731
                ,   "function_name": "QGuiApplicationPrivate::init()"
                ,   "file_name": "/lib64/libQt5Gui.so.5"
                }
              , {   "address": 140698924461720
                ,   "build_id": "db93a4bda87e9b90d8dd083b00d903b020509acc"
                ,   "build_id_offset": 1437336
                ,   "function_name": "QGuiApplication::QGuiApplication(int&, char**, int)"
                ,   "file_name": "/lib64/libQt5Gui.so.5"
                }
              , {   "address": 93952641413531
                ,   "build_id": "598bb947ec337a74313071c2c3e93758253446eb"
                ,   "build_id_offset": 111003
                ,   "function_name": "main"
                ,   "file_name": "/usr/bin/sddm-greeter"
                } ]
        } ]
}

I don't see anything that really points to the kernel, so I don't know why going back to an older kernel fixes things.
But I see alot of qt5 stuff.  There is and updated qt5 released for CentOS STream 8, I'm going to see if that does anything.

--- Additional comment from Piotr Golonka on 2022-05-06 10:13:57 UTC ---

FYI, just checked on CentOS Stream 9, and SDDM works correctly, whereas the latest CentosStream8 it fails (black screen as descrived above).
Both of the OSes installed today in the minimal "Custom Operating System" setup, enabled EPEL, installed openbox for a minimal session.

--- Additional comment from Piotr Golonka on 2022-05-06 10:39:21 UTC ---

Could it be related to https://bugzilla.redhat.com/show_bug.cgi?id=2057419 ?
If so, already pushed to F36... Could it be backported to EPEL?

--- Additional comment from Troy Dawson on 2022-05-06 13:36:12 UTC ---

(In reply to Piotr Golonka from comment #15)
> FYI, just checked on CentOS Stream 9, and SDDM works correctly, whereas the
> latest CentosStream8 it fails (black screen as descrived above).
> Both of the OSes installed today in the minimal "Custom Operating System"
> setup, enabled EPEL, installed openbox for a minimal session.

Correct, this only affect CentOS Stream 8 / epel8.
CentOS Stream 9 / epel9 is unaffected.
But it would be good to figure out what the problem is incase it comes up in 9.

--- Additional comment from Troy Dawson on 2022-05-06 13:40:23 UTC ---

(In reply to Piotr Golonka from comment #16)
> Could it be related to https://bugzilla.redhat.com/show_bug.cgi?id=2057419 ?
> If so, already pushed to F36... Could it be backported to EPEL?

That is the most promising thing yet.  Thank you very much.
I especially like that this line is removed
https://github.com/sddm/sddm/pull/1522/files#diff-d7898f8a43d0f15c2f009d1c0ac2efd089ee8b297f38e8f12c1d93a78bfa4b5eL57

I'll give that a try and see what happens.

--- Additional comment from Neal Gompa on 2022-05-06 14:47:26 UTC ---

It should be reasonably safe to backport sddm from Fedora to EPEL, and indeed I'd recommend doing so, since that code is much closer to the latest upstream code.

Comment 1 Troy Dawson 2022-05-06 21:49:34 UTC
I have moved this bug over to sddm on Fedora/EPEL 8.
Although the kernel triggers the bug, it isn't specific the the RHEL kernel. I believe this can be fixed in sddm.  And it is nice to not spam the RHEL kernel people who always have alot of bugs anyway.

Comment 2 Troy Dawson 2022-05-06 21:50:34 UTC
*** Bug 2043771 has been marked as a duplicate of this bug. ***

Comment 3 Troy Dawson 2022-05-06 22:05:59 UTC
It appears that something similar has happened in Fedora.
https://bugzilla.redhat.com/show_bug.cgi?id=2011991 - Black screen with cursor in the corner.
https://bugzilla.redhat.com/show_bug.cgi?id=2057419 - Total black screen.

The problem is that both of these have been patched and fixed in sddm.
I have rebuilt every Fedora sddm from the version in epel8 up to the latest in rawhide.  None of them totally fix the problem.
But, the latest ones in F36 and Rawhide have changed the screen from solid black, to Black Screen with a cursor in the corner.  So ... progress?

One thing I was wondering about what a udev change mentioned in this comment
https://bugzilla.redhat.com/show_bug.cgi?id=2011991#c4
I couldn't find what they were talking about anywhere.

Comment 4 Troy Dawson 2022-05-06 22:10:28 UTC
Other things I've tried that have had no impact:

Rebuilt the entire KDE Plasma Desktop stack on 5.23.5 - no help
Rebuild the entire KDE Plasma Desktop stack on last weeks F36 builds (Plasma 5.24.4) - nothing changed in a good way.  This required updating a few RHEL packages and it didn't fix the sddm issue, and made other things less stable.
Rebuild all the sddm packages in Fedora (as mentioned above), even the older ones. - no change other than was mentined above.
Changed the pausing times for sddm, as it waits for logind to come up. - nothing.

Comment 5 Troy Dawson 2022-05-10 14:10:15 UTC
Turns out this was a RHEL 8.6 kernel bug after all, but it was with process management.

"qt (and perhaps something else) rightly assumes that if the kernel supports P_PIDFD (qt
does the runtime check), then poll(pidfd) should work too, because
("pidfd: add P_PIDFD to waitid()") was the last one of those that became
available in kernel releases."

The bug is here, but it is a private bug.  
https://bugzilla.redhat.com/show_bug.cgi?id=2044587
I will keep this bug updated with the major updates, that I can make public.

The fix for the kernel will not be ready when RHEL 8.6 is released.  It's not even ready for QA.
I have tested the proposed solution and it does work.  But I don't know what side effects it has.

Comment 6 Troy Dawson 2022-05-23 16:22:25 UTC
Update to this bug.
The fixed kernel is in QA status.
That doesn't give us a timeline yet, there still might be regressions that need to be fixed.  But at least now it's in the intense testing.

Comment 7 Timothée Ravier 2022-05-27 15:48:30 UTC
Now verified.

Comment 8 Christian Dersch 2022-06-07 08:47:53 UTC
Kernel 4.18.0-394.el8.x86_64 seems to fix the issue for me. Thanks!

Comment 9 Troy Dawson 2022-06-07 13:31:35 UTC
Thanks for checking Christian.

Verified for CentOS Stream 8.
kernel-4.18.0-394.el8 works.

Still waiting for this to make it to RHEL 8.

Comment 10 Jonathan Sattelberger 2022-06-07 20:20:47 UTC
Issue appears to be resolved in CentOS Stream 8 with kernel-4.18.0-394.el8.x86_64.

Comment 11 Piotr Golonka 2022-06-08 16:41:36 UTC
Indeed seems to be resolved by the newest kernel. Tested with SDDM on CentOS Stream 8 with kernel as above, using VirtualBox.

Comment 12 Troy Dawson 2022-06-30 14:31:58 UTC
Updating to say that there is no update on a new kernel for RHEL 8.6.
It is possible that this bug will stick around until RHEL 8.7.

Again, this is fixed for CentOS Stream 8, but not for RHEL 8, or it's clones.

Comment 13 Jonathan Sattelberger 2022-06-30 15:54:22 UTC
Before this was fixed in Stream 8, I had recompiled qt5-qtbase with the following changes. It's dirty, but did the job to make KDE and X2Go function properly.

diff --git a/SOURCES/qt5-qtbase-p_pidfd-4-rhel.patch b/SOURCES/qt5-qtbase-p_pidfd-4-rhel.patch
new file mode 100644
index 0000000..4171055
--- /dev/null
+++ b/SOURCES/qt5-qtbase-p_pidfd-4-rhel.patch
@@ -0,0 +1,11 @@
+--- a/src/3rdparty/forkfd/forkfd_linux.c	2022-05-12 11:14:07.897191864 -0400
++++ b/src/3rdparty/forkfd/forkfd_linux.c	2022-05-12 12:18:04.129615756 -0400
+@@ -138,7 +138,7 @@
+ 
+     int state = ffd_atomic_load(&system_forkfd_state, FFD_ATOMIC_RELAXED);
+     if (state == 0) {
+-        state = detect_clone_pidfd_support();
++        state = -1;
+         ffd_atomic_store(&system_forkfd_state, state, FFD_ATOMIC_RELAXED);
+     }
+     if (state < 0) {
diff --git a/SPECS/qt5-qtbase.spec b/SPECS/qt5-qtbase.spec
index 77101f0..d763af9 100644
--- a/SPECS/qt5-qtbase.spec
+++ b/SPECS/qt5-qtbase.spec
@@ -41,7 +41,7 @@ BuildRequires: pkgconfig(libsystemd)
 Name:    qt5-qtbase
 Summary: Qt5 - QtBase components
 Version: 5.15.3
-Release: 1%{?dist}
+Release: 1.0.2%{?dist}
 
 # See LGPL_EXCEPTIONS.txt, for exception details
 License: LGPLv2 with exceptions or GPLv3 with exceptions
@@ -125,6 +125,9 @@ Patch100: kde-5.15-rollup-20220324.patch.gz
 # HACK to make 'fedpkg sources' consider it 'used"
 Source100: kde-5.15-rollup-20220324.patch.gz
 
+# HACK to ignore P_PIDFD support on RHEL 8.6, 8.7/c8s
+Patch1000: qt5-qtbase-p_pidfd-4-rhel.patch
+
 # Do not check any files in %%{_qt5_plugindir}/platformthemes/ for requires.
 # Those themes are there for platform integration. If the required libraries are
 # not there, the platform to integrate with isn't either. Then Qt will just
@@ -382,6 +385,9 @@ Qt5 libraries used for drawing widgets and OpenGL items.
 ## upstream patches
 %patch100 -p1
 
+## local patches
+%patch1000 -p1 -b .p-pidfd
+
 # move some bundled libs to ensure they're not accidentally used
 pushd src/3rdparty
 mkdir UNUSED

Comment 14 may29apr 2022-07-13 13:35:23 UTC
This explains how Qt is detecting PIDFD:

https://code.woboq.org/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html
static int detect_clone_pidfd_support()
{
    /*
     * Detect support for CLONE_PIDFD and P_PIDFD. Support was added in steps:
     * - Linux 5.2 added CLONE_PIDFD support in clone(2) system call
     * - Linux 5.2 added pidfd_send_signal(2)
     * - Linux 5.3 added support for poll(2) on pidfds
     * - Linux 5.3 added clone3(2)
     * - Linux 5.4 added P_PIDFD support in waitid(2)
     *
     * We need CLONE_PIDFD and the poll(2) support. We could emulate the
     * P_PIDFD support by reading the PID from /proc/self/fdinfo/n, which works
     * in Linux 5.2, but without poll(2), we can't guarantee the functionality
     * anyway.
     *
     * So we detect by trying to waitid(2) on a positive file descriptor that
     * is definitely closed (INT_MAX). If P_PIDFD is supported, waitid(2) will
     * return EBADF. If it isn't supported, it returns EINVAL (as it would for
     * a negative file descriptor). This will succeed on Linux 5.4.
     *
     * We could have instead detected by the existence of the clone3(2) system
     * call, but for that we would have needed to wait for __NR_clone3 to show
     * up on the libcs. We choose to go via the waitid(2) route, which requires
     * platform-independent constants only. It would have simplified the
     * sys_clone() mess above...
     */
    sys_waitid(P_PIDFD, INT_MAX, NULL, WEXITED|WNOHANG, NULL);
    return errno == EBADF ? 1 : -1;

And RHEL-8.x applies the upstream kernel patches in wrong order, so it breaks Qts assumption:
12526 * Mon Oct 18 2021 Augusto Caringi acaringi [4.18.0-348.3.el8]
12570 - pidfd: add P_PIDFD to waitid() (Chris von Recklinghausen) [1993665]
--> upstream 5.4 applied first (causes Qt breakage when applied alone)

https://rpmfind.net/linux/RPM/centos/8-stream/baseos/x86_64/Packages/kernel-4.18.0-394.el8.x86_64.html
  - pidfd: add polling support (Oleg Nesterov) [2044587]
--> upstream 5.3 applied later (fixes Qt breakage)

However 4.18.0-348.3 is RHEL-8.5 kernel which does not show the problem; the problem was created by 4.18.0-372 which is RHEL-8.6 kernel.
So there is still a missing factor.

Comment 15 kartlee 2022-07-13 14:23:50 UTC
Is this issue fixed in RHEL 9.0 release?

-Karthik

Comment 16 Troy Dawson 2022-07-13 14:26:03 UTC
(In reply to kartlee from comment #15)
> Is this issue fixed in RHEL 9.0 release?
This never affected RHEL 9.0, it is a RHEL 8.6 only bug.

Comment 17 Troy Dawson 2022-07-13 14:26:47 UTC
There is progress on getting this fixed in RHEL 8.6, instead of just waiting for RHEL 8.7.
No expected date yet, but the work has started.

Comment 18 Troy Dawson 2022-07-13 14:31:19 UTC
(In reply to may29apr from comment #14)
> However 4.18.0-348.3 is RHEL-8.5 kernel which does not show the problem; the
> problem was created by 4.18.0-372 which is RHEL-8.6 kernel.
> So there is still a missing factor.

Have you tried using kernel-4.18.0-394.el8 from CentOS Stream 8.
https://koji.mbox.centos.org/koji/buildinfo?buildID=21936
That is the current kernel that will go into RHEL 8.7.
It has fixed the problem for me and others.  If it doesn't fix the problem for you, it would be good to know now versus later.

Comment 19 kartlee 2022-07-13 14:45:35 UTC
Thanks Troy for the details.

Comment 20 may29apr 2022-07-14 14:18:19 UTC
What is the "QaTocalypse": 

"Refers to a bug caused by a certain kernel version which inadvertently introduced all kind of malfunctions for a multitude of applications relying on QT framework"

Comment 21 Troy Dawson 2022-08-01 14:29:30 UTC
Beginning of August update on this.
A test kernel has been built for affected customers to test.
I have tested that kernel and it does fix the KDE / SDDM issues.
Since this kernel is still in testing, I have no estimate for it's release.

Comment 22 Troy Volin 2022-08-08 10:07:40 UTC
I'm disappointed with the pace of resolving this RHEL 8.6 kernel bug, but obviously the EPEL bugzilla isn't the place for me to whine about it.
Is there a kernel build available that is release-candidate on the 8.6 path (rather than the 8.7 candidate kernel listed in comment 18 ?

Thanks

Comment 23 Troy Dawson 2022-08-08 16:13:30 UTC
The largest part of the delay has been miscommunication internally.

The private bug was marked as a kernel for RHEL 8.7.  Which is why the fix is in CentOS Stream 8 right now.
What I, and several others, didn't know was that we needed to set the bug to RHEL 8.6 to get the fix in RHEL 8.6.
So the bug just sat there, for several months.
We wondered why the progress was slow, while the kernel folk thought their work was done.

At one point, the right people got together and the miscommunication was discovered.
At that point, the kernel people started backporting the fix to RHEL 8.6.  But kernel bug fixes take time, mainly for testing.

I wish I knew about the bug settings.  This would have been fixed months ago.
But I didn't, and that's where we are now.

Comment 24 areis 2022-08-22 15:02:40 UTC
We're also experiencing this bug. We confirmed that this bug wasn't present in the last kernel from RHEL 8.5, but is present in the latest kernel from RHEL 8.6. I tested a new version of the kernel from CentOS Stream 8 and confirmed that it is fixed in that version.

What's the latest status of this issue? Can we expect a fix for RHEL 8.6 or will we have to wait for RHEL 8.7?

Comment 25 Josh Boyer 2022-08-22 15:15:35 UTC
For RHEL customers with active accounts, we highly encourage you to report issues in the Customer Portal.  Our support team will be able to link your cases to the proper issues and provide more timely updates.

Comment 26 Troy Dawson 2022-08-24 14:48:56 UTC
Just to update the progress on this.

The fix is now in one of the proposed RHEL 8.6 update kernels, working it's way through the testing process before a release.

I'm not going to guess how soon it will be released (I've been wrong each time I've tried to guess), and it's still possible it will get pulled out.
But, this means that it should get released in the RHEL 8.6 timeframe.  I'm hoping sooner, rather than later.

Comment 27 Troy Dawson 2022-09-13 13:36:55 UTC
kernel-4.18.0-372.26.1.el8_6 (the kernel that fixes this problem) has been released in RHEL 8.6 today.

I suspect it will be in Alma and Rocky by the end of this week, or beginning of next week.

Comment 28 Christoph Karl 2022-09-13 14:40:38 UTC
"Unable to unlock your screen" works now for me (RHEL8)

Comment 29 Troy Dawson 2022-09-16 14:43:57 UTC
It's been a long time coming.  The fix is out and available.
Thank you all for your patience.

Comment 30 Troy Volin 2022-10-03 08:42:44 UTC
Thanks for your persistence.


Note You need to log in before you can comment on or make changes to this bug.