1. Please describe the problem: After upgrading to kernel 5.12.6 / 5.12.7 my system hangs after a random time, but within minutes. 2. What is the Version-Release number of the kernel: 5.12.7 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : All is fine on 5.11.20, since 5.12.6 this happens. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: - Boot system in kernel 5.12.7 - Log in to X - Within minutes the system is frozen 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Not tested yet; will do. 6. Are you running any modules that not shipped with directly Fedora's kernel?: nvidia 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. The system freezes hard; no logs are available. However at one instant I noticed nvme module related errors on console. No details are available, because it scrolled out of sight. Tried kdump, and sysrq, but so far nothing can be extracted. I know this report lacks information; so I consider it a placeholder for a) more info to be added leter b) other people who have similar issues. Hardware: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05) 00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 05) 00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller 00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1 00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode] 00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #21 (rev f0) 00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0) 00:1c.7 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #8 (rev f0) 00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0) 00:1f.0 ISA bridge: Intel Corporation 200 Series PCH LPC Controller (B250) 00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller 00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio 00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller 01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) 01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
Checked rawhide kernel 5.13.0-0.rc3.20210527gitad9f25d33860.28.fc35, the system froze as well, possibly faster. Removed nvidia proprietary drivers, the system froze as well.
This might be a case for using netconsole and a 2nd computer. https://wiki.archlinux.org/title/General_troubleshooting#netconsole
An alternative is to independently confirm 5.12.5 works and 5.12.6 fails by compiling upstream source, if if confirmed, then do a git bisect which will help find what commit broke 5.12.6.
Created attachment 1788187 [details] crash1 oops
Created attachment 1788188 [details] crash2 oops
netconsole proved to be really useful. Attached two oopses, both related to bcache. I'll reach out to the bcache devs.
https://www.spinics.net/lists/linux-bcache/msg10224.html
https://www.spinics.net/lists/linux-bcache/msg10127.html: "This is caused by a hidden issue which is triggered by the bio code change in v5.12. The attached patch can help to avoid the panic, and the finally fixes are under testing and will be posted very soon."
Fixed in kernel 5.12.11-200