Bug 2008529
Summary: | System freeze with kernel 5.14.x | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Sammy <umar> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 34 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, ego.cordatus, essin, geraldo.simiao.kutz, hdegoede, jarodwilson, jeremy, jforbes, jglisse, jonathan, josef, kernel-maint, kowaleskij, lgoncalv, linville, masami256, mchehab, nberrehouc, projects.rg, ptalbert, rkudyba, samar.vaishampayan, sanjay.ankur, steved | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-5.14.9-200.fc34 kernel-5.14.9-300.fc35 kernel-5.14.9-100.fc33 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-10-02 01:28:23 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 1827003 [details]
lspci
Looks like this bug: https://bugzilla.kernel.org/show_bug.cgi?id=214503 Indeed, for me same problem only with kernel 5.14.7 on F34 and F35. I will try https://bugzilla.kernel.org/show_bug.cgi?id=214503#c14 . I just built https://koji.fedoraproject.org/koji/taskinfo?taskID=76424019 with the revert from that bug. Want to give it a spin and see if that solves your issue? It is a scratch build, so not secure boot signed. Trying to get a reasonable solution before I push a new build. Will do and report....I had been using the koji test kernels up to 5.14.5-300 for fc34 without the problem, the next release kernel was 5.14.7-200 that started the problem. Considering that the patch was applied in 5.14.6 it fits the profile too. By the way, the new kernels are not cleanly removed anymore leaving a residual file called: modules.builtin.alias.bin which prevents the removal of the kernel directory. Working OK for 3 hours.....let's wait till morning here to be certain. Same problem with kernel-5.14.8-300.fc35.x86_64. # cat /sys/block/*/queue/scheduler mq-deadline kyber [bfq] none mq-deadline kyber [bfq] none mq-deadline kyber [bfq] none mq-deadline kyber [bfq] none mq-deadline kyber [bfq] none mq-deadline kyber [bfq] none none If after starting kernel-5.14.8-300.fc35.x86_64 I apply the command below then I have no more gel, it seems OK for moment. # echo mq-deadline | tee /sys/block/*/queue/scheduler # cat /sys/block/*/queue/scheduler [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none [mq-deadline] kyber bfq none none I am aware that 5.14.8-300.fc35 is still broken, which is why it is not in an update, I specifically asked about the linked 5.14.8-200.fc34 scratch build linked here, which is not the same as the 5.14.8 official builds in koji. Yes, I know. Using your unofficial built the system is stable almost 17 hours now....the problem seems to be resolved! Thanks. *** Bug 2008844 has been marked as a duplicate of this bug. *** Sorry it’s my fault, I confused the -200 and -300 versions of kernel 5.14.8 when I downloaded the packages. It seems to be fine with special 5.14.8-200.fc34.x86_64 from jforbes. Scheduler is [bfq]. 5 hours uptime without freeze. Another victim found. Here for me the problem is for Kernel: 5.14.8-300.fc35.x86_64 on F35 KDE (upgraded from a 34 install) with BTRFS. Today I experienced a crash when a KVM guest was running on my Aspire V3-571 V2.11 Intel i7-3632QM (8) @ 3.200GHz. It was running fine for more than three hours, before I started the VM and then, after some 20 minutes testing something on the guest machine the host OS hard locks up requiring me to hold the power button down. I can't get anything from dmesg, or from journalctl. I don't have secure boot enabled: mokutil --sb-state =>SecureBoot disabled FEDORA-2021-07f46cd951 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951 FEDORA-2021-884d245ef8 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8 FEDORA-2021-e0d6215753 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753 FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-07f46cd951` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-884d245ef8` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-e0d6215753` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. Yesterday, I experienced slow internet and as a consequence would not authenticate fedora id to upload relval results. I had to shutdown and reboot my system to get good internet speed. This issue might be attributed to new kernels of pre release f35. By the way....there seems to be 2 sammy's, the original bug opener and the one from comment #21. Please be aware. Comment #21 is not related to this bug. Justin, it seems that with kernel 5.14.9 (https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951) the problems are gone. All working fine here, since yesterday, running VMs, doing tests, working as usual and no freezes. Suspend is working fine too. Here's my setup: Operating System: Fedora Linux 35 KDE Plasma Version: 5.22.5 KDE Frameworks Version: 5.85.0 Qt Version: 5.15.2 Kernel Version: 5.14.9-300.fc35.x86_64 (64-bit) Graphics Platform: X11 Processors: 8 × Intel® Core™ i7-3632QM CPU @ 2.20GHz Memory: 15.4 GiB of RAM Graphics Processor: Mesa Intel® HD Graphics 4000 No more bug here for me. Indeed, no more freezes with kernel-5.14.9-300.fc35.x86_64. Good job! FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. I am experiencing this exact issue on 5.14.14-300.fc35.x86_64. System Info: Operating System: Fedora Linux 35 KDE Plasma Version: 5.22.5 KDE Frameworks Version: 5.87.0 Qt Version: 5.15.2 Kernel Version: 5.14.14-300.fc35.x86_64 (64-bit) Graphics Platform: Wayland Processors: 8 × 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz Memory: 31.1 GiB of RAM Graphics Processor: Mesa Intel® Xe Graphics I have what may be the same problem with Linux fedora 5.14.18-300.fc35.x86_64 This is the scenario: 1 - Wake system from suspend 2 - dnf update 3 - apply updates - the problem happens when the kernel or sometimes other modules are replaced. update never prompts to reboot (goor or bad?) 4 - Then attempt some other operation such as: open Firefox, run df, etc... 5 - System freezes requiring force shutdown and reboot. This has been happening for about the past month This is the output of lshw -short: H/W path Device Class Description ================================================================= system System Product Name (SKU) /0 bus PRIME Z590-A /0/0 memory 64KiB BIOS /0/4d memory 64GiB System Memory /0/4d/0 memory [empty] /0/4d/1 memory 32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n /0/4d/2 memory [empty] /0/4d/3 memory 32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n /0/5e memory 640KiB L1 cache /0/5f memory 2560KiB L2 cache /0/60 memory 20MiB L3 cache /0/61 processor Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz /0/100 bridge Intel Corporation /0/100/2 /dev/fb0 display CometLake-S GT2 [UHD Graphics 630] /0/100/14 bus Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Cont /0/100/14/0 usb1 bus xHCI Host Controller /0/100/14/0/2 bus USB2.0 Hub /0/100/14/0/2/3 scsi6 storage USB 2.0 FD /0/100/14/0/2/3/0.0.0 /dev/sde disk 16GB USB 2.0 FD /0/100/14/0/2/3/0.0.0/0 /dev/sde disk 16GB /0/100/14/0/2/3/0.0.0/0/2 /dev/sde2 volume 15EiB Windows FAT volume /0/100/14/0/4 input AURA LED Controller /0/100/14/0/7 bus 4-Port USB 2.1 Hub /0/100/14/0/7/4 bus 4-Port USB 2.1 Hub /0/100/14/0/7/4/4 bus 4-Port USB 2.1 Hub /0/100/14/0/8 bus USB2.0 Hub /0/100/14/0/8/1 bus USB2.0 Hub /0/100/14/0/c bus USB2.0 Hub /0/100/14/0/c/1 input Yubico Yubikey II /0/100/14/0/c/2 input USB Optical Mouse /0/100/14/0/c/4 input Das Keyboard /0/100/14/1 usb2 bus xHCI Host Controller /0/100/14/1/6 bus 4-Port USB 3.1 Hub /0/100/14/1/6/4 bus 4-Port USB 3.1 Hub /0/100/14/1/6/4/4 bus 4-Port USB 3.1 Hub /0/100/14/1/7 bus USB3.1 Hub /0/100/14/1/7/1 bus USB-C Dual Drive Dock /0/100/14/1/7/1/1 scsi8 storage Dual Drive Dock 2 /0/100/14/1/7/1/1/0.0.0 /dev/sdd disk 750GB Drive Dock 2 /0/100/14/1/7/1/1/0.0.0/1 /dev/sdd1 volume 698GiB EXT4 volume /0/100/14/1/7/2 scsi7 storage Mobius Pro 2C /0/100/14/1/7/2/0.0.0 /dev/sdb disk 3TB Pro 2C Disk 1 /0/100/14/1/7/2/0.0.0/1 /dev/sdb1 volume 2794GiB EXT4 volume /0/100/14/1/7/2/0.0.1 /dev/sdc disk 4TB Pro 2C Disk 2 /0/100/14/1/7/2/0.0.1/1 /dev/sdc1 volume 3726GiB EXT4 volume /0/100/14.2 memory RAM memory /0/100/15 bus Tiger Lake-H Serial IO I2C Controller #0 /0/100/15.1 bus Intel Corporation /0/100/16 communication Tiger Lake-H Management Engine Interface /0/100/17 scsi0 storage Intel Corporation /0/100/17/0 /dev/sda disk 1TB Samsung SSD 860 /0/100/17/0/1 /dev/sda1 volume 99MiB Windows FAT volume /0/100/17/0/2 /dev/sda2 volume 15MiB reserved partition /0/100/17/0/3 /dev/sda3 volume 149GiB Windows NTFS volume /0/100/17/0/5 /dev/sda5 volume 781GiB EXT4 volume /0/100/17/1 /dev/cdrom disk DVD+-RW DVD8881 /0/100/1b bridge Intel Corporation /0/100/1b.2 bridge Intel Corporation /0/100/1b.2/0 enp2s0 network Ethernet Controller I225-V /0/100/1c bridge Intel Corporation /0/100/1d bridge Tiger Lake-H PCI Express Root Port #9 /0/100/1f bridge Intel Corporation /0/100/1f.3 multimedia Intel Corporation /0/100/1f.4 bus Tiger Lake-H SMBus Controller /0/100/1f.5 bus Tiger Lake-H SPI Controller I have another system with the same os and app software but with a Ryzen 5 5600G processor. That configuration has yet to exhibit this behavior. What does anyone suppose is causing this? Video? or something else? |
Created attachment 1827002 [details] journalctl -b I am having complete system freeze with kernels 5.14.x on FC34. The system runs fine on 5.13.x kernels. The progression is weird. After rebooting I can work for a long time on the system but few hours after leaving the system the freeze starts and goes into full hard freeze requring a reboot. The unusual thing is that the freeze is somewhat gradual, first I notice a major slowdown of http and ssh connections, then a full stop while ping responds for a while than complete silence. Coming back to the office the workstation requires a power off/on (power is still on) to restart. It almost resembles a sleep/hibernation...however this does not happen with 5.13 kernels. This happens if I leave the system logged in or logged out. Running sddm. I have done a full system diagnostics and found no problems. The system is DELL PRECISION 7920 (bought this year) with the latest BIOS. I have seen in some posts that using pcie_aspm=off in the kernel line. Trying this today but this is my server system so I don't want to leave it idle very long. Running sshd, httpd, and postfix servers on this. I am attaching journalctl -b and lspci outputs (logs show nothing when system freezes).