Bug 2008529

Summary: System freeze with kernel 5.14.x
Product: [Fedora] Fedora Reporter: Sammy <umar>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 34CC: acaringi, adscvr, airlied, alciregi, bskeggs, ego.cordatus, essin, geraldo.simiao.kutz, hdegoede, jarodwilson, jeremy, jforbes, jglisse, jonathan, josef, kernel-maint, kowaleskij, lgoncalv, linville, masami256, mchehab, nberrehouc, projects.rg, ptalbert, rkudyba, samar.vaishampayan, sanjay.ankur, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-5.14.9-200.fc34 kernel-5.14.9-300.fc35 kernel-5.14.9-100.fc33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-02 01:28:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl -b
none
lspci none

Description Sammy 2021-09-28 13:14:44 UTC
Created attachment 1827002 [details]
journalctl -b

I am having complete system freeze with kernels 5.14.x on FC34. The system runs fine on 5.13.x kernels.

The progression is weird. After rebooting I can work for a long time on the system but few hours after leaving the system the freeze starts and goes into full hard freeze requring a reboot. The unusual thing is that the freeze is somewhat gradual, first I notice a major slowdown of http and ssh connections, then a full stop while ping responds for a while than complete silence. Coming back to the office the workstation requires a power off/on (power is still on) to restart.
It almost resembles a sleep/hibernation...however this does not happen with 5.13 kernels.

This happens if I leave the system logged in or logged out. Running sddm.

I have done a full system diagnostics and found no problems. The system is DELL PRECISION 7920 (bought this year) with the latest BIOS.

I have seen in some posts that using pcie_aspm=off in the kernel line. Trying this today but this is my server system so I don't want to leave it idle very long. Running sshd, httpd, and postfix servers on this.

I am attaching journalctl -b and lspci outputs (logs show nothing when system freezes).

Comment 1 Sammy 2021-09-28 13:16:38 UTC
Created attachment 1827003 [details]
lspci

Comment 2 Sammy 2021-09-28 15:33:13 UTC
Looks like this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=214503

Comment 3 Nicolas Berrehouc 2021-09-28 18:34:03 UTC
Indeed, for me same problem only with kernel 5.14.7 on F34 and F35.
I will try https://bugzilla.kernel.org/show_bug.cgi?id=214503#c14 .

Comment 4 Justin M. Forbes 2021-09-28 19:10:38 UTC
I just built https://koji.fedoraproject.org/koji/taskinfo?taskID=76424019 with the revert from that bug. Want to give it a spin and see if that solves your issue? It is a scratch build, so not secure boot signed.  Trying to get a reasonable solution before I push a new build.

Comment 5 Sammy 2021-09-28 19:52:34 UTC
Will do and report....I had been using the koji test kernels up to 5.14.5-300 for fc34 without the problem, the next release kernel was 5.14.7-200 that started the problem. Considering that the patch was applied in 5.14.6 it fits the profile too.

Comment 6 Sammy 2021-09-28 19:59:58 UTC
By the way, the new kernels are not cleanly removed anymore leaving a residual file called:


modules.builtin.alias.bin

which prevents the removal of the kernel directory.

Comment 7 Sammy 2021-09-28 22:54:43 UTC
Working OK for 3 hours.....let's wait till morning here to be certain.

Comment 8 Nicolas Berrehouc 2021-09-29 05:23:55 UTC
Same problem with kernel-5.14.8-300.fc35.x86_64.

# cat /sys/block/*/queue/scheduler
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
none

If after starting kernel-5.14.8-300.fc35.x86_64 I apply the command below then I have no more gel, it seems OK for moment.

# echo mq-deadline | tee /sys/block/*/queue/scheduler

# cat /sys/block/*/queue/scheduler
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
none

Comment 9 Justin M. Forbes 2021-09-29 11:53:32 UTC
I am aware that 5.14.8-300.fc35 is still broken, which is why it is not in an update, I specifically asked about the linked 5.14.8-200.fc34 scratch build linked here, which is not the same as the 5.14.8 official builds in koji.

Comment 10 Sammy 2021-09-29 12:25:45 UTC
Yes, I know. Using your unofficial built the system is stable almost 17 hours now....the problem seems to be resolved! Thanks.

Comment 11 Justin M. Forbes 2021-09-29 13:06:23 UTC
*** Bug 2008844 has been marked as a duplicate of this bug. ***

Comment 12 Nicolas Berrehouc 2021-09-29 18:12:32 UTC
Sorry it’s my fault, I confused the -200 and -300 versions of kernel 5.14.8 when I downloaded the packages.

It seems to be fine with special 5.14.8-200.fc34.x86_64 from jforbes. Scheduler is [bfq]. 5 hours uptime without freeze.

Comment 13 Raphael Groner 2021-09-30 13:58:22 UTC
Another victim found.

Comment 14 Geraldo Simião 2021-09-30 17:32:01 UTC
Here for me the problem is for Kernel: 5.14.8-300.fc35.x86_64 on F35 KDE (upgraded from a 34 install) with BTRFS.

Today I experienced a crash when a KVM guest was running on my Aspire V3-571 V2.11 Intel i7-3632QM (8) @ 3.200GHz. 
It was running fine for more than three hours, before I started the VM and then, after some 20 minutes testing something on the guest machine the host OS hard locks up requiring me to hold the power button down. I can't get anything from dmesg, or from journalctl.
I don't have secure boot enabled: mokutil --sb-state =>SecureBoot disabled

Comment 15 Fedora Update System 2021-09-30 21:45:16 UTC
FEDORA-2021-07f46cd951 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951

Comment 16 Fedora Update System 2021-09-30 21:46:19 UTC
FEDORA-2021-884d245ef8 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8

Comment 17 Fedora Update System 2021-09-30 21:46:22 UTC
FEDORA-2021-e0d6215753 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753

Comment 18 Fedora Update System 2021-10-01 01:39:57 UTC
FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-07f46cd951`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2021-10-01 02:20:11 UTC
FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-884d245ef8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2021-10-01 02:21:47 UTC
FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-e0d6215753`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 sammy 2021-10-01 04:24:51 UTC
Yesterday, I experienced slow internet and as a consequence would not authenticate fedora id to upload relval results. I had to shutdown and reboot my system to get good internet speed. This issue might be attributed to new kernels of pre release f35.

Comment 22 Sammy 2021-10-01 13:28:47 UTC
By the way....there seems to be 2 sammy's, the original bug opener and the one from comment #21. Please be aware.

Comment 23 Justin M. Forbes 2021-10-01 16:47:37 UTC
Comment #21 is not related to this bug.

Comment 24 Geraldo Simião 2021-10-01 17:26:32 UTC
Justin, it seems that with kernel 5.14.9 (https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951) the problems are gone.
All working fine here, since yesterday, running VMs, doing tests, working as usual and no freezes. Suspend is working fine too.

Here's my setup:

Operating System: Fedora Linux 35
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.2
Kernel Version: 5.14.9-300.fc35.x86_64 (64-bit)
Graphics Platform: X11
Processors: 8 × Intel® Core™ i7-3632QM CPU @ 2.20GHz
Memory: 15.4 GiB of RAM
Graphics Processor: Mesa Intel® HD Graphics 4000

No more bug here for me.

Comment 25 Nicolas Berrehouc 2021-10-01 19:24:39 UTC
Indeed, no more freezes with kernel-5.14.9-300.fc35.x86_64. Good job!

Comment 26 Fedora Update System 2021-10-02 01:28:23 UTC
FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 27 Fedora Update System 2021-10-02 01:30:47 UTC
FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Fedora Update System 2021-10-03 01:06:04 UTC
FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 29 Jason 2021-11-03 05:50:41 UTC
I am experiencing this exact issue on 5.14.14-300.fc35.x86_64.

System Info:

Operating System: Fedora Linux 35
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.87.0
Qt Version: 5.15.2
Kernel Version: 5.14.14-300.fc35.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 8 × 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz
Memory: 31.1 GiB of RAM
Graphics Processor: Mesa Intel® Xe Graphics

Comment 30 essin 2021-11-23 00:23:23 UTC
I have what may be the same problem with Linux fedora 5.14.18-300.fc35.x86_64

This is the scenario:
1 - Wake system from suspend
2 - dnf update
3 - apply updates - the problem happens when the kernel or sometimes other modules are replaced.
    update never prompts to reboot (goor or bad?)
4 - Then attempt some other operation such as:
        open Firefox,
        run df,
        etc...
5 - System freezes requiring force shutdown and reboot.

This has been happening for about the past month

This is the output of lshw -short:
H/W path                   Device      Class          Description
=================================================================
                                       system         System Product Name (SKU)
/0                                     bus            PRIME Z590-A
/0/0                                   memory         64KiB BIOS
/0/4d                                  memory         64GiB System Memory
/0/4d/0                                memory         [empty]
/0/4d/1                                memory         32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n
/0/4d/2                                memory         [empty]
/0/4d/3                                memory         32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n
/0/5e                                  memory         640KiB L1 cache
/0/5f                                  memory         2560KiB L2 cache
/0/60                                  memory         20MiB L3 cache
/0/61                                  processor      Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz
/0/100                                 bridge         Intel Corporation
/0/100/2                   /dev/fb0    display        CometLake-S GT2 [UHD Graphics 630]
/0/100/14                              bus            Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Cont
/0/100/14/0                usb1        bus            xHCI Host Controller
/0/100/14/0/2                          bus            USB2.0 Hub
/0/100/14/0/2/3            scsi6       storage        USB 2.0 FD
/0/100/14/0/2/3/0.0.0      /dev/sde    disk           16GB USB 2.0 FD
/0/100/14/0/2/3/0.0.0/0    /dev/sde    disk           16GB 
/0/100/14/0/2/3/0.0.0/0/2  /dev/sde2   volume         15EiB Windows FAT volume
/0/100/14/0/4                          input          AURA LED Controller
/0/100/14/0/7                          bus            4-Port USB 2.1 Hub
/0/100/14/0/7/4                        bus            4-Port USB 2.1 Hub
/0/100/14/0/7/4/4                      bus            4-Port USB 2.1 Hub
/0/100/14/0/8                          bus            USB2.0 Hub
/0/100/14/0/8/1                        bus            USB2.0 Hub
/0/100/14/0/c                          bus            USB2.0 Hub
/0/100/14/0/c/1                        input          Yubico Yubikey II
/0/100/14/0/c/2                        input          USB Optical Mouse
/0/100/14/0/c/4                        input          Das Keyboard
/0/100/14/1                usb2        bus            xHCI Host Controller
/0/100/14/1/6                          bus            4-Port USB 3.1 Hub
/0/100/14/1/6/4                        bus            4-Port USB 3.1 Hub
/0/100/14/1/6/4/4                      bus            4-Port USB 3.1 Hub
/0/100/14/1/7                          bus            USB3.1 Hub
/0/100/14/1/7/1                        bus            USB-C Dual Drive Dock
/0/100/14/1/7/1/1          scsi8       storage        Dual Drive Dock 2
/0/100/14/1/7/1/1/0.0.0    /dev/sdd    disk           750GB Drive Dock 2
/0/100/14/1/7/1/1/0.0.0/1  /dev/sdd1   volume         698GiB EXT4 volume
/0/100/14/1/7/2            scsi7       storage        Mobius Pro 2C
/0/100/14/1/7/2/0.0.0      /dev/sdb    disk           3TB Pro 2C Disk 1
/0/100/14/1/7/2/0.0.0/1    /dev/sdb1   volume         2794GiB EXT4 volume
/0/100/14/1/7/2/0.0.1      /dev/sdc    disk           4TB Pro 2C Disk 2
/0/100/14/1/7/2/0.0.1/1    /dev/sdc1   volume         3726GiB EXT4 volume
/0/100/14.2                            memory         RAM memory
/0/100/15                              bus            Tiger Lake-H Serial IO I2C Controller #0
/0/100/15.1                            bus            Intel Corporation
/0/100/16                              communication  Tiger Lake-H Management Engine Interface
/0/100/17                  scsi0       storage        Intel Corporation
/0/100/17/0                /dev/sda    disk           1TB Samsung SSD 860
/0/100/17/0/1              /dev/sda1   volume         99MiB Windows FAT volume
/0/100/17/0/2              /dev/sda2   volume         15MiB reserved partition
/0/100/17/0/3              /dev/sda3   volume         149GiB Windows NTFS volume
/0/100/17/0/5              /dev/sda5   volume         781GiB EXT4 volume
/0/100/17/1                /dev/cdrom  disk           DVD+-RW DVD8881
/0/100/1b                              bridge         Intel Corporation
/0/100/1b.2                            bridge         Intel Corporation
/0/100/1b.2/0              enp2s0      network        Ethernet Controller I225-V
/0/100/1c                              bridge         Intel Corporation
/0/100/1d                              bridge         Tiger Lake-H PCI Express Root Port #9
/0/100/1f                              bridge         Intel Corporation
/0/100/1f.3                            multimedia     Intel Corporation
/0/100/1f.4                            bus            Tiger Lake-H SMBus Controller
/0/100/1f.5                            bus            Tiger Lake-H SPI Controller

I have another system with the same os and app software but with a Ryzen 5 5600G processor. That configuration has yet to exhibit this behavior.

What does anyone suppose is causing this? Video? or something else?