Bug 2008529 - System freeze with kernel 5.14.x
Summary: System freeze with kernel 5.14.x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2008844 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-28 13:14 UTC by Sammy
Modified: 2021-11-23 00:23 UTC (History)
28 users (show)

Fixed In Version: kernel-5.14.9-200.fc34 kernel-5.14.9-300.fc35 kernel-5.14.9-100.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-02 01:28:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl -b (51.83 KB, application/gzip)
2021-09-28 13:14 UTC, Sammy
no flags Details
lspci (1.84 KB, application/gzip)
2021-09-28 13:16 UTC, Sammy
no flags Details

Description Sammy 2021-09-28 13:14:44 UTC
Created attachment 1827002 [details]
journalctl -b

I am having complete system freeze with kernels 5.14.x on FC34. The system runs fine on 5.13.x kernels.

The progression is weird. After rebooting I can work for a long time on the system but few hours after leaving the system the freeze starts and goes into full hard freeze requring a reboot. The unusual thing is that the freeze is somewhat gradual, first I notice a major slowdown of http and ssh connections, then a full stop while ping responds for a while than complete silence. Coming back to the office the workstation requires a power off/on (power is still on) to restart.
It almost resembles a sleep/hibernation...however this does not happen with 5.13 kernels.

This happens if I leave the system logged in or logged out. Running sddm.

I have done a full system diagnostics and found no problems. The system is DELL PRECISION 7920 (bought this year) with the latest BIOS.

I have seen in some posts that using pcie_aspm=off in the kernel line. Trying this today but this is my server system so I don't want to leave it idle very long. Running sshd, httpd, and postfix servers on this.

I am attaching journalctl -b and lspci outputs (logs show nothing when system freezes).

Comment 1 Sammy 2021-09-28 13:16:38 UTC
Created attachment 1827003 [details]
lspci

Comment 2 Sammy 2021-09-28 15:33:13 UTC
Looks like this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=214503

Comment 3 Nicolas Berrehouc 2021-09-28 18:34:03 UTC
Indeed, for me same problem only with kernel 5.14.7 on F34 and F35.
I will try https://bugzilla.kernel.org/show_bug.cgi?id=214503#c14 .

Comment 4 Justin M. Forbes 2021-09-28 19:10:38 UTC
I just built https://koji.fedoraproject.org/koji/taskinfo?taskID=76424019 with the revert from that bug. Want to give it a spin and see if that solves your issue? It is a scratch build, so not secure boot signed.  Trying to get a reasonable solution before I push a new build.

Comment 5 Sammy 2021-09-28 19:52:34 UTC
Will do and report....I had been using the koji test kernels up to 5.14.5-300 for fc34 without the problem, the next release kernel was 5.14.7-200 that started the problem. Considering that the patch was applied in 5.14.6 it fits the profile too.

Comment 6 Sammy 2021-09-28 19:59:58 UTC
By the way, the new kernels are not cleanly removed anymore leaving a residual file called:


modules.builtin.alias.bin

which prevents the removal of the kernel directory.

Comment 7 Sammy 2021-09-28 22:54:43 UTC
Working OK for 3 hours.....let's wait till morning here to be certain.

Comment 8 Nicolas Berrehouc 2021-09-29 05:23:55 UTC
Same problem with kernel-5.14.8-300.fc35.x86_64.

# cat /sys/block/*/queue/scheduler
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none
none

If after starting kernel-5.14.8-300.fc35.x86_64 I apply the command below then I have no more gel, it seems OK for moment.

# echo mq-deadline | tee /sys/block/*/queue/scheduler

# cat /sys/block/*/queue/scheduler
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none
none

Comment 9 Justin M. Forbes 2021-09-29 11:53:32 UTC
I am aware that 5.14.8-300.fc35 is still broken, which is why it is not in an update, I specifically asked about the linked 5.14.8-200.fc34 scratch build linked here, which is not the same as the 5.14.8 official builds in koji.

Comment 10 Sammy 2021-09-29 12:25:45 UTC
Yes, I know. Using your unofficial built the system is stable almost 17 hours now....the problem seems to be resolved! Thanks.

Comment 11 Justin M. Forbes 2021-09-29 13:06:23 UTC
*** Bug 2008844 has been marked as a duplicate of this bug. ***

Comment 12 Nicolas Berrehouc 2021-09-29 18:12:32 UTC
Sorry it’s my fault, I confused the -200 and -300 versions of kernel 5.14.8 when I downloaded the packages.

It seems to be fine with special 5.14.8-200.fc34.x86_64 from jforbes. Scheduler is [bfq]. 5 hours uptime without freeze.

Comment 13 Raphael Groner 2021-09-30 13:58:22 UTC
Another victim found.

Comment 14 Geraldo Simião 2021-09-30 17:32:01 UTC
Here for me the problem is for Kernel: 5.14.8-300.fc35.x86_64 on F35 KDE (upgraded from a 34 install) with BTRFS.

Today I experienced a crash when a KVM guest was running on my Aspire V3-571 V2.11 Intel i7-3632QM (8) @ 3.200GHz. 
It was running fine for more than three hours, before I started the VM and then, after some 20 minutes testing something on the guest machine the host OS hard locks up requiring me to hold the power button down. I can't get anything from dmesg, or from journalctl.
I don't have secure boot enabled: mokutil --sb-state =>SecureBoot disabled

Comment 15 Fedora Update System 2021-09-30 21:45:16 UTC
FEDORA-2021-07f46cd951 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951

Comment 16 Fedora Update System 2021-09-30 21:46:19 UTC
FEDORA-2021-884d245ef8 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8

Comment 17 Fedora Update System 2021-09-30 21:46:22 UTC
FEDORA-2021-e0d6215753 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753

Comment 18 Fedora Update System 2021-10-01 01:39:57 UTC
FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-07f46cd951`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2021-10-01 02:20:11 UTC
FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-884d245ef8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-884d245ef8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2021-10-01 02:21:47 UTC
FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-e0d6215753`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0d6215753

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 sammy 2021-10-01 04:24:51 UTC
Yesterday, I experienced slow internet and as a consequence would not authenticate fedora id to upload relval results. I had to shutdown and reboot my system to get good internet speed. This issue might be attributed to new kernels of pre release f35.

Comment 22 Sammy 2021-10-01 13:28:47 UTC
By the way....there seems to be 2 sammy's, the original bug opener and the one from comment #21. Please be aware.

Comment 23 Justin M. Forbes 2021-10-01 16:47:37 UTC
Comment #21 is not related to this bug.

Comment 24 Geraldo Simião 2021-10-01 17:26:32 UTC
Justin, it seems that with kernel 5.14.9 (https://bodhi.fedoraproject.org/updates/FEDORA-2021-07f46cd951) the problems are gone.
All working fine here, since yesterday, running VMs, doing tests, working as usual and no freezes. Suspend is working fine too.

Here's my setup:

Operating System: Fedora Linux 35
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.2
Kernel Version: 5.14.9-300.fc35.x86_64 (64-bit)
Graphics Platform: X11
Processors: 8 × Intel® Core™ i7-3632QM CPU @ 2.20GHz
Memory: 15.4 GiB of RAM
Graphics Processor: Mesa Intel® HD Graphics 4000

No more bug here for me.

Comment 25 Nicolas Berrehouc 2021-10-01 19:24:39 UTC
Indeed, no more freezes with kernel-5.14.9-300.fc35.x86_64. Good job!

Comment 26 Fedora Update System 2021-10-02 01:28:23 UTC
FEDORA-2021-884d245ef8 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 27 Fedora Update System 2021-10-02 01:30:47 UTC
FEDORA-2021-07f46cd951 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Fedora Update System 2021-10-03 01:06:04 UTC
FEDORA-2021-e0d6215753 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 29 Jason 2021-11-03 05:50:41 UTC
I am experiencing this exact issue on 5.14.14-300.fc35.x86_64.

System Info:

Operating System: Fedora Linux 35
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.87.0
Qt Version: 5.15.2
Kernel Version: 5.14.14-300.fc35.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 8 × 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz
Memory: 31.1 GiB of RAM
Graphics Processor: Mesa Intel® Xe Graphics

Comment 30 essin 2021-11-23 00:23:23 UTC
I have what may be the same problem with Linux fedora 5.14.18-300.fc35.x86_64

This is the scenario:
1 - Wake system from suspend
2 - dnf update
3 - apply updates - the problem happens when the kernel or sometimes other modules are replaced.
    update never prompts to reboot (goor or bad?)
4 - Then attempt some other operation such as:
        open Firefox,
        run df,
        etc...
5 - System freezes requiring force shutdown and reboot.

This has been happening for about the past month

This is the output of lshw -short:
H/W path                   Device      Class          Description
=================================================================
                                       system         System Product Name (SKU)
/0                                     bus            PRIME Z590-A
/0/0                                   memory         64KiB BIOS
/0/4d                                  memory         64GiB System Memory
/0/4d/0                                memory         [empty]
/0/4d/1                                memory         32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n
/0/4d/2                                memory         [empty]
/0/4d/3                                memory         32GiB DIMM DDR4 Synchronous 2133 MHz (0.5 n
/0/5e                                  memory         640KiB L1 cache
/0/5f                                  memory         2560KiB L2 cache
/0/60                                  memory         20MiB L3 cache
/0/61                                  processor      Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz
/0/100                                 bridge         Intel Corporation
/0/100/2                   /dev/fb0    display        CometLake-S GT2 [UHD Graphics 630]
/0/100/14                              bus            Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Cont
/0/100/14/0                usb1        bus            xHCI Host Controller
/0/100/14/0/2                          bus            USB2.0 Hub
/0/100/14/0/2/3            scsi6       storage        USB 2.0 FD
/0/100/14/0/2/3/0.0.0      /dev/sde    disk           16GB USB 2.0 FD
/0/100/14/0/2/3/0.0.0/0    /dev/sde    disk           16GB 
/0/100/14/0/2/3/0.0.0/0/2  /dev/sde2   volume         15EiB Windows FAT volume
/0/100/14/0/4                          input          AURA LED Controller
/0/100/14/0/7                          bus            4-Port USB 2.1 Hub
/0/100/14/0/7/4                        bus            4-Port USB 2.1 Hub
/0/100/14/0/7/4/4                      bus            4-Port USB 2.1 Hub
/0/100/14/0/8                          bus            USB2.0 Hub
/0/100/14/0/8/1                        bus            USB2.0 Hub
/0/100/14/0/c                          bus            USB2.0 Hub
/0/100/14/0/c/1                        input          Yubico Yubikey II
/0/100/14/0/c/2                        input          USB Optical Mouse
/0/100/14/0/c/4                        input          Das Keyboard
/0/100/14/1                usb2        bus            xHCI Host Controller
/0/100/14/1/6                          bus            4-Port USB 3.1 Hub
/0/100/14/1/6/4                        bus            4-Port USB 3.1 Hub
/0/100/14/1/6/4/4                      bus            4-Port USB 3.1 Hub
/0/100/14/1/7                          bus            USB3.1 Hub
/0/100/14/1/7/1                        bus            USB-C Dual Drive Dock
/0/100/14/1/7/1/1          scsi8       storage        Dual Drive Dock 2
/0/100/14/1/7/1/1/0.0.0    /dev/sdd    disk           750GB Drive Dock 2
/0/100/14/1/7/1/1/0.0.0/1  /dev/sdd1   volume         698GiB EXT4 volume
/0/100/14/1/7/2            scsi7       storage        Mobius Pro 2C
/0/100/14/1/7/2/0.0.0      /dev/sdb    disk           3TB Pro 2C Disk 1
/0/100/14/1/7/2/0.0.0/1    /dev/sdb1   volume         2794GiB EXT4 volume
/0/100/14/1/7/2/0.0.1      /dev/sdc    disk           4TB Pro 2C Disk 2
/0/100/14/1/7/2/0.0.1/1    /dev/sdc1   volume         3726GiB EXT4 volume
/0/100/14.2                            memory         RAM memory
/0/100/15                              bus            Tiger Lake-H Serial IO I2C Controller #0
/0/100/15.1                            bus            Intel Corporation
/0/100/16                              communication  Tiger Lake-H Management Engine Interface
/0/100/17                  scsi0       storage        Intel Corporation
/0/100/17/0                /dev/sda    disk           1TB Samsung SSD 860
/0/100/17/0/1              /dev/sda1   volume         99MiB Windows FAT volume
/0/100/17/0/2              /dev/sda2   volume         15MiB reserved partition
/0/100/17/0/3              /dev/sda3   volume         149GiB Windows NTFS volume
/0/100/17/0/5              /dev/sda5   volume         781GiB EXT4 volume
/0/100/17/1                /dev/cdrom  disk           DVD+-RW DVD8881
/0/100/1b                              bridge         Intel Corporation
/0/100/1b.2                            bridge         Intel Corporation
/0/100/1b.2/0              enp2s0      network        Ethernet Controller I225-V
/0/100/1c                              bridge         Intel Corporation
/0/100/1d                              bridge         Tiger Lake-H PCI Express Root Port #9
/0/100/1f                              bridge         Intel Corporation
/0/100/1f.3                            multimedia     Intel Corporation
/0/100/1f.4                            bus            Tiger Lake-H SMBus Controller
/0/100/1f.5                            bus            Tiger Lake-H SPI Controller

I have another system with the same os and app software but with a Ryzen 5 5600G processor. That configuration has yet to exhibit this behavior.

What does anyone suppose is causing this? Video? or something else?


Note You need to log in before you can comment on or make changes to this bug.