Bug 1943866 - watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:15]
Summary: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:15]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-28 00:07 UTC by Andrew Price
Modified: 2021-05-24 01:34 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-22 17:49:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kernel messages (126.93 KB, text/plain)
2021-03-28 00:07 UTC, Andrew Price
no flags Details
softlockup trace from journalctl (54.41 KB, text/plain)
2021-05-13 04:08 UTC, John Apple II
no flags Details

Description Andrew Price 2021-03-28 00:07:03 UTC
Created attachment 1767004 [details]
Kernel messages

1. Please describe the problem:

Booting Fedora 34 beta on a Lenovo Thinkpad T590 pauses for a while and then several complaints about soft lockups on CPUs appear.

2. What is the Version-Release number of the kernel:

kernel-5.11.10-300.fc34.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

It's occurred on the three Fedora 34 kernels I've used since upgrading to it:

kernel-5.11.9-200.fc33.x86_64
kernel-5.11.9-300.fc34.x86_64
kernel-5.11.10-300.fc34.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Happens on every boot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

(I will try this after submitting the bug report.)

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Done.

BTW there's also a tpm-related complaint in the dmesg. I believe that's a separate issue as it's been occurring since Fedora 33 without any noticeable impact. So that part is probably ok to ignore.

Comment 1 Andrew Price 2021-03-28 00:21:17 UTC
It seems to be fixed in the rawhide kernel (kernel-5.12.0-0.rc4.20210325gite138138003eb.177.fc35.x86_64) but back on F34 beta it reoccurs.

Comment 2 Andrew Price 2021-04-09 20:05:39 UTC
I updated my firmware today to

LENOVO 20N4CTO1WW/20N4CTO1WW, BIOS N2IET94W (1.72 ) 02/18/2021

and it still happens:

[    2.659800] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x396d4bf570c, max_idle_ns: 881590425443 ns
[   28.277447] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:15]
[   28.278447] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:18]
[   28.279457] watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [swapper/2:0]
etc.

Currently running kernel 5.11.12-300.fc34.x86_64

Comment 3 John Apple II 2021-05-13 04:08:02 UTC
Created attachment 1782605 [details]
softlockup trace from journalctl

This has my kernel traces from my T590 as well, with the softlockup bug shown.

Not sure why my kernel is listed as Tainted, as I'm not loading any modules.

Comment 4 John Apple II 2021-05-13 04:13:30 UTC
Is this possibly the same issue in bug #1912167 ?

Comment 5 Andrew Price 2021-05-13 08:52:31 UTC
(In reply to John Apple II from comment #4)
> Is this possibly the same issue in bug #1912167 ?

I don't believe so. I've had tpm issues since before the soft lockups started and after updating to the 5.12 kernel the soft lockup has gone but the tpm complaints are still there.

Comment 6 Jonathon Turel 2021-05-19 15:37:16 UTC
Also have a Lenovo T590 and I encountered this exact issue even when running on Fedora 32; earlier this year I was affected after a dnf update and ended up reverting to kernel-5.10.13 which got me going again. It made the VMs I run on my system unusable because they would stop responding for minutes at a time in addition to the watchdog warnings at system start.

I upgraded to fedora 33 yesterday and hit this again since the kernel was upgraded. Fortunately, I found kernel-5.10.13 in Koji built for f33 and downgrading to that once again fixed it. Locked it in with `dnf versionlock`. Not sure which kernel version this regressed in, but hopefully knowing that 5.10.13 was OK helps isolate.

Comment 7 Andrew Price 2021-05-22 17:49:04 UTC
5.12.5-300.fc34.x86_64 is now in updates-testing and it's working fine for me, no soft lockups nor tpm-related backtrace. If others still see problems with that kernel it would be best to file a new bug report as it'll be a different issue.

Comment 8 John Apple II 2021-05-24 01:34:03 UTC
Upgraded to the latest packages with 5.12.5-300.fc34.x86_64 and most system services fail to start.  I'm booting from the older kernel for now, but on my T590 networking won't start with the last update to 5.12.5-300 from about an hour ago.


Note You need to log in before you can comment on or make changes to this bug.