Bug 1154286 - ernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-arm:13689]
Summary: ernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-arm:...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-18 11:51 UTC by Paulo Andrade
Modified: 2016-07-19 12:14 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-19 12:14:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel log from journald (22.35 KB, text/plain)
2016-04-25 15:09 UTC, zimon
no flags Details

Description Paulo Andrade 2014-10-18 11:51:45 UTC
$ uname -a
Linux localhost.localdomain 3.18.0-0.rc0.git8.1.fc22.x86_64 #1 SMP Tue Oct 14 15:02:02 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I started having this issue when loading qemu to test
some arm builds.

Not all the time, but too much frequently I see a lot of
messages like this:

 kernel:[  788.977247] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-arm:13689]

Message from syslogd@localhost at Oct 18 08:41:41 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-arm:13689]

Message from syslogd@localhost at Oct 18 08:42:09 ...
 kernel:[  817.005255] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-arm:13689]

Message from syslogd@localhost at Oct 18 08:42:09 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-arm:13689]

Message from syslogd@localhost at Oct 18 08:42:37 ...
 kernel:[  845.033266] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-arm:13689]


The abrt window says:

A kernel problem occurred, but your kernel has been tainted (flags:GL). Kernel maintainers are unable to diagnose tainted reports.

$ dmesg | grep -i taint
[  724.913309] CPU: 0 PID: 13689 Comm: qemu-system-arm Not tainted 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  752.941348] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  788.977332] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  817.005340] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  845.033349] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  873.061400] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1
[  901.089408] CPU: 0 PID: 13689 Comm: qemu-system-arm Tainted: G             L 3.18.0-0.rc0.git8.1.fc22.x86_64 #1

Comment 1 Laurent Rineau 2015-02-16 11:06:56 UTC
I have seen the same bug on Fedora 21, x86_64:

Message from syslogd@warhol at Feb 16 12:03:07 ...
 kernel:[ 8924.995787] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [rtkit-daemon:795]

The machine was locked. We had to hard-reboot it.

[lrineau@warhol ~]$ uname -a
Linux warhol.geometryfactory.com 3.18.6-200.fc21.x86_64 #1 SMP Fri Feb 6 22:59:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Jaroslav Reznik 2015-03-03 16:22:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 3 Bugzy 2015-08-20 17:17:47 UTC
I am experiencing the same issues on both kernel 4.1.4 and 4.1.5 on Fedora 22, except in my case, it seems that this bug may be preventing my machine from booting up. 

After an update via dnf to kernel 4.1.4, attempting to boot into the new kernel hangs at "Starting Network Manager Wait Online..." 
followed by

[  68.060000] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  74.842220] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 3, t=60003 jiffies, g=1457, c=1456, q=0)
[  96.044388] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  124.028767] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  152.013129] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  179.997531] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  207.981890] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
[  235.966285] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0]
~
[  1154.378467] rcu_sched kthread starved for 2295 jiffies!

And laptop continues to loop through the messages but boot-up does not complete. I gad to revert to kernel 4.1.3 to be able to boot normally again.

I have an ASUS TX201LA laptop with an Intel Core i5 4200U Processor 

Full specs on the laptop are available at http://www.linlap.com/asus_transformer_book_trio_tx201la

Comment 4 GMS 2015-09-27 15:54:19 UTC
I have similar problems to Bugzy. On a new machine (ASUS VivoPC VM40B with
2-core Intel Celeron) I installed Fedora 22 KDE spin. After a couple of updates
I have kernels 4.0.4-301, 4.1.6-201 and 4.1.7-200. Both 4.1 kernels hang in boot
with repeated "NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!" messages.
Now in NetManager, before the last update with (I think) iptables. Only the
4.0 kernel works OK.

The priority for me is "bloody high" otherwise I will have to juggle the next
kernel updates.

BTW Bugzy, how do you keep the boot.log messages from a failed boot? Mine
disappear in a black hole.

Comment 5 GMS 2015-09-30 21:11:16 UTC
(In reply to GMS from comment #4)
> I have similar problems to Bugzy. On a new machine (ASUS VivoPC VM40B with
> 2-core Intel Celeron) ...

Right. After disabling the WiFi (I will be using it only on Ethernet)
the machine boots just fine.

> BTW Bugzy, how do you keep the boot.log messages from a failed boot? Mine
> disappear in a black hole.

And yes, I know how you did it - from an alternative console. Am getting slow...

Comment 6 Bugzy 2015-10-02 04:05:01 UTC
(In reply to GMS from comment #5)

> Right. After disabling the WiFi (I will be using it only on Ethernet)
> the machine boots just fine.

This is good to know. Out of curiosity, can you let me know what wireless card/chipset your device uses, and what module is loaded in the kernel for it to function? I am wondering if the problem is a result of patches to the rtl-wifi drivers in the kernel. If so it should be easy to track down given that we can diff the driver at issue between kernel 4.1.3 and 4.1.4.

Comment 7 Bugzy 2015-10-02 04:26:44 UTC
The only change between the two kernels 4.1.3 and 4.1.4 as far as the rtlwifi drivers are concerned is "rtlwifi: Remove the clear interrupt routine from all drivers" by Vincent Fann [1], the patch one prior to that, which still works/ed in kernel 4.1.3 was intended to fix a deadlock. In any case, the patch set of interest to investigate if GMS wireless card uses the rtl-wifi is [1]. Hopefully, this should be enoug information to get the Fedora kernel folks involved in trying to sort out this issue.


[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/khilman/linux-stable/+/33e1432c291b72339b03ea3450d7840cdba1dbe1

Comment 8 Justin M. Forbes 2015-10-20 19:30:20 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 9 Bugzy 2015-10-21 05:52:23 UTC
@Justin

The problem still exists in the 4.2.3 series kernel

Comment 10 Bugzy 2015-11-05 18:25:59 UTC
Kernel 4.2.5 has the same problem.

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-udevd:608]
INFO: rcu_sched self-detected stall on CPU { 0} (t=60000 jiffies, g=1026, c=1025, q=0)

Comment 11 Bugzy 2015-11-23 20:01:16 UTC
Kernel 4.2.6 has the same problem.

Comment 12 Bugzy 2015-11-23 20:10:12 UTC
note as Justin mentioned in comment #4 above, disabling the wireless on the system allows for complete kernel boot. However, once the wireless device is re-enabled upon boot-up, the system hangs. I think this pretty much narrows the bug to the set of changes linked above in comment #7. Is there a way to get that driver recompiled without those changes (Note: I am closer to a power user than a dev of any sort. Any help in how to do what I have described above will be appreciated.)

Comment 13 Bugzy 2015-11-23 20:15:08 UTC
It looks like the bug was caught in kernel 4.3. looking at the commit logs on the rtlwifi driver [1] suggests that it is fixed there. Now to get a 4.3 kernel for fedora 22

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c?id=54328e64047a54b8fc2362c2e1f0fa16c90f739f

Comment 14 James 2016-02-27 19:50:23 UTC
I can confirm this bug still exists in Fedora 23. Kernel 4.3.5-300.fc23.x86_64

It's a complete show stopper trying to install a new system. Wifi had to be disabled in the bios to complete and install.

ASUS VivoPC VM42-S075V

Comment 15 Bugzy 2016-02-28 05:41:29 UTC
Hi James,
    The fix in Kernel 4.3 did not fix the problem. However, the one in kernel 4.4 did. This bug report [1] contains a link to a fedora built kernel 4.3 with a back-ported patch from 4.4 that fixes the issue, and this other bug report [2] contains a work around to get you up and running with wireless on a 4.3 kernel (i.e adding pci=nomsi on grub command line when booting kernels 4.3.x). I guess these are the only measures yo can take until kernel 4.4.x

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1297554
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1269335

Comment 16 James 2016-03-08 02:03:29 UTC
Thanks Bugzy, huge help!

Unfortunately the pci=nomsi crashes the system, even with wifi disabled. It locks up just after GDM loads. I haven't had time to investigate this one.

Also the kernel build has expired. I haven't had a chance to patch/rebuild my own kernel yet, either.

Comment 17 Bugzy 2016-03-08 02:31:46 UTC
Well not all hope is lost as the 4.3.6-201 kernel for fedora 22 fixes the issue. Also, kernel 4.4.3 is now available for Fedora 22 which also has the fix

Comment 18 James 2016-03-11 09:28:06 UTC
Awesome, it's fixed and working now after an update to 4.4.3!

Comment 19 zimon 2016-04-25 15:09:40 UTC
Created attachment 1150531 [details]
kernel log from journald

I started to get these last Friday with kernel 4.4.6-301.fc23.x86_64. The machine has now had to hard-reset for 5 times already. Kernels 4.4.7 and 4.4.8 did not help. I checked the CPU temperatures (sensors.log), and they have been normal these couple of days. Never have noticed these before with older 4.3 kernels, or not with kernel 4.4.3 either, which I used before 4.4.6  (Seen with lastlog(8))

With kernels 4.4.7 and 4.4.8, I don't see those "BUG: soft lockup - CPU#*" messages from journal, but the system still hangs suddenly and nothing helps but pushing reset-button or switching off and on the power button.

Does broken CPU show up like this or is this always a software bug? The i7-2600 CPU is 4 years old already and the PC is on 24/7, but under rather light load always.

The log shows "Reboots" and those "soft lockup"s. Notice, with kernel 4.4.7 and 4.4.8 journald does not either get those kernel messages, or then just fails to write them to disk.

Comment 20 Fedora End Of Life 2016-07-19 12:14:30 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.