Bug 1154286
Summary: | ernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-arm:13689] | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paulo Andrade <paulo.cesar.pereira.de.andrade> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 22 | CC: | a9016009, gansalmon, george.sigut, igeorgex, itamar, jamessubmitsbugs, jonathan, kernel-maint, laurent.rineau__fedora, madhu.chinakonda, mchehab, thebenj88, w90p710, zimon | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-07-19 12:14:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Paulo Andrade
2014-10-18 11:51:45 UTC
I have seen the same bug on Fedora 21, x86_64: Message from syslogd@warhol at Feb 16 12:03:07 ... kernel:[ 8924.995787] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [rtkit-daemon:795] The machine was locked. We had to hard-reboot it. [lrineau@warhol ~]$ uname -a Linux warhol.geometryfactory.com 3.18.6-200.fc21.x86_64 #1 SMP Fri Feb 6 22:59:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22 I am experiencing the same issues on both kernel 4.1.4 and 4.1.5 on Fedora 22, except in my case, it seems that this bug may be preventing my machine from booting up. After an update via dnf to kernel 4.1.4, attempting to boot into the new kernel hangs at "Starting Network Manager Wait Online..." followed by [ 68.060000] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 74.842220] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 3, t=60003 jiffies, g=1457, c=1456, q=0) [ 96.044388] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 124.028767] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 152.013129] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 179.997531] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 207.981890] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] [ 235.966285] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swaper/0:0] ~ [ 1154.378467] rcu_sched kthread starved for 2295 jiffies! And laptop continues to loop through the messages but boot-up does not complete. I gad to revert to kernel 4.1.3 to be able to boot normally again. I have an ASUS TX201LA laptop with an Intel Core i5 4200U Processor Full specs on the laptop are available at http://www.linlap.com/asus_transformer_book_trio_tx201la I have similar problems to Bugzy. On a new machine (ASUS VivoPC VM40B with 2-core Intel Celeron) I installed Fedora 22 KDE spin. After a couple of updates I have kernels 4.0.4-301, 4.1.6-201 and 4.1.7-200. Both 4.1 kernels hang in boot with repeated "NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!" messages. Now in NetManager, before the last update with (I think) iptables. Only the 4.0 kernel works OK. The priority for me is "bloody high" otherwise I will have to juggle the next kernel updates. BTW Bugzy, how do you keep the boot.log messages from a failed boot? Mine disappear in a black hole. (In reply to GMS from comment #4) > I have similar problems to Bugzy. On a new machine (ASUS VivoPC VM40B with > 2-core Intel Celeron) ... Right. After disabling the WiFi (I will be using it only on Ethernet) the machine boots just fine. > BTW Bugzy, how do you keep the boot.log messages from a failed boot? Mine > disappear in a black hole. And yes, I know how you did it - from an alternative console. Am getting slow... (In reply to GMS from comment #5) > Right. After disabling the WiFi (I will be using it only on Ethernet) > the machine boots just fine. This is good to know. Out of curiosity, can you let me know what wireless card/chipset your device uses, and what module is loaded in the kernel for it to function? I am wondering if the problem is a result of patches to the rtl-wifi drivers in the kernel. If so it should be easy to track down given that we can diff the driver at issue between kernel 4.1.3 and 4.1.4. The only change between the two kernels 4.1.3 and 4.1.4 as far as the rtlwifi drivers are concerned is "rtlwifi: Remove the clear interrupt routine from all drivers" by Vincent Fann [1], the patch one prior to that, which still works/ed in kernel 4.1.3 was intended to fix a deadlock. In any case, the patch set of interest to investigate if GMS wireless card uses the rtl-wifi is [1]. Hopefully, this should be enoug information to get the Fedora kernel folks involved in trying to sort out this issue. [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/khilman/linux-stable/+/33e1432c291b72339b03ea3450d7840cdba1dbe1 *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs. Fedora 22 has now been rebased to 4.2.3-200.fc22. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23. If you experience different issues, please open a new bug report for those. @Justin The problem still exists in the 4.2.3 series kernel Kernel 4.2.5 has the same problem. NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-udevd:608] INFO: rcu_sched self-detected stall on CPU { 0} (t=60000 jiffies, g=1026, c=1025, q=0) Kernel 4.2.6 has the same problem. note as Justin mentioned in comment #4 above, disabling the wireless on the system allows for complete kernel boot. However, once the wireless device is re-enabled upon boot-up, the system hangs. I think this pretty much narrows the bug to the set of changes linked above in comment #7. Is there a way to get that driver recompiled without those changes (Note: I am closer to a power user than a dev of any sort. Any help in how to do what I have described above will be appreciated.) It looks like the bug was caught in kernel 4.3. looking at the commit logs on the rtlwifi driver [1] suggests that it is fixed there. Now to get a 4.3 kernel for fedora 22 [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c?id=54328e64047a54b8fc2362c2e1f0fa16c90f739f I can confirm this bug still exists in Fedora 23. Kernel 4.3.5-300.fc23.x86_64 It's a complete show stopper trying to install a new system. Wifi had to be disabled in the bios to complete and install. ASUS VivoPC VM42-S075V Hi James, The fix in Kernel 4.3 did not fix the problem. However, the one in kernel 4.4 did. This bug report [1] contains a link to a fedora built kernel 4.3 with a back-ported patch from 4.4 that fixes the issue, and this other bug report [2] contains a work around to get you up and running with wireless on a 4.3 kernel (i.e adding pci=nomsi on grub command line when booting kernels 4.3.x). I guess these are the only measures yo can take until kernel 4.4.x [1] https://bugzilla.redhat.com/show_bug.cgi?id=1297554 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1269335 Thanks Bugzy, huge help! Unfortunately the pci=nomsi crashes the system, even with wifi disabled. It locks up just after GDM loads. I haven't had time to investigate this one. Also the kernel build has expired. I haven't had a chance to patch/rebuild my own kernel yet, either. Well not all hope is lost as the 4.3.6-201 kernel for fedora 22 fixes the issue. Also, kernel 4.4.3 is now available for Fedora 22 which also has the fix Awesome, it's fixed and working now after an update to 4.4.3! Created attachment 1150531 [details]
kernel log from journald
I started to get these last Friday with kernel 4.4.6-301.fc23.x86_64. The machine has now had to hard-reset for 5 times already. Kernels 4.4.7 and 4.4.8 did not help. I checked the CPU temperatures (sensors.log), and they have been normal these couple of days. Never have noticed these before with older 4.3 kernels, or not with kernel 4.4.3 either, which I used before 4.4.6 (Seen with lastlog(8))
With kernels 4.4.7 and 4.4.8, I don't see those "BUG: soft lockup - CPU#*" messages from journal, but the system still hangs suddenly and nothing helps but pushing reset-button or switching off and on the power button.
Does broken CPU show up like this or is this always a software bug? The i7-2600 CPU is 4 years old already and the PC is on 24/7, but under rather light load always.
The log shows "Reboots" and those "soft lockup"s. Notice, with kernel 4.4.7 and 4.4.8 journald does not either get those kernel messages, or then just fails to write them to disk.
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |