Bug 714521

Summary: [abrt] kernel: WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6(): TAINTED Warning Issued
Product: [Fedora] Fedora Reporter: staimeer <staimeer>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: dzickus, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard: abrt_hash:fccdac61a915caf898bfdb52f36793f05af502f3
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-26 20:43:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description staimeer 2011-06-19 21:29:23 UTC
abrt version: 2.0.1
architecture:   x86_64
cmdline:        ro root=UUID=6aa3d5f1-78a2-4a6a-8271-92c9c0d6c55b rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=pt_BR.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=br-abnt2 rhgb quiet
component:      kernel
kernel:         2.6.38.8-32.fc15.x86_64
kernel_tainted: 512
os_release:     Fedora release 15 (Lovelock)
package:        kernel
reason:         WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6()
reported_to:    kerneloops: URL=http://submit.kerneloops.org/submitoops.php
time:           Sun Jun 19 17:39:56 2011

backtrace:
:WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x9b/0xa6()
:Hardware name: System Product Name
:Watchdog detected hard LOCKUP on cpu 1
:Modules linked in: fuse tpm_infineon sunrpc cpufreq_ondemand powernow_k8 freq_table mperf snd_hda_codec_realtek snd_hda_intel ppdev snd_hda_codec snd_hwdep snd_seq snd_seq_device microcode edac_core r8169 snd_pcm snd_timer sp5100_tco snd edac_mce_amd soundcore i2c_piix4 snd_page_alloc k8temp mii serio_raw asus_atk0110 parport_pc parport ipv6 btrfs zlib_deflate libcrc32c ata_generic pata_acpi pata_atiixp nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
:Pid: 0, comm: kworker/0:0 Not tainted 2.6.38.8-32.fc15.x86_64 #1
:Call Trace:
: <NMI>  [<ffffffff8105511a>] warn_slowpath_common+0x83/0x9b
: [<ffffffff810551d5>] warn_slowpath_fmt+0x46/0x48
: [<ffffffff810ac2e3>] watchdog_overflow_callback+0x9b/0xa6
: [<ffffffff810d3ca1>] __perf_event_overflow+0x135/0x191
: [<ffffffff810162f2>] ? paravirt_write_msr+0xf/0x13
: [<ffffffff810d42fa>] perf_event_overflow+0x14/0x16
: [<ffffffff8101836f>] x86_pmu_handle_irq+0xaf/0xea
: [<ffffffff814772e0>] perf_event_nmi_handler+0x67/0xb3
: [<ffffffff81478f89>] notifier_call_chain+0x37/0x63
: [<ffffffff81478fe1>] atomic_notifier_call_chain+0x18/0x1a
: [<ffffffff81479011>] notify_die+0x2e/0x30
: [<ffffffff81476774>] do_nmi+0x6d/0x217
: [<ffffffff81476490>] nmi+0x20/0x30
: [<ffffffff8102a145>] ? native_safe_halt+0xb/0xd
: <<EOE>>  [<ffffffff81010d36>] default_idle+0x4e/0x86
: [<ffffffff81010e31>] c1e_idle+0xc3/0xe4
: [<ffffffff81008321>] cpu_idle+0xa5/0xdf
: [<ffffffff81464dba>] start_secondary+0x20c/0x20e

event_log:
:2011-06-19-18:29:16> Submitting oops report to http://submit.kerneloops.org/submitoops.php
:2011-06-19-18:29:17  Kernel oops report was uploaded

Comment 1 Chuck Ebbert 2011-06-24 08:31:57 UTC
*** Bug 714837 has been marked as a duplicate of this bug. ***

Comment 2 Don Zickus 2011-06-24 13:44:46 UTC
When does this WARNING happen? On bootup? Shutdown? Randomly?  Also what type of machine is this.  Is it easy to reproduce?

The stack trace doesn't make sense for some reason.  It says the kernel detected a cpu lockup while idling, which is bogus obviously.

Thanks,
Don

Comment 3 staimeer 2011-06-24 18:02:24 UTC
@Don Zickus 
The warning happens randomly. The machine and amd x2, asus mobo
And complicated to reproduce, since it happens randomly.

But apparently the problem stopped happening after the update, I believe it was the update of

Comment 4 staimeer 2011-06-24 18:08:30 UTC
@Don Zickus 
The warning happens randomly. The machine and amd x2, asus mobo
And complicated to reproduce, since it happens randomly.

But apparently the problem stopped happening after the update, I believe it was
the update of systemd

Comment 5 Don Zickus 2011-06-24 18:24:33 UTC
(In reply to comment #4)
> @Don Zickus 
> The warning happens randomly. The machine and amd x2, asus mobo
> And complicated to reproduce, since it happens randomly.

Hmm. That sucks.  If you happen to see it again, can you paste another stack trace here.  I am wondering if they are consistent, though like I said earlier it is odd that cpu_idle is causing interrupts to be disabled and spinning the cpu.

> 
> But apparently the problem stopped happening after the update, I believe it was
> the update of systemd

That is even more odd that a userspace app can create a cpu lockup like this.  I find it hard to believe, but then again I do not know all the pieces of systemd.  It could be a cgroup thing like another upstream bug I saw was dealing with.  I know systemd makes extensive use of cgroups.

Let me know if you see it again.  

Cheers,
Don

Comment 6 Chuck Ebbert 2011-06-26 08:08:36 UTC
(In reply to comment #2)
> The stack trace doesn't make sense for some reason.  It says the kernel
> detected a cpu lockup while idling, which is bogus obviously.

I wonder if tickless mode has gotten so good that the CPU really was legitimately idle for that long? Does the watchdog even check for something like that?

Comment 7 Don Zickus 2011-06-27 13:41:30 UTC
(In reply to comment #6)
> (In reply to comment #2)
> > The stack trace doesn't make sense for some reason.  It says the kernel
> > detected a cpu lockup while idling, which is bogus obviously.
> 
> I wonder if tickless mode has gotten so good that the CPU really was
> legitimately idle for that long? Does the watchdog even check for something
> like that?

The watchdog checks to see if a cpu has had its interrupts disabled for 60 seconds.  If so, then when the NMI fires it will print out the backtrace.

That means in this case the cpu was idling with interrupts disabled.  If that is really the case, then I doubt it would ever get scheduled and the cpu would sit there in an idle state forever.  Which is just as bad as a lockup. :-)

Considering the cpu idle state is the state most machines spend a lot of time in, I find it hard to believe that it is the problem.  Though it could be a corner case that some process forgot to enable interrupts and then forced the cpu to idle.

Cheers,
Don

Comment 8 Josh Boyer 2011-09-26 18:21:23 UTC
Don, can we close this bug out?  Seems a recreate isn't in the cards...

Comment 9 Don Zickus 2011-09-26 20:02:26 UTC
Doesn't bother me. I have enough bugs to work on! :-)

Cheers,
Don

Comment 10 Josh Boyer 2011-09-26 20:43:08 UTC
Thanks.