Red Hat Bugzilla – Bug 677532
[kdump] WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted
Last modified: 2016-01-20 03:09:47 EST
Description of problem: When run test /kernel/kdump/crash-crasher, system got hang and kdump was not executed. Part of the console log as follow: ======================================================== ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted) Hardware name: IBM 3850 M2 / x3950 M2 -[71414RZ]- Watchdog detected hard LOCKUP on cpu 7 Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] Pid: 14704, comm: runtest.sh Not tainted 2.6.32-114.0.1.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff81066d27>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff81066e16>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff810d5859>] ? watchdog_overflow_callback+0xa9/0xd0 [<ffffffff81107e56>] ? __perf_event_overflow+0x116/0x290 [<ffffffff8101b8a1>] ? intel_pmu_save_and_restart+0xa1/0xc0 [<ffffffff81108449>] ? perf_event_overflow+0x19/0x20 [<ffffffff8101d6fa>] ? intel_pmu_handle_irq+0x26a/0x4e0 [<ffffffff814dc4f6>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff814dafe8>] ? perf_event_nmi_handler+0x58/0xe0 [<ffffffff814dcb25>] ? notifier_call_chain+0x55/0x80 [<ffffffff814dcb8a>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8109377e>] ? notify_die+0x2e/0x30 [<ffffffff814da793>] ? do_nmi+0x173/0x2b0 [<ffffffff814da0a0>] ? nmi+0x20/0x30 [<ffffffff811d8280>] ? proc_file_write+0x0/0xc0 [<ffffffff814d9835>] ? _spin_lock_irq+0x25/0x40 <<EOE>> [<ffffffffa00320db>] ? crasher_write+0x5b/0xa0 [crasher] [<ffffffff811d82f4>] ? proc_file_write+0x74/0xc0 [<ffffffff811d294e>] ? proc_reg_write+0x7e/0xc0 [<ffffffff81170018>] ? vfs_write+0xb8/0x1a0 [<ffffffff810d0fd2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81170a51>] ? sys_write+0x51/0x90 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b ---[ end trace ee603741dcc789f6 ]--- BUG: soft lockup - CPU#1 stuck for 67s! [parted:15073] Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] CPU 1: Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] Pid: 15073, comm: parted Tainted: G W ---------------- 2.6.32-114.0.1.el6.x86_64 #1 IBM 3850 M2 / x3950 M2 -[71414RZ]- RIP: 0010:[<ffffffff810a3b72>] [<ffffffff810a3b72>] smp_call_function_many+0x1b2/0x210 RSP: 0018:ffff88041bd79d48 EFLAGS: 00000202 RAX: 0000000000000010 RBX: ffff88041bd79d88 RCX: 00000000000000fc RDX: 000000000000000f RSI: 0000000000000010 RDI: 0000000000000286 RBP: ffffffff8100bc8e R08: ffffffff81b9d840 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81260320 R13: ffff88041bd79cf8 R14: 0000000000000292 R15: ffff88041bd79cf8 FS: 00007fbae21707e0(0000) GS:ffff880028220000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcfbf08a610 CR3: 000000041c646000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff810a3b5a>] ? smp_call_function_many+0x19a/0x210 [<ffffffff811a0bd0>] ? invalidate_bh_lru+0x0/0x50 [<ffffffff810a3bf2>] ? smp_call_function+0x22/0x30 [<ffffffff8106eec4>] ? on_each_cpu+0x24/0x50 [<ffffffff811a084c>] ? invalidate_bh_lrus+0x1c/0x20 [<ffffffff811a16f5>] ? invalidate_bdev+0x25/0x40 [<ffffffff8124e054>] ? blkdev_ioctl+0x424/0x720 [<ffffffff811a6cec>] ? block_ioctl+0x3c/0x40 [<ffffffff81182b22>] ? vfs_ioctl+0x22/0xa0 [<ffffffff81182cc4>] ? do_vfs_ioctl+0x84/0x580 [<ffffffff81183241>] ? sys_ioctl+0x81/0xa0 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b BUG: soft lockup - CPU#1 stuck for 67s! [parted:15073] Version-Release number of selected component (if applicable): kernel-firmware-2.6.32-114.0.1.el6.noarch kernel-2.6.32-114.0.1.el6.x86_64 kexec-tools-2.0.0-165.el6.x86_64 How reproducible: Found this issue on ibm-x3950m2-01.rhts.eng.bos.redhat.com Steps to Reproduce: 1.Setup kdump 2.Run /kernel/kdump/crash-crasher as TESTARGS=3 3. Actual results: System got hang and print call trace Expected results: Kdump executed as expected Additional info: https://beaker.engineering.redhat.com/recipes/108037 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/530/53008/108037///console.log
with this guiltyfunc: bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008
Hmm, that was dumb of me when I added the new nmi_watchdog. I told it to display a WARNING instead of panic when it detected a lockup. If you boot with 'nmi_watchdog=panic' on the kernel command line it will probably get kdump going. That should be good enough for beta. I'll fix it properly with a config option or something. Cheers, Don
Reporter, Could I please ask you to provide a priority assessment (set the priority field to one of urgent/high/medium/low) for the impact of this issue? This will help us prioritize this issue with our other outstanding bugs for the current release cycle ... Regards, Brock
Patch(es) available on kernel-2.6.32-121.el6
Tested with RHEL-20110311.3, kernel-2.6.32-122.el6 on ibm-x3950m2-01.rhts.eng.bos.redhat.com. No WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted) found. . No WARNING: at kernel/watchdog.c:229 Change status to VERIFIED
Patch(es) available on kernel-2.6.32-131.0.9.el6
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html