Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 677532

Summary:	[kdump] WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted
Product:	Red Hat Enterprise Linux 6	Reporter:	Chao Ye <cye>
Component:	kernel	Assignee:	Don Zickus <dzickus>
Status:	CLOSED ERRATA	QA Contact:	Chao Ye <cye>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.1	CC:	arozansk, czhang, emcnabb, jburke, nhorman, prarit, qcai, syeghiay
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-2.6.32-131.0.9.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-05-23 20:39:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	799186
Bug Blocks:	676037, 1300182

Description Chao Ye 2011-02-15 06:10:42 UTC

Description of problem:
When run test /kernel/kdump/crash-crasher, system got hang and kdump was not executed. Part of the console log as follow:
========================================================
------------[ cut here ]------------ 
WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted) 
Hardware name: IBM 3850 M2 / x3950 M2 -[71414RZ]- 
Watchdog detected hard LOCKUP on cpu 7 
Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
Pid: 14704, comm: runtest.sh Not tainted 2.6.32-114.0.1.el6.x86_64 #1 
Call Trace: 
 <NMI>  [<ffffffff81066d27>] ? warn_slowpath_common+0x87/0xc0 
 [<ffffffff81066e16>] ? warn_slowpath_fmt+0x46/0x50 
 [<ffffffff810d5859>] ? watchdog_overflow_callback+0xa9/0xd0 
 [<ffffffff81107e56>] ? __perf_event_overflow+0x116/0x290 
 [<ffffffff8101b8a1>] ? intel_pmu_save_and_restart+0xa1/0xc0 
 [<ffffffff81108449>] ? perf_event_overflow+0x19/0x20 
 [<ffffffff8101d6fa>] ? intel_pmu_handle_irq+0x26a/0x4e0 
 [<ffffffff814dc4f6>] ? kprobe_exceptions_notify+0x16/0x430 
 [<ffffffff814dafe8>] ? perf_event_nmi_handler+0x58/0xe0 
 [<ffffffff814dcb25>] ? notifier_call_chain+0x55/0x80 
 [<ffffffff814dcb8a>] ? atomic_notifier_call_chain+0x1a/0x20 
 [<ffffffff8109377e>] ? notify_die+0x2e/0x30 
 [<ffffffff814da793>] ? do_nmi+0x173/0x2b0 
 [<ffffffff814da0a0>] ? nmi+0x20/0x30 
 [<ffffffff811d8280>] ? proc_file_write+0x0/0xc0 
 [<ffffffff814d9835>] ? _spin_lock_irq+0x25/0x40 
 <<EOE>>  [<ffffffffa00320db>] ? crasher_write+0x5b/0xa0 [crasher] 
 [<ffffffff811d82f4>] ? proc_file_write+0x74/0xc0 
 [<ffffffff811d294e>] ? proc_reg_write+0x7e/0xc0 
 [<ffffffff81170018>] ? vfs_write+0xb8/0x1a0 
 [<ffffffff810d0fd2>] ? audit_syscall_entry+0x272/0x2a0 
 [<ffffffff81170a51>] ? sys_write+0x51/0x90 
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b 
---[ end trace ee603741dcc789f6 ]--- 
BUG: soft lockup - CPU#1 stuck for 67s! [parted:15073] 
Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
CPU 1: 
Modules linked in: crasher(U) sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log ses enclosure microcode ibmaem ipmi_msghandler sg iTCO_wdt iTCO_vendor_support bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] 
Pid: 15073, comm: parted Tainted: G        W  ----------------  2.6.32-114.0.1.el6.x86_64 #1 IBM 3850 M2 / x3950 M2 -[71414RZ]- 
RIP: 0010:[<ffffffff810a3b72>]  [<ffffffff810a3b72>] smp_call_function_many+0x1b2/0x210 
RSP: 0018:ffff88041bd79d48  EFLAGS: 00000202 
RAX: 0000000000000010 RBX: ffff88041bd79d88 RCX: 00000000000000fc 
RDX: 000000000000000f RSI: 0000000000000010 RDI: 0000000000000286 
RBP: ffffffff8100bc8e R08: ffffffff81b9d840 R09: 0000000000000000 
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81260320 
R13: ffff88041bd79cf8 R14: 0000000000000292 R15: ffff88041bd79cf8 
FS:  00007fbae21707e0(0000) GS:ffff880028220000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 00007fcfbf08a610 CR3: 000000041c646000 CR4: 00000000000006e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Call Trace: 
 [<ffffffff810a3b5a>] ? smp_call_function_many+0x19a/0x210 
 [<ffffffff811a0bd0>] ? invalidate_bh_lru+0x0/0x50 
 [<ffffffff810a3bf2>] ? smp_call_function+0x22/0x30 
 [<ffffffff8106eec4>] ? on_each_cpu+0x24/0x50 
 [<ffffffff811a084c>] ? invalidate_bh_lrus+0x1c/0x20 
 [<ffffffff811a16f5>] ? invalidate_bdev+0x25/0x40 
 [<ffffffff8124e054>] ? blkdev_ioctl+0x424/0x720 
 [<ffffffff811a6cec>] ? block_ioctl+0x3c/0x40 
 [<ffffffff81182b22>] ? vfs_ioctl+0x22/0xa0 
 [<ffffffff81182cc4>] ? do_vfs_ioctl+0x84/0x580 
 [<ffffffff81183241>] ? sys_ioctl+0x81/0xa0 
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b 
BUG: soft lockup - CPU#1 stuck for 67s! [parted:15073] 


Version-Release number of selected component (if applicable):
kernel-firmware-2.6.32-114.0.1.el6.noarch
kernel-2.6.32-114.0.1.el6.x86_64
kexec-tools-2.0.0-165.el6.x86_64

How reproducible:
Found this issue on ibm-x3950m2-01.rhts.eng.bos.redhat.com

Steps to Reproduce:
1.Setup kdump
2.Run /kernel/kdump/crash-crasher as TESTARGS=3
3.
  
Actual results:
System got hang and print call trace

Expected results:
Kdump executed as expected

Additional info:
https://beaker.engineering.redhat.com/recipes/108037
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/02/530/53008/108037///console.log

Comment 1 KernelOops Bot 2011-02-15 06:14:12 UTC

 with this guiltyfunc:  bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008

Comment 5 KernelOops Bot 2011-02-15 07:33:52 UTC

 with this guiltyfunc:  bug 527824 bug 528295 bug 528296 bug 528521 bug 528768 bug 530380 bug 536855 bug 536985 bug 542694 bug 562008

Comment 9 Don Zickus 2011-02-25 02:39:47 UTC

Hmm, that was dumb of me when I added the new nmi_watchdog.  I told it to display a WARNING instead of panic when it detected a lockup.

If you boot with 'nmi_watchdog=panic' on the kernel command line it will probably get kdump going.  That should be good enough for beta.

I'll fix it properly with a config option or something.

Cheers,
Don

Comment 10 Brock Organ 2011-03-01 14:43:06 UTC

Reporter,

Could I please ask you to provide a priority assessment (set the priority field to one of urgent/high/medium/low) for the impact of this issue?  This will help us prioritize this issue with our other outstanding bugs for the current release cycle ...

Regards,

Brock

Comment 13 Aristeu Rozanski 2011-03-10 17:58:12 UTC

Patch(es) available on kernel-2.6.32-121.el6

Comment 16 Chao Ye 2011-03-16 03:00:35 UTC

Tested with RHEL-20110311.3, kernel-2.6.32-122.el6 on ibm-x3950m2-01.rhts.eng.bos.redhat.com.
No WARNING: at kernel/watchdog.c:229 watchdog_overflow_callback+0xa9/0xd0() (Not tainted) found.
. No WARNING: at kernel/watchdog.c:229

Change status to VERIFIED

Comment 21 Aristeu Rozanski 2011-04-27 14:20:34 UTC

Patch(es) available on kernel-2.6.32-131.0.9.el6

Comment 26 errata-xmlrpc 2011-05-23 20:39:33 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html