| Summary: | [RHEL6.1] PANIC booting kexec kernel: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | PaulB <pbunyan> | ||||
| Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Chao Ye <cye> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.1 | CC: | amwang, arozansk, cye, jburke, mike.miller, pbunyan, phan, prarit, qcai, thenzl | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-10-28 16:18:47 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
PaulB
2011-06-28 18:06:25 UTC
Hard lockup happened in idle thread, kernel seems stuck at
static void mwait_idle(void)
{
if (!need_resched()) {
trace_power_start(POWER_CSTATE, 1, smp_processor_id());
if (cpu_has(¤t_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)¤t_thread_info()->flags);
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
__sti_mwait(0, 0);
else
local_irq_enable();
} else
local_irq_enable();
}
Prarit, any ideas?
I was poking at this box recently. I can reproduce the 6.1 hang without to much effort. However, updating to the latest 6.2 tools/kernel, resulted in kdump recovering from the hang (took a couple of minutes but it recovered).
Kdump failed because it had trouble mounting the filesystem because the cciss driver can't do its thing.
Snippet below:
Loading i6300esb.ko module
i6300ESB timer: Intel 6300ESB WatchDog Timer Driver v0.04
i6300ESB timer: initialized (0xffffc900000ea000). heartbeat=30 sec
(nowayout=0)
Loading shpchp.ko module
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Loading edac_core.ko module
EDAC MC: Ver: 2.1.0 Oct 25 2011
Loading mbcache.ko module
Loading jbd2.ko module
Loading cdrom.ko module
Loading hpsa.ko module
HP HPSA Driver (v 2.0.2-3)
hpsa 0000:02:01.0: unrecognized board ID: 0x40910e11, ignoring.
hpsa 0000:02:01.0: Not resetting device.
Loading cciss.ko module
HP CISS Driver (v 3.6.28-RH1)
cciss 0000:02:01.0: using PCI PM to reset controller
cciss 0000:02:01.0: Refused to change power state, currently in D3
cciss 0000:02:01.0: enabling device (0000 -> 0003)
cciss 0000:02:01.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
cciss 0000:02:01.0: Waiting for board to reset.
cciss 0000:02:01.0: board not ready, timed out.
cciss 0000:02:01.0: failed waiting for board to become ready after hard
reset
Loading pata_acpi.ko module
pata_acpi 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pata_acpi 0000:00:1f.1: PCI INT A disabled
Loading ata_generic.ko module
Loading ata_piix.ko module
ata_piix 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x500 irq 14
ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x508 irq 15
ata1.00: ATAPI: HL-DT-STCD-RW/DVD DRIVE GCC-4244N, 2.00, max UDMA/33
ata1.00: configured for UDMA/33
scsi 0:0:0:0: CD-ROM HL-DT-ST RW/DVD GCC-4244N 2.00 PQ: 0 ANSI: 5
scsi 0:0:0:0: Attached scsi generic sg0 type 5
Loading cpufreq_ondemand.ko module
Loading acpi-cpufreq.ko module
Loading iTCO_wdt.ko module
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.05
iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS
Loading e752x_edac.ko module
Contact your BIOS vendor to see if the E752x error registers can be safely
un-hidden
Loading ext4.ko module
Loading sr_mod.ko module
sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
Waiting for required block device discovery
Creating Block Devices
Creating block device loop0
Creating block device loop1
Creating block device loop2
Creating block device loop3
Creating block device loop4
Creating block device loop5
Creating block device loop6
Creating block device loop7
Creating block device ram0
Creating block device ram1
Creating block device ram10
Creating block device ram11
Creating block device ram12
Creating block device ram13
Creating block device ram14
Creating block device ram15
Creating block device ram2
Creating block device ram3
Creating block device ram4
Creating block device ram5
Creating block device ram6
Creating block device ram7
Creating block device ram8
Creating block device ram9
Creating block device sr0
Making device-mapper control node
Scanning logical volumes
Reading all physical volumes. This may take a while...
No volume groups found
No volume groups found
Activating logical volumes
No volume groups found
No volume groups found
Free memory/Total memory (free %): 206176 / 243020 ( 84.8391 )
Saving to the local filesystem /dev/mapper/vg_hpdl360g401-lv_root
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: No such file or directory while trying to open
/dev/mapper/vg_hpdl360g401-lv_root
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
mount: mounting /dev/mapper/vg_hpdl360g401-lv_root on /mnt failed: No such
file or directory
Attempting to enter user-space to capture vmcore
Resetting kernel time value to BIOS time and timezone value to UTC.
Free memory/Total memory (free %): 206176 / 243020 ( 84.8391 )
Creating root device.
Free memory/Total memory (free %): 206236 / 243020 ( 84.8638 )
Checking root filesystem.
fsck (busybox 1.15.1, 2010-11-30 08:10:31 EST)
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: No such file or directory while trying to open
/dev/mapper/vg_hpdl360g401-lv_root
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
Mounting root filesystem: mount -t ext4 /dev/mapper/vg_hpdl360g401-lv_root
/sysroot
unable to mount rootfs. Dropping to shell
/ #
/
I should probably re-assign this to someone like Tomas Henzl who looks after the cciss driver.
But I think all the strange panics and hangs on my end have disappeared through various fixes in the kernel.
Cheers,
Don
Which Smart Array is this? I'm guessing from the output in comment 9 it's a P600. If so, I just recently submitted a minor change to delay for 1/2 second in the reset code. That seems to resolve this issue. (In reply to comment #10) > Which Smart Array is this? I'm guessing from the output in comment 9 it's a > P600. If so, I just recently submitted a minor change to delay for 1/2 second > in the reset code. That seems to resolve this issue. Hi Mike, Where can I find that patch to try it? Cheers, Don Created attachment 530676 [details]
Patch to add 500ms delay in PCI PM reset code
Don,
I just attached the patch to the BZ. This one is actually for upstream (can't find the ones I did for RH, arghhhhh). It should apply with an offset. But as you can see it's very simple.
-- mikem
Thanks Mike. That fix worked for me. Cheers, Don (In reply to comment #13) > Thanks Mike. That fix worked for me. > > Cheers, > Don Excellent. Ship it! :) *** This bug has been marked as a duplicate of bug 746317 *** |