Bug 475652 - kdump panic introduced by hpet fix on systems without HPET
Summary: kdump panic introduced by hpet fix on systems without HPET
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-09 22:02 UTC by Doug Chapman
Modified: 2009-01-20 19:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 19:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
potential fix (1009 bytes, patch)
2008-12-09 22:05 UTC, Doug Chapman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Doug Chapman 2008-12-09 22:02:23 UTC
Description of problem:
This is a regression introduced by BZ 473038

That patch disables the hpet during machine_crash_shutdown() however on the HP DL585 platform which doesn't have an HPET this causes a panic:

SysRq : Trigger a crashdump
Unable to handle kernel paging request at ffffffffff5fe010 RIP: 
 [<ffffffff80079db4>] machine_crash_shutdown+0xe0/0x104
PGD 203067 PUD 10e048067 PMD 10e049067 PTE 0
Oops: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:07.0/0000:02:06.1/irq
CPU 4 
Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tg3 i2c_amd756 ide_cd libphy i2c_core cdrom hpilo amd_rng shpchp k8temp hwmon k8_edac serio_raw edac_mc pcspkr floppy dm_snapshot dm_zero dm_mirror dm_log dm_mod cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4089, comm: bash Not tainted 2.6.18-126.el5 #1
RIP: 0010:[<ffffffff80079db4>]  [<ffffffff80079db4>] machine_crash_shutdown+0xe0/0x104
RSP: 0018:ffff8103ef43bdd8  EFLAGS: 00010092
RAX: 0000000000000012 RBX: 00000000000003e7 RCX: 0000000000803000
RDX: 0000000000000700 RSI: 0000000000000012 RDI: 0000000000000001
RBP: ffff8103ef43bdf8 R08: 0000000000000010 R09: 0000000001000000
R10: ffff8103ef43bd38 R11: 0000000000000000 R12: 0000000000000063
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000006
FS:  00002b9a0116ce10(0000) GS:ffff81070e1553c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff5fe010 CR3: 0000000afe02d000 CR4: 00000000000006e0
Process bash (pid: 4089, threadinfo ffff8103ef43a000, task ffff8103fd2640c0)
Stack:  0000000000000000 ffffffff8031b420 0000000000000000 ffffffff800aaa11
 0000000000000006 0000000000000000 0000000000000000 0000000000000063
 0000000000000000 ffffffff8031b420 0000000000000080 ffff8103ef43bb68
Call Trace:
 [<ffffffff800aaa11>] crash_kexec+0xcc/0xe8
 [<ffffffff800aa9fc>] crash_kexec+0xb7/0xe8
 [<ffffffff801a4184>] sysrq_handle_crashdump+0xc/0x34
 [<ffffffff801a3f7c>] __handle_sysrq+0x90/0x121
 [<ffffffff801031f1>] write_sysrq_trigger+0x2a/0x32
 [<ffffffff8001659e>] vfs_write+0xce/0x174
 [<ffffffff80016e6b>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 8b 04 25 10 e0 5f ff 83 e0 fc 89 04 25 10 e0 5f ff 65 8b 34 
RIP  [<ffffffff80079db4>] machine_crash_shutdown+0xe0/0x104
 RSP <ffff8103ef43bdd8>
CR2: ffffffffff5fe010
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
kernel-2.6.18-126.el5


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Chapman 2008-12-09 22:05:07 UTC
Created attachment 326408 [details]
potential fix

untested fix, I am building both x86_64 and i686 kernels with this patch now

Comment 2 Neil Horman 2008-12-09 23:19:15 UTC
Setting exception flags so this can hopefully make snap7.  Doug, you'll probably want to bring this up to Linda or someone in pm to get the blocker flag set on this.  Looking at the patch, I think its reasonable.  Thanks for looking at this!

Comment 3 Linda Wang 2008-12-10 17:17:32 UTC
since bug 473038's patch caused this regression, to address the regression,
we will back out the patch for 473038, and use this bug to back 
the patch out.

Comment 5 Don Zickus 2008-12-16 19:15:46 UTC
in kernel-2.6.18-127.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 7 Doug Chapman 2008-12-16 20:44:41 UTC
I have verified that the problem is resolved with kernel-2.6.18-127.el5 on hp-dl585-01 where I first uncovered the issue.

Comment 10 errata-xmlrpc 2009-01-20 19:36:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.