Description of problem: After upgrading my PowerEdge R300 to the 5.4 xen kernel (2.6.18-164.el5) I am now getting a kernel panic on boot whenever I have "acpi=off noacpi" specified on the kernel line in grub.conf: (XEN) ----[ Xen-3.1.2-164.el5 x86_64 debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff828c80127f1c>] msi_msg_read_remap_rte+0x2c/0x1b0 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000001 rcx: 0000000000000002 (XEN) rdx: 0000000000000001 rsi: 0000000000000000 rdi: 0000000000000005 (XEN) rbp: ffff83007ebe8480 rsp: ffff83007ed6fcf8 r8: 0000000000000093 (XEN) r9: 0000000000000004 r10: 00000000000000f6 r11: 0000000000000000 (XEN) r12: ffff83007ed6fd78 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000005 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000005d42e000 cr2: 0000000000000050 (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83007ed6fcf8: (XEN) 0000000000000000 0000000000000001 ffff83007ebe8480 0000000000000060 (XEN) 0000000000000000 ffff828c80153936 0000000000000001 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff83007ed6fd78 (XEN) 0000000000000093 ffff828c80152a32 0000000000000100 0000000000000021 (XEN) 00000000fee0300c 0000000000004121 ffff83007ea08080 0000000000000000 (XEN) ffff828c80323d80 0000000000000021 ffff83007ea08080 00000000000000f6 (XEN) 0000000000000001 ffff828c80139382 0000000000000001 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000293 ffff83007ebe8580 (XEN) 0000000000000001 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffffffffffffffff ffff88001ec29c18 ffff83007eafe380 (XEN) 00000000000000f6 ffff83007eaf8080 00000000000000f6 ffff828c8010733d (XEN) 0000000000000008 ffff83007ed6fee4 ffff83007ed6fec0 ffff83007eaf6080 (XEN) ffff83007ed6fed8 0000000000000060 0000000200000010 000000080000fec8 (XEN) 828c8018f720b948 c390ef66d1ffffff 0000000000000005 ffffffff00000000 (XEN) 0000000000000000 ffff828c801ea0ef ffffffffffffffff 0000000000000000 (XEN) 00000001000000f6 00a0fb001edd44f8 000000027ed6ff28 ffff83007eaf6080 (XEN) ffffffff805e1a80 00000000000000f6 ffff88001eddadc0 ffffffff805e1abc (XEN) 0000000000000000 ffff828c8018f107 0000000000000000 ffffffff805e1abc (XEN) ffff88001eddadc0 00000000000000f6 ffffffff805e1a80 00000000000000f6 (XEN) 0000000000000286 0000000000000004 000000000000f600 ffff88001ec29c18 (XEN) Xen call trace: (XEN) [<ffff828c80127f1c>] msi_msg_read_remap_rte+0x2c/0x1b0 (XEN) [<ffff828c80153936>] set_msi_affinity+0x216/0x270 (XEN) [<ffff828c80152a32>] write_msi_msg+0xc2/0x130 (XEN) [<ffff828c80139382>] pirq_guest_bind+0x1c2/0x2f0 (XEN) [<ffff828c8010733d>] do_event_channel_op+0x9fd/0x11d0 (XEN) [<ffff828c8018f107>] syscall_enter+0x67/0x6c (XEN) (XEN) Pagetable walk from 0000000000000050: (XEN) L4[0x000] = 000000005d1d4067 000000000001f1d4 (XEN) L3[0x000] = 000000005d11c067 000000000001f11c (XEN) L2[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000000000050 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... I am still able to boot the standard -164 kernel as well as kernel-xen-2.6.18-128, just not the kernel-xen-2.6.18-164. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-164.el5.x86_64 How reproducible: Always Steps to Reproduce: 1. Set kernel options "noacpi acpi=off dom0_mem=512M com1=9600n8" in grub.conf: title Red Hat Enterprise Linux Server (2.6.18-164.el5xen) root(hd0,0) kernel /xen.gz-2.6.18-164.el5 noacpi acpi=off dom0_mem=512M com1=9600n8 module /vmlinuz-2.6.18-164.el5xen ro root=/dev/vg0/lv_root console=tty0 console=ttyS0,9600n8 module /initrd-2.6.18-164.el5xen.img 2. Boot system into above entry Actual results: System panics before loading the kernel Expected results: System boots successfully Additional info: The reason I'm disabling acpi is it is the recommended configuration for cluster nodes, as leaving it enabled can cause a graceful shutdown when attempting to fence the node using a system management card (drac, ilo, etc). For now I've removed the options but since many cluster suite users may use this same configuration I wanted to report it.
Mirek the patch you emailed me did indeed correct the problem. I'm attaching it here for reference. Let me know if you need anything else from me. Thanks! -John
Created attachment 364768 [details] Patch to check for null drhd before reference in msi_msg_write_remap_rte
Created attachment 365017 [details] Patch to check for null drhd before use.v2 Patch checks drhd not only in msi_msg_read(write)_remap_rte but also in reassign_device_ownership.
in kernel-2.6.18-170.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
system fails to boot with kernel-xen 2.6.18-164.11.1.el5 system booted correctly with kernel-xen 2.6.18-192.el5 In both cases I modified the kernel line in grub and added "noacpi acpi=off" parameter.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html