Bug 525467

Summary: Xen panic in msi_msg_read_remap_rte with acpi=off
Product: Red Hat Enterprise Linux 5 Reporter: John Ruemker <jruemker>
Component: kernel-xenAssignee: Miroslav Rezanina <mrezanin>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4CC: clalance, dzickus, kraxel, mnovacek, mrezanin, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:36:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 526946    
Attachments:
Description Flags
Patch to check for null drhd before reference in msi_msg_write_remap_rte
none
Patch to check for null drhd before use.v2 none

Description John Ruemker 2009-09-24 13:58:55 UTC
Description of problem: After upgrading my PowerEdge R300 to the 5.4 xen kernel (2.6.18-164.el5) I am now getting a kernel panic on boot whenever I have "acpi=off noacpi" specified on the kernel line in grub.conf:

(XEN) ----[ Xen-3.1.2-164.el5  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff828c80127f1c>] msi_msg_read_remap_rte+0x2c/0x1b0
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000001   rcx: 0000000000000002
(XEN) rdx: 0000000000000001   rsi: 0000000000000000   rdi: 0000000000000005
(XEN) rbp: ffff83007ebe8480   rsp: ffff83007ed6fcf8   r8:  0000000000000093
(XEN) r9:  0000000000000004   r10: 00000000000000f6   r11: 0000000000000000
(XEN) r12: ffff83007ed6fd78   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000005   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 000000005d42e000   cr2: 0000000000000050
(XEN) ds: 0000   es: 0000   fs: 0063   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83007ed6fcf8:
(XEN)    0000000000000000 0000000000000001 ffff83007ebe8480 0000000000000060
(XEN)    0000000000000000 ffff828c80153936 0000000000000001 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff83007ed6fd78
(XEN)    0000000000000093 ffff828c80152a32 0000000000000100 0000000000000021
(XEN)    00000000fee0300c 0000000000004121 ffff83007ea08080 0000000000000000
(XEN)    ffff828c80323d80 0000000000000021 ffff83007ea08080 00000000000000f6
(XEN)    0000000000000001 ffff828c80139382 0000000000000001 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000293 ffff83007ebe8580
(XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff ffff88001ec29c18 ffff83007eafe380
(XEN)    00000000000000f6 ffff83007eaf8080 00000000000000f6 ffff828c8010733d
(XEN)    0000000000000008 ffff83007ed6fee4 ffff83007ed6fec0 ffff83007eaf6080
(XEN)    ffff83007ed6fed8 0000000000000060 0000000200000010 000000080000fec8
(XEN)    828c8018f720b948 c390ef66d1ffffff 0000000000000005 ffffffff00000000
(XEN)    0000000000000000 ffff828c801ea0ef ffffffffffffffff 0000000000000000
(XEN)    00000001000000f6 00a0fb001edd44f8 000000027ed6ff28 ffff83007eaf6080
(XEN)    ffffffff805e1a80 00000000000000f6 ffff88001eddadc0 ffffffff805e1abc
(XEN)    0000000000000000 ffff828c8018f107 0000000000000000 ffffffff805e1abc
(XEN)    ffff88001eddadc0 00000000000000f6 ffffffff805e1a80 00000000000000f6
(XEN)    0000000000000286 0000000000000004 000000000000f600 ffff88001ec29c18
(XEN) Xen call trace:
(XEN)    [<ffff828c80127f1c>] msi_msg_read_remap_rte+0x2c/0x1b0
(XEN)    [<ffff828c80153936>] set_msi_affinity+0x216/0x270
(XEN)    [<ffff828c80152a32>] write_msi_msg+0xc2/0x130
(XEN)    [<ffff828c80139382>] pirq_guest_bind+0x1c2/0x2f0
(XEN)    [<ffff828c8010733d>] do_event_channel_op+0x9fd/0x11d0
(XEN)    [<ffff828c8018f107>] syscall_enter+0x67/0x6c
(XEN)    
(XEN) Pagetable walk from 0000000000000050:
(XEN)  L4[0x000] = 000000005d1d4067 000000000001f1d4
(XEN)  L3[0x000] = 000000005d11c067 000000000001f11c
(XEN)  L2[0x000] = 0000000000000000 ffffffffffffffff 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000050
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

I am still able to boot the standard -164 kernel as well as kernel-xen-2.6.18-128, just not the kernel-xen-2.6.18-164. 

Version-Release number of selected component (if applicable): kernel-xen-2.6.18-164.el5.x86_64

How reproducible: Always

Steps to Reproduce: 
1. Set kernel options "noacpi acpi=off dom0_mem=512M com1=9600n8" in grub.conf:

title Red Hat Enterprise Linux Server (2.6.18-164.el5xen)
    root(hd0,0)
    kernel /xen.gz-2.6.18-164.el5 noacpi acpi=off dom0_mem=512M com1=9600n8
    module /vmlinuz-2.6.18-164.el5xen ro root=/dev/vg0/lv_root  console=tty0 console=ttyS0,9600n8
    module /initrd-2.6.18-164.el5xen.img

2. Boot system into above entry
 
Actual results: System panics before loading the kernel

Expected results: System boots successfully

Additional info: The reason I'm disabling acpi is it is the recommended configuration for cluster nodes, as leaving it enabled can cause a graceful shutdown when attempting to fence the node using a system management card (drac, ilo, etc).  For now I've removed the options but since many cluster suite users may use this same configuration I wanted to report it.

Comment 4 John Ruemker 2009-10-14 15:37:31 UTC
Mirek the patch you emailed me did indeed correct the problem.  I'm attaching it here for reference.  Let me know if you need anything else from me.

Thanks!
-John

Comment 5 John Ruemker 2009-10-14 15:39:53 UTC
Created attachment 364768 [details]
Patch to check for null drhd before reference in msi_msg_write_remap_rte

Comment 6 Miroslav Rezanina 2009-10-16 05:19:24 UTC
Created attachment 365017 [details]
Patch to check for null drhd before use.v2

Patch checks drhd not only in msi_msg_read(write)_remap_rte but also in reassign_device_ownership.

Comment 7 Don Zickus 2009-10-21 19:13:29 UTC
in kernel-2.6.18-170.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 9 michal novacek 2010-03-16 16:22:38 UTC
system fails to boot with kernel-xen 2.6.18-164.11.1.el5
system booted correctly with kernel-xen 2.6.18-192.el5

In both cases I modified the kernel line in grub and added "noacpi acpi=off" parameter.

Comment 11 errata-xmlrpc 2010-03-30 07:36:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html