Bug 541209 - kernel-xen boot up hang there after enabled iommu on AMD Magny Cours system
Summary: kernel-xen boot up hang there after enabled iommu on AMD Magny Cours system
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Kiran Thirumalai
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 514490
TreeView+ depends on / blocked
 
Reported: 2009-11-25 08:58 UTC by wmg
Modified: 2013-08-06 01:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-04-28 10:09:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Xen and dom0 log about successful -238 boot, with iommu enabled (58.69 KB, text/plain)
2011-04-28 08:49 UTC, Laszlo Ersek
no flags Details
log of 10 successful reboots in a row (8.86 KB, application/x-xz)
2011-04-28 10:02 UTC, Laszlo Ersek
no flags Details

Description wmg 2009-11-25 08:58:14 UTC
Description of problem:

kernel-xen boot up hang there after enabled iommu on AMD Magny Cours system
amd-drachma-01.lab.bos.redhat.com:

input: USB HID v1.01 Mouse [Peppercon AG Multidevice] on usb-0000:00:13.2-2.2
ioc0: LSISAS1064E B3: Capabilities={Initiator}
mptbase: ioc0: Initiating recovery
BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0]
CPU 13:
Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1
RIP: e030:[<ffffffff802063aa>]  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
RSP: e02b:ffff88003326df08  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 000000000000000d RCX: ffffffff802063aa
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000036 R09: 00000000fffef762
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff805ca680(0000) knlGS:0000000000000000
CS:  e033 DS: 002b ES: 002b

Call Trace:
 [<ffffffff8026f4d1>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca4c>] xen_idle+0x38/0x4a
 [<ffffffff8024b038>] cpu_idle+0x97/0xba

mptbase: ioc0: Initiating recovery
BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0]
CPU 13:
Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1
RIP: e030:[<ffffffff802063aa>]  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
RSP: e02b:ffff88003326df08  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 000000000000000d RCX: ffffffff802063aa
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000000 R08: 000000000000000f R09: 00000000ffff105a
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff805ca680(0000) knlGS:0000000000000000
CS:  e033 DS: 002b ES: 002b

Call Trace:
 [<ffffffff8026f4d1>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca4c>] xen_idle+0x38/0x4a
 [<ffffffff8024b038>] cpu_idle+0x97/0xba

mptbase: ioc0: Initiating recovery
BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0]

CPU 13:
Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1
    [---------- snip -----------]
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Kernel panic - not syncing: Attempted to kill init!
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. on amd MC system, enable IOMMU in bios
2. pass iommu to xen.gz in grub
3. boot that xen kernel
  
Actual results:

will got soft lockup, while init mptbase
and got kernel panic later then


Expected results:

kernel boot up correctly


Additional info:

Comment 3 Bill Burns 2010-06-30 18:47:37 UTC
Bhavna, ever look at this?

Comment 6 Bill Burns 2010-12-06 16:39:38 UTC
Re-assigned.

Comment 7 Laszlo Ersek 2011-04-28 08:49:00 UTC
Created attachment 495454 [details]
Xen and dom0 log about successful -238 boot, with iommu enabled

As a standard assumption for any engineering sample, I believe the BIOS is broken and/or out of date:

    powernow-k8: invalid pstate 1 - bad value 1.
    powernow-k8: Please report to BIOS manufacturer

The IOMMUs were enabled however:

    (XEN) AMD-Vi: IOMMU 0 Enabled.
    (XEN) AMD-Vi: IOMMU 1 Enabled.
    (XEN) I/O virtualisation enabled

This provides the "expected results" in comment 0 (kernel boots up correctly). I'm starting a reboot loop with this RHEL-5.6 installation.

Comment 8 Laszlo Ersek 2011-04-28 10:02:12 UTC
Created attachment 495473 [details]
log of 10 successful reboots in a row

Usual reboot time was 7 minutes 10 seconds with very low variance.

Comment 9 Laszlo Ersek 2011-04-28 10:09:33 UTC
The bug was reported for RHEL-5.4 (2.6.18-164.8.1.el5xen). The successful reboots above were done under RHEL-5.6 (-238). After RHEL-5.4, at least the following Magny-Cours related patches were committed:

build 2.6.18-174.el5 kernel:

BZ      commit   one-liner
--      ------   ---------
513684  310690c  [x86] fix up threshold_bank4 support on AMD Magny-cours
513684  8c0ce9b  [x86] fix up L3 cache information for AMD Magny-cours

build 2.6.18-175.el5 kernel:

BZ      commit   one-liner
--      ------   ---------
522215  58d03d1  [x86] fix boot crash with < 8-core AMD Magny-cours system
513685  3cc6b97  [x86] support amd magny-cours power-aware scheduler fix
526770  0ed6049  [x86] amd: fix hot plug cpu issue on 32-bit magny-cours

build 2.6.18-175.el5 hypervisor:

BZ      commit   one-liner
--      ------   ---------
526051  02b0c79  [xen] fix numa on magny-cours systems


Note You need to log in before you can comment on or make changes to this bug.