Description of problem: kernel-xen boot up hang there after enabled iommu on AMD Magny Cours system amd-drachma-01.lab.bos.redhat.com: input: USB HID v1.01 Mouse [Peppercon AG Multidevice] on usb-0000:00:13.2-2.2 ioc0: LSISAS1064E B3: Capabilities={Initiator} mptbase: ioc0: Initiating recovery BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0] CPU 13: Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1 RIP: e030:[<ffffffff802063aa>] [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 RSP: e02b:ffff88003326df08 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 000000000000000d RCX: ffffffff802063aa RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000036 R09: 00000000fffef762 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff805ca680(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b Call Trace: [<ffffffff8026f4d1>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca4c>] xen_idle+0x38/0x4a [<ffffffff8024b038>] cpu_idle+0x97/0xba mptbase: ioc0: Initiating recovery BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0] CPU 13: Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1 RIP: e030:[<ffffffff802063aa>] [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 RSP: e02b:ffff88003326df08 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 000000000000000d RCX: ffffffff802063aa RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: 000000000000000f R09: 00000000ffff105a R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff805ca680(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b Call Trace: [<ffffffff8026f4d1>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca4c>] xen_idle+0x38/0x4a [<ffffffff8024b038>] cpu_idle+0x97/0xba mptbase: ioc0: Initiating recovery BUG: soft lockup - CPU#13 stuck for 16s! [swapper:0] CPU 13: Modules linked in: mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-164.8.1.el5xen #1 [---------- snip -----------] unmounting old /proc unmounting old /sys switchroot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init! (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. on amd MC system, enable IOMMU in bios 2. pass iommu to xen.gz in grub 3. boot that xen kernel Actual results: will got soft lockup, while init mptbase and got kernel panic later then Expected results: kernel boot up correctly Additional info:
Bhavna, ever look at this?
Re-assigned.
Created attachment 495454 [details] Xen and dom0 log about successful -238 boot, with iommu enabled As a standard assumption for any engineering sample, I believe the BIOS is broken and/or out of date: powernow-k8: invalid pstate 1 - bad value 1. powernow-k8: Please report to BIOS manufacturer The IOMMUs were enabled however: (XEN) AMD-Vi: IOMMU 0 Enabled. (XEN) AMD-Vi: IOMMU 1 Enabled. (XEN) I/O virtualisation enabled This provides the "expected results" in comment 0 (kernel boots up correctly). I'm starting a reboot loop with this RHEL-5.6 installation.
Created attachment 495473 [details] log of 10 successful reboots in a row Usual reboot time was 7 minutes 10 seconds with very low variance.
The bug was reported for RHEL-5.4 (2.6.18-164.8.1.el5xen). The successful reboots above were done under RHEL-5.6 (-238). After RHEL-5.4, at least the following Magny-Cours related patches were committed: build 2.6.18-174.el5 kernel: BZ commit one-liner -- ------ --------- 513684 310690c [x86] fix up threshold_bank4 support on AMD Magny-cours 513684 8c0ce9b [x86] fix up L3 cache information for AMD Magny-cours build 2.6.18-175.el5 kernel: BZ commit one-liner -- ------ --------- 522215 58d03d1 [x86] fix boot crash with < 8-core AMD Magny-cours system 513685 3cc6b97 [x86] support amd magny-cours power-aware scheduler fix 526770 0ed6049 [x86] amd: fix hot plug cpu issue on 32-bit magny-cours build 2.6.18-175.el5 hypervisor: BZ commit one-liner -- ------ --------- 526051 02b0c79 [xen] fix numa on magny-cours systems