Bug 628534 - system reboots when AMD IOMMU is enabled
Summary: system reboots when AMD IOMMU is enabled
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Kiran Thirumalai
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 5.6-Known_Issues
TreeView+ depends on / blocked
 
Reported: 2010-08-30 10:26 UTC by Stefan Assmann
Modified: 2011-08-17 19:04 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If AMD IOMMU is enabled in BIOS on ProLiant DL165 G7 systems, the system will reboot automatically when IOMMU attempts to initalize. To work around this issue, either disable IOMMU, or update the BIOS to version <filename>2010.09.06</filename> or later.
Clone Of:
Environment:
Last Closed: 2011-08-17 19:04:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stefan Assmann 2010-08-30 10:26:53 UTC
Description of problem:
system: hp-dl165g7-01.rhts.eng.bos.redhat.com

After enabling AMD IOMMU in the BIOS the system always resets during IOMMU init.

hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0, 0
hpet0: 4 32-bit timers, 14318180 Hz
ACPI: DMAR not present
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:00:00.2[A] -> GSI 55 (level, low) -> IRQ 169
AMD IOMMU: enabling GFX workaround for PCI device 02:00.0
AMD IOMMU: Enabling IOMMU at 00:00.2cap 0x40

After that line the system is reset. Only way to go beyond this is to disable the AMD IOMMU again.

Comment 3 Andreas Herrmann 2010-10-22 14:49:11 UTC
AFAIK we did not see similar problems during our tests.

Does the reset still occur with a newer RHEL 5.x kernel,
say kernel-2.6.18-225.el5?

Comment 4 Stefan Assmann 2010-11-10 07:57:36 UTC
Yes, it still occurs with kernel-2.6.18-230.el5.

System is
ProLiant DL165 G7
HP System BIOS - O37  (07/30/2010)

Comment 6 RHEL Program Management 2010-11-10 08:19:30 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Stefan Assmann 2010-11-10 11:03:34 UTC
This is possibly a BIOS bug.
Testing with upstream 2.6.35 revealed the following:
AMD-Vi: Can not reserve memory region fec20000 for mmio
AMD-Vi: This is a BIOS bug. Please contact your hardware vendor
Trying to free nonexistent resource <00000000fec20000-00000000fec23fff>

However upstream is able to handle the fact gracefully while RHEL5 is stuck in an endless reboot cycle.

Comment 8 Stefan Assmann 2010-11-10 12:36:06 UTC
I think I found the upstream commit that fixed it.
e82752d8b5a7e0a5e4d607fd8713549e2a4e2741 
x86/amd-iommu: Fix crash when request_mem_region fails

Unfortunately this will require further changes to the AMD IOMMU code. I'll try to cook something up.

Comment 9 Stefan Assmann 2010-11-17 11:16:29 UTC
Box boots fine with the latest BIOS revision
HP System BIOS - O37  (09/06/2010)

Just some IO_PAGE_FAULT displayed with a -194 kernel.
AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:13.2 domain=0x0000 address=0x00000000000e43c0 flags=0x0050]
AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000000e5080 flags=0x0070]
AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000000e5040 flags=0x0050]
AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000ffffffc0 flags=0x0050]
AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000ffffffc0 flags=0x0050]
[...]

Comment 11 Linda Wang 2010-12-07 13:20:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
RHEL5 on ProLiant DL165 G7
systems the IOMMU needs to be disabled or the BIOS updated to version 
2010.09.06 or later.

Comment 16 Ryan Lerch 2011-01-05 02:06:40 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,2 @@
-RHEL5 on ProLiant DL165 G7
+If AMD IOMMU is enabled in BIOS on ProLiant DL165 G7
-systems the IOMMU needs to be disabled or the BIOS updated to version 
+systems, the system will reboot automatically when IOMMU attempts to initalize. To work around this issue, either disable IOMMU, or update the BIOS to version <filename>2010.09.06</filename> or later.-2010.09.06 or later.

Comment 17 Andreas Herrmann 2011-01-20 16:59:58 UTC
(For sake of completeness.)
In reply to comment #9 regarding the IO page faults, here a comment
from Joerg Roedel:

  "This is no real issue. The io-page-faults come from devices which are
  used by the BIOS and are not handed over to the OS yet (tyically USB
  controlers). From the time the IOMMU is initialized up to the point
  Linux loads the USB drivers such io-page-faults can happen.
  The BIOS can prevent that by defining unity-mapped ranges or
  exclusion-ranges. But the BIOSes I have seen don't do this."

Comment 18 Tony Camuso 2011-01-21 16:03:26 UTC
If I read this correctly, the problem is fixed with a later version of the BIOS. In that case, all that's needed is a CA from HP and a RH release note advising users to either update the BIOS or disable IOMMU in the BIOS.

Comment 20 John Feeney 2011-06-22 18:45:22 UTC
I believe the note for this was added to the 5.6 Technical Notes. Do we have a CA from HP so we can close this now?

Comment 21 RHEL Program Management 2011-08-17 19:04:55 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.