Description of problem: system: hp-dl165g7-01.rhts.eng.bos.redhat.com After enabling AMD IOMMU in the BIOS the system always resets during IOMMU init. hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0, 0 hpet0: 4 32-bit timers, 14318180 Hz ACPI: DMAR not present GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:00.2[A] -> GSI 55 (level, low) -> IRQ 169 AMD IOMMU: enabling GFX workaround for PCI device 02:00.0 AMD IOMMU: Enabling IOMMU at 00:00.2cap 0x40 After that line the system is reset. Only way to go beyond this is to disable the AMD IOMMU again.
AFAIK we did not see similar problems during our tests. Does the reset still occur with a newer RHEL 5.x kernel, say kernel-2.6.18-225.el5?
Yes, it still occurs with kernel-2.6.18-230.el5. System is ProLiant DL165 G7 HP System BIOS - O37 (07/30/2010)
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This is possibly a BIOS bug. Testing with upstream 2.6.35 revealed the following: AMD-Vi: Can not reserve memory region fec20000 for mmio AMD-Vi: This is a BIOS bug. Please contact your hardware vendor Trying to free nonexistent resource <00000000fec20000-00000000fec23fff> However upstream is able to handle the fact gracefully while RHEL5 is stuck in an endless reboot cycle.
I think I found the upstream commit that fixed it. e82752d8b5a7e0a5e4d607fd8713549e2a4e2741 x86/amd-iommu: Fix crash when request_mem_region fails Unfortunately this will require further changes to the AMD IOMMU code. I'll try to cook something up.
Box boots fine with the latest BIOS revision HP System BIOS - O37 (09/06/2010) Just some IO_PAGE_FAULT displayed with a -194 kernel. AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:13.2 domain=0x0000 address=0x00000000000e43c0 flags=0x0050] AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000000e5080 flags=0x0070] AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000000e5040 flags=0x0050] AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000ffffffc0 flags=0x0050] AMD IOMMU: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0000 address=0x00000000ffffffc0 flags=0x0050] [...]
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: RHEL5 on ProLiant DL165 G7 systems the IOMMU needs to be disabled or the BIOS updated to version 2010.09.06 or later.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,2 @@ -RHEL5 on ProLiant DL165 G7 +If AMD IOMMU is enabled in BIOS on ProLiant DL165 G7 -systems the IOMMU needs to be disabled or the BIOS updated to version +systems, the system will reboot automatically when IOMMU attempts to initalize. To work around this issue, either disable IOMMU, or update the BIOS to version <filename>2010.09.06</filename> or later.-2010.09.06 or later.
(For sake of completeness.) In reply to comment #9 regarding the IO page faults, here a comment from Joerg Roedel: "This is no real issue. The io-page-faults come from devices which are used by the BIOS and are not handed over to the OS yet (tyically USB controlers). From the time the IOMMU is initialized up to the point Linux loads the USB drivers such io-page-faults can happen. The BIOS can prevent that by defining unity-mapped ranges or exclusion-ranges. But the BIOSes I have seen don't do this."
If I read this correctly, the problem is fixed with a later version of the BIOS. In that case, all that's needed is a CA from HP and a RH release note advising users to either update the BIOS or disable IOMMU in the BIOS.
I believe the note for this was added to the 5.6 Technical Notes. Do we have a CA from HP so we can close this now?
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request.