Bug 649766
| Summary: | DMAR Errors on HP RAID controller with intel_iommu set to on, system hangs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Joseph Mann <joseph.mann> | ||||||
| Component: | kernel | Assignee: | Tony Camuso <tcamuso> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Barry Donahue <bdonahue> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.0 | CC: | arozansk, bzeranski, chrisw, dwmw2, jane.lv, joseph.mann, jvillalo, laurie.barry, linda.knippers, luyu, mike.miller, mzywusko, prarit, rdoty, sandy.garza, syeghiay, thenzl, vaios.papadimitriou | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 6.1 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kernel 2.6.32-120.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-06-09 19:06:20 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 564512, 580566 | ||||||||
| Attachments: |
|
||||||||
|
Description
Joseph Mann
2010-11-04 14:13:18 UTC
(In reply to comment #0) > Description of problem: > When enabling intel_iommu on an HP DL380 G6, RHEL6 fails to complete its boot, > the following errors are seen on the console: > DRHD: handling fault status reg 2 > BUG: recent printk recursion! > <3>DMAR:[DMA Read] Request device [04:00.0] fault addr ffff4000 > DMAR:[fault reason 06] PTE Read access is not set This is typically a bug in the driver. The driver's incorrect use of the DMA API can cause this. In the above example, it would be not calling something like pci_map_single(PCI_DMA_TODEVICE) before instructing the device to initiate a DMA read transaction from memory. If you could simply install and boot the kernel-debug package. It has DMA API debugging enabled which can keep track of this and generate a useful backtrace. Also, in the meantime, if your intention is to test KVM PCI device assignment, you can boot the box (using the standard kernel) with "intel_iommu=on iommu=pt" to put the IOMMU in PassThrough mode. PT mode means the host devices are not isolated by the IOMMU, only guest devices. This should allow you to boot and test KVM PCI device assignment. thanks, -chris Created attachment 458123 [details]
Failure log of boot with IOMMU enabled
Chris,
Attached is the log from booting with the RHEL6 debug kernel.
It looks like that by the end of the trace, udev is some sort of failing loop, i continue to see kernel dumps, but I didn't capture any more since they appear to be in an infinite (or at least very long) loop.
Thanks Joseph. Really looks like a driver issue. hpsa 0000:04:00.0: MSIX hpsa 0000:04:00.0: hpsa0: <0x323a> at IRQ 63 using DAC ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted) Hardware name: ProLiant DL380 G6 hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ffff5001] [size=640 bytes] Just to add some more details to the debugging trail. I tested on local hw with current upstream kernel (2.6.37-rc1+, commit 5398a64). I was able to reproduce the DMA debugging warning above. However, I don't get DMAR faults when I enable intel_iommu in either RHEL 6 or upstream. So this will need some further investigation. Can you post the full dmesg from a failing boot? Boot the debug kernel and add to the commandline "debug intel_iommu=on" Again, for now (assuming the Storage Array is not what you are trying to assign to a KVM guest), you can boot with intel_iommu=on iommu=pt and this should get things going. Be good to confirm that works for you. (In reply to comment #5) > Just to add some more details to the debugging trail. I tested on local hw > with current upstream kernel (2.6.37-rc1+, commit 5398a64). I was able to > reproduce the DMA debugging warning above. However, I don't get DMAR faults > when I enable intel_iommu in either RHEL 6 or upstream. So this will need some > further investigation. Can you post the full dmesg from a failing boot? Boot > the debug kernel and add to the commandline "debug intel_iommu=on" Sorry, I somehow missed the fact that the dmesg in Comment #3 includes intel_iommu=on and the failure. (In reply to comment #5) > Just to add some more details to the debugging trail. I tested on local hw > with current upstream kernel (2.6.37-rc1+, commit 5398a64). I was able to > reproduce the DMA debugging warning above. However, I don't get DMAR faults > when I enable intel_iommu in either RHEL 6 or upstream. So this will need some > further investigation. Can you post the full dmesg from a failing boot? Boot > the debug kernel and add to the commandline "debug intel_iommu=on" > > Again, for now (assuming the Storage Array is not what you are trying to assign > to a KVM guest), you can boot with intel_iommu=on iommu=pt and this should get > things going. Be good to confirm that works for you. Chris, Setting 'iommu=pt' allows me to bypass this issue for the purpose of my testing. Joe (In reply to comment #7) > Setting 'iommu=pt' allows me to bypass this issue for the purpose of my > testing. Great, thanks for letting me know. Created attachment 459169 [details]
Mask off low order bits when unmapping
BTW, I dug into the warning only to notice it's purely cosmetic. The issue is simply that during pci_free_consistent() the low order bits are included in the dma_addr. These bits have been modified by the driver to encode extra information. Not a real issue since vt-d shifts down to page frame number. Here's an example of the patch. Will be sure this makes it upstream.
Still need to get more information from the hang that you are seeing Joe.
Can you send lspci -vvv -xxxx of full pci tree? (I'm specifically interested in 00:14.2, but whole tree can be useful).
(In reply to comment #9) > information. Not a real issue since vt-d shifts down to page frame number. > Here's an example of the patch. Will be sure this makes it upstream. The patch below is what is being prepared for 6.1, and I think it already was posted upstream. Please make sure you use the latest firmware, I think to remember to seen some problems with firmware related to hangs ... diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index fc9ea5a..f2dccb6 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -169,6 +169,7 @@ static int __devinit hpsa_find_cfg_addrs(struct pci_dev *pdev, static int __devinit hpsa_pci_find_memory_BAR(struct pci_dev *pdev, unsigned long *memory_bar); static int __devinit hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id); +static inline u32 hpsa_tag_discard_error_bits(u32 tag); static DEVICE_ATTR(raid_level, S_IRUGO, raid_level_show, NULL); static DEVICE_ATTR(lunid, S_IRUGO, lunid_show, NULL); @@ -2259,8 +2260,8 @@ static void cmd_special_free(struct ctlr_info *h, struct CommandList *c) temp64.val32.upper = c->ErrDesc.Addr.upper; pci_free_consistent(h->pdev, sizeof(*c->err_info), c->err_info, (dma_addr_t) temp64.val); - pci_free_consistent(h->pdev, sizeof(*c), - c, (dma_addr_t) c->busaddr); + pci_free_consistent(h->pdev, sizeof(*c), c, + (dma_addr_t) hpsa_tag_discard_error_bits((u32) c->busaddr)); } #ifdef CONFIG_COMPAT In reply to comment #10: Thomas, does this patch completely address the DMAR issue? Is this patch required as well as latest RAID fw? In reply to Description: Joe, Do you have the latest fw for the RAID as well as the latest system BIOS? I think these are 2 separate issues. Masking off those lower bits is _supposed_ to fix: WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted) Hardware name: ProLiant DL380 G6 hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ffff5001] [size=640 bytes] I still see the message from time to time even after patching the driver. If memory serves I saw DMAR errors when messing around with AER. But I'll have to dig thru any notes I may have. Come to think of it I saw the DMAR messages when running a XEN kernel. Does this kernel have XEN enabled. (In reply to comment #0) > Version-Release number of selected component (if applicable): > 2.6.32-71.el6.x86_64 > > > How reproducible: > Every boot, with intel_iommu=on I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted normally. Which machine did you test? What is interesting, I have observed on that machine that it says while in bios (or just after): ----------- Integrated Lights-Out 2 Advanced iLO 2 v1.77 Apr 23 2009 10.16.65.40 Slot 0 NMI - Undetermined Source ---------- and freezes. This seems to happen every time when the previously booted kernel had the "intel_iommu=on' option - the box needs then a cold reset to boot. Without the intel_iommu option it restarts fine. I haven't noticed any traces this were related to the raid controller. cat /proc/cmdline ro root=/dev/mapper/vg_hpdl380g601-lv_root rd_LVM_LV=vg_hpdl380g601/lv_root rd_LVM_LV=vg_hpdl380g601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS1,115200 crashkernel=129M@0M intel_iommu=on Slot 0 is usually the embedded controller. Do you have hpwdt on the system? If so, my suggestion is to disable or remove it then try again. Whenever I see those "NMI - Undetermined Source" messages it's hpwdt. Supposedly all it does is try to source the NMI. But I have my doubts. (In reply to comment #14) > (In reply to comment #0) > > Version-Release number of selected component (if applicable): > > 2.6.32-71.el6.x86_64 > > > > > > How reproducible: > > Every boot, with intel_iommu=on > > I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted > normally. > Which machine did you test? > > What is interesting, I have observed on that machine that it says while in bios > (or just after): > ----------- > Integrated Lights-Out 2 Advanced > iLO 2 v1.77 Apr 23 2009 10.16.65.40 > > Slot 0 > NMI - Undetermined Source > ---------- > and freezes. This seems to happen every time when the previously booted kernel > had the "intel_iommu=on' option - the box needs then a cold reset to boot. > Without the intel_iommu option it restarts fine. > I haven't noticed any traces this were related to the raid controller. > > cat /proc/cmdline > ro root=/dev/mapper/vg_hpdl380g601-lv_root rd_LVM_LV=vg_hpdl380g601/lv_root > rd_LVM_LV=vg_hpdl380g601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 > SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS1,115200 > crashkernel=129M@0M intel_iommu=on Can you add this to /etc/rc.d/rc.local: setpci -s 00:14.0 0x1AC.L=0x80000000 Then see if the reboot after intel_iommu=on still triggers the NMI? (In reply to comment #15,#16) I'll test it, but the machine is at the moment used by someone else. When I get the box back I'll post the results. Joseph, I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted normally. Which machine did you test? I originally hit the issue on DL380 G6. When you say retest, do you mean with Chris's patch? I have not had a chance to try his patch yet. Joe (In reply to comment #19) > I originally hit the issue on DL380 G6. > When you say retest, do you mean with Chris's patch? I have not had a chance to > try his patch yet. Sorry, that wasn't a good question, I don't know why I had the feeling the test were done on a internal machine. (In reply to comment #15) > Slot 0 is usually the embedded controller. Do you have hpwdt on the system? If > so, my suggestion is to disable or remove it then try again. Whenever I see > those "NMI - Undetermined Source" messages it's hpwdt. Supposedly all it does > is try to source the NMI. But I have my doubts. I haven't found hpwdt on that system. (In reply to comment #16) > Can you add this to /etc/rc.d/rc.local: > > setpci -s 00:14.0 0x1AC.L=0x80000000 > > Then see if the reboot after intel_iommu=on still triggers the NMI? I'm not sure if this helps, the system now reboots fine. (In reply to comment #22) > (In reply to comment #16) > > Can you add this to /etc/rc.d/rc.local: > > > > setpci -s 00:14.0 0x1AC.L=0x80000000 > > > > Then see if the reboot after intel_iommu=on still triggers the NMI? > > I'm not sure if this helps, the system now reboots fine. OK, seems odd. I tried it on that same box and it seemed to work. I set it in a reboot loop to make sure, and somehow the machine was provisioned away from me. If it's already set ('setpci -s 00:14.0 0x1AC.L' will show), then indeed, it won't make a difference. The setting is persistent across warm reset. At any rate, I believe we want this setting. Joe's original dmesg included this: Uhhuh. NMI received for unknown reason a1 on CPU 0. You have some hardware problem, likely on the PCI bus. Dazed and confused, but trying to continue DRHD: handling fault status reg 2 DMAR:[DMA Read] Request device [04:00.0] fault addr ffff0000 DMAR:[fault reason 06] PTE Read access is not set The NMI is triggered by the VT-d fault. And setting the high bit in register 0x1AC (VTUNCERRMSK) will stop forwarding those fault to the IOH error handling logic (which appears to be set up to generate an NMI on this platform). This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. ACK Looks like it's under control Tony, This bz is not under control. If you don't get the acks it won't get into 6.1. Chris, Tomas, 1. Are you saying that this system is booting okay now? 2. Are you saying we need a kernel parameter or other setting? 3. Is there a patch forthcoming? (In reply to comment #30) > Chris, Tomas, > 3. Is there a patch forthcoming? I'm convinced a patch for this warning "device driver tries to free DMA memory it has not allocated" is a part of a driver update, which I hope will made it into next release. Adding Exception while it is determined if there is going to be a patch. Laurie: Will there be a patch? thanks, Beth Beth, I am following up with Joe Mann. Laurie I am confused as to what is asked from Emulex, as far as a "patch" is concerned. Are you asking for a patch in the Emulex LPFC driver, and if yes, what exactly is this "patch" supposed to include/fix? If you are referring to Comment # 12: ... Masking off those lower bits is _supposed_ to fix: WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted) Hardware name: ProLiant DL380 G6 hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ffff5001] [size=640 bytes] ... notice that this refers to the "hpsa" driver, and not LPFC. I also believe this is the same "patch" referred to in Comment # 31. So, please be clear as to what exactly is expected by the Emulex LPFC driver. As far as I'm concerned no LPFC driver patch is expected and required for this BZ. Thanks, -Vaios- (In reply to comment #34) > notice that this refers to the "hpsa" driver, and not LPFC. > I also believe this is the same "patch" referred to in Comment # 31. Confirm that, the patch I mentioned above belong to the hpsa driver. Thomas, Do you know if the required patch has made it into the -120 kernel? I see a large patch set dated March 4 that you checked in (35 patches) that are in the 120 kernel. Is the patch for this BZ among those? (In reply to comment #36) > Thomas, > > Do you know if the required patch has made it into the -120 kernel? > > I see a large patch set dated March 4 that you checked in (35 patches) that are > in the 120 kernel. > > Is the patch for this BZ among those? It is the hpsa: fixup DMA address before freeing. @@ -2249,7 +2249,7 @@ static void cmd_special_free(struct ctlr_info *h, pci_free_consistent(h->pdev, sizeof(*c), - c, (dma_addr_t) c->busaddr); + c, (dma_addr_t) (c->busaddr & DIRECT_LOOKUP_MASK)); ... #define DIRECT_LOOKUP_SHIFT 5 #define DIRECT_LOOKUP_BIT 0x10 +#define DIRECT_LOOKUP_MASK (~((1 << DIRECT_LOOKUP_SHIFT) - 1)) This patch is in the current kernel 2.6.32-120.el6 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |