RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 649766 - DMAR Errors on HP RAID controller with intel_iommu set to on, system hangs
Summary: DMAR Errors on HP RAID controller with intel_iommu set to on, system hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 6.1
Assignee: Tony Camuso
QA Contact: Barry Donahue
URL:
Whiteboard:
Depends On:
Blocks: 564512 580566
TreeView+ depends on / blocked
 
Reported: 2010-11-04 14:13 UTC by Joseph Mann
Modified: 2011-06-09 19:06 UTC (History)
18 users (show)

Fixed In Version: kernel 2.6.32-120.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-09 19:06:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Failure log of boot with IOMMU enabled (60.56 KB, text/plain)
2010-11-05 15:19 UTC, Joseph Mann
no flags Details
Mask off low order bits when unmapping (732 bytes, patch)
2010-11-09 16:10 UTC, Chris Wright
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Joseph Mann 2010-11-04 14:13:18 UTC
Description of problem:
When enabling intel_iommu on an HP DL380 G6, RHEL6 fails to complete its boot, the following errors are seen on the console:
DRHD: handling fault status reg 2
BUG: recent printk recursion!
<3>DMAR:[DMA Read] Request device [04:00.0] fault addr ffff4000 
DMAR:[fault reason 06] PTE Read access is not set 

Device 04:00.0 is the following device:

04:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01)
	Subsystem: Hewlett-Packard Company Smart Array P410i
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 28
	Region 0: Memory at fb400000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at fb3f0000 (64-bit, non-prefetchable) [size=4K]
	Region 4: I/O ports at 4000 [size=256]
	[virtual] Expansion ROM at e4000000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [ac] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=001c2000
		PBA: BAR=0 offset=001c4000
	Capabilities: [100] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UESvrt:	DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: hpsa
	Kernel modules: hpsa






Version-Release number of selected component (if applicable):
2.6.32-71.el6.x86_64


How reproducible:
Every boot, with intel_iommu=on

Comment 2 Chris Wright 2010-11-04 16:45:41 UTC
(In reply to comment #0)
> Description of problem:
> When enabling intel_iommu on an HP DL380 G6, RHEL6 fails to complete its boot,
> the following errors are seen on the console:
> DRHD: handling fault status reg 2
> BUG: recent printk recursion!
> <3>DMAR:[DMA Read] Request device [04:00.0] fault addr ffff4000 
> DMAR:[fault reason 06] PTE Read access is not set 

This is typically a bug in the driver.  The driver's incorrect use of the DMA API can cause this.  In the above example, it would be not calling something like pci_map_single(PCI_DMA_TODEVICE) before instructing the device to initiate a DMA read transaction from memory.

If you could simply install and boot the kernel-debug package.  It has DMA API debugging enabled which can keep track of this and generate a useful backtrace.

Also, in the meantime, if your intention is to test KVM PCI device assignment, you can boot the box (using the standard kernel) with "intel_iommu=on iommu=pt" to put the IOMMU in PassThrough mode.  PT mode means the host devices are not isolated by the IOMMU, only guest devices.  This should allow you to boot and test KVM PCI device assignment.

thanks,
-chris

Comment 3 Joseph Mann 2010-11-05 15:19:43 UTC
Created attachment 458123 [details]
Failure log of boot with IOMMU enabled

Chris,

Attached is the log from booting with the RHEL6 debug kernel.
It looks like that by the end of the trace, udev is some sort of failing loop, i continue to see kernel dumps, but I didn't capture any more since they appear to be in an infinite (or at least very long) loop.

Comment 4 Chris Wright 2010-11-08 18:45:15 UTC
Thanks Joseph.  Really looks like a driver issue.

hpsa 0000:04:00.0: MSIX
hpsa 0000:04:00.0: hpsa0: <0x323a> at IRQ 63 using DAC
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted)
Hardware name: ProLiant DL380 G6
hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ffff5001] [size=640 bytes]

Comment 5 Chris Wright 2010-11-08 20:43:50 UTC
Just to add some more details to the debugging trail.  I tested on local hw with current upstream kernel (2.6.37-rc1+, commit 5398a64).  I was able to reproduce the DMA debugging warning above.  However, I don't get DMAR faults when I enable intel_iommu in either RHEL 6 or upstream.  So this will need some further investigation.  Can you post the full dmesg from a failing boot?  Boot the debug kernel and add to the commandline "debug intel_iommu=on"

Again, for now (assuming the Storage Array is not what you are trying to assign to a KVM guest), you can boot with intel_iommu=on iommu=pt and this should get things going.  Be good to confirm that works for you.

Comment 6 Chris Wright 2010-11-08 20:48:40 UTC
(In reply to comment #5)
> Just to add some more details to the debugging trail.  I tested on local hw
> with current upstream kernel (2.6.37-rc1+, commit 5398a64).  I was able to
> reproduce the DMA debugging warning above.  However, I don't get DMAR faults
> when I enable intel_iommu in either RHEL 6 or upstream.  So this will need some
> further investigation.  Can you post the full dmesg from a failing boot?  Boot
> the debug kernel and add to the commandline "debug intel_iommu=on"

Sorry, I somehow missed the fact that the dmesg in Comment #3 includes intel_iommu=on and the failure.

Comment 7 Joseph Mann 2010-11-08 20:56:34 UTC
(In reply to comment #5)
> Just to add some more details to the debugging trail.  I tested on local hw
> with current upstream kernel (2.6.37-rc1+, commit 5398a64).  I was able to
> reproduce the DMA debugging warning above.  However, I don't get DMAR faults
> when I enable intel_iommu in either RHEL 6 or upstream.  So this will need some
> further investigation.  Can you post the full dmesg from a failing boot?  Boot
> the debug kernel and add to the commandline "debug intel_iommu=on"
> 
> Again, for now (assuming the Storage Array is not what you are trying to assign
> to a KVM guest), you can boot with intel_iommu=on iommu=pt and this should get
> things going.  Be good to confirm that works for you.

Chris,

Setting 'iommu=pt' allows me to bypass this issue for the purpose of my testing.

Joe

Comment 8 Chris Wright 2010-11-08 21:47:18 UTC
(In reply to comment #7)
> Setting 'iommu=pt' allows me to bypass this issue for the purpose of my
> testing.

Great, thanks for letting me know.

Comment 9 Chris Wright 2010-11-09 16:10:47 UTC
Created attachment 459169 [details]
Mask off low order bits when unmapping

BTW, I dug into the warning only to notice it's purely cosmetic.  The issue is simply that during pci_free_consistent() the low order bits are included in the dma_addr.  These bits have been modified by the driver to encode extra information.  Not a real issue since vt-d shifts down to page frame number.  Here's an example of the patch.  Will be sure this makes it upstream.

Still need to get more information from the hang that you are seeing Joe.
Can you send lspci -vvv -xxxx of full pci tree?  (I'm specifically interested in 00:14.2, but whole tree can be useful).

Comment 10 Tomas Henzl 2010-11-09 16:47:19 UTC
(In reply to comment #9)
> information.  Not a real issue since vt-d shifts down to page frame number. 
> Here's an example of the patch.  Will be sure this makes it upstream.

The patch below is what is being prepared for 6.1, and I think it already was posted upstream.
Please make sure you use the latest firmware, I think to remember to seen some problems with firmware related to hangs ...


diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index fc9ea5a..f2dccb6 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -169,6 +169,7 @@ static int __devinit hpsa_find_cfg_addrs(struct pci_dev *pdev,
 static int __devinit hpsa_pci_find_memory_BAR(struct pci_dev *pdev,
 	unsigned long *memory_bar);
 static int __devinit hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id);
+static inline u32 hpsa_tag_discard_error_bits(u32 tag);
 
 static DEVICE_ATTR(raid_level, S_IRUGO, raid_level_show, NULL);
 static DEVICE_ATTR(lunid, S_IRUGO, lunid_show, NULL);
@@ -2259,8 +2260,8 @@ static void cmd_special_free(struct ctlr_info *h, struct CommandList *c)
 	temp64.val32.upper = c->ErrDesc.Addr.upper;
 	pci_free_consistent(h->pdev, sizeof(*c->err_info),
 			    c->err_info, (dma_addr_t) temp64.val);
-	pci_free_consistent(h->pdev, sizeof(*c),
-			    c, (dma_addr_t) c->busaddr);
+	pci_free_consistent(h->pdev, sizeof(*c), c,
+		(dma_addr_t) hpsa_tag_discard_error_bits((u32) c->busaddr));
 }
 
 #ifdef CONFIG_COMPAT

Comment 11 Tony Camuso 2010-12-15 15:50:20 UTC
In reply to comment #10:

Thomas, does this patch completely address the DMAR issue? Is this patch required as well as latest RAID fw?

In reply to Description:

Joe, 

Do you have the latest fw for the RAID as well as the latest system BIOS?

Comment 12 Mike Miller (OS Dev) 2010-12-15 16:59:08 UTC
I think these are 2 separate issues. Masking off those lower bits is _supposed_ to fix:

WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted)
Hardware name: ProLiant DL380 G6
hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ffff5001] [size=640 bytes]

I still see the message from time to time even after patching the driver. If memory serves I saw DMAR errors when messing around with AER. But I'll have to dig thru any notes I may have.

Comment 13 Mike Miller (OS Dev) 2010-12-15 19:42:05 UTC
Come to think of it I saw the DMAR messages when running a XEN kernel. Does this kernel have XEN enabled.

Comment 14 Tomas Henzl 2010-12-16 17:12:01 UTC
(In reply to comment #0) 
> Version-Release number of selected component (if applicable):
> 2.6.32-71.el6.x86_64
> 
> 
> How reproducible:
> Every boot, with intel_iommu=on

I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted normally.
Which machine did you test?

What is interesting, I have observed on that machine that it says while in bios (or just after): 
-----------
Integrated Lights-Out 2 Advanced                          
iLO 2 v1.77 Apr 23 2009 10.16.65.40

Slot 0 
NMI - Undetermined Source
----------
and freezes. This seems to happen every time when the previously booted kernel had the "intel_iommu=on' option - the box needs then a cold reset to boot.
Without the intel_iommu option it restarts fine.
I haven't noticed any traces this were related to the raid controller.

cat /proc/cmdline 
ro root=/dev/mapper/vg_hpdl380g601-lv_root rd_LVM_LV=vg_hpdl380g601/lv_root rd_LVM_LV=vg_hpdl380g601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS1,115200 crashkernel=129M@0M intel_iommu=on

Comment 15 Mike Miller (OS Dev) 2010-12-16 17:22:29 UTC
Slot 0 is usually the embedded controller. Do you have hpwdt on the system? If so, my suggestion is to disable or remove it then try again. Whenever I see those "NMI - Undetermined Source" messages it's hpwdt. Supposedly all it does is try to source the NMI. But I have my doubts.

Comment 16 Chris Wright 2010-12-17 01:50:57 UTC
(In reply to comment #14)
> (In reply to comment #0) 
> > Version-Release number of selected component (if applicable):
> > 2.6.32-71.el6.x86_64
> > 
> > 
> > How reproducible:
> > Every boot, with intel_iommu=on
> 
> I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted
> normally.
> Which machine did you test?
> 
> What is interesting, I have observed on that machine that it says while in bios
> (or just after): 
> -----------
> Integrated Lights-Out 2 Advanced                          
> iLO 2 v1.77 Apr 23 2009 10.16.65.40
> 
> Slot 0 
> NMI - Undetermined Source
> ----------
> and freezes. This seems to happen every time when the previously booted kernel
> had the "intel_iommu=on' option - the box needs then a cold reset to boot.
> Without the intel_iommu option it restarts fine.
> I haven't noticed any traces this were related to the raid controller.
> 
> cat /proc/cmdline 
> ro root=/dev/mapper/vg_hpdl380g601-lv_root rd_LVM_LV=vg_hpdl380g601/lv_root
> rd_LVM_LV=vg_hpdl380g601/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
> SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS1,115200
> crashkernel=129M@0M intel_iommu=on

Can you add this to /etc/rc.d/rc.local:

setpci -s 00:14.0 0x1AC.L=0x80000000

Then see if the reboot after intel_iommu=on still triggers the NMI?

Comment 17 Tomas Henzl 2010-12-17 12:25:05 UTC
(In reply to comment #15,#16)
I'll test it, but the machine is at the moment used by someone else. 
When I get the box back I'll post the results.

Comment 18 Tomas Henzl 2010-12-17 12:28:16 UTC
Joseph,
I retested this on hp-dl380g6-01, kernel 2.6.32-71.el6.x86_64, P410i, booted
normally.
Which machine did you test?

Comment 19 Joseph Mann 2010-12-17 14:35:49 UTC
I originally hit the issue on DL380 G6.
When you say retest, do you mean with Chris's patch? I have not had a chance to try his patch yet.

Joe

Comment 20 Tomas Henzl 2010-12-17 14:49:37 UTC
(In reply to comment #19)
> I originally hit the issue on DL380 G6.
> When you say retest, do you mean with Chris's patch? I have not had a chance to
> try his patch yet.

Sorry, that wasn't a good question, I don't know why I had the feeling the test were done on a internal machine.

Comment 21 Tomas Henzl 2010-12-21 15:29:14 UTC
(In reply to comment #15)
> Slot 0 is usually the embedded controller. Do you have hpwdt on the system? If
> so, my suggestion is to disable or remove it then try again. Whenever I see
> those "NMI - Undetermined Source" messages it's hpwdt. Supposedly all it does
> is try to source the NMI. But I have my doubts.
I haven't found hpwdt on that system.

Comment 22 Tomas Henzl 2010-12-21 15:36:02 UTC
(In reply to comment #16)
> Can you add this to /etc/rc.d/rc.local:
> 
> setpci -s 00:14.0 0x1AC.L=0x80000000
> 
> Then see if the reboot after intel_iommu=on still triggers the NMI?

I'm not sure if this helps, the system now reboots fine.

Comment 23 Chris Wright 2010-12-22 16:06:57 UTC
(In reply to comment #22)
> (In reply to comment #16)
> > Can you add this to /etc/rc.d/rc.local:
> > 
> > setpci -s 00:14.0 0x1AC.L=0x80000000
> > 
> > Then see if the reboot after intel_iommu=on still triggers the NMI?
> 
> I'm not sure if this helps, the system now reboots fine.

OK, seems odd.  I tried it on that same box and it seemed to work.  I set it in a reboot loop to make sure, and somehow the machine was provisioned away from me.

If it's already set ('setpci -s 00:14.0 0x1AC.L' will show), then indeed, it won't make a difference.  The setting is persistent across warm reset.  At any rate, I believe we want this setting.  Joe's original dmesg included this:

Uhhuh. NMI received for unknown reason a1 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [04:00.0] fault addr ffff0000 
DMAR:[fault reason 06] PTE Read access is not set

The NMI is triggered by the VT-d fault.  And setting the high bit in register 0x1AC (VTUNCERRMSK) will stop forwarding those fault to the IOH error handling logic (which appears to be set up to generate an NMI on this platform).

Comment 24 RHEL Program Management 2011-01-07 04:14:01 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 25 Suzanne Logcher 2011-01-07 16:09:02 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 26 RHEL Program Management 2011-02-01 05:45:44 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 27 RHEL Program Management 2011-02-01 18:50:48 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 28 Tony Camuso 2011-02-14 16:34:32 UTC
ACK

Looks like it's under control

Comment 29 Beth Zeranski 2011-02-14 22:12:08 UTC
Tony,
This bz is not under control. If you don't get the acks it won't get into 6.1.

Comment 30 Tony Camuso 2011-02-15 00:26:46 UTC
Chris, Tomas,

1. Are you saying that this system is booting okay now?

2. Are you saying we need a kernel parameter or other setting?

3. Is there a patch forthcoming?

Comment 31 Tomas Henzl 2011-02-15 13:48:04 UTC
(In reply to comment #30)
> Chris, Tomas,

> 3. Is there a patch forthcoming?

I'm convinced a patch for this warning "device driver tries to free DMA memory it has not allocated" is a part of a driver update, which I hope will made it into next release.

Comment 32 Beth Zeranski 2011-02-21 17:50:37 UTC
Adding Exception while it is determined if there is going to be a patch.

Laurie: Will there be a patch?

thanks,
 Beth

Comment 33 laurie barry 2011-02-22 14:09:25 UTC
Beth,

I am following up with Joe Mann.

Laurie

Comment 34 Vaios Papadimitriou 2011-02-22 16:20:12 UTC
I am confused as to what is asked from Emulex, as far as a "patch" is concerned.

Are you asking for a patch in the Emulex LPFC driver, and if yes, what exactly is this "patch" supposed to include/fix?

If you are referring to Comment # 12:
...
Masking off those lower bits is _supposed_
to fix:
WARNING: at lib/dma-debug.c:802 check_unmap+0x6c7/0x700() (Not tainted)
Hardware name: ProLiant DL380 G6
hpsa 0000:04:00.0: DMA-API: device driver tries to free DMA memory it has not
allocated [device address=0x00000000ffff5001] [size=640 bytes]
...

notice that this refers to the "hpsa" driver, and not LPFC.
I also believe this is the same "patch" referred to in Comment # 31.

So, please be clear as to what exactly is expected by the Emulex LPFC driver.

As far as I'm concerned no LPFC driver patch is expected and required for this BZ.

Thanks,
-Vaios-

Comment 35 Tomas Henzl 2011-02-23 14:45:35 UTC
(In reply to comment #34)
> notice that this refers to the "hpsa" driver, and not LPFC.
> I also believe this is the same "patch" referred to in Comment # 31.

Confirm that, the patch I mentioned above belong to the hpsa driver.

Comment 36 Tony Camuso 2011-03-09 13:25:27 UTC
Thomas, 

Do you know if the required patch has made it into the -120 kernel?

I see a large patch set dated March 4 that you checked in (35 patches) that are in the 120 kernel. 

Is the patch for this BZ among those?

Comment 37 Tomas Henzl 2011-03-09 14:17:03 UTC
(In reply to comment #36)
> Thomas, 
> 
> Do you know if the required patch has made it into the -120 kernel?
> 
> I see a large patch set dated March 4 that you checked in (35 patches) that are
> in the 120 kernel. 
> 
> Is the patch for this BZ among those?

It is the hpsa: fixup DMA address before freeing.

@@ -2249,7 +2249,7 @@ static void cmd_special_free(struct ctlr_info *h,
        pci_free_consistent(h->pdev, sizeof(*c),
-                           c, (dma_addr_t) c->busaddr);
+                           c, (dma_addr_t) (c->busaddr & DIRECT_LOOKUP_MASK));
...
 #define DIRECT_LOOKUP_SHIFT 5
 #define DIRECT_LOOKUP_BIT 0x10
+#define DIRECT_LOOKUP_MASK (~((1 << DIRECT_LOOKUP_SHIFT) - 1))

Comment 38 Tony Camuso 2011-03-09 16:59:52 UTC
This patch is in the current kernel 2.6.32-120.el6

Comment 40 errata-xmlrpc 2011-06-09 19:06:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.