Bug 252029 - [EMC/QLogic 5.2 bug] qla2xxx driver running IO on DM-MPIO devices cause "kernel: PCI-DMA: Out of SW-IOMMU space "
[EMC/QLogic 5.2 bug] qla2xxx driver running IO on DM-MPIO devices cause "kern...
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Xen Maintainance List
Martin Jenner
: OtherQA
Depends On: 219216
Blocks: 246139 216992 217106 296411
  Show dependency treegraph
Reported: 2007-08-13 16:14 EDT by Rik van Riel
Modified: 2009-06-19 20:07 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-01-29 11:19:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 1 Andrius Benokraitis 2007-08-14 11:33:48 EDT
This bug continues work done from bug 219216 which implements a short-term
printk limit workaround in RHEL 5.1.

This bug is for a longer-term fix slated for RHEL 5.2.
Comment 6 Tom Coughlan 2007-11-16 16:46:44 EST
A recap:

Bug #219216 Comment #62 From Andrew Vasquez 

"...a SCSI LLD (low-level driver) is simply a transparent
consumer of SG entries prepared and mapped by the upper-layers.  qla2xxx
doesn't manipulate sizes nor counts of SG entries.  Again, I'm not
entirely clear a LLD can 'do' something about this, if a request's SG
list can't be mapped by the upper-layers, the I/O is simply flagged for

Bug #219216 Comment #63 From Rik van Riel 

"The qla2xxx driver seems to intentionally fill up the swiotlb (with requests
that don't fit in a page, so they need to be bounce buffered under Xen)..."

And, when this was discussed upstream:



Subject	Re: [PATCH] quiet down swiotlb warnings

On Sat, Jun 02, 2007 at 06:21:46PM +0300, Muli Ben-Yehuda wrote:
> On Fri, Jun 01, 2007 at 10:26:01PM +0200, Andi Kleen wrote:
> > Normally swiotlb doesn't even try to bounce when dma mask is <=
> > end_pfn so something must be very wrong in your kernel. It
> > definitely isn't a mainline kernel. If this happens in Xen then Xen
> > just needs fixing -- it should not try to bounce when the normal
> > kernel wouldn't.
> Xen needs to bounce when the requested buffer is not contiguous in
> machine memory (and indeed uses swiotlb for that).

Then it should just restrict the sg list merging at the block layer
to never merge into anything larger than a page. Then this cannot
happen or only very rarely.

------end quote-----

So, if I understand correctly, the fix needs to be in teh Xen kernel. 

Comment 7 Andrius Benokraitis 2007-11-19 16:06:10 EST
After confirming with QLogic, they have no patch since they believe it is not
with the QLogic driver.
Comment 9 Stephen Tweedie 2007-12-03 17:10:13 EST
   PCI-DMA: Out of SW-IOMMU space for 4608 bytes at device 0000:02:0b.0
   PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:08:03.1

Those are all sector-aligned, large IOs.  Looks like we're passing SG lists
through the swiotlb.

Now, Xen's swiotlb_map_sg simply does not use the swiotlb, unless the test

		if (swiotlb_force || address_needs_mapping(hwdev, dev_addr)) {

So: is "swiotlb=force" being used on the kernel options line in the case which
is breaking?

If not, then we need to work out why the SG list mapping is entering swiotlb.
"address_needs_mapping" should not be returning true for a
64-bit-addressing-capable adapter, so we'll probably need to run the reproducer
on an instrumented kernel to go any further.

Comment 10 Andrew Vasquez 2007-12-12 17:07:06 EST
We've already cycled through this in some of the earlier bugzilla comments.

Comment #32 (https://bugzilla.redhat.com/show_bug.cgi?id=219216#c32) has details
on what dma_get_required_mask() is returning while run with this kernel (it's
a 32-bit mask).

Comment #44 (https://bugzilla.redhat.com/show_bug.cgi?id=219216#c44) is Rik's 
note on how dma_get_required_mask() is implemented incorrectly in a Xen kernel.

I believe EMC was able to easily reproduce this in their labs.
Comment 11 Stephen Tweedie 2007-12-19 12:02:57 EST
Comment #44 was:

> A related problem: dma_get_required_mask() is wrong if the Xen kernel is 
> booted on a large system with a dom0 smaller than the maximum machine size.

But there's no information I can see indicating that that is the case in this
particular instance.

And we are _still_ missing any kernel boot logs indicating early boot
configuration: the only logs posted have been after significant uptime, and
contained little except for storage error messages.

Complete boot logs really are going to be helpful here, along with the kernel
and hypervisor options being used.  If "dmesg" no longer has them, you may need
to reboot to obtain them.
Comment 12 Marcus Barrow 2007-12-22 10:38:59 EST
Added pan_haifeng to the cc: list. Not sure he can read this BZ...

Comment 13 Andrius Benokraitis 2008-01-28 14:10:31 EST
Wayne/Pan - have you been able to reproduce this issue? I think your group is
the only one that can nail this down. If not, I say we close this...
Comment 14 Pan Haifeng 2008-01-29 11:04:11 EST
The issue can not reproduce now, agree to close and will reopen when we hit 

Note You need to log in before you can comment on or make changes to this bug.