Bug 433554
| Summary: | [RHEL5 U2] Kernel-xen PCI-DMA: Out of SW-IOMMU space for 57344 bytes at device 0000:03:04.0 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> |
| Component: | kernel-xen | Assignee: | Stephen Tweedie <sct> |
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.2 | CC: | aspi.charna, berthiaume_wayne, dzickus, gozen, jbastian, jmoyer, mbarrow, mgahagan, pan_haifeng, qcai, qlogic-redhat-ext, sn, tao, xen-maint |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| URL: | http://rhts.redhat.com/testlogs/15796/55410/456250/boot.kernel-xen-2.6.18-79.el5 | ||
| Whiteboard: | |||
| Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-05-21 15:10:18 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 391501, 445799 | ||
| Attachments: | |||
|
Description
Jeff Burke
2008-02-19 23:19:43 UTC
Created attachment 295354 [details]
Boot log for kernel-xen-2.6.18-79.el5
Created attachment 295355 [details]
Boot log for kernel-xen-2.6.18-53.el5
< Notes from Chip Coldwell >
mptsas is causing this. We maybe wrong, but we don't know what other io device
might be doing large chunks of DMA.
Actually, not such a big assumption. The boot log has this:
ACPI: PCI Interrupt 0000:03:04.0[A] -> GSI 19 (level, low) -> IRQ 16
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1064 A3: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1064 A3, FwRev=000a0f00h, Ports=1, MaxQ=511, IRQ=16
Vendor: IBM-ESXS Model: MAY2036RC Rev: T106
Type: Direct-Access ANSI SCSI revision: 05
and the error messages are
PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:03:04.0
I think it's pretty clear that the device at 0000:03:04.0 is mptsas.
--------------------------------------------------------------------------------
OK, mptsas_qcmd calls mptscsih_qcmd, which in turn will call either
pci_map_sg or pci_map_single, which are #defines for dma_map_(sg|single).
That's the end of the code path that leads to that error message. I cannot see
anywhere else where mptsas is calling into the SW-IOMMU.
mptsas_qcmd is installed as the .queuecommand method in the
mptsas_driver_template (an instance of struct scsi_host_template). This gets
called scsi_dispatch_cmd, itself called by scsi_request_fn. What this boils down
to, is those requests are coming from I/Os submitted to the HBA.
< End of notes from Chip Coldwell >
I'm getting quite a few of these errors during boot with kernel 2.6.18-82.el5-xen:
PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2
ata1.00: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
ata1.00: cmd 61/00:30:d7:b7:7b/04:00:02:00:00/40 tag 6 ncq 524288 out
res 40/00:1c:d7:a7:7b/00:00:02:00:00/40 Emask 0x40 (internal error)
ata1.00: status: { DRDY }
ata1.00: configured for UDMA/133
ata1: EH complete
dmesg shows this for device 0000:00:1f.2:
libata version 3.00 loaded.
ahci 0000:00:1f.2: version 3.0
GSI 22 sharing vector 0xD0 and IRQ 22
ACPI: PCI Interrupt 0000:00:1f.2[C] -> GSI 20 (level, low) -> IRQ 22
ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 abar m1024@0xff970000 port 0xff970100 irq 22
ata2: SATA max UDMA/133 abar m1024@0xff970000 port 0xff970180 irq 22
ata3: SATA max UDMA/133 abar m1024@0xff970000 port 0xff970200 irq 22
ata4: SATA max UDMA/133 abar m1024@0xff970000 port 0xff970280 irq 22
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: SAMSUNG HD160JJ/P, ZM100-34, max UDMA7
ata1.00: 312500000 sectors, multi 8: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
Created attachment 295526 [details]
dmesg output for the machine in question
Jeff, for the later errors on PCI address 0000:00:1f.2 from comment #4, what's the controller in question? And do we know which was the latest kernel NOT to show these problems? Stephen, There are 2 Jeff's on this BZ. I see that you put in NEEDINFO for me. But can you be a little more specific. Which comment are you asking about? Thanks, JeffB Stephen, this is the chunk that causes the problem. It was added in the -70.el5
kernel by Bill. As you can see, all it does is enforce the dma restrictions,
nothing to serious. The result is magnifying issues in the scsi layer it appears.
Before, address_needs_mapping would fail because the whole 64-bit range was
masked (which is expected). Now the code is checking to make sure the sg list
is a chain of pages, which in the printks we are seeing are clearly not.
@@ -529,7 +529,9 @@ swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg,
int nelems,
for (i = 0; i < nelems; i++, sg++) {
dev_addr = SG_ENT_PHYS_ADDRESS(sg);
- if (address_needs_mapping(hwdev, dev_addr)) {
+ if (range_straddles_page_boundary(page_to_pseudophys(sg->page)
+ + sg->offset, sg->length)
+ || address_needs_mapping(hwdev, dev_addr)) {
buffer.page = sg->page;
buffer.offset = sg->offset;
map = map_single(hwdev, buffer, sg->length, dir);
My question about the PCI address specifically referred to comment #4, so I'm asking Jeff M... But the question "do we know which was the latest kernel NOT to show these problems?" is a general request applicable to all of the instances of swiotlb in this BZ, so I'm leaving it open as NEEDINFO(reporter) in general, as I can't set the request to multiple people in BZ. And to add yet _another_ person to the virtual NEEDINFO list... Don, do we have confirmation that backing out that one section eliminates the messages? (In reply to comment #6) > Jeff, for the later errors on PCI address 0000:00:1f.2 from comment #4, what's > the controller in question? > 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI Controller (rev 01) (prog-if 01 [AHCI 1.0]) Subsystem: Dell Unknown device 01de Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 22 I/O ports at fe00 [size=8] I/O ports at fe10 [size=4] I/O ports at fe20 [size=8] I/O ports at fe30 [size=4] I/O ports at fec0 [size=16] Memory at ff970000 (32-bit, non-prefetchable) [size=1K] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [70] Power Management version 2 > And do we know which was the latest kernel NOT to show these problems? No, I haven't tried earlier kernels. Would you like me to narrow it down? It will mean rebooting my workstation. (In reply to comment #10) no i haven't confirmed it yet. it seemed obvious, but then again i guess it could be a combination with another patch (even though -70.el5 is mostly xen patches). (In reply to comment #8) > Stephen, this is the chunk that causes the problem. It was added in the -70.el5 > kernel by Bill. As you can see, all it does is enforce the dma restrictions, > nothing to serious. The result is magnifying issues in the scsi layer it appears. > > Before, address_needs_mapping would fail because the whole 64-bit range was > masked (which is expected). Now the code is checking to make sure the sg list > is a chain of pages, which in the printks we are seeing are clearly not. > > @@ -529,7 +529,9 @@ swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, > int nelems, > > for (i = 0; i < nelems; i++, sg++) { > dev_addr = SG_ENT_PHYS_ADDRESS(sg); > - if (address_needs_mapping(hwdev, dev_addr)) { > + if (range_straddles_page_boundary(page_to_pseudophys(sg->page) > + + sg->offset, sg->length) > + || address_needs_mapping(hwdev, dev_addr)) { > buffer.page = sg->page; > buffer.offset = sg->offset; > map = map_single(hwdev, buffer, sg->length, dir); Don, please include a patch name in the future so I don't have to go digging. Looking at the patch (linux-2.6-xen-handle-multi-page-segments-in-dma_map_sg.patch), the chunk you mention is applied in arch/i386. I am running on an x86_64 box. JeffM,
However, there is a *lot* of crossover between these two arches, especially
in the Xen case. And, if you look in arch/x86_64/kernel/Makefile, you'll see
that the i386 version is built in the x86_64 case as well. Another reason the
i386/x86_64 upstream merge was good, but we have to live with it for RHEL-5.
Chris Lalancette
Actually that isn't true. If you look in lib/Makefile you will find that not only do x86_64 and i386 share the same swiotlb.c file but it _differs_ from bare-metal. Which explains why you don't see it there. I'm changing this back to ASSIGNED as all of the questions have been answered. Stephen, if you need to know if backing out that patch will fix things, then I'll kick off a build. Don seems convinced that the cause has been identified, though. *** Bug 436265 has been marked as a duplicate of this bug. *** *** Bug 436111 has been marked as a duplicate of this bug. *** Created attachment 298053 [details]
Don't perform unnecessarily swiotlb copies
Possible fix: when we receive page-spanning scatter-gather segments which
happen to be machine-contiguous already, don't copy them via swiotlb
unnecessarily.
Fuller log for the fix, copied straight from the patch header:
xen dma: avoid unnecessarily SWIOTLB bounce buffering.
On Xen kernels, BIOVEC_PHYS_MERGEABLE permits merging of disk IOs that
span multiple pages, provided that the pages are both pseudophysically-
AND machine-contiguous ---
(((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2))) && \
((bvec_to_pseudophys((vec1)) + (vec1)->bv_len) == \
bvec_to_pseudophys((vec2))))
However, this best-effort merging of adjacent pages can occur in
regions of dom0 memory which just happen, by virtue of having been
initially set up that way, to be machine-contiguous. Such pages
which occur outside of a range created by xen_create_contiguous_
region won't be seen as contiguous by range_straddles_page_boundary(),
so the pci-dma-xen.c code for dma_map_sg() will send these regions
to the swiotlb for bounce buffering.
In RHEL-5.1 this did not happen, because we did not have the check
for range_straddles_page_boundary() in that code. Now that that check
has been added, these SG ranges --- which ARE machine contiguous and
which can perfectly well be sent to a dma engine --- are being bounce-
buffered in the swiotlb instead, causing a performance overhead and
potentially leading to early swiotlb exhaustion.
This patch adds a new check, check_pages_physically_contiguous(),
to the swiotlb_map_sg() code to capture these ranges and map them
directly via virt_to_bus() mapping rather than through the swiotlb.
The patched kernel fixes the problem on my system; I no longer see any of the messages pertaining to SW-IOMMU exhaustion. The 2.6.18-85.el5.swiotlbfix test kernel fixes the issue seen in RHTS as well. *** Bug 438799 has been marked as a duplicate of this bug. *** Created attachment 299461 [details] Detect physically-contiguous pages when determining if memory spans a page boundary Updates the previous patch (attachment 298053 [details]). The same test is still performed, but now in the core Xen dma layer, not in the swiotlb code, so the fix still works if we run with swiotlb=off. Setting flags. *** Bug 437031 has been marked as a duplicate of this bug. *** *** Bug 440229 has been marked as a duplicate of this bug. *** in kernel-2.6.18-89.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 *** Bug 441984 has been marked as a duplicate of this bug. *** in kernel-2.6.18-90.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Sorry, disregard previous comment *** Bug 442347 has been marked as a duplicate of this bug. *** *** Bug 442094 has been marked as a duplicate of this bug. *** Adding QLogic and EMC to this bug. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html |