Description of problem: Under heavy load, pci_alloc_consistent()/dma_alloc_coherent() does not return an address aligned to the size argument passed in. The documentation is clear on this: "pci_alloc_consistent returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the card. The cpu return address and the DMA bus master address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary." Version-Release number of selected component (if applicable): 97.el5 How reproducible: 100% Steps to Reproduce: 1. Reserve all of DMA'able memory below 4G. 2. Use a 32-bit DMA mask and attempt to reserve memory. The dma memory allocation code will attempt to get iommu pages. alloc_iommu() is broken -- it does not return aligned addresses. Actual results: Unaligned dma addresses are returned from pci_alloc_consistent()/dma_alloc_coherent. Expected results: The addresses should be aligned to the size argument passed in. Additional info: A little tricky to reproduce. One module loaded that exhausts DMA memory, and then another to request more DMA memory with a 32-bit mask. Alternatively, you can short circuit the code to do a map_single request (which calls the iommu code). The latter option became my preferred choice. The solution was to backport some code from upstream (see attached patch). Additionally a bug was found in that code and a patch was sent upstream to fix.
This work was based on the report from 298811. This allocation method is also broken in RHEL4.
Created attachment 312088 [details] RHEL5 fix for this issue Initial rough patch.
Created attachment 312089 [details] Upstream bug fix related to this issue Sent to LKML & Jesse Barnes.
Created attachment 312135 [details] Additional Upstream bug fix related to this issue
Patch in comment #4 sent to LKML and Jesse Barnes.
Created attachment 312136 [details] RHEL5 fix for this issue (with DMA short-circuit code and debug)
Created attachment 312164 [details] RHEL5 fix for this issue
Links to upstream submits for this issue: http://marc.info/?l=linux-kernel&m=121664984730778&w=2 http://marc.info/?l=linux-kernel&m=121664984830791&w=2 P.
Created attachment 312463 [details] Upstream patch that fixes this issue Submitted to LKML.
Patch upstream here: http://marc.info/?l=linux-kernel&m=121681201313560&w=2 P.
This request was evaluated by Red Hat Product Management for inclusion, but this component is not scheduled to be updated in the current Red Hat Enterprise Linux release. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?".
Created attachment 313699 [details] RHEL5 fix for this issue Posted patch.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-110.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html