From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322) Description of problem: Implement a better solution to the dma memory allocation done in the kernel when you have a 32 bit device on a 64bit extended architecture OS. Basically pci_alloc_consistent will have to change. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Put a 32bit device (SCSI or RAID device for example) on RHEL 3 x86_64 2. Add stress and wait for a kernel panic 3. Additional info: We would like to see this fixed for U5.
A proposed fix from Matt Domsch: "linux-2.6.9-5.EL/arch/x86_64/kernel/pci-nommu.c: /* * Dummy IO MMU functions */ void *dma_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, unsigned gfp) { void *ret; u64 mask; int order = get_order(size); if (hwdev) mask = hwdev->coherent_dma_mask & *hwdev->dma_mask; else mask = 0xffffffff; for (;;) { ret = (void *)__get_free_pages(gfp, order); if (ret == NULL) return NULL; *dma_handle = virt_to_bus(ret); if ((*dma_handle & ~mask) == 0) break; free_pages((unsigned long)ret, order); if (gfp & GFP_DMA) return NULL; gfp |= GFP_DMA; } memset(ret, 0, size); return ret; } So RHEL4 doesn't have quite the same restriction. When it allocates a page, if that page isn't DMA-able by the device, then it frees it up and tries again, using the GFP_DMA flag this time. Because generally there are *some* pages available in ZONE_NORMAL that are below the 4GB address limit, this works quite well in practice. The same idea could/should be backported to RHEL3, it's certainly not been done yet for the RHEL3 U4 or earlier kernels."
PM ACK for U6
Development ACK. We are waiting for Emulex to verify this works OK for them. Larry
Created attachment 117134 [details] This is the x86_64-swiotlb.patch that is being referred to here.
Larry Troan, regarding comment #25, this is a RHEL3 bug. Why is building the patch into a RHEL4 kernel relevant?
From User-Agent: XML-RPC Per Matt.... Finger check.... kenel-2.4.21-32.EL.smp RHEL3.... This event sent from IssueTracker by ltroan issue 73055
U6 is closed (and in beta already).
This Bug apparently is not a DUP of Bug 146954 (Engineering's call) but is tied to to it. It is believed that there is a common patch which will resolve both the problems described here and those described in bug 146954. Both bugs are now public.
Joshua, per Matt's email, can we close this bug since Dell no longer requires a solution to this problem?
Clarifying: Dell's solution is for customer's experiencing this problem to migrate to RHEL4.
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-40.7.EL).
Closing per customer feedback. *A patch was included that adds the "maxdma" option which will workaround this problem.
Reverting to ON_QA after reopening.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html