The Red Hat 2.1 enterprise and summit kernels (2.4.9-e.XXenterprise, 2.4.9-e.XXsummit) seem to have a broken implementation of page_to_phys(). They define the macro with the code #ifdef CONFIG_HIGHMEM64G #define page_to_phys(page) ((u64)(page - mem_map) << PAGE_SHIFT) #else #define page_to_phys(page) ((page - mem_map) << PAGE_SHIFT) #endif but in their autoconf.h they have #undef CONFIG_HIGHMEM64G #define CONFIG_HIGHMEM64G_HIGHPTE 1 Since CONFIG_HIGHMEM64G is NOT set, they get the wrong definition of page_to_phys() and truncate the resulting addresses to 32 bits, so things get screwed up on machines with more than 4G of RAM. The use of page_to_phys() in the RH 2.1 kernel seems to be limited, but this bug does affect our out-of-tree driver. This applies to all enterprise and summit kernels up to (at least) 2.4.9-e.27.
Looks like an easy to fix bug that should just be fixed. Jason?
Scsi drivers and co shouldn't really get 64 bit addresses actually, more stuff is broken than just this...
unless this is causing a specific problem, i vote to close this.
This affects our (out-of-tree) InfiniBand drivers. Right now we just have #if defined(__KERNEL__) # if defined(__i386__) && (LINUX_VERSION_CODE == KERNEL_VERSION(2,4,9)) && defined(CONFIG_HIGHMEM64G_HIGHPTE) /* Work around RH AS 2.1 configuration bug */ # undef page_to_phys # define page_to_phys(page) ((u64)(page - mem_map) << PAGE_SHIFT) # endif #endif which is ugly but works. However I'm not sure why you want to leave an obvious, simple-to-fix bug in your kernel (lurking to bite other people in the future, since it causes someone to silently get the wrong physical address for a page, leading to all sorts of fun depending on how that address is used). Up to you I guess.
page_to_phys() is referenced by page_to_bus(), which is in turn used by pci_map_sg(), which is used all over the place in driver code. Severe problems will arise whenever a driver capable of using bus addresses >32bit does DMA. AFAICS, several such drivers come with RHAS2.1 (think megaraid, for example), I think that this is a serious bug that may lead to machine crashes and/or data corruption, and is unacceptable in an enterprise Linux distribution. Please fix it as soon as possible, or show me why my argument is wrong. If, as comment #3 suggests, "more stuff [of similar severity] is broken than just this", then I would really like to know what that broken stuff is and what our enterprise customers should do to avoid being hit by it.
Added myself to cc list.
"Severe problems will arise whenever a driver capable of using bus addresses >32bit does DMA. " that's the flawed reasoning, since in AS2.1 no driver will. (in your example: even though megaraid tells the kernel it can do > 4Gb the kernel will NEVER give it an address, and will pretend megaraid told it that it's limit was 4Gb).
Created attachment 100153 [details] Fix for the problem I do not understand why this simple fix isn't just applied.
Arjan, thanks for your reply. Can you point me to the code where the kernel makes sure no address >4GB is ever used in an SG list? I can't find it.
+ bounce_limit = (unsigned long)SHpnt->pci_dev->dma_mask; that line in drivers/scsi/scsi_merge.c
Uff, the "(unsigned long)". Thanks. Man, that line deserves a comment.
I'll be the first to admit that it's really really subtle and I only know it's there because I put it there ;(
It helps only for SCSI drivers though. What about network drivers? Or other PCI devices, or even 3rd party modules? I still think page_to_phys() should be fixed, or at least a big fat comment should be placed there that it's only valid below 4GB.
network driver are unaffected by this; others are expected to follow the example. I can agree with the idea of putting a comment there, sure. I wonder if that's worth it at this stage in the product lifecycle though.