Bug 107336 - i386 page_to_phys() bug for enterprise kernels
Summary: i386 page_to_phys() bug for enterprise kernels
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Jason Baron
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2003-10-17 01:50 UTC by Roland Dreier
Modified: 2013-03-06 05:56 UTC (History)
3 users (show)

Clone Of:
Last Closed: 2004-05-11 16:48:15 UTC

Attachments (Terms of Use)
Fix for the problem (431 bytes, patch)
2004-05-11 16:53 UTC, Martin Wilck
no flags Details | Diff

Description Roland Dreier 2003-10-17 01:50:03 UTC
The Red Hat 2.1 enterprise and summit kernels (2.4.9-e.XXenterprise,
2.4.9-e.XXsummit) seem to have a broken implementation of page_to_phys().  They
define the macro with the code

    #ifdef CONFIG_HIGHMEM64G
    #define page_to_phys(page)     ((u64)(page - mem_map) << PAGE_SHIFT)
    #define page_to_phys(page)	((page - mem_map) << PAGE_SHIFT)

but in their autoconf.h they have

    #undef  CONFIG_HIGHMEM64G

Since CONFIG_HIGHMEM64G is NOT set, they get the wrong definition of
page_to_phys() and truncate the resulting addresses to 32 bits, so things get
screwed up on machines with more than 4G of RAM.

The use of page_to_phys() in the RH 2.1 kernel seems to be limited, but this bug
does affect our out-of-tree driver.

This applies to all enterprise and summit kernels up to (at least) 2.4.9-e.27.

Comment 1 Rik van Riel 2003-10-17 02:15:32 UTC
Looks like an easy to fix bug that should just be fixed.  Jason?

Comment 2 Arjan van de Ven 2003-10-17 07:48:30 UTC
Scsi drivers and co shouldn't really get 64 bit addresses actually, more stuff
is broken than just this...

Comment 4 Jason Baron 2004-02-03 21:53:11 UTC
unless this is causing a specific problem, i vote to close this.

Comment 5 Roland Dreier 2004-02-03 22:06:46 UTC
This affects our (out-of-tree) InfiniBand drivers.  Right now we just have

#if defined(__KERNEL__)
#  if defined(__i386__) && (LINUX_VERSION_CODE ==
/* Work around RH AS 2.1 configuration bug */
#    undef page_to_phys
#    define page_to_phys(page) ((u64)(page - mem_map) << PAGE_SHIFT)
#  endif

which is ugly but works.  However I'm not sure why you want to leave
an obvious, simple-to-fix bug in your kernel (lurking to bite other
people in the future, since it causes someone to silently get the
wrong physical address for a page, leading to all sorts of fun
depending on how that address is used).

Up to you I guess.

Comment 6 Martin Wilck 2004-05-11 16:45:16 UTC
page_to_phys() is referenced by page_to_bus(), which is in turn used
by pci_map_sg(), which is used all over the place in driver code.

Severe problems will arise whenever a driver capable of using bus
addresses >32bit does DMA. AFAICS, several such drivers come with
RHAS2.1 (think megaraid, for example),

I think that this is a serious bug that may lead to machine crashes
and/or data corruption, and is unacceptable in an enterprise Linux

Please fix it as soon as possible, or show me why my argument is wrong.

If, as comment #3 suggests, "more stuff [of similar severity] is
broken than just this", then I would really like to know what that
broken stuff is and what our enterprise customers should do to avoid 
being hit by it.

Comment 7 Martin Wilck 2004-05-11 16:46:17 UTC
Added myself to cc list.

Comment 8 Arjan van de Ven 2004-05-11 16:48:15 UTC
"Severe problems will arise whenever a driver capable of using bus
addresses >32bit does DMA. "

that's the flawed reasoning, since in AS2.1 no driver will. 
(in your example: even though megaraid tells the kernel it can do >
4Gb the kernel will NEVER give it an address, and will pretend
megaraid told it that it's limit was 4Gb).

Comment 9 Martin Wilck 2004-05-11 16:53:25 UTC
Created attachment 100153 [details]
Fix for the problem

I do not understand why this simple fix isn't just applied.

Comment 10 Martin Wilck 2004-05-11 17:07:54 UTC

thanks for your reply.

Can you point me to the code where the kernel makes sure no address
>4GB is ever used in an SG list?

I can't find it.

Comment 11 Arjan van de Ven 2004-05-11 17:12:50 UTC
+            bounce_limit = (unsigned long)SHpnt->pci_dev->dma_mask;

that line in drivers/scsi/scsi_merge.c 

Comment 12 Martin Wilck 2004-05-11 17:34:26 UTC
Uff, the "(unsigned long)". Thanks. Man, that line deserves a comment.

Comment 13 Arjan van de Ven 2004-05-11 17:35:51 UTC
I'll be the first to admit that it's really really subtle and I only
know it's there because I put it there ;(

Comment 14 Martin Wilck 2004-05-11 17:42:32 UTC
It helps only for SCSI drivers though. What about network drivers? Or
other PCI devices, or even 3rd party modules? I still think
page_to_phys() should be fixed, or at least a big fat comment should
be placed there that it's only valid below 4GB.

Comment 15 Arjan van de Ven 2004-05-11 17:45:53 UTC
network driver are unaffected by this; others are expected to follow
the example. I can agree with the idea of putting a comment there,
sure. I wonder if that's worth it at this stage in the product
lifecycle though.

Note You need to log in before you can comment on or make changes to this bug.