107336 – i386 page_to_phys() bug for enterprise kernels

Bug 107336 - i386 page_to_phys() bug for enterprise kernels

Summary: i386 page_to_phys() bug for enterprise kernels

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	2.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jason Baron
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-10-17 01:50 UTC by Roland Dreier
Modified:	2013-03-06 05:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-05-11 16:48:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Fix for the problem (431 bytes, patch) 2004-05-11 16:53 UTC, Martin Wilck	no flags	Details \| Diff
View All

Description Roland Dreier 2003-10-17 01:50:03 UTC

The Red Hat 2.1 enterprise and summit kernels (2.4.9-e.XXenterprise,
2.4.9-e.XXsummit) seem to have a broken implementation of page_to_phys().  They
define the macro with the code

    #ifdef CONFIG_HIGHMEM64G
    #define page_to_phys(page)     ((u64)(page - mem_map) << PAGE_SHIFT)
    #else
    #define page_to_phys(page)	((page - mem_map) << PAGE_SHIFT)
    #endif

but in their autoconf.h they have

    #undef  CONFIG_HIGHMEM64G
    #define CONFIG_HIGHMEM64G_HIGHPTE 1

Since CONFIG_HIGHMEM64G is NOT set, they get the wrong definition of
page_to_phys() and truncate the resulting addresses to 32 bits, so things get
screwed up on machines with more than 4G of RAM.

The use of page_to_phys() in the RH 2.1 kernel seems to be limited, but this bug
does affect our out-of-tree driver.

This applies to all enterprise and summit kernels up to (at least) 2.4.9-e.27.

Comment 1 Rik van Riel 2003-10-17 02:15:32 UTC

Looks like an easy to fix bug that should just be fixed.  Jason?

Comment 2 Arjan van de Ven 2003-10-17 07:48:30 UTC

Scsi drivers and co shouldn't really get 64 bit addresses actually, more stuff
is broken than just this...

Comment 4 Jason Baron 2004-02-03 21:53:11 UTC

unless this is causing a specific problem, i vote to close this.

Comment 5 Roland Dreier 2004-02-03 22:06:46 UTC

This affects our (out-of-tree) InfiniBand drivers.  Right now we just have

#if defined(__KERNEL__)
#  if defined(__i386__) && (LINUX_VERSION_CODE ==
KERNEL_VERSION(2,4,9)) && defined(CONFIG_HIGHMEM64G_HIGHPTE)
/* Work around RH AS 2.1 configuration bug */
#    undef page_to_phys
#    define page_to_phys(page) ((u64)(page - mem_map) << PAGE_SHIFT)
#  endif
#endif

which is ugly but works.  However I'm not sure why you want to leave
an obvious, simple-to-fix bug in your kernel (lurking to bite other
people in the future, since it causes someone to silently get the
wrong physical address for a page, leading to all sorts of fun
depending on how that address is used).

Up to you I guess.

Comment 6 Martin Wilck 2004-05-11 16:45:16 UTC

page_to_phys() is referenced by page_to_bus(), which is in turn used
by pci_map_sg(), which is used all over the place in driver code.

Severe problems will arise whenever a driver capable of using bus
addresses >32bit does DMA. AFAICS, several such drivers come with
RHAS2.1 (think megaraid, for example),

I think that this is a serious bug that may lead to machine crashes
and/or data corruption, and is unacceptable in an enterprise Linux
distribution. 

Please fix it as soon as possible, or show me why my argument is wrong.

If, as comment #3 suggests, "more stuff [of similar severity] is
broken than just this", then I would really like to know what that
broken stuff is and what our enterprise customers should do to avoid 
being hit by it.

Comment 7 Martin Wilck 2004-05-11 16:46:17 UTC

Added myself to cc list.

Comment 8 Arjan van de Ven 2004-05-11 16:48:15 UTC

"Severe problems will arise whenever a driver capable of using bus
addresses >32bit does DMA. "

that's the flawed reasoning, since in AS2.1 no driver will. 
(in your example: even though megaraid tells the kernel it can do >
4Gb the kernel will NEVER give it an address, and will pretend
megaraid told it that it's limit was 4Gb).

Comment 9 Martin Wilck 2004-05-11 16:53:25 UTC

Created attachment 100153 [details]
Fix for the problem

I do not understand why this simple fix isn't just applied.

Comment 10 Martin Wilck 2004-05-11 17:07:54 UTC

Arjan,

thanks for your reply.

Can you point me to the code where the kernel makes sure no address
>4GB is ever used in an SG list?

I can't find it.

Comment 11 Arjan van de Ven 2004-05-11 17:12:50 UTC

+            bounce_limit = (unsigned long)SHpnt->pci_dev->dma_mask;

that line in drivers/scsi/scsi_merge.c

Comment 12 Martin Wilck 2004-05-11 17:34:26 UTC

Uff, the "(unsigned long)". Thanks. Man, that line deserves a comment.

Comment 13 Arjan van de Ven 2004-05-11 17:35:51 UTC

I'll be the first to admit that it's really really subtle and I only
know it's there because I put it there ;(

Comment 14 Martin Wilck 2004-05-11 17:42:32 UTC

It helps only for SCSI drivers though. What about network drivers? Or
other PCI devices, or even 3rd party modules? I still think
page_to_phys() should be fixed, or at least a big fat comment should
be placed there that it's only valid below 4GB.

Comment 15 Arjan van de Ven 2004-05-11 17:45:53 UTC

network driver are unaffected by this; others are expected to follow
the example. I can agree with the idea of putting a comment there,
sure. I wonder if that's worth it at this stage in the product
lifecycle though.

Note You need to log in before you can comment on or make changes to this bug.