Bug 186316 - nvidia cache aliasing problem: change_page_attr drops GLOBAL bit from executable kernel pages
nvidia cache aliasing problem: change_page_attr drops GLOBAL bit from executa...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Baron
Brian Brock
:
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2006-03-22 16:14 EST by Terence Ripperda
Modified: 2013-03-06 00:59 EST (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 18:54:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Terence Ripperda 2006-03-22 16:14:33 EST
Description of problem:

OK, in 2.6.9-22.0.2.EL's __change_page_attr(), the code that calls
split_large_page() looks like this:

 
                        ref_prot = 
                        ((address & LARGE_PAGE_MASK) < (unsigned long)&_etext) 
                                ? PAGE_KERNEL_EXEC : PAGE_KERNEL; 
                        split = split_large_page(address, prot, ref_prot); 
                        if (!split) 
                                return -ENOMEM; 
                        set_pmd_pte(kpte,address,mk_pte(split, ref_prot)); 


ref_prot is assigned PAGE_KERNEL_EXEC if the incoming address is below that of
_etext, which marks the end of the kernel's text section. The problem appears to
be that in setup_identity_mappings() (../arch/i386/mm/init.c), _PAGE_GLOBAL is
added to __PAGE_KERNEL if supported, but not to __PAGE_KERNEL_EXEC, so
PAGE_KERNEL_EXEC doesn't pick it up.

I guess the fix is to change the code in setup_identity_mappings() to read:

 
                                if (cpu_has_pge) { 
                                        set_in_cr4(X86_CR4_PGE); 
#if !defined(CONFIG_X86_SWITCH_PAGETABLES) 
                                        __pe += _PAGE_GLOBAL; 
                                        __PAGE_KERNEL |= _PAGE_GLOBAL; 
                                        __PAGE_KERNEL_EXEC |= _PAGE_GLOBAL; 
#endif 

Version-Release number of selected component (if applicable):


How reproducible:
very reproducible on specific hardware.

Steps to Reproduce:
1. load recent nvidia driver (1.0-8178) on 2.6.9-22.0.2 kernel
2. start X
3. kill gdm in a loop to cause continuous restarts of X

hopefully the description of the problem above is enough, I can help further in
setting up a reproduction case if needed.
  
Actual results:


Expected results:


Additional info:
Comment 1 Terence Ripperda 2006-03-22 16:20:18 EST
sorry, after commiting, I realized that my "How reproducible" comment is a bit
vague. the customer that reported this to us was using an Intel P4 Alderwood
based system, with an nvidia Quadro FX 1400 (I goofed and reported it initially
as x86_64, updated to i686). Looking at the details closer, I actually doubt
that the specific hardware is needed. 
Comment 2 Ernie Petrides 2006-03-22 21:03:40 EST
This is a RHEL4 bug.  RHEL3 already contains the line
"__PAGE_KERNEL_EXEC |= _PAGE_GLOBAL;" in the i386 version
of setup_identity_mappings().
Comment 3 Jason Baron 2006-03-23 09:22:32 EST
agreed this looks like a bug. thanks for the fix...i've posted kernels with this
patch at: http://people.redhat.com/~jbaron/bz186316/. I'd like to get back test
results confirming the fix, if possible. thanks.
Comment 4 Gunther Mayer 2006-03-23 10:09:36 EST
Can you please upload kernel-devel-2.6.9.xx, as -devel is needed for
installation via ./NVIDIA-Linux-x86-1.0-8751-pkg1.run -s ?


Comment 6 Jason Baron 2006-03-23 11:18:30 EST
good point. added -devel pkgs at the above spot.
Comment 7 Terence Ripperda 2006-03-23 13:20:54 EST
sorry for the version goof. the bug system defaulted to Q3, I wasn't clear what
that was, but thought perhaps that was RHEL4 update3.

I'll download the updated kernels this afternoon and test things on my side.
Comment 8 Gunther Mayer 2006-03-27 06:15:03 EST
2.6.9-34.7.EL.nvidia.1 fixes the problem for me.
The problem did not reoccur since this kernel was installed.
Comment 9 Jason Baron 2006-03-27 17:17:30 EST
cool. Terence, do you have any more testing/comments from the NVIDIA side on
this one. If everything is positve, i think this patch is ready to go into the
beta kernel. thanks.
Comment 10 Terence Ripperda 2006-03-28 13:56:40 EST
I'm working on trying to verify this here, but I think if Gunther's not seeing
the problem anymore, that should be fine.
Comment 11 Jason Baron 2006-03-30 18:01:23 EST
committed in stream u3 build 34.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 12 Terence Ripperda 2006-03-30 18:13:39 EST
thanks Jason!
Comment 13 Jason Baron 2006-03-30 20:41:46 EST
oops. i meant u4 34.10
Comment 18 Red Hat Bugzilla 2006-08-10 18:54:34 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.