Bug 186316

Summary: nvidia cache aliasing problem: change_page_attr drops GLOBAL bit from executable kernel pages
Product: Red Hat Enterprise Linux 4 Reporter: Terence Ripperda <ripperda>
Component: kernelAssignee: Jason Baron <jbaron>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: gunther.mayer, jbaron, knoel, netllama, ripperda
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:54:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    

Description Terence Ripperda 2006-03-22 21:14:33 UTC
Description of problem:

OK, in 2.6.9-22.0.2.EL's __change_page_attr(), the code that calls
split_large_page() looks like this:

 
                        ref_prot = 
                        ((address & LARGE_PAGE_MASK) < (unsigned long)&_etext) 
                                ? PAGE_KERNEL_EXEC : PAGE_KERNEL; 
                        split = split_large_page(address, prot, ref_prot); 
                        if (!split) 
                                return -ENOMEM; 
                        set_pmd_pte(kpte,address,mk_pte(split, ref_prot)); 


ref_prot is assigned PAGE_KERNEL_EXEC if the incoming address is below that of
_etext, which marks the end of the kernel's text section. The problem appears to
be that in setup_identity_mappings() (../arch/i386/mm/init.c), _PAGE_GLOBAL is
added to __PAGE_KERNEL if supported, but not to __PAGE_KERNEL_EXEC, so
PAGE_KERNEL_EXEC doesn't pick it up.

I guess the fix is to change the code in setup_identity_mappings() to read:

 
                                if (cpu_has_pge) { 
                                        set_in_cr4(X86_CR4_PGE); 
#if !defined(CONFIG_X86_SWITCH_PAGETABLES) 
                                        __pe += _PAGE_GLOBAL; 
                                        __PAGE_KERNEL |= _PAGE_GLOBAL; 
                                        __PAGE_KERNEL_EXEC |= _PAGE_GLOBAL; 
#endif 

Version-Release number of selected component (if applicable):


How reproducible:
very reproducible on specific hardware.

Steps to Reproduce:
1. load recent nvidia driver (1.0-8178) on 2.6.9-22.0.2 kernel
2. start X
3. kill gdm in a loop to cause continuous restarts of X

hopefully the description of the problem above is enough, I can help further in
setting up a reproduction case if needed.
  
Actual results:


Expected results:


Additional info:

Comment 1 Terence Ripperda 2006-03-22 21:20:18 UTC
sorry, after commiting, I realized that my "How reproducible" comment is a bit
vague. the customer that reported this to us was using an Intel P4 Alderwood
based system, with an nvidia Quadro FX 1400 (I goofed and reported it initially
as x86_64, updated to i686). Looking at the details closer, I actually doubt
that the specific hardware is needed. 

Comment 2 Ernie Petrides 2006-03-23 02:03:40 UTC
This is a RHEL4 bug.  RHEL3 already contains the line
"__PAGE_KERNEL_EXEC |= _PAGE_GLOBAL;" in the i386 version
of setup_identity_mappings().

Comment 3 Jason Baron 2006-03-23 14:22:32 UTC
agreed this looks like a bug. thanks for the fix...i've posted kernels with this
patch at: http://people.redhat.com/~jbaron/bz186316/. I'd like to get back test
results confirming the fix, if possible. thanks.

Comment 4 Gunther Mayer 2006-03-23 15:09:36 UTC
Can you please upload kernel-devel-2.6.9.xx, as -devel is needed for
installation via ./NVIDIA-Linux-x86-1.0-8751-pkg1.run -s ?




Comment 6 Jason Baron 2006-03-23 16:18:30 UTC
good point. added -devel pkgs at the above spot.

Comment 7 Terence Ripperda 2006-03-23 18:20:54 UTC
sorry for the version goof. the bug system defaulted to Q3, I wasn't clear what
that was, but thought perhaps that was RHEL4 update3.

I'll download the updated kernels this afternoon and test things on my side.

Comment 8 Gunther Mayer 2006-03-27 11:15:03 UTC
2.6.9-34.7.EL.nvidia.1 fixes the problem for me.
The problem did not reoccur since this kernel was installed.

Comment 9 Jason Baron 2006-03-27 22:17:30 UTC
cool. Terence, do you have any more testing/comments from the NVIDIA side on
this one. If everything is positve, i think this patch is ready to go into the
beta kernel. thanks.

Comment 10 Terence Ripperda 2006-03-28 18:56:40 UTC
I'm working on trying to verify this here, but I think if Gunther's not seeing
the problem anymore, that should be fine.

Comment 11 Jason Baron 2006-03-30 23:01:23 UTC
committed in stream u3 build 34.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 12 Terence Ripperda 2006-03-30 23:13:39 UTC
thanks Jason!

Comment 13 Jason Baron 2006-03-31 01:41:46 UTC
oops. i meant u4 34.10

Comment 18 Red Hat Bugzilla 2006-08-10 22:54:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html