Red Hat Bugzilla – Bug 137201
RHEL3U2/U3 x86-64 - /proc/mtrr reported incorrectly
Last modified: 2007-11-30 17:07:04 EST
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922 Description of problem: /proc/mtrr is being multiplied by 16 in RHEL3 U3 x86_64 x86: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0xfeda0000 (4077MB), size= 128KB: uncachable, count=1 x86_64: reg00: base=0x00000000 ( 0MB), size=32768MB: write-back, count=1 reg01: base=0xfeda0000 (4077MB), size= 2MB: uncachable, count=1 More Details... The simple statement of the problem is: on x86-64, the reported mtrr ranges are 16x the actual sizes. This makes drivers that depend on being able to allocate mtrr segments for transaction space fail, or default to a non-accelerated mode. This affects BOTH the open-source 'nv' driver as well as the accelerated 'nvidia' driver. When using the 'nv' 2D driver, its obvious... bring up the machine and grep for "WW" in /var/log/XFree86.0.log and you will see that it "Failed to allocate..." Same with the accelerated driver although the error/warning message isn't so obvious. This is graphics card independent... happens on all cards, rather affects the drivers ability to allocate mtrr segments. In nVIDIA's case, the driver gets confused, see's bogus overlaped segments and bails. Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Always Steps to Reproduce: Install RHEL3 U3 x86_64. Observe /proc/mtrr Actual Results: mtrr values displayed 16x what they should be. Expected Results: On x86_64 /proc/mtrr values erported correctly Additional info:
Event posted 12-14-2004 01:41pm by Daryl with duration of 0.00 Can I assume (due to inactivity here) that this is *not* in Update 4. And therefore is on the list for Update 5. The fear is that this defect is resulting in bizarre or unexplained performance problems that are very difficult to nail down. --------------------------- Event posted 12-17-2004 09:41am by cww with duration of 0.10 Daryl, This is on the must fix list for U5. Chris
Created attachment 108917 [details] Patch to fix mtrr ranges on IA32E architecture for kernel 2.4.21-20.EL
I believe this bug to be IA32E-specific. Intel's implementation of the x86_64 architecture differs from AMDs in that the IA32E mtrrs are 36-bit instead of 40-bit. See the developer's docs for more details. Given that: MTRR_BEG_BIT is defined to be 12 MTRR_END_BIT is defined to be 7 the following code from mtrr.c, which is supposed to find the mtrr range, does not work properly for IA32E because the top 4 bits in mask_hi, which are meaningful for AMD's mtrrs, are not for Intel's. count = 0; tmp = mask_lo >> MTRR_BEG_BIT; for (i=MTRR_BEG_BIT; i <= 31; i++, tmp = tmp >> 1) count = (count << (~tmp & 1)) | (~tmp & 1); tmp = mask_hi; for (i=0; i <= MTRR_END_BIT; i++, tmp = tmp >> 1) count = (count << (~tmp & 1)) | (~tmp & 1); *size = (count+1); --------------------- So mask_hi for AMD might look like: 0000 0000 0000 0000 0000 0000 1111 1100 and for Intel: 0000 0000 0000 0000 0000 0000 0000 1100 This is BAD because the loop happily counts the 4 zeros in positions 4-7 on the IA32E architecture, which results in an unfortunate 4-bit left shift of the range value. I.e. - a multiplication of the range value by 16. If I redefine MTRR_END_BIT to be 3 instead of 7, the problem goes away on the EM64T system I have access to. (Supermicro X6DA8-G) This is not, however, an ideal solution, as these bits should not be ignored in the AMD case. Instead of adding something like: #ifdef IA32E #DEFINE MTRR_END_BIT 3 #endif to the code, I recommend the attached patch, based on suggestions by Phil Pokorny (ppokorny@penguincomputing.com), which should be good for both the AMD and Intel case, and immune to the > 4GB problem associated with older rev BIOSs on the Tyan S2885. Justin Thiessen ---------------- jthiessen@penguincomputing.com
Reassigning to lwoodman as he integrated the fix.
*** Bug 130113 has been marked as a duplicate of this bug. ***
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.16.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html