Bug 137201 (IT_51544)
| Summary: | RHEL3U2/U3 x86-64 - /proc/mtrr reported incorrectly | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Chris Williams <cww> | ||||
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.0 | CC: | george.liu, jparadis, peterm, petrides, Rainer.Koenig, riel, tao | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2005-05-18 13:28:20 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 132991 | ||||||
| Attachments: |
|
||||||
Event posted 12-14-2004 01:41pm by Daryl with duration of 0.00 Can I assume (due to inactivity here) that this is *not* in Update 4. And therefore is on the list for Update 5. The fear is that this defect is resulting in bizarre or unexplained performance problems that are very difficult to nail down. --------------------------- Event posted 12-17-2004 09:41am by cww with duration of 0.10 Daryl, This is on the must fix list for U5. Chris Created attachment 108917 [details]
Patch to fix mtrr ranges on IA32E architecture for kernel 2.4.21-20.EL
I believe this bug to be IA32E-specific. Intel's implementation of
the x86_64 architecture differs from AMDs in that the IA32E mtrrs are
36-bit instead of 40-bit. See the developer's docs for more details.
Given that:
MTRR_BEG_BIT is defined to be 12
MTRR_END_BIT is defined to be 7
the following code from mtrr.c, which is supposed to find the mtrr
range, does not work properly for IA32E because the top 4 bits in
mask_hi, which are meaningful for AMD's mtrrs, are not for Intel's.
count = 0;
tmp = mask_lo >> MTRR_BEG_BIT;
for (i=MTRR_BEG_BIT; i <= 31; i++, tmp = tmp >> 1)
count = (count << (~tmp & 1)) | (~tmp & 1);
tmp = mask_hi;
for (i=0; i <= MTRR_END_BIT; i++, tmp = tmp >> 1)
count = (count << (~tmp & 1)) | (~tmp & 1);
*size = (count+1);
---------------------
So mask_hi for AMD might look like:
0000 0000 0000 0000 0000 0000 1111 1100
and for Intel:
0000 0000 0000 0000 0000 0000 0000 1100
This is BAD because the loop happily counts the 4 zeros in positions
4-7 on the IA32E architecture, which results in an unfortunate 4-bit
left shift of the range value. I.e. - a multiplication of the range
value by 16.
If I redefine MTRR_END_BIT to be 3 instead of 7, the problem goes away
on the EM64T system I have access to. (Supermicro X6DA8-G) This is
not, however, an ideal solution, as these bits should not be ignored
in the AMD case. Instead of adding something like:
#ifdef IA32E
#DEFINE MTRR_END_BIT 3
#endif
to the code, I recommend the attached patch, based on suggestions by
Phil Pokorny (ppokorny), which should be good for
both the AMD and Intel case, and immune to the > 4GB problem
associated with older rev BIOSs on the Tyan S2885.
Justin Thiessen
----------------
jthiessen
Reassigning to lwoodman as he integrated the fix. *** Bug 130113 has been marked as a duplicate of this bug. *** A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.16.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922 Description of problem: /proc/mtrr is being multiplied by 16 in RHEL3 U3 x86_64 x86: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0xfeda0000 (4077MB), size= 128KB: uncachable, count=1 x86_64: reg00: base=0x00000000 ( 0MB), size=32768MB: write-back, count=1 reg01: base=0xfeda0000 (4077MB), size= 2MB: uncachable, count=1 More Details... The simple statement of the problem is: on x86-64, the reported mtrr ranges are 16x the actual sizes. This makes drivers that depend on being able to allocate mtrr segments for transaction space fail, or default to a non-accelerated mode. This affects BOTH the open-source 'nv' driver as well as the accelerated 'nvidia' driver. When using the 'nv' 2D driver, its obvious... bring up the machine and grep for "WW" in /var/log/XFree86.0.log and you will see that it "Failed to allocate..." Same with the accelerated driver although the error/warning message isn't so obvious. This is graphics card independent... happens on all cards, rather affects the drivers ability to allocate mtrr segments. In nVIDIA's case, the driver gets confused, see's bogus overlaped segments and bails. Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL How reproducible: Always Steps to Reproduce: Install RHEL3 U3 x86_64. Observe /proc/mtrr Actual Results: mtrr values displayed 16x what they should be. Expected Results: On x86_64 /proc/mtrr values erported correctly Additional info: