Bug 173506

Summary: Incl. workaround from 2.6.14rc2 for buggy TLB flush problem on SMP Opteron
Product: Red Hat Enterprise Linux 4 Reporter: Lon Hohberger <lhh>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: davej, jbaron, kanderso, peterm, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-01-18 20:56:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 170416    
Attachments:
Description Flags
Shell script to auto-apply the workaround for testing none

Description Lon Hohberger 2005-11-17 18:15:48 UTC
Description of problem:

Bad pmd messages + random segfaults of applications.  Patch (incl. comments
about what it does) here:

http://lkml.org/lkml/2005/9/20/207

Version-Release number of selected component (if applicable): RHEL4-U2

How reproducible: Random

Actual results: Random application segfaults on some Opteron SMP machines

Additional info: The patch is addressed in 2.6.14rc2.  We may be able to address
this at boot time from userland; Dave Jones has a program which can apply the
workaround in userland if the MSR kernel module is loaded.

Comment 3 Lon Hohberger 2005-11-18 20:09:40 UTC
Created attachment 121253 [details]
Shell script to auto-apply the workaround for testing

This is not a long-term solution.  This is based on the errata122.c which Dave
sent me.  It creates MSR devices, compiles the inline-attached code (which has
a little more error checking, and does the equivalent of a test-and-set on the
MSR data), and applies it to each CPU found.  It's crude.  I tested it on 2P
RHEL3U6 (after 'modprobe msr') and 4P RHEL4U2.

[root@bigisis ~]# ./amd-tlb-workaround.sh
AMD TLB filter flush workaround (auto-apply errata 122 workaround)
Checking Processor ID: 0
Applying workaround to /dev/msr0
Checking Processor ID: 1
Applying workaround to /dev/msr1
Checking Processor ID: 2
Applying workaround to /dev/msr2
Checking Processor ID: 3
Applying workaround to /dev/msr3
[root@bigisis ~]# ./amd-tlb-workaround.sh
AMD TLB filter flush workaround (auto-apply errata 122 workaround)
Checking Processor ID: 0
Workaround already applied to /dev/msr0
Checking Processor ID: 1
Workaround already applied to /dev/msr1
Checking Processor ID: 2
Workaround already applied to /dev/msr2
Checking Processor ID: 3
Workaround already applied to /dev/msr3

Comment 9 Lon Hohberger 2006-01-18 20:56:17 UTC
Closing.  Upon reproduction, this is not the problem.

Setting to WONTFIX, because the fix isn't necessary.