Description of problem: Bad pmd messages + random segfaults of applications. Patch (incl. comments about what it does) here: http://lkml.org/lkml/2005/9/20/207 Version-Release number of selected component (if applicable): RHEL4-U2 How reproducible: Random Actual results: Random application segfaults on some Opteron SMP machines Additional info: The patch is addressed in 2.6.14rc2. We may be able to address this at boot time from userland; Dave Jones has a program which can apply the workaround in userland if the MSR kernel module is loaded.
see also: http://marc.theaimsgroup.com/?l=git-commits-head&m=112835880912108&w=2 http://marc.theaimsgroup.com/?l=git-commits-head&m=112803857324433&w=2 http://marc.theaimsgroup.com/?l=git-commits-head&m=112700568414344&w=2
Created attachment 121253 [details] Shell script to auto-apply the workaround for testing This is not a long-term solution. This is based on the errata122.c which Dave sent me. It creates MSR devices, compiles the inline-attached code (which has a little more error checking, and does the equivalent of a test-and-set on the MSR data), and applies it to each CPU found. It's crude. I tested it on 2P RHEL3U6 (after 'modprobe msr') and 4P RHEL4U2. [root@bigisis ~]# ./amd-tlb-workaround.sh AMD TLB filter flush workaround (auto-apply errata 122 workaround) Checking Processor ID: 0 Applying workaround to /dev/msr0 Checking Processor ID: 1 Applying workaround to /dev/msr1 Checking Processor ID: 2 Applying workaround to /dev/msr2 Checking Processor ID: 3 Applying workaround to /dev/msr3 [root@bigisis ~]# ./amd-tlb-workaround.sh AMD TLB filter flush workaround (auto-apply errata 122 workaround) Checking Processor ID: 0 Workaround already applied to /dev/msr0 Checking Processor ID: 1 Workaround already applied to /dev/msr1 Checking Processor ID: 2 Workaround already applied to /dev/msr2 Checking Processor ID: 3 Workaround already applied to /dev/msr3
Closing. Upon reproduction, this is not the problem. Setting to WONTFIX, because the fix isn't necessary.