Red Hat Bugzilla – Bug 472523
AMD: Panic if cpu_khz is incorrect
Last modified: 2009-09-03 09:46:43 EDT
Description of problem:
After code inspection it was discovered that new(ish) AMD processors could boot with an incorrect value for cpu_khz. This in turn leads to an incorrect value for tsc_khz which then leads to significant problems on the system.
Version-Release number of selected component (if applicable): -124.el5
How reproducible: > 1% of the time
Additional info: The code in question was modified in 467782. With the new code if a perfctr cannot be reserved the code simply uses PERFCTR3 -- even if it is busy.
If it is busy, the result for cpu_khz is questionable.
In this case we should simply panic() and output a message to the user to reboot because of a HW error.
I have pushed a patch upstream http://marc.info/?l=linux-kernel&m=122651496115998&w=2
which outputs a printk warning to the user.
In the Enterprise space, however, I think we should panic.
Created attachment 324472 [details]
RHEL5 fix for this issue
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Updating PM score.
You can download this test kernel from http://people.redhat.com/dzickus/el5
Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so. However feel free
to provide a comment indicating that this fix has been verified.
I've tested it in the old kernel, I record the bogomips value of cpuinfo, then restart the machine. I tested for 314 times and all of the bogomips value are between 4400 to 4500 except one(it's 4332).
Then I tested it in the new kernel(160.el5), I tested for 334 times and no abnormal bogomips value appeared. I'll keep the machine running to try to produce an incorrect value.
I leave this bug ON_QA and do code review to the patch.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.