Bug 562006

Summary: WARNING: APIC timer calibration may be wrong
Product: Red Hat Enterprise Linux 5 Reporter: Alok Kataria <akataria>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: drjones, garrett, jsavanyo, jwilson, pbhaskar, prarit, qcai
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 665197 (view as bug list) Environment:
Last Closed: 2010-03-30 07:46:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 665197    
Attachments:
Description Flags
Patch which introduces the warning.
none
Proposed patch, which bumps the maximum difference value to 5000
none
RHEL5 fix for this issue
none
Bump apiccalibrationdiff to 10000 when running on VMware. none

Description Alok Kataria 2010-02-04 23:22:36 UTC
Created attachment 388914 [details]
Patch which introduces the warning.

Description of problem:
While booting newer RH 5.4 kernels we see a warning like the below during the kernel boot,
"WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong."

We started seeing this warning with kernel version 2.6.18-164.6.1.el5.

After investigation we found that this warning was added as part of the following commit
commit 79792ece139c499d9a9133138851401f0c4faa64
Author: Prarit Bhargava <prarit>
Date:   Wed Jul 1 08:42:54 2009 -0400

This commit was part of resolution for BZ 503957.

Looking at the patch, we think that the MAX_DIFFERENCE value of 1000 cycles, which this patch introduces, is too aggressive for virtualized systems. APIC and TSC reads do take longer than 1000 cycles when done from inside the VM,  due to the hypervisor exits that need to be taken.

IMO we should bump the maximum error from 1000 to 5000, this still limits the max error that can creep into APIC calibration, due to SMI's or some such event, to 100ppm on a 1GHz processor, as well as it enables the algorithm to work well for virtual machines. 


Version-Release number of selected component (if applicable):


How reproducible:
Every time...

Steps to Reproduce:
1. Boot these newer kernels inside a VM, use VMware workstation 7.0 and notice the APIC calibration warning in dmesg.

Comment 1 Alok Kataria 2010-02-04 23:29:47 UTC
Created attachment 388921 [details]
Proposed patch, which bumps the maximum difference value to 5000

Comment 2 Prarit Bhargava 2010-02-08 15:19:27 UTC
5000 seems really long.  I'll check with the virt team to see if there is a better way of handling this.

P.

Comment 3 Prarit Bhargava 2010-02-08 16:18:42 UTC
Created attachment 389560 [details]
RHEL5 fix for this issue

Bump default value to 5000 cycles and add boot option.

Comment 5 RHEL Program Management 2010-02-08 16:33:01 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 errata-xmlrpc 2010-03-30 07:46:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 13 Alok Kataria 2010-12-23 00:04:09 UTC
Created attachment 470348 [details]
Bump apiccalibrationdiff to 10000 when running on VMware.

Some of our customers have complained about still seeing this warning. 
After some analysis, we have noticed that on some AMD opteron processors, HV exits caused by RDTSC and APIC_READS can surpass the 5000 cycles limit. 

Now, one can argue that those who are affected can use the apiccalibrationdiff boot parameter, though as an counter-argument to that people expect things to work okay (read without any warnings or such in dmesg) that too right out of the box. 

Given the above I think we should just bump this value to somewhere around 10K so that this doesn't bother anyone. The attached patch does exactly that, though we do this just when running on VMware, so that it doesn't affect behavior in other cases.

Please consider this trivial patch for the next update. Thanks.