Bug 462410

Summary: [BUG]Loosing time with divider=10 option.
Product: Red Hat Enterprise Linux 5 Reporter: Alok Kataria <akataria>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: medium    
Version: 5.2CC: clalance, dhecht, garrett, peterm, riel, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-17 14:13:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192    
Attachments:
Description Flags
Fine tune the ACT_HZ value when divider= is enabled. none

Description Alok Kataria 2008-09-16 00:31:56 UTC
Description of problem:

When running the 2.6.18-92.1.10 kernel as guest under VMware, we have seen that over a run of 5 hours the Guest OS looses about 3sec of time when divider= option is passed. 

After doing some analysis, we think the problem is due to some error in calculations that the kernel does. For a case where divider=10 option is passed and a clock_tick_rate of 1193182.

The LATCH value, based on REAL_HZ is calculated to 11932, this is used to program the PIT period. OTOH, the LOGICAL_LATCH value, based on HZ, is calculated to 1193. This LOGICAL_LATCH value is used to calculate the TICK_NSEC value. TICK_NSEC is used to adjust time of day based on jiffies advancement. So in a real tick, the value that TOD is updated by corresponds to 11930 PIT ticks. 
As a result we loose 2 PIT ticks every interrupt. This results in a error of 166PPM which keeps accumulating.

In our opinion the calculations should be improved here by using LATCH to calculate the ACTHZ value 

Current definition of ACTHZ is 
#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LOGICAL_LATCH, 8))

We think, it should be 
#define ACTHZ ((SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) * tick_divider)
for higher precision.

This problem should be seen on any system where divider= option is used.

Please have a look.

Thanks,
Alok

Comment 1 Chris Lalancette 2008-09-18 10:44:59 UTC
I did some testing here.  I can definitely see the clock drift when booting the RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized Xen guest.  However, I don't think your analysis is quite right.  First, we can't actually change ACTHZ as you suggest because that would make it not a constant anymore, and TICK_NSEC is used to make pre-processor decisions.  That's minor, I hacked around that temporarily by modifying the "tick_nsec" value in kernel/timer.c.  However, doing that caused the time drift to become much worse, not better.  With a straight 92 kernel, I was falling behind about .005 seconds per second.  With this patch in place, I'm falling behind about .01 seconds per second, so the problem has doubled.  I'll have to do some more investigation here to see what exactly is going on.

Chris Lalancette

Comment 2 Alok Kataria 2008-09-18 17:15:27 UTC
Created attachment 317099 [details]
Fine tune the ACT_HZ value when divider= is enabled.

Comment 3 Alok Kataria 2008-09-18 17:16:11 UTC
(In reply to comment #1)
> I did some testing here.  I can definitely see the clock drift when booting the
> RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized
> Xen guest.  However, I don't think your analysis is quite right.  First, we
> can't actually change ACTHZ as you suggest because that would make it not a
> constant anymore, and TICK_NSEC is used to make pre-processor decisions. 
> That's minor, I hacked around that temporarily by modifying the "tick_nsec"
> value in kernel/timer.c.

IMO, you should also change the clocksource_jiffies.mult value to reflect this 
in kernel/time/jiffies.c. Have a look at this experimental patch. Another worry could be that we are not changing all the places where TICK_NSEC needs to reflect the more fine tuned value. 



>  However, doing that caused the time drift to become
> much worse, not better.  With a straight 92 kernel, I was falling behind about
> .005 seconds per second.  With this patch in place, I'm falling behind about
> .01 seconds per second, so the problem has doubled.  I'll have to do some more
> investigation here to see what exactly is going on.

Also it would be great if you could post your changes. 

Alok
> 
> Chris Lalancette

Comment 4 Rik van Riel 2009-01-12 15:51:38 UTC
Reassigning to kernel, since this is bug is not Xen related.

Comment 5 Chris Lalancette 2009-01-12 16:07:05 UTC
Ah, true, but I'll assign it back to myself.  I do intend to do some work with the tick divider in the near future.

Chris Lalancette

Comment 6 RHEL Program Management 2009-02-16 15:37:12 UTC
Updating PM score.

Comment 9 Alok Kataria 2010-08-19 18:20:12 UTC
PR 463573 should fix this too so am duping it.

Comment 11 RHEL Program Management 2010-12-07 09:55:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 12 RHEL Program Management 2011-06-20 21:13:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 13 Prarit Bhargava 2011-10-17 14:13:36 UTC

*** This bug has been marked as a duplicate of bug 463573 ***