Bug 462410 - [BUG]Loosing time with divider=10 option.
[BUG]Loosing time with divider=10 option.
Status: CLOSED DUPLICATE of bug 463573
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity high
: rc
: ---
Assigned To: Prarit Bhargava
Red Hat Kernel QE team
:
Depends On:
Blocks: 533192
  Show dependency treegraph
 
Reported: 2008-09-15 20:31 EDT by Alok Kataria
Modified: 2011-10-17 10:13 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-10-17 10:13:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fine tune the ACT_HZ value when divider= is enabled. (3.52 KB, patch)
2008-09-18 13:15 EDT, Alok Kataria
no flags Details | Diff

  None (edit)
Description Alok Kataria 2008-09-15 20:31:56 EDT
Description of problem:

When running the 2.6.18-92.1.10 kernel as guest under VMware, we have seen that over a run of 5 hours the Guest OS looses about 3sec of time when divider= option is passed. 

After doing some analysis, we think the problem is due to some error in calculations that the kernel does. For a case where divider=10 option is passed and a clock_tick_rate of 1193182.

The LATCH value, based on REAL_HZ is calculated to 11932, this is used to program the PIT period. OTOH, the LOGICAL_LATCH value, based on HZ, is calculated to 1193. This LOGICAL_LATCH value is used to calculate the TICK_NSEC value. TICK_NSEC is used to adjust time of day based on jiffies advancement. So in a real tick, the value that TOD is updated by corresponds to 11930 PIT ticks. 
As a result we loose 2 PIT ticks every interrupt. This results in a error of 166PPM which keeps accumulating.

In our opinion the calculations should be improved here by using LATCH to calculate the ACTHZ value 

Current definition of ACTHZ is 
#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LOGICAL_LATCH, 8))

We think, it should be 
#define ACTHZ ((SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) * tick_divider)
for higher precision.

This problem should be seen on any system where divider= option is used.

Please have a look.

Thanks,
Alok
Comment 1 Chris Lalancette 2008-09-18 06:44:59 EDT
I did some testing here.  I can definitely see the clock drift when booting the RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized Xen guest.  However, I don't think your analysis is quite right.  First, we can't actually change ACTHZ as you suggest because that would make it not a constant anymore, and TICK_NSEC is used to make pre-processor decisions.  That's minor, I hacked around that temporarily by modifying the "tick_nsec" value in kernel/timer.c.  However, doing that caused the time drift to become much worse, not better.  With a straight 92 kernel, I was falling behind about .005 seconds per second.  With this patch in place, I'm falling behind about .01 seconds per second, so the problem has doubled.  I'll have to do some more investigation here to see what exactly is going on.

Chris Lalancette
Comment 2 Alok Kataria 2008-09-18 13:15:27 EDT
Created attachment 317099 [details]
Fine tune the ACT_HZ value when divider= is enabled.
Comment 3 Alok Kataria 2008-09-18 13:16:11 EDT
(In reply to comment #1)
> I did some testing here.  I can definitely see the clock drift when booting the
> RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized
> Xen guest.  However, I don't think your analysis is quite right.  First, we
> can't actually change ACTHZ as you suggest because that would make it not a
> constant anymore, and TICK_NSEC is used to make pre-processor decisions. 
> That's minor, I hacked around that temporarily by modifying the "tick_nsec"
> value in kernel/timer.c.

IMO, you should also change the clocksource_jiffies.mult value to reflect this 
in kernel/time/jiffies.c. Have a look at this experimental patch. Another worry could be that we are not changing all the places where TICK_NSEC needs to reflect the more fine tuned value. 



>  However, doing that caused the time drift to become
> much worse, not better.  With a straight 92 kernel, I was falling behind about
> .005 seconds per second.  With this patch in place, I'm falling behind about
> .01 seconds per second, so the problem has doubled.  I'll have to do some more
> investigation here to see what exactly is going on.

Also it would be great if you could post your changes. 

Alok
> 
> Chris Lalancette
Comment 4 Rik van Riel 2009-01-12 10:51:38 EST
Reassigning to kernel, since this is bug is not Xen related.
Comment 5 Chris Lalancette 2009-01-12 11:07:05 EST
Ah, true, but I'll assign it back to myself.  I do intend to do some work with the tick divider in the near future.

Chris Lalancette
Comment 6 RHEL Product and Program Management 2009-02-16 10:37:12 EST
Updating PM score.
Comment 9 Alok Kataria 2010-08-19 14:20:12 EDT
PR 463573 should fix this too so am duping it.
Comment 11 RHEL Product and Program Management 2010-12-07 04:55:14 EST
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.
Comment 12 RHEL Product and Program Management 2011-06-20 17:13:21 EDT
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.
Comment 13 Prarit Bhargava 2011-10-17 10:13:36 EDT

*** This bug has been marked as a duplicate of bug 463573 ***

Note You need to log in before you can comment on or make changes to this bug.