Bug 462410

Summary:

[BUG]Loosing time with divider=10 option.

Product:

Red Hat Enterprise Linux 5

Reporter:

Alok Kataria <akataria>

Component:

kernel

Assignee:

Prarit Bhargava <prarit>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

medium

Version:

5.2

CC:

clalance, dhecht, garrett, peterm, riel, xen-maint

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-10-17 14:13:36 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

533192

Attachments:

Description	Flags
Fine tune the ACT_HZ value when divider= is enabled.	none

Description Alok Kataria 2008-09-16 00:31:56 UTC

Description of problem:

When running the 2.6.18-92.1.10 kernel as guest under VMware, we have seen that over a run of 5 hours the Guest OS looses about 3sec of time when divider= option is passed. 

After doing some analysis, we think the problem is due to some error in calculations that the kernel does. For a case where divider=10 option is passed and a clock_tick_rate of 1193182.

The LATCH value, based on REAL_HZ is calculated to 11932, this is used to program the PIT period. OTOH, the LOGICAL_LATCH value, based on HZ, is calculated to 1193. This LOGICAL_LATCH value is used to calculate the TICK_NSEC value. TICK_NSEC is used to adjust time of day based on jiffies advancement. So in a real tick, the value that TOD is updated by corresponds to 11930 PIT ticks. 
As a result we loose 2 PIT ticks every interrupt. This results in a error of 166PPM which keeps accumulating.

In our opinion the calculations should be improved here by using LATCH to calculate the ACTHZ value 

Current definition of ACTHZ is 
#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LOGICAL_LATCH, 8))

We think, it should be 
#define ACTHZ ((SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) * tick_divider)
for higher precision.

This problem should be seen on any system where divider= option is used.

Please have a look.

Thanks,
Alok

Comment 1 Chris Lalancette 2008-09-18 10:44:59 UTC

I did some testing here.  I can definitely see the clock drift when booting the RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized Xen guest.  However, I don't think your analysis is quite right.  First, we can't actually change ACTHZ as you suggest because that would make it not a constant anymore, and TICK_NSEC is used to make pre-processor decisions.  That's minor, I hacked around that temporarily by modifying the "tick_nsec" value in kernel/timer.c.  However, doing that caused the time drift to become much worse, not better.  With a straight 92 kernel, I was falling behind about .005 seconds per second.  With this patch in place, I'm falling behind about .01 seconds per second, so the problem has doubled.  I'll have to do some more investigation here to see what exactly is going on.

Chris Lalancette

Comment 2 Alok Kataria 2008-09-18 17:15:27 UTC

Created attachment 317099 [details]
Fine tune the ACT_HZ value when divider= is enabled.

Comment 3 Alok Kataria 2008-09-18 17:16:11 UTC

(In reply to comment #1)
> I did some testing here.  I can definitely see the clock drift when booting the
> RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized
> Xen guest.  However, I don't think your analysis is quite right.  First, we
> can't actually change ACTHZ as you suggest because that would make it not a
> constant anymore, and TICK_NSEC is used to make pre-processor decisions. 
> That's minor, I hacked around that temporarily by modifying the "tick_nsec"
> value in kernel/timer.c.

IMO, you should also change the clocksource_jiffies.mult value to reflect this 
in kernel/time/jiffies.c. Have a look at this experimental patch. Another worry could be that we are not changing all the places where TICK_NSEC needs to reflect the more fine tuned value. 



>  However, doing that caused the time drift to become
> much worse, not better.  With a straight 92 kernel, I was falling behind about
> .005 seconds per second.  With this patch in place, I'm falling behind about
> .01 seconds per second, so the problem has doubled.  I'll have to do some more
> investigation here to see what exactly is going on.

Also it would be great if you could post your changes. 

Alok
> 
> Chris Lalancette

Comment 4 Rik van Riel 2009-01-12 15:51:38 UTC

Reassigning to kernel, since this is bug is not Xen related.

Comment 5 Chris Lalancette 2009-01-12 16:07:05 UTC

Ah, true, but I'll assign it back to myself.  I do intend to do some work with the tick divider in the near future.

Chris Lalancette

Comment 6 RHEL Program Management 2009-02-16 15:37:12 UTC

Updating PM score.

Comment 9 Alok Kataria 2010-08-19 18:20:12 UTC

PR 463573 should fix this too so am duping it.

Comment 11 RHEL Program Management 2010-12-07 09:55:14 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 12 RHEL Program Management 2011-06-20 21:13:21 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 13 Prarit Bhargava 2011-10-17 14:13:36 UTC


*** This bug has been marked as a duplicate of bug 463573 ***