Description of problem: When running the 2.6.18-92.1.10 kernel as guest under VMware, we have seen that over a run of 5 hours the Guest OS looses about 3sec of time when divider= option is passed. After doing some analysis, we think the problem is due to some error in calculations that the kernel does. For a case where divider=10 option is passed and a clock_tick_rate of 1193182. The LATCH value, based on REAL_HZ is calculated to 11932, this is used to program the PIT period. OTOH, the LOGICAL_LATCH value, based on HZ, is calculated to 1193. This LOGICAL_LATCH value is used to calculate the TICK_NSEC value. TICK_NSEC is used to adjust time of day based on jiffies advancement. So in a real tick, the value that TOD is updated by corresponds to 11930 PIT ticks. As a result we loose 2 PIT ticks every interrupt. This results in a error of 166PPM which keeps accumulating. In our opinion the calculations should be improved here by using LATCH to calculate the ACTHZ value Current definition of ACTHZ is #define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LOGICAL_LATCH, 8)) We think, it should be #define ACTHZ ((SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) * tick_divider) for higher precision. This problem should be seen on any system where divider= option is used. Please have a look. Thanks, Alok
I did some testing here. I can definitely see the clock drift when booting the RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized Xen guest. However, I don't think your analysis is quite right. First, we can't actually change ACTHZ as you suggest because that would make it not a constant anymore, and TICK_NSEC is used to make pre-processor decisions. That's minor, I hacked around that temporarily by modifying the "tick_nsec" value in kernel/timer.c. However, doing that caused the time drift to become much worse, not better. With a straight 92 kernel, I was falling behind about .005 seconds per second. With this patch in place, I'm falling behind about .01 seconds per second, so the problem has doubled. I'll have to do some more investigation here to see what exactly is going on. Chris Lalancette
Created attachment 317099 [details] Fine tune the ACT_HZ value when divider= is enabled.
(In reply to comment #1) > I did some testing here. I can definitely see the clock drift when booting the > RHEL-5 i386 kernel with "clocksource=pit divider=10" under a fully virtualized > Xen guest. However, I don't think your analysis is quite right. First, we > can't actually change ACTHZ as you suggest because that would make it not a > constant anymore, and TICK_NSEC is used to make pre-processor decisions. > That's minor, I hacked around that temporarily by modifying the "tick_nsec" > value in kernel/timer.c. IMO, you should also change the clocksource_jiffies.mult value to reflect this in kernel/time/jiffies.c. Have a look at this experimental patch. Another worry could be that we are not changing all the places where TICK_NSEC needs to reflect the more fine tuned value. > However, doing that caused the time drift to become > much worse, not better. With a straight 92 kernel, I was falling behind about > .005 seconds per second. With this patch in place, I'm falling behind about > .01 seconds per second, so the problem has doubled. I'll have to do some more > investigation here to see what exactly is going on. Also it would be great if you could post your changes. Alok > > Chris Lalancette
Reassigning to kernel, since this is bug is not Xen related.
Ah, true, but I'll assign it back to myself. I do intend to do some work with the tick divider in the near future. Chris Lalancette
Updating PM score.
PR 463573 should fix this too so am duping it.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
*** This bug has been marked as a duplicate of bug 463573 ***