Description of problem: I've been recently working with some CentOS people using the tick divider patch: http://bugs.centos.org/view.php?id=2189 They pointed out a bug to me: if, under VMware, you boot a RHEL-5.1 (or 5.2) kernel with divider=10 clocksource=pit, the kernel will get softlockups and exhibit all kinds of strange behavior. The end result is that the kernel does not really boot, and doesn't work properly. I've tracked this down to arch/i386/kernel/io_apic.c: check_timer(). In there, there is a check (timer_irq_works()) to make sure that the PIT, when routed through the IO-APIC, actually works. However, it does it by enabling interrupts, mdelay((10*1000)/HZ), and then comparing the difference in the jiffies. In the case of a divided kernel, however, 0 jiffies may have elapsed during the mdelay, instead of the expected 10. I'm still working on a solution, but the simple way to go may be to just change HZ to REAL_HZ. This looks like it will also affect x86_64, although I haven't confirmed it there yet.
Regarding x86_64, the only available clocksource is jiffies. So choosing pit is not an option there. Akemi
FYI, we have only observed the "can't boot" problem when combining divider=10 if clock is pit on Xen on 32-bit kernels. 64-bit kernels boot fine. HOWEVER, we have just discovered that 64-bit kernels with divider=10 and clock=pit (nohpet, nopmtimer) results in bad clock skew under Xen. Possibly this is obscured to real users by NPT, but I thought your debugging efforts might be easier if you know the problem does not just affect 32-bit.
If I'm not mistaken, xen kernels are set to 250Hz by default. You might not want to use the divider= option in this case. Akemi
Paravirtualized kernels are 250MHz. All of our measurements are with fully-virtualized ("hvm") kernels for which the HZ rate was compiled-in long ago.
I've been playing with this and I think I have a partial fix, which might make it easier to identify a complete fix. Test by booting with divider=10 but not clock=pit. Manually change clocksource to pit (via writing to sysfs). Havoc erupts instantaneously as the time-of-day clock starts gaining time very quickly. Now, in arch/i386/kernel/i8253.c, remove the line in pit_read that multiplies count by tick_divider (following comment "Adjust to logical ticks"). This changes the problem from instantaneous and devastating, to periodic and useable (though still unacceptable).
Ignore the above partial fix. I've had good luck in limited testing so far with the one-line patch below. No boot problems, no crazy time problems. And the evaluated condition is the same if tick_divider=1 so no change to the normal case. --- arch/i386/kernel/i8253.c 2008-04-02 11:28:43.000000000 -0600 +++ arch.patch/i386/kernel/i8253.c 2008-04-02 12:25:14.000000000 -0600 @@ -86,7 +86,7 @@ * Previous attempts to handle these cases intelligently were * buggy, so we just do the simple thing now. */ - if (count > old_count && jifs == old_jifs) { + if (count > old_count && (jifs - old_jifs) < tick_divider) { count = old_count; } old_count = count;
Paravirt uses 250Hz fixed and full virt is all a bit weird if it isn't related to the real Xen timing. The patch looks sensible to me.
Tested the patch under Microsoft Virtual PC (with clocksource=pit divider=10) and it appears to fix all the kernel instability issues. gettimeofday is still drifting, but I'm chalking that up to Microsoft for now.
from the CentOS bug entry: http://kb.vmware.com/kb/1006427 lists the timekeeping best practices for a number of distributions. see also https://bugzilla.redhat.com/show_bug.cgi?id=463573
Updating PM score.
Dan M, We want to pull the patch from Comment #7 into the next RHEL kernel. Given that it is a one-off fix, and never be upstream, I just wanted to make sure that I had your Signed-off-by to go ahead and put it into RHEL. Just let me know. Thanks, Chris Lalancette
Hi Chris -- Signed-off-by: Dan Magenheimer <dan.magenheimer> (Note that I myself have only done limited testing on the fix.)
Great, thanks. I'll definitely get it some QA here before we put it in. Thanks again, Chris Lalancette
in kernel-2.6.18-141.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html