Bug 427588
Summary: | [RHEL 5.2]: Tick divider bug when using clocksource=pit | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Chris Lalancette <clalance> |
Component: | kernel | Assignee: | Chris Lalancette <clalance> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.2 | CC: | ahecox, amyagi, dan.magenheimer, dzickus, ian, jeff, johnny, k.georgiou, michael, mishu, nbryant, pasteur, riek, swoodcock, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 08:18:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 483701 |
Description
Chris Lalancette
2008-01-04 22:03:46 UTC
Regarding x86_64, the only available clocksource is jiffies. So choosing pit is not an option there. Akemi FYI, we have only observed the "can't boot" problem when combining divider=10 if clock is pit on Xen on 32-bit kernels. 64-bit kernels boot fine. HOWEVER, we have just discovered that 64-bit kernels with divider=10 and clock=pit (nohpet, nopmtimer) results in bad clock skew under Xen. Possibly this is obscured to real users by NPT, but I thought your debugging efforts might be easier if you know the problem does not just affect 32-bit. If I'm not mistaken, xen kernels are set to 250Hz by default. You might not want to use the divider= option in this case. Akemi Paravirtualized kernels are 250MHz. All of our measurements are with fully-virtualized ("hvm") kernels for which the HZ rate was compiled-in long ago. I've been playing with this and I think I have a partial fix, which might make it easier to identify a complete fix. Test by booting with divider=10 but not clock=pit. Manually change clocksource to pit (via writing to sysfs). Havoc erupts instantaneously as the time-of-day clock starts gaining time very quickly. Now, in arch/i386/kernel/i8253.c, remove the line in pit_read that multiplies count by tick_divider (following comment "Adjust to logical ticks"). This changes the problem from instantaneous and devastating, to periodic and useable (though still unacceptable). Ignore the above partial fix. I've had good luck in limited testing so far with the one-line patch below. No boot problems, no crazy time problems. And the evaluated condition is the same if tick_divider=1 so no change to the normal case. --- arch/i386/kernel/i8253.c 2008-04-02 11:28:43.000000000 -0600 +++ arch.patch/i386/kernel/i8253.c 2008-04-02 12:25:14.000000000 -0600 @@ -86,7 +86,7 @@ * Previous attempts to handle these cases intelligently were * buggy, so we just do the simple thing now. */ - if (count > old_count && jifs == old_jifs) { + if (count > old_count && (jifs - old_jifs) < tick_divider) { count = old_count; } old_count = count; Paravirt uses 250Hz fixed and full virt is all a bit weird if it isn't related to the real Xen timing. The patch looks sensible to me. Tested the patch under Microsoft Virtual PC (with clocksource=pit divider=10) and it appears to fix all the kernel instability issues. gettimeofday is still drifting, but I'm chalking that up to Microsoft for now. from the CentOS bug entry: http://kb.vmware.com/kb/1006427 lists the timekeeping best practices for a number of distributions. see also https://bugzilla.redhat.com/show_bug.cgi?id=463573 Updating PM score. Dan M, We want to pull the patch from Comment #7 into the next RHEL kernel. Given that it is a one-off fix, and never be upstream, I just wanted to make sure that I had your Signed-off-by to go ahead and put it into RHEL. Just let me know. Thanks, Chris Lalancette Hi Chris -- Signed-off-by: Dan Magenheimer <dan.magenheimer> (Note that I myself have only done limited testing on the fix.) Great, thanks. I'll definitely get it some QA here before we put it in. Thanks again, Chris Lalancette in kernel-2.6.18-141.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |