Description of problem: For this new kernel, ntpd is resetting time every 45 minues on the order of +0.2 seconds. Feb 24 06:09:56 white ntpd[2261]: time reset +0.186409 s Version-Release number of selected component (if applicable): kernel-2.6.20-1.2940.fc7 How reproducible: Always Steps to Reproduce: 1. Look at /var/log/messages 2. 3. Actual results: About every 45 minutes, there is a new ntpd log message because it needs to advance the clock about +0.2 seconds Expected results: The clock should be accurate enough to avoid resetting the clock. Additional info:
Created attachment 148745 [details] The log entries in /var/log/messages AND lspci output
Was it accurate with older kernels? If so, which ones?
I've switched to kernel-2.6.20-1.2942.fc7. While I'm still getting positive resets on the order of +0.15 seconds, the period between the resets is getting longer and longer today (~90, ~120, ~180). Feb 24 14:48:02 white ntpd[2070]: time reset +0.142277 s Feb 24 16:15:48 white ntpd[2070]: time reset +0.151547 s Feb 24 18:20:50 white ntpd[2070]: time reset +0.158067 s Feb 24 21:15:43 white ntpd[2070]: time reset +0.144398 s My previous kernel before 2940 was 2.6.20-1.2932.fc7 (which I booted on Feb 17). There were no time resets for that kernel. It's possible that ntpd is recalibrating the kernel as of 2940. Decreasing time resets and now increasing periods seem to reflect that.
Hmm. The ~.15s adjustments are expected, as .15 is the slew boundary (w/ default config, if you get a drift larger then .15s it will use settimeofday rather then slew the clock). However the frequency at which your seeing these time resets is quite unexpected. What is the drift value on your machine? (check /etc/ntp/drift or /var/lib/ntp/ntp.drift).
It's not always resetted in the same directions. Times also seem random. /var/log/messages:Feb 25 22:44:57 white ntpd[2070]: time reset -0.157131 s /var/log/messages:Feb 26 01:03:02 white ntpd[2070]: time reset -0.147778 s /var/log/messages:Feb 26 04:18:52 white ntpd[2070]: time reset -0.149802 s /var/log/messages:Feb 27 15:08:18 white ntpd[2070]: time reset +0.154075 s /var/log/messages:Feb 27 20:41:58 white ntpd[2070]: time reset +0.150290 s /var/log/messages:Feb 28 04:27:39 white ntpd[2070]: time reset -0.168500 s /var/log/messages:Feb 28 06:44:25 white ntpd[2070]: time reset -0.143982 s /var/log/messages:Feb 28 09:58:56 white ntpd[2070]: time reset -0.134518 s /var/log/messages:Feb 28 15:31:46 white ntpd[2070]: time reset -0.147648 s [root@white 22:56:40 ~]# more /etc/ntp/drift /etc/ntp/drift: No such file or directory [root@white 22:58:33 ~]# more /var/lib/ntp/ntp.drift /var/lib/ntp/ntp.drift: No such file or directory [root@white 22:58:47 ~]# more /var/lib/ntp/drift 164.271
FWIW, since upgrading to 2.6.20 (the latest x86_64-xen FC6 version) I see dramatic instability in the timekeeping too. (a couple of FreeBSD boxes on the same network with the same time servers are within a few ms of the correct time). [root@app1 ~]# ntpq -c pe remote refid st t when poll reach delay offset jitter =============================================================== =============== xclock-a.develoo 204.123.2.72 2 u 577 1024 377 7.200 -190.34 597.043 +clock-b.develoo 164.67.62.194 2 u 315 1024 377 0.001 243.504 434.286 -gw.bur.sol 10.0.201.1 3 u 328 1024 377 0.001 169.540 372.136 xgw-a.la.sol 216.52.237.152 3 u 601 1024 377 0.001 582.896 768.883 xgw-b.la.sol 216.52.237.151 3 u 602 1024 377 0.001 586.233 768.665 x10.0.201.116 .PPS. 1 u 662 1024 377 0.001 466.949 183.192 +tick.ucla.edu .PSC. 1 u 339 1024 377 0.001 269.785 460.487 *tick.usnogps.na .USNO. 1 u 838 1024 377 0.001 318.837 115.632 I saw the following come by on lkml, it might be related... I haven't had time to try the patch. From: tglx Subject: Re: [PATCH] Avoid time_offset overflows Date: March 25, 2007 4:07:52 AM PDT To: akpm Cc: zippel, johnstul.com, mingo, linux-kernel.org Reply-To: tglx On Sun, 2007-03-25 at 01:09 -0800, Andrew Morton wrote: On Sat, 24 Mar 2007 06:20:45 +0100 (CET) Roman Zippel <zippel> wrote: Hi, On Fri, 23 Mar 2007, john stultz wrote: @@ -314,8 +314,8 @@ #endif freq_adj += time_freq; freq_adj = min(freq_adj, (s64)MAXFREQ_NSEC); time_freq = max(freq_adj, (s64)-MAXFREQ_NSEC); - time_offset = (time_offset / NTP_INTERVAL_FREQ) - << SHIFT_UPDATE; + do_div(time_offset, NTP_INTERVAL_FREQ); + time_offset <<= SHIFT_UPDATE; } /* STA_PLL */ } /* txc->modes & ADJ_OFFSET */ if (txc->modes & ADJ_TICK) This is wrong, time_offset is signed and do_div is unsigned. In general I planned to do the same change, but the do_div API could use a little cleanup to provide some clear function for signed/unsigned divide (hopefully with a better name than div_long_long_rem_signed or do_div_llr). Can we do a minimal thing for 2.6.21, worry about API beautification later? Here you go. It's ugly, but it should do the trick for now. tglx Index: linux-2.6/kernel/time/ntp.c =============================================================== ==== --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -196,7 +196,7 @@ void __attribute__ ((weak)) notify_arch_ */ int do_adjtimex(struct timex *txc) { - long mtemp, save_adjust; + long mtemp, save_adjust, rem; s64 freq_adj, temp64; int result; @@ -314,7 +314,9 @@ int do_adjtimex(struct timex *txc) freq_adj += time_freq; freq_adj = min(freq_adj, (s64)MAXFREQ_NSEC); time_freq = max(freq_adj, (s64)-MAXFREQ_NSEC); - do_div(time_offset, NTP_INTERVAL_FREQ); + time_offset = div_long_long_rem_signed(time_offset, + NTP_INTERVAL_FREQ, + &rem); time_offset <<= SHIFT_UPDATE; } /* STA_PLL */ } /* txc->modes & ADJ_OFFSET */
Can you attach dmesg output?
Created attachment 151640 [details] dmesg I don't see anything in dmesg that looks related, but attached anyway. I'll try recompiling the kernel with the patch from above to check if it makes a difference...
Hmm.. What was the last known working kernel version? Also can you reproduce the issue without running xen?
It wasn't great in 2.6.19-1.2911.6.5.fc6xen either, but much better: Linux db2.la.sol 2.6.19-1.2911.6.5.fc6xen #1 SMP Sun Mar 4 16:23:59 EST 2007 x86_64 x86_64 x86_64 GNU/Linux remote refid st t when poll reach delay offset jitter =============================================================== =============== +clock-a.develoo 207.200.81.113 2 u 224 256 377 0.001 -91.769 189.461 clock-b.develoo 10.0.201.116 2 u 218 512 377 0.001 -69.820 173.134 +gw.bur.sol 10.0.201.1 3 u 41 512 377 0.001 -89.930 170.287 +gw-a.la.sol 216.52.237.153 3 u 38 512 377 0.001 -81.617 161.867 +gw-b.la.sol 216.52.237.152 3 u 219 512 377 0.001 -89.302 158.438 *10.0.201.116 .PPS. 1 u 222 512 377 0.001 -85.575 160.003 Without xen in the latest kernel it's close to sane: Linux app3.la.sol 2.6.20-1.2933.fc6 #1 SMP Mon Mar 19 11:00:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux remote refid st t when poll reach delay offset jitter =============================================================== =============== +clock-a.develoo 207.200.81.113 2 u 35 64 377 2.127 66.149 19.534 clock-b.develoo 10.0.201.116 2 u 30 64 377 2.391 66.327 5.292 -gw.bur.sol 10.0.201.1 3 u 33 64 377 13.863 48.056 1.971 +gw-a.la.sol 216.52.237.153 3 u 34 64 377 0.764 65.999 1.241 -gw-b.la.sol 216.52.237.152 3 u 34 64 377 0.735 63.250 1.432 *10.0.201.116 .PPS. 1 u 25 64 377 13.197 66.117 7.241 And with Xen and the patch above it seems Pretty Good (not sure if that makes any sense)... Linux app2.la.sol 2.6.20-1.2933solfo1xen #1 SMP Tue Apr 3 23:59:03 PDT 2007 x86_64 x86_64 x86_64 GNU/Linux remote refid st t when poll reach delay offset jitter =============================================================== =============== +clock-a.develoo 207.200.81.113 2 u 16 256 377 0.001 2.134 462.272 +clock-b.develoo 10.0.201.116 2 u 4 256 377 0.001 8.422 470.036 +gw.bur.sol 10.0.201.1 3 u 3 256 377 0.001 -0.360 484.335 +gw-a.la.sol 216.52.237.153 3 u 5 256 377 0.001 7.211 467.613 +gw-b.la.sol 216.52.237.152 3 u 56 256 377 0.001 -9.638 446.548 *10.0.201.116 .PPS. 1 u 195 256 377 0.001 34.328 489.772
FYI - The time resets that started this report got further and further apart. My current 5 log files (/var/log/message*) don't have them and I think they stopped within a month of them starting.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp