Description of problem: When adjtime() (i.e. adjtimex in SINGLESHOT mode) is called with offset smaller than 1000us, the clock is not adjusted. Version-Release number of selected component (if applicable): kernel-2.6.9-55.EL.ppc64 How reproducible: Always Steps to Reproduce: 1. gcc -x c - <<EOF #include <stdio.h> #include <sys/time.h> #include <sys/timex.h> #include <unistd.h> #include <string.h> int main() { struct timeval tv, otv; long usecs; while (1) { struct timex t; gettimeofday(&tv, NULL); usleep(1000000 - tv.tv_usec % 1000000); memset(&t, 0, sizeof (t)); t.modes = ADJ_OFFSET_SINGLESHOT; t.offset = 999; if (adjtimex(&t) >= 0) printf("%d\n", t.offset); } return 0; } EOF 2. ./a.out & 3. compare system time with a reference (e.g. ntpdate -q) Actual results: No adjustment. Expected results: Time adjusted. Additional info: This is blocking bug #431728, where ntpd in -x mode should be fixed to use only adjtime() instead of switching between adjtimex() and adjtime().
------- Comment From mjwolf.com 2008-04-07 11:51 EDT------- No, I haven't seen this error before. I have added some team members to the cc list to see if they recognize this error. ------- Comment From mjwolf.com 2008-04-07 11:56 EDT------- Also I'm a little confused here, the kernel version is listed as kernel-2.6.9-55.EL.ppc64. That is a rhel4u5 kernel. I guess I expected that the kernel version would have been -67 (rhel4u6) or a newer one for the upcoming rhel4u7. Can someone clarify if this problem is being seen on the newer kernels?
kernel-2.6.9-67.0.7.EL.ppc64 has the same problem.
Created attachment 301757 [details] Fix detection of need for time adjustment. When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to correctly detect that time needs to be adjusted. This patch checks the current time step at the begining of a decrementer tick and adjusts the varisou time paramenters to cope with the requested change in ttime. When we have successfully adjusted time as requested revert to the "stable" values. Fixes both the 32-bit (compat) and 64-bit cases.
Created attachment 301758 [details] Fix detection of need for time adjustment. v2 When making the adjtimex() system call in ADJ_OFFSET_SINGLESHOT, we fail to correctly detect that time needs to be adjusted. This patch checks the current time step at the begining of a decrementer tick and adjusts the various timeing paramenters to cope with the requested change in time. When we have successfully adjusted time as requested revert to the "stable" values. Fixes both the 32-bit (compat) and 64-bit cases. Previous version had a DEBUG #define enabled by mistake. Can we get the original reported to test?
With this patch applied it seems to work well.
------- Comment From tbreeds.com 2008-04-09 19:41 EDT------- (In reply to comment #9) > ------- Comment From mlichvar 2008-04-09 12:39 EST------- > With this patch applied it seems to work well. Okay I'll update this to submitted for inclusion by RHEL
Unfortunately the patch breaks ntpd running without -x option, clock frequency offset is adjusted, but it doesn't have any effect.
We are past code freeze and I do not see a patch proposed to resolve this issue as an exception. Radek you may want to consider reverting the ntp changes for R4.7.
------- Comment From mjwolf.com 2008-05-09 15:22 EDT------- Is RedHat still looking for a patch for this problem. I'm confused by the comment above: "------- Comment From peterm 2008-04-22 15:58 EST------- We are past code freeze and I do not see a patch proposed to resolve this issue as an exception. Radek you may want to consider reverting the ntp changes for R4.7."
------- Comment From mjwolf.com 2008-07-01 15:47 EDT------- reopening this bug and marking as NEEDINFO. Would RedHat like us to look at this for RHEL4u8?
------- Comment From mjwolf.com 2008-07-15 09:49 EDT------- Can IBM get a copy of the patched ntpd that showed the problem with adjtime()?
ntp packages showing the problem are available here: http://people.redhat.com/mlichvar/tmp/ntpadjtime/
Updating PM score.
Hmm....from what I can see, the NTP along with adjtime are working as expected. Here are my tests along with my results and why I think there is no problem here to be fixed. I am running with the ntp package from Comment 20 and kernel Linux version 2.6.9-78.EL (brewbuilder.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:37:15 EDT 2008 Make sure NTP server is up and running...... root> pgrep ntp 21838 Make sure drift file is set to 0....... root> cat /var/lib/ntp/drift 0.000 Set the system clock to time from NTP server...... root> ntpdate -u 91.189.94.4 ; date ; hwclock --show 14 Oct 13:05:04 ntpdate[23026]: step time server 91.189.94.4 offset 223005059.003048 sec Tue Oct 14 13:05:04 CDT 2008 Thu 20 Sep 2001 04:14:06 PM CDT -0.285046 seconds Make sure clocks are still as expected root> date ; hwclock --show Tue Oct 14 13:05:17 CDT 2008 Thu 20 Sep 2001 04:14:19 PM CDT -0.624382 seconds Set hw clock to same as system clock..... root> date ; hwclock --show ; hwclock --localtime --systohc ; date ; hwclock --show Tue Oct 14 13:05:37 CDT 2008 Thu 20 Sep 2001 04:14:39 PM CDT -0.384789 seconds Tue Oct 14 13:05:40 CDT 2008 Tue 14 Oct 2008 01:05:41 PM CDT -0.997783 seconds Make sure clocks are still as expected....progressing together..... root> date ; hwclock --show Tue Oct 14 13:05:54 CDT 2008 Tue 14 Oct 2008 01:05:55 PM CDT -0.010500 seconds Set system clock to Sept 2001..... root> date ; hwclock --show ; date 092011082001 ; date ; hwclock --show Tue Oct 14 13:06:00 CDT 2008 Tue 14 Oct 2008 01:06:01 PM CDT -0.090070 seconds Thu Sep 20 11:08:00 CDT 2001 Thu Sep 20 11:08:00 CDT 2001 Tue 14 Oct 2008 01:06:02 PM CDT -0.996832 seconds Make sure the two clocks are still as expected, and, are progressing..... root> date ; hwclock --show Thu Sep 20 11:08:08 CDT 2001 Tue 14 Oct 2008 01:06:10 PM CDT -0.088859 seconds Try to adjust the hw clock to be in sync with the system clock..... root> date ; hwclock --show ; hwclock --adjust ; date ; hwclock --show Thu Sep 20 11:08:19 CDT 2001 Tue 14 Oct 2008 01:06:21 PM CDT -0.631395 seconds Thu Sep 20 11:08:21 CDT 2001 Tue 14 Oct 2008 01:06:23 PM CDT -0.997863 seconds Wait a few seconds then check times again...... root> date ; hwclock --show Thu Sep 20 11:10:38 CDT 2001 Thu 20 Sep 2001 04:10:39 PM CDT -0.996076 seconds
I ran some tests on another machine to help gather data points, and, noticed the same behavior there as well. In particular, I ran some analysis tests on an x86 based Ubuntu system running with kernel..... Linux version 2.6.20-17-generic (root@terranova) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Wed Aug 20 16:47:34 UTC 2008 And NTP ..... 14 Oct 13:32:21 ntpdate[25739]: ntpdate 4.2.2p4 Wed Mar 7 20:43:31 UTC 2007 (1) Some helpful links........ http://linux.die.net/man/8/ntpd http://linux.die.net/man/2/settimeofday http://linux.about.com/library/cmd/blcmdl8_hwclock.htm http://ecorrado.us/scholarly/documents/chlug-time.pdf http://linux.die.net/man/1/ntpd http://linux.about.com/od/commands/l/blcmdl2_gettime.htm http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_22945569.html http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch24_:_The_NTP_Server This bug, as defined, does not appear to be a problem. However, if there is something else (which is not being covered by this particular bug), please open another bugzilla and add an abundance of data to help debug and resolve the problem, thanks in advance!!
I can still reproduce the problem with kernel-2.6.9-78.EL. - clock offset adjusted to about 0.3 sec (the machine has a slightly slower clock, if uncorrected the offset will increase) - added "statistics loopstats" to ntp.conf - ntpd started with the -x option to use adjtime() and -s /tmp/ to specify a stats dir The loopstats file in the stats dir shows: 54754 37605.169 0.325754698 0.000000 0.162877349 0.000000 4 54754 37632.213 0.311736644 4.756724 0.141229953 38.053790 4 54754 37649.243 0.297691084 9.299129 0.122510179 49.057199 4 54754 37666.272 0.293997093 13.785169 0.106113003 55.614098 4 54754 37683.302 0.278460071 15.112970 0.092224329 49.320700 4 ... 54754 40947.741 0.130068301 494.654183 0.000783042 30.638022 4 54754 40964.760 0.128114085 496.486870 0.001189372 30.314612 4 54754 40981.777 0.128193493 498.442948 0.001030792 30.563231 4 54754 40998.794 0.128413508 498.810342 0.000899444 26.631221 4 54754 41015.811 0.128723566 500.000000 0.000794219 27.907437 4 54754 41032.829 0.128131188 500.000000 0.000748877 28.788194 4 54754 41049.845 0.128259471 500.000000 0.000651710 25.855169 4 54754 41066.863 0.128458423 500.000000 0.000573097 27.336054 4 The third column is the offset. ntpd is unable to reach zero, it hovers over 128 ms which is where ntpd crosses the 1000us value used for adjtime() when poll interval is 16 s and frequency offset has reached the 500ppm maximum. A plot of the offset is here: http://people.redhat.com/mlichvar/tmp/ntpadjtime/offset.png Probably the easiest way to reproduce the bug is to run adjtime(999 us) in a loop and compare the clock with a reference.
Tony Breeds writes: "I've been workign to try and reproduce this bug. With the patch already attached to this bug and the RPMS from RHEL 4.8 Beta. I do not see the problems detailed in comment 31. I'm not denying the existencce of the problem but I'm unable to reporoduce it on my hardware." I'm going to close this bug as unreproducible
Hello Red Hat, We are unable to reproduce this problem at this point using RHEL 4.8. Please let us know if you are still able to recreate the problem on RHEL 4.8. Thanks!
Reopening the bug. Just to be clear. We can reproduce the original problem stated in the bug. However, when we apply the patch that is currently attached to this bug we no longer see errors. We cannot reproduce the secondary problem that was reported when the patch was tested at the distro
I can still reproduce the secondary problem with kernel 2.6.9-86 and the "Fix detection of need for time adjustment. v2" patch, git commit 0b221e6. Changing the kernel frequency offset with ntptime -f doesn't change the real frequency. However, since the ntp bug depending on this probably won't get a chance to be fixed in RHEL4, I think this bug is very low priority.
Miroslav, Thanks for the update. Does this mean you are willing to accept IBM's patch for the original problem? Reading through the bug maybe we should open a new Bugzilla for the secondary problem and I can reverse mirror that to IBM. What do you suggest? -Ameet
Ameet, the secondary problem is much worse than the original one, it would break ntp badly. I'd suggest to leave it as it is for now or close it as WONTFIX.
------- Comment From kumarr.com 2009-05-12 16:00 EDT------- (In reply to comment #43) > Ameet, > > the secondary problem is much worse than the original one, it would break ntp > badly. I'd suggest to leave it as it is for now or close it as WONTFIX. > I am closing this bug as WILL_NOT_FIX, since this is unlikely to be addressed in the near future. If this becomes a cirtical issue, please reopen and raise the severity of this bug.
Closing on the Red Hat side as well.