From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 Description of problem: With the simple code shown below, gettimeofday() is randomly going backward by 1 second. This problem (time going backward) was discovered by a vendor looking at our log files to investigate a problem with one of their products. In an effort to test if there was something odd with our system clock, they sent the following code to see if it would show time going backwards. It does on at least two systems running RHEL3 U6. #include <stdio.h> #include <sys/time.h> int main(int argc, char *argv[]) { struct timeval t, old, delta; while (1) { usleep(400000); gettimeofday(&t, NULL); printf("%ld %ld\n", t.tv_sec, t.tv_usec); if (old.tv_sec > t.tv_sec) { delta.tv_sec = t.tv_sec - old.tv_sec; delta.tv_usec = t.tv_usec - old.tv_usec; printf("Delta time: %ld %ld\n", delta.tv_sec, delta.tv_usec); } old = t; } return 0; } Version-Release number of selected component (if applicable): kernel-smp-2.4.21-37.EL How reproducible: Always Steps to Reproduce: 1.Compile and run the above code 2.Program will print "Delta time: xx yy" when gettimeofday() reports going backward 3.The frequency of error seems similar regardless if ntpd is running and synchronized or not. Actual Results: 1128968736 51968 Delta time: -1 1025 1128968737 870948 1128968738 280953 1128968738 690968 1128968737 691995 Delta time: -1 1027 1128968738 101994 1128968739 920975 1128968738 922005 Delta time: -1 1030 1128968740 740983 1128968741 151007 1128968741 560997 1128968741 971003 1128968742 381013 1128968742 791010 1128968741 792028 Delta time: -1 1018 1128968742 202036 1128968744 21026 1128968743 22105 Delta time: -1 1079 1128968743 432134 1128968745 251046 1128968745 661044 1128968746 71068 1128968745 72078 Delta time: -1 1010 1128968745 482077 1128968747 301063 1128968746 302110 Delta time: -1 1047 1128968748 121075 1128968748 531087 Expected Results: I should not see the "Delta time: xx yy" message Additional info: I have run this test on several RHEL systems, all of which have their clocks sync'd via ntpd. I have not been able to reproduce on RHEL4, and have seen two Compaq DL380 G3 systems which exhibit the problem, and other RHEL3 systems that do not exhibit the problem.
what timesource was being used? HPET, PIT, PMTimer, TSC? i.e. "dmesg | grep time.c"
None of the RHEL systems here have "time.c" in the kernel buffer. Is there another way that I can answer this question for you? --Larry
"dmesg | grep time.c" is the only way to tell what timekeeping mechanism is being used. It may also be in some other file in /var/log, but this info shows up at boot time, so simply rebooting and looking at dmesg would be a sure way to find out.
Brian, Okay, I've just rebooted one of my test RHEL3 U6 boxes (the ones that are having the problem are production and cannot be rebooted at will). $ dmesg|grep -i time Real Time Clock Driver v1.10e I'm sorry, but there is no time.c mentioned in the kernel buffer, and I can see that the buffer has not yet been overwitten. Buffer begins with: Linux version 2.4.21-37.EL (bhcompile.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-53)) #1 Wed Sep 7 13:35:21 EDT 2005 --Larry
For RHEL3, the timekeeping stuff gets logged at loglevel "KERN_INFO", so you will need to adjust your /etc/syslog.conf accordingly otherwise it wont wont see this info logged. i.e. (syslog.conf) kern.* /var/log/kern
Brian, Still not seeing what you're asking for. # grep /var/log/kern /etc/syslog.conf kern.* /var/log/kern Reboot # dmesg | grep -i time Real Time Clock Driver v1.10e # grep -i time /var/log/kern Oct 11 15:51:16 zeus kernel: Real Time Clock Driver v1.10e Oct 11 15:51:22 zeus last message repeated 9 times --Larry
Brian, It looks like you are asking me to look for messages that are specific to the x86_64 kernel. This bug is against the i386 kernel and arch/i386/kernel/time.c does not contain these messages. As stated in the title of the bug, it does not matter if ntpd is running or not, time still goes backward by 1 second at random intervals. Is there some other data that I can provide to help you isolate this? --Larry
time.c has major differences between 2.4.21 and 2.6.9 and between i386 and x86_64. Regrettably RHEL3 on i386 provides a very limited log output and appears to use only TSC and Cyclone for timekeeping, which is known to be problematic on some hardware (like intel based SMP systems). The RHEL3 x86_64 kernel offers HPET, PMTimer and PIT for timekeeping. Also, the code appears to use write_locks which may be the issue here but would likely only affect SMP systems. Question: Is this a SMP system? Does the problem go away by booting a non SMP kernel? If the problem doesnt exist on non-smp systems, the locking mechanisms in time.c likely need re-working. time.c would need to be made more SMP aware. spin_locks are used in x86_64 time.c and should probably be used in i386. i386 kernels are less used for SMP and are not as optimized for SMP as the x86_64.
Brian, We don't have any x86_64 systems here. Two Compaq/HP DL380 SMP machines running RHEL3 U6 exhibit the problem, one Compaq/HP DL360 SMP machine running RHEL3 U6 does not exhibit the problem. None of these systems has the non-SMP kernel installed. I have not yet found a non-SMP RHEL3 or RHEL4 system that exhibits the problem. --Larry