Bug 170341 - gettimeofday goes backward 1 sec with or without ntpd
Summary: gettimeofday goes backward 1 sec with or without ntpd
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Brian Maly
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-10-10 20:06 UTC by Larry Fahnoe
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-04-27 19:07:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Larry Fahnoe 2005-10-10 20:06:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
With the simple code shown below, gettimeofday() is randomly going backward by 1 second.  This problem (time going backward) was discovered by a vendor looking at our log files to investigate a problem with one of their products.  In an effort to test if there was something odd with our system clock, they sent the following code to see if it would show time going backwards.  It does on at least two systems running RHEL3 U6.

#include <stdio.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{
  struct timeval t, old, delta;

  while (1) {
    usleep(400000);
    gettimeofday(&t, NULL);
    printf("%ld %ld\n", t.tv_sec, t.tv_usec);
    if (old.tv_sec > t.tv_sec) {
      delta.tv_sec = t.tv_sec - old.tv_sec;
      delta.tv_usec = t.tv_usec - old.tv_usec;
      printf("Delta time: %ld %ld\n", delta.tv_sec, delta.tv_usec);
    }
    old = t;
  }
  return 0;
}


Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-37.EL

How reproducible:
Always

Steps to Reproduce:
1.Compile and run the above code
2.Program will print "Delta time: xx yy" when gettimeofday() reports going backward
3.The frequency of error seems similar regardless if ntpd is running and synchronized or not.
  

Actual Results:  1128968736 51968
Delta time: -1 1025
1128968737 870948
1128968738 280953
1128968738 690968
1128968737 691995
Delta time: -1 1027
1128968738 101994
1128968739 920975
1128968738 922005
Delta time: -1 1030
1128968740 740983
1128968741 151007
1128968741 560997
1128968741 971003
1128968742 381013
1128968742 791010
1128968741 792028
Delta time: -1 1018
1128968742 202036
1128968744 21026
1128968743 22105
Delta time: -1 1079
1128968743 432134
1128968745 251046
1128968745 661044
1128968746 71068
1128968745 72078
Delta time: -1 1010
1128968745 482077
1128968747 301063
1128968746 302110
Delta time: -1 1047
1128968748 121075
1128968748 531087


Expected Results:  I should not see the "Delta time: xx yy" message

Additional info:

I have run this test on several RHEL systems, all of which have their clocks sync'd via ntpd.  I have not been able to reproduce on RHEL4, and have seen two Compaq DL380 G3 systems which exhibit the problem, and other RHEL3 systems that do not exhibit the problem.

Comment 2 Brian Maly 2005-10-11 14:50:37 UTC
what timesource was being used? HPET, PIT, PMTimer, TSC?

i.e. "dmesg | grep time.c"

Comment 3 Larry Fahnoe 2005-10-11 17:31:44 UTC
None of the RHEL systems here have "time.c" in the kernel buffer.  Is there
another way that I can answer this question for you?

--Larry

Comment 4 Brian Maly 2005-10-11 18:22:12 UTC
"dmesg | grep time.c" is the only way to tell what timekeeping mechanism is
being used. It may also be in some other file in /var/log, but this info shows
up at boot time, so simply rebooting and looking at dmesg would be a sure way to
find out.

Comment 5 Larry Fahnoe 2005-10-11 18:40:53 UTC
Brian,

Okay, I've just rebooted one of my test RHEL3 U6 boxes (the ones that are having
the problem are production and cannot be rebooted at will).

$ dmesg|grep -i time
Real Time Clock Driver v1.10e

I'm sorry, but there is no time.c mentioned in the kernel buffer, and I can see
that the buffer has not yet been overwitten.  Buffer begins with:

Linux version 2.4.21-37.EL (bhcompile.redhat.com) (gcc version
3.2.3 20030502 (Red Hat Linux 3.2.3-53)) #1 Wed Sep 7 13:35:21 EDT 2005

--Larry

Comment 6 Brian Maly 2005-10-11 19:01:05 UTC
For RHEL3, the timekeeping stuff gets logged at loglevel "KERN_INFO", so you
will need to adjust your /etc/syslog.conf accordingly otherwise it wont wont see
this info logged. 

i.e. (syslog.conf) 

kern.*                                                 /var/log/kern

Comment 7 Larry Fahnoe 2005-10-11 20:52:57 UTC
Brian,

Still not seeing what you're asking for.

# grep /var/log/kern /etc/syslog.conf
kern.*                                                  /var/log/kern

Reboot

# dmesg | grep -i time
Real Time Clock Driver v1.10e

# grep -i time /var/log/kern
Oct 11 15:51:16 zeus kernel: Real Time Clock Driver v1.10e
Oct 11 15:51:22 zeus last message repeated 9 times

--Larry

Comment 8 Larry Fahnoe 2005-10-12 17:12:50 UTC
Brian,

It looks like you are asking me to look for messages that are specific to the
x86_64 kernel.  This bug is against the i386 kernel and arch/i386/kernel/time.c
does not contain these messages.

As stated in the title of the bug, it does not matter if ntpd is running or not,
time still goes backward by 1 second at random intervals.

Is there some other data that I can provide to help you isolate this?

--Larry

Comment 9 Brian Maly 2005-10-12 18:04:27 UTC
time.c has major differences between 2.4.21 and 2.6.9 and between i386 and
x86_64. Regrettably RHEL3 on i386 provides a very limited log output and appears
to use only TSC and Cyclone for timekeeping, which is known to be problematic on
some hardware (like intel based SMP systems). The RHEL3 x86_64 kernel offers
HPET, PMTimer and PIT for timekeeping. Also, the code appears to use write_locks
which may be the issue here but would likely only affect SMP systems.

Question: Is this a SMP system? Does the problem go away by booting a non SMP
kernel?

If the problem doesnt exist on non-smp systems, the locking mechanisms in time.c
likely need re-working. time.c would need to be made more SMP aware. spin_locks
are used in x86_64 time.c and should probably be used in i386. i386 kernels are
less used for SMP and are not as optimized for SMP as the x86_64.



Comment 10 Larry Fahnoe 2005-10-12 18:24:44 UTC
Brian,

We don't have any x86_64 systems here.  Two Compaq/HP DL380 SMP machines running
RHEL3 U6 exhibit the problem, one Compaq/HP DL360 SMP machine running RHEL3 U6
does not exhibit the problem.  None of these systems has the non-SMP kernel
installed.  I have not yet found a non-SMP RHEL3 or RHEL4 system that exhibits
the problem.

--Larry


Note You need to log in before you can comment on or make changes to this bug.