Bug 193838 - gettimeofday goes backwards on IBM x460 merged servers
gettimeofday goes backwards on IBM x460 merged servers
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Brian Maly
Brian Brock
Depends On:
Blocks: 181409
  Show dependency treegraph
Reported: 2006-06-01 21:19 EDT by TDS HCM
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-08-10 19:26:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description TDS HCM 2006-06-01 21:19:55 EDT
Description of problem:
gettimeofday() goes backwards randomly

Version-Release number of selected component (if applicable):

How reproducible:
Compile c program below and run.

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <time.h>

int  main(int argc, char *argv[])
        int i,x,rs;
        long long secs,elapse_time,start,end;
        struct timeval t1;

        printf("starting program\n");

        for(i=0;i<10000;i++) {
                rs = gettimeofday(&t1,0);
                secs = t1.tv_sec;
                start = secs * 1000000 + t1.tv_usec;

                rs = gettimeofday(&t1,0);
                secs = t1.tv_sec;
                end = secs * 1000000 + t1.tv_usec;
                if(x >= 1000) { x=0; printf("."); fflush(NULL);}

                elapse_time   = end-start;
                if(end<start) {
                        printf("\nerror:(%i)end time=%ld, start time=%20ld :
time diff=%20ld\n",i,end,start,elapse_time);

Steps to Reproduce:
1. Compile program
2. Run
Actual results:
(Time diff and frequency of errors vary on each run)

error:(402)end time=1149011014759468, start time=    1149011014850382 : time
diff=              -90914
error:(1051)end time=1149011014760467, start time=    1149011014851331 : time
diff=              -90864

error:(1717)end time=1149011014761454, start time=    1149011014852399 : time
diff=              -90945
error:(3067)end time=1149011014763405, start time=    1149011014854292 : time
diff=              -90887

error:(3739)end time=1149011014764465, start time=    1149011014855398 : time
diff=              -90933
error:(4401)end time=1149011014765430, start time=    1149011014856382 : time
diff=              -90952
error:(5068)end time=1149011014766406, start time=    1149011014857346 : time
diff=              -90940

error:(5743)end time=1149011014767411, start time=    1149011014858375 : time
diff=              -90964
error:(6409)end time=1149011014768464, start time=    1149011014859377 : time
diff=              -90913
error:(8436)end time=1149011014771445, start time=    1149011014862320 : time
diff=              -90875

Expected results:
Time diff should never go negative. On a working box, we can run it hundreds of
times with no errors:

starting program

Additional info:

RHAS 4 Update 3 x86_64
Kernel 2.6.9-34.ELlargesmp

This problem only occurs on the IBM x460 server, and only when it is "merged"
with another partition via an external cable. (Each half of the pair of x460
servers contains 4 dual-core CPUs and 64GB of RAM.)

Main server: IBM 8872-6RU
Second half: IBM 8874-2RU

I rebooted, bypassed the partition merge, and then ran the test again on just
one half and the above program repeatedly without errors.

I'm guessing there is a timing issue between the two halves of the server.

I've searched bugzilla for this bug, and there have been several reports of
similar problems in the last year or so. The problem we are experiencing is very
hardware specific, however. It does not seem to be related to the kernel version.

The x86 version of the kernel (2.6.9-34.ELhugemem) does not have this problem.
Comment 1 Kurtis Rader 2006-06-02 17:18:51 EDT
Known issue involving use of the HPET time source that is already fixed in the U
4 kernel.

Comment 2 TDS HCM 2006-06-02 17:36:41 EDT
Tested with pre-beta kernel-largesmp-2.6.9-37, and the problem is indeed fixed.

Rebooted with the "nohpet" kernel option and the problem is now gone on
kernel-largesmp-2.6.9-34 as well.

Thank you for the quick attention to this matter.
Comment 3 Jason Baron 2006-06-06 14:48:22 EDT
committed in stream U4. A test kernel with this patch is available from
Comment 7 Red Hat Bugzilla 2006-08-10 19:26:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.