Description of problem: With RedHat AS x86 on opteron using the 2.4.21-9 kernel, the gettimeofday function demonstrated microsecond resolution. However in RedHat update 2 the 2.4.21-15 kernel, the gettimeofday function resolution appears to be less. It is now equal to that of a Linux clock tick, .01 sec. This problem causes MPI benchmarks like Pallas and b_eff to produce innacurate results. Since these benchmarks are important tools used to evaluate linux cluster performance this problem is considered to be serious. Version-Release number of selected component (if applicable): kernel-2.4.21-15.EL How reproducible: Everytime Steps to Reproduce: 1. 2. 3. Compile and run the following program which can be used to demonstrate the problem: #include <sys/time.h> #include <stdio.h> int main(int argc, char* argv[]) { struct timeval start, prev, recent; long diff; int underflows = 0, overflows = 0; #define LIMIT 20000 int counts[LIMIT]; bzero(counts, sizeof(counts)); gettimeofday(&start, NULL); prev = start; while (1) { gettimeofday(&recent, NULL); diff = ((recent.tv_sec - prev.tv_sec)*1000000 + (recent.tv_usec - prev.tv_usec)); if (diff < 0) underflows += 1; else if (diff > LIMIT) overflows += 1; else counts[diff] += 1; diff = ((recent.tv_sec - start.tv_sec)*1000000 + (recent.tv_usec - start.tv_usec)); if (diff > 5000000) /* 5 seconds */ break; prev = recent; } if (underflows > 0) printf("%9d intervals took less than %6d microseconds\n", underflows, 0); if (overflows > 0) printf("%9d intervals took more than %6d microseconds\n", overflows, LIMIT); for (diff = 0; diff < LIMIT; diff += 1) if (counts[diff] > 0) printf("%9d intervals took %9s %6d microseconds\n", counts[diff], "", diff); return 0; } Actual results: [root@crs1 root]# uname -a Linux crs1 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:09:01 EDT 2004 x86_64 x86_64 x86_64 GNU/Linux [root@crs1 root]# ./dll 138948453 intervals took 0 microseconds 318 intervals took 10000 microseconds 182 intervals took 10001 microseconds [root@crs1 root]# ./dll 138909560 intervals took 0 microseconds 501 intervals took 10000 microseconds Expected results: On a system with microsecond resolution, like 2.4.21-9, you see output like this: 5283049 intervals took 0 microseconds 4931788 intervals took 1 microseconds 3150 intervals took 2 microseconds 367 intervals took 3 microseconds 363 intervals took 4 microseconds 666 intervals took 5 microseconds 1220 intervals took 6 microseconds 2507 intervals took 7 microseconds 1618 intervals took 8 microseconds 582 intervals took 9 microseconds 132 intervals took 10 microseconds 152 intervals took 11 microseconds 152 intervals took 12 microseconds 31 intervals took 13 microseconds 13 intervals took 14 microseconds 9 intervals took 15 microseconds 12 intervals took 16 microseconds 4 intervals took 17 microseconds 9 intervals took 18 microseconds 6 intervals took 19 microseconds 10 intervals took 20 microseconds 5 intervals took 21 microseconds Additional info: This problem causes MPI benchmarks like Pallas and b_eff to produce innacurate results. Since these benchmarks are important tools used to evaluate linux cluster performance this problem is considered to be serious.
By default, systems without an HPET timer fall back to using the PIT timer (which has .01 s resolution). Although the TSC timer is available for finer resolution, we disabled it by default due to another problem. A workaround for the problem you're seeing right now is to enable the TSC timer by specifying the "tsc" parameter at boot time. Meanwhile I'll revisit the TSC timer issue.
Can someone at RedHat elaborate what the TSC timer issue is? I have an 4 way opteron system, is it safe to use tsc or cyclone as the kernel parameter at boot time?
cyclone is only appropriate for IBM x440 line of machines TSC will work generally, but there may be unreliabilities in case of clockdrift (which is more likely the more cpus you have) between the cpus. (temperature differences alone can cause this over longer time). HPET is the most reliable method, and in theory all modern systems have one..
So Jim Paradis, meant that "we disabled it by default due to another problem" was entirely the clock skew problem? Our system doesn't even detect HPET, does it mean it doesn't have it. Do I have to ask someone at AMD that?
It wasn't just clock skew; there was a synchronization problem such that clock updates (e.g. via ntpdate) would occasionally update the wrong half of the doubleword and you'd see the year jump to something like 586562 (See Bug 114869).
So, is this TSC problem an issue on UP AMD64 systems or on the EM64T, or is it a problem only with NUMA machines? If it's not a problem on these arches then perhaps we should consider making TSC the default on them?
So, is there a fix to this problem yet ? This problem has not been fixed as of RHEL 3 AS Update 4.
We are encountering a situation where the lack of precision is causing an issue with some Oracle timestamps. The database was exported from Oracle on Solaris into Oracle on Linux. > 5/18/2005 5:28:17.358617 PM > 5/18/2005 5:28:17.408617 PM As you can see, the timestamps all end in "8617"
I am currently investigating the feasibility of backporting the ACPI power-management timer code from RHEL4 so as to make another free-running timer available for timekeeping. I have already backported this to another version of the 2.4 kernel, so this should not be terribly difficult.
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-33.EL).
Note that to take advantage of the fix just committed one has to boot with the "pmtmr" boot command line option. This should be noted in the errata documentation.
Should this information be noted in the release notes for U6?
To kbaxley, not "clock=pmtmr", just "pmtmr".
Hi, Bastien. Please show the output of /proc/cmdline on the boot-up that shows the failure, and please also indicate whether the reproducer program in the initial comment of this bug report also fails. Thanks.
Re comment #46, the customer wasn't using the pmtmr option as mentioned in comment #29.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html
*** Bug 210889 has been marked as a duplicate of this bug. ***
Undoing dup, because this bug was fixed in U6 and bug 210889 was entered on U8.