+++ This bug was initially created as a clone of Bug #170341 +++ Description of problem: With the simple code shown below, gettimeofday() is going backward by a random amount of time (between 10 us and 2 seconds). #include "stdio.h" #include <sys/time.h> #include <memory.h> #include "errno.h" typedef long long long64; int main(int, char**) { long64 issues = 0; long64 total = 0; while(1) { bool bError = false; timeval tv; memset(&tv, 0, sizeof(tv)); if(::gettimeofday (&tv, NULL) == -1) { printf("returned error:%d\n", errno); bError = true; continue; } long64 ticks = tv.tv_sec*1000000ll+tv.tv_usec; memset(&tv, 0, sizeof(tv)); if(::gettimeofday (&tv, NULL) == -1) { printf("returned error:%d\n", errno); bError = true; continue; } long64 ticks2 = tv.tv_sec*1000000ll+tv.tv_usec; if(ticks2 < ticks) { printf("Delta time:%lld us percentage:%f\n", ticks - ticks2, 100.*double(issues)/total); issues++; } total++; } return 1; } Version-Release number of selected component (if applicable): kernel version: 2.6.9-11.ELsmp How reproducible: Always Steps to Reproduce: 1.Compile and run the above code 2.Program will print "Delta time: xx us percentage: yy" when gettimeofday() reports going backward 3.The frequency of error seems similar regardless if ntpd is running and synchronized or not. Actual Results: Delta time:200 us percentage:0.000003 Delta time:375 us percentage:0.000003 Delta time:910 us percentage:0.000003 Delta time:177 us percentage:0.000003 Delta time:177 us percentage:0.000003 Delta time:177 us percentage:0.000003 Delta time:139 us percentage:0.000003 Delta time:177 us percentage:0.000003 Delta time:841 us percentage:0.000003 Delta time:179 us percentage:0.000003 Delta time:199 us percentage:0.000003 Delta time:178 us percentage:0.000003 Delta time:177 us percentage:0.000003 Delta time:552 us percentage:0.000003 Delta time:176 us percentage:0.000003 Delta time:176 us percentage:0.000003 Delta time:84 us percentage:0.000003 Delta time:258 us percentage:0.000003 Delta time:436 us percentage:0.000003 Delta time:726 us percentage:0.000003 Delta time:60 us percentage:0.000003 Delta time:585 us percentage:0.000003 Delta time:758 us percentage:0.000003 Delta time:717 us percentage:0.000003 Delta time:546 us percentage:0.000002 Delta time:372 us percentage:0.000002 Delta time:444 us percentage:0.000002 Delta time:30 us percentage:0.000002 Delta time:858 us percentage:0.000002 Delta time:392 us percentage:0.000002 Delta time:571 us percentage:0.000002 Delta time:759 us percentage:0.000002 Delta time:746 us percentage:0.000002 Delta time:99 us percentage:0.000002 Delta time:274 us percentage:0.000002 Delta time:760 us percentage:0.000002 Delta time:156 us percentage:0.000002 Delta time:759 us percentage:0.000002 Delta time:754916 us percentage:0.000002 Delta time:808 us percentage:0.000002 Delta time:846 us percentage:0.000002 Delta time:1788404 us percentage:0.000002 Delta time:1787652 us percentage:0.000002 Delta time:1396667 us percentage:0.000002 Delta time:390090 us percentage:0.000002 Delta time:1788633 us percentage:0.000002 Delta time:858 us percentage:0.000002 Delta time:877 us percentage:0.000002 Delta time:860 us percentage:0.000002 Delta time:860 us percentage:0.000002 Delta time:858 us percentage:0.000002 Delta time:858 us percentage:0.000002 Delta time:494 us percentage:0.000002 Delta time:673 us percentage:0.000002 Delta time:909 us percentage:0.000002 Delta time:438 us percentage:0.000002 Delta time:1394161 us percentage:0.000002 Delta time:758667 us percentage:0.000002 Delta time:757164 us percentage:0.000002 Delta time:758494 us percentage:0.000002 Expected Results: I should not see the "Delta time: xx yy" message Additional info: This problem happens on a dual-core bi-processor Opteron box. >cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : 02/21 stepping : 2 cpu MHz : 1994.147 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni bogomips : 3915.77 processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : 02/21 stepping : 2 cpu MHz : 1994.147 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni bogomips : 3981.31 processor : 2 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : 02/21 stepping : 2 cpu MHz : 1994.147 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni bogomips : 3981.31 processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : 02/21 stepping : 2 cpu MHz : 1994.147 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni bogomips : 3981.31 I also tried to disable TSC but this doesn't work since the kernel is compiled with CONFIG_X86_TSC. >dmesg Linux version 2.6.9-11.ELsmp (bhcompile.redhat.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri May 20 18:26:27 EDT 2005 ............ Kernel command line: ro root=LABEL=/ rhgb quiet notsc notsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC. Initializing CPU#0 CPU 0 irqstacks, hard=c03db000 soft=c03bb000 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 1994.147 MHz processor. Using tsc for high-res timesource
*** Bug 174401 has been marked as a duplicate of this bug. ***
Probably your program is bouncing between CPUs and each core is a few seconds different. I've seen that before with BIOS bugs. If you do a "while true ; do date ; done | uniq -c" does it go back and forth in time? If you have schedutils installed try setting the process affinity with something like: taskset -c 0 bash -c "while true ; do date ; done | uniq -c"
Comment #3 would only be true of TSC is being used for timekeeping. TSC's between processors are not syncronized on AMD based systems. As a result the kernel might read from different TSC's each time and introduce error into the timekeeping math. Use "notsc" at boot to prevent this problem. This is a known issue that will be fixed in future releases.
Even with notsc clock=pmtmr I get time.c: Using 1.193182 MHz PIT timer. time.c: Detected 2002.665 MHz processor. time.c: Using PIT/TSC based timekeeping. this even with the 2.6.9-42.ELsmp kernel Then again I am running at x86_64 so notsc might be working under i686.
Re: Comment #5, Does the system keep time properly with "notsc clock=pmtmr" on x86_64? i686 has different code for timekeeping anyway.
No it doesn't :( I am getting messages like this from ntp ntpd: time reset -0.828991 s for a while and then ntpd: frequency error -512 PPM exceeds tolerance 500 PPM Running "while true ; do date ; done | uniq -c" also shows that time sometimes goes backward :( Is there a similar patch to the one from comment 12 in #172199 for x86_64? I'll try to have a look at the code during the weekend.
Should I add a new bug for x86_64? Anyway from what I can see at the code (and I know almost nothing about the kernel) in arch/x86_64/kernel/time.c I can't see anywhere that pmtmr_ioport gets initialized unless arch/i386/kernel/acpi/boot.c is also used under x86_64 which I find unlikely. So if there is no hpet timer we end up in tsc :(
Well arch/i386/kernel/acpi/boot.c is used so it seems that somehow pm-timer doesn't get detected. The same motherboard at home A8V Deluxe (with the beta bios) detects the pm-timer under Fedora5 so I'll upgrade the bios in one of the machines at work to see if that helps or it is a problem with the detection code under RHEL4.
i686 timekeeping code only supports TSC, which wont work properly on multiprocessor AMD machines. a patch is currently being tested which disables C1 clock ramping so the system can use TSC safely
To avoid this issue, use "notsc" at boot on x86_64 to disable TSC based timekeeping. On i386 (i686), use "clock=hpet", "clock=pmtmr" or "clock=pit" to select a different timesource.
notsc doesn't do anything for the A8V Deluxe under x86_64 since the system doesn't have HPET and pm doesn't get detected the system still falls back to tsc :(
Re: Comment #13, looks like we will need to disable C1 clock ramping so TSC can be used on i386. RedHat is working on a patch for this. What chipset does the A8V Deluxe have? Seems like it should probably have a PMTimer... maybe thats a seperate issue worth filing a new and seperate BugZilla?
(In reply to comment #14) > Re: Comment #13, looks like we will need to disable C1 clock ramping so TSC can > be used on i386. RedHat is working on a patch for this. > > What chipset does the A8V Deluxe have? Seems like it should probably have a > PMTimer... maybe thats a seperate issue worth filing a new and seperate BugZilla? If this can help, I retried the very same test on a different version of RHAT >cat /etc/redhat-release Red Hat Enterprise Linux WS release 4 (Nahant Update 3) Linux version 2.6.9-34.ELsmp (bhcompile.redhat.com) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Fri Feb 24 16:54:53 EST 2006 and my little program does no longer produce incorrect results. Besides, in dmesg logs I see now this line: checking TSC synchronization across 4 CPUs: passed. So I guess that this means that synchronisations issues between the 4 cores have been solved.
Re Comment #15 There were a few changes that went into RHEL4 U3 and U4.. One to check TSC synchronization and another to move TSC to the last picking order of timers. Are you using i686 or x86_64?
(In reply to comment #17) > Re Comment #15 > > There were a few changes that went into RHEL4 U3 and U4.. One to check TSC > synchronization and another to move TSC to the last picking order of timers. > > Are you using i686 or x86_64? i686 rigth now but I'm pretty sure that among many other tests I also ran this one on x86_64 some time ago and it worked.
Can the reporter try to reproduce this issue with the newest kernel (RHEL4.4)? We believe this is a regression that was fixed in the last release. Redhat is unable to reproduce this issue with the current U4 kernel.
Comment #14 Yes the non detection of PM-Timer should be a different bug report, I'll add one shortly. Testing again the system with a RHEl4.4 userland as well as kernel seems to have fixed the "time goes backward" problem as far as I can tell. I remember reading about changes in the way gibc calls gettimeofday not that I think about it. I suspect that the bug can be closed if Claudiu doesn't see any problems.
(In reply to comment #23) > Comment #14 > > Yes the non detection of PM-Timer should be a different bug report, I'll add one > shortly. Testing again the system with a RHEl4.4 userland as well as kernel > seems to have fixed the "time goes backward" problem as far as I can tell. I > remember reading about changes in the way gibc calls gettimeofday not that I > think about it. > > I suspect that the bug can be closed if Claudiu doesn't see any problems. I have no objections to close this case. (I can't test RHEL4.4 but I'll do it as soon as I can).
closing this bug since because was fixed in a previous release
Hmm after a few hours running RHEL4.4 x86_64: $ while true ; do date ; done | uniq -c 318 Sat Aug 19 18:46:54 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 2 Sat Aug 19 18:46:54 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 2 Sat Aug 19 18:46:54 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 2 Sat Aug 19 18:46:54 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:54 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:54 BST 2006 312 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:56 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:56 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:56 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:56 BST 2006 1 Sat Aug 19 18:46:55 BST 2006 1 Sat Aug 19 18:46:56 BST 2006 I added #203236 for the non detection of pmtimer in the Asus A8V Deluxe mb.
Okay -- this bug is against i686 ... I'll test against x86_64. Could you send me the command line you used to boot? P.