Red Hat Bugzilla – Full Text Bug Listing
|Summary:||something deeply wrong with gettimeofday|
|Product:||[Fedora] Fedora||Reporter:||Bill Nottingham <notting>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED RAWHIDE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||rawhide||CC:||adam, drepper.fsp, fedora, jakub, rvokal, tglx, valdis.kletnieks, zing|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2007-09-10 19:06:59 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Bill Nottingham 2007-08-29 01:12:20 EDT
Description of problem: After 5 minutes of uptime, the system clock goes screwy. When running a simple 'while /bin/true ; date ; uptime ; sleep 2 ; done' loop: Wed Aug 29 01:01:14 EDT 2007 01:01:14 up 4 min, 4 users, load average: 0.13, 0.40, 0.21 Wed Aug 29 01:01:16 EDT 2007 01:01:16 up 4 min, 4 users, load average: 0.13, 0.40, 0.21 Wed Aug 29 02:09:36 EDT 2007 01:01:18 up 5 min, 4 users, load average: 0.12, 0.39, 0.21 Wed Aug 29 02:09:38 EDT 2007 01:01:20 up 5 min, 4 users, load average: 0.12, 0.39, 0.21 Wed Aug 29 02:09:40 EDT 2007 01:01:22 up 5 min, 4 users, load average: 0.19, 0.40, 0.22 At the 5 minute mark, the time jumps forward an hour as reported by date, *even though the kernel variant appears to be unchanged*. If you then try and reset the time (with 'date -s "01:04"): Wed Aug 29 02:11:06 EDT 2007 01:02:48 up 6 min, 4 users, load average: 0.99, 0.60, 0.30 Wed Aug 29 02:12:19 EDT 2007 01:04:01 up 6 min, 4 users, load average: 0.99, 0.60, 0.30 Wed Aug 29 02:12:21 EDT 2007 01:04:03 up 6 min, 4 users, load average: 0.99, 0.61, 0.30 the kernel's date (as seen by uptime) is correctly set, but the date as returned by /bin/date remains wrong (although it changes by the same amount.) Reproduced with multiple kernels; since it appears to be an issue reading the time (and seeing the changelogs), pushing to glibc. Version-Release number of selected component (if applicable): 2.6.90-13
Comment 1 Bill Nottingham 2007-08-29 09:51:26 EDT
2.6.90-11 appears to be behaving better in short testing.
Comment 2 Jakub Jelinek 2007-08-29 10:01:11 EDT
Likely a kernel bug then. The change in 2.6.90-12/13 is just that for gettimeofday it calls gettimeofday@@LINUX_2.6 in kernel's VDSO if available, previously it would always call the vsyscall 0xffffffffff600000ul. gettimeofday in glibc is just a wrapper around either of those, only when that vsyscall or vdso call returns >= -4095UL (== error value), it instead returns -1 and sets errno to -retval.
Comment 3 Chuck Ebbert 2007-08-29 11:34:38 EDT
(In reply to comment #1) > 2.6.90-11 appears to be behaving better in short testing. Does booting the kernel with "vdso=0" fix things with the newer glibc?
Comment 4 Jakub Jelinek 2007-08-29 13:14:24 EDT
*** Bug 264301 has been marked as a duplicate of this bug. ***
Comment 5 Valdis Kletnieks 2007-08-29 13:32:42 EDT
Yes, vdso=0 works as a workaround.
Comment 6 Valdis Kletnieks 2007-08-29 13:34:47 EDT
Forgot to add - if it's a kernel bug, it's probably not confined to Fedora kernels - I got bit on both 2.6.22-rc6-mm1 and 2.6.23-rc3-mm1. I haven't checked a vanilla Linus kernel yet.
Comment 7 Chuck Ebbert 2007-08-29 13:56:41 EDT
Almost certainly caused by the x86_64 vdso patch -- either that or the new glibc vdso support for x86_64 is broken.
Comment 8 Ulrich Drepper 2007-08-29 15:02:00 EDT
(In reply to comment #7) > either that or the new glibc vdso support for x86_64 is broken. All glibc does is jump to the provided address. Not much room to make mistakes.
Comment 9 Dave Jones 2007-08-29 15:11:46 EDT
I wonder if this started happening when we added the 64bit tickless patches (which afaik are in -mm, which explains why Valdis saw it there). Valdis, can you check if it reproduces on Linus' tree ? thanks.
Comment 10 Thomas Gleixner 2007-08-29 15:26:17 EDT
The tickless patches are not changing the VDSO stuff. Thanks, tglx
Comment 11 Valdis Kletnieks 2007-08-29 15:34:39 EDT
23-rc3-mm1 doesn't have the x86_64 tickless code, Andrew dropped it for the nonce. I'll replicate against a Linus -rc3/-rc4/-git later tonight and see what shakes out.
Comment 12 Valdis Kletnieks 2007-08-30 10:13:34 EDT
I took a Linus 2.6.22 tarball, applied 2.6.23-rc3 to it, built it - and the problem is there too. So whatever the issue is, it's in mainline kernels as well as the -mm and Fedora kernels. I found this quote from #2 interesting: gettimeofday in glibc is just a wrapper around either of those, only when that vsyscall or vdso call returns >= -4095UL (== error value), it instead returns -1 and sets errno to -retval. mostly because the time offset is just about 4095/4096 seconds....
Comment 13 Chuck Ebbert 2007-08-31 13:30:24 EDT
Has anyone tried to see if this problem happens with different clock sources?
Comment 14 Chuck Ebbert 2007-09-07 15:08:19 EDT
How to change your clocksource: Look at /sys/devices/system/clocksource/clocksource0/current_clocksource and available_clocksource. Pick something different from available_clocksource and add a kernel boot parameter using that: clocksource=<whatever>
Comment 15 Thorsten Leemhuis 2007-09-07 16:27:50 EDT
booted with "clocksource=acpi_pm" and problem vanished; seems the CPU enters C3 again now as well (according to powertop it did not do that with recent rawhide kernels) Default clocksource was "hpet" beforehand; I can try "jiffies" or "tsc" as well if hat is any help.
Comment 16 Chuck Ebbert 2007-09-07 16:46:54 EDT
What clocksource was it using originally? hpet? Trying jiffies or tsc could be interesting, but will probably be flaky on SMP and/or with cpufreq.
Comment 17 Thorsten Leemhuis 2007-09-07 16:57:15 EDT
(In reply to comment #16) > What clocksource was it using originally? hpet? yes, hpet. > Trying jiffies or tsc could be interesting, but will probably be flaky on SMP > and/or with cpufreq. That's what I assumed. Are there any hpet specific options I could try to narrow down the problem further?
Comment 18 Valdis Kletnieks 2007-09-08 20:21:15 EDT
Confirming - hpet clocksource causes the clock warp, but acpi_pm clocksource works. I wasn't brave enough to test jiffies or tsc, I'm on a x86_64 SMP. ;)
Comment 19 Valdis Kletnieks 2007-09-08 23:33:42 EDT
Over on the lkml thread on this subject, Andi Kleen pointed out that since the vdso runs in ring 3, only the hpet and tsc clocksources are available - which probably means that when you're using acpi_pm, it's forced into a different codepath that avoids whatever the bug we're seeing....
Comment 20 Valdis Kletnieks 2007-09-10 15:13:56 EDT
Andi Kleen posted a patch - Chuck Ebbert finally tracked this down. http://lkml.org/lkml/2007/9/9/8 has the patch. Congrats, Chuck! :)
Comment 21 Chuck Ebbert 2007-09-10 17:28:31 EDT
Comment 22 Chuck Ebbert 2007-09-10 19:06:59 EDT
Private build of kernel 0.171 works here where previous kernel failed, closing as fixed.