Bug 262481
Summary: | something deeply wrong with gettimeofday | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bill Nottingham <notting> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | adam, drepper, fedora, jakub, rvokal, tglx, valdis.kletnieks, zing |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-09-10 23:06:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 235703 |
Description
Bill Nottingham
2007-08-29 05:12:20 UTC
2.6.90-11 appears to be behaving better in short testing. Likely a kernel bug then. The change in 2.6.90-12/13 is just that for gettimeofday it calls gettimeofday@@LINUX_2.6 in kernel's VDSO if available, previously it would always call the vsyscall 0xffffffffff600000ul. gettimeofday in glibc is just a wrapper around either of those, only when that vsyscall or vdso call returns >= -4095UL (== error value), it instead returns -1 and sets errno to -retval. (In reply to comment #1) > 2.6.90-11 appears to be behaving better in short testing. Does booting the kernel with "vdso=0" fix things with the newer glibc? *** Bug 264301 has been marked as a duplicate of this bug. *** Yes, vdso=0 works as a workaround. Forgot to add - if it's a kernel bug, it's probably not confined to Fedora kernels - I got bit on both 2.6.22-rc6-mm1 and 2.6.23-rc3-mm1. I haven't checked a vanilla Linus kernel yet. Almost certainly caused by the x86_64 vdso patch -- either that or the new glibc vdso support for x86_64 is broken. (In reply to comment #7) > either that or the new glibc vdso support for x86_64 is broken. All glibc does is jump to the provided address. Not much room to make mistakes. I wonder if this started happening when we added the 64bit tickless patches (which afaik are in -mm, which explains why Valdis saw it there). Valdis, can you check if it reproduces on Linus' tree ? thanks. The tickless patches are not changing the VDSO stuff. Thanks, tglx 23-rc3-mm1 doesn't have the x86_64 tickless code, Andrew dropped it for the nonce. I'll replicate against a Linus -rc3/-rc4/-git later tonight and see what shakes out. I took a Linus 2.6.22 tarball, applied 2.6.23-rc3 to it, built it - and the problem is there too. So whatever the issue is, it's in mainline kernels as well as the -mm and Fedora kernels. I found this quote from #2 interesting: gettimeofday in glibc is just a wrapper around either of those, only when that vsyscall or vdso call returns >= -4095UL (== error value), it instead returns -1 and sets errno to -retval. mostly because the time offset is just about 4095/4096 seconds.... Has anyone tried to see if this problem happens with different clock sources? How to change your clocksource: Look at /sys/devices/system/clocksource/clocksource0/current_clocksource and available_clocksource. Pick something different from available_clocksource and add a kernel boot parameter using that: clocksource=<whatever> booted with "clocksource=acpi_pm" and problem vanished; seems the CPU enters C3 again now as well (according to powertop it did not do that with recent rawhide kernels) Default clocksource was "hpet" beforehand; I can try "jiffies" or "tsc" as well if hat is any help. What clocksource was it using originally? hpet? Trying jiffies or tsc could be interesting, but will probably be flaky on SMP and/or with cpufreq. (In reply to comment #16) > What clocksource was it using originally? hpet? yes, hpet. > Trying jiffies or tsc could be interesting, but will probably be flaky on SMP > and/or with cpufreq. That's what I assumed. Are there any hpet specific options I could try to narrow down the problem further? Confirming - hpet clocksource causes the clock warp, but acpi_pm clocksource works. I wasn't brave enough to test jiffies or tsc, I'm on a x86_64 SMP. ;) Over on the lkml thread on this subject, Andi Kleen pointed out that since the vdso runs in ring 3, only the hpet and tsc clocksources are available - which probably means that when you're using acpi_pm, it's forced into a different codepath that avoids whatever the bug we're seeing.... Andi Kleen posted a patch - Chuck Ebbert finally tracked this down. http://lkml.org/lkml/2007/9/9/8 has the patch. Congrats, Chuck! :) In kernel-2.6.23-0.171.rc5.git1 Private build of kernel 0.171 works here where previous kernel failed, closing as fixed. |