262481 – something deeply wrong with gettimeofday

Bug 262481 - something deeply wrong with gettimeofday

Summary: something deeply wrong with gettimeofday

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	All
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	264301 (view as bug list)
Depends On:
Blocks:	F8Blocker
TreeView+	depends on / blocked

Reported:	2007-08-29 05:12 UTC by Bill Nottingham
Modified:	2014-03-17 03:08 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-09-10 23:06:59 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bill Nottingham 2007-08-29 05:12:20 UTC

Description of problem:

After 5 minutes of uptime, the system clock goes screwy.

When running a simple 'while /bin/true ; date ; uptime ; sleep 2 ; done' loop:


Wed Aug 29 01:01:14 EDT 2007
 01:01:14 up 4 min,  4 users,  load average: 0.13, 0.40, 0.21
Wed Aug 29 01:01:16 EDT 2007
 01:01:16 up 4 min,  4 users,  load average: 0.13, 0.40, 0.21
Wed Aug 29 02:09:36 EDT 2007
 01:01:18 up 5 min,  4 users,  load average: 0.12, 0.39, 0.21
Wed Aug 29 02:09:38 EDT 2007
 01:01:20 up 5 min,  4 users,  load average: 0.12, 0.39, 0.21
Wed Aug 29 02:09:40 EDT 2007
 01:01:22 up 5 min,  4 users,  load average: 0.19, 0.40, 0.22

At the 5 minute mark, the time jumps forward an hour as reported by date, *even
though the kernel variant appears to be unchanged*.

If you then try and reset the time (with 'date -s "01:04"):

Wed Aug 29 02:11:06 EDT 2007
 01:02:48 up 6 min,  4 users,  load average: 0.99, 0.60, 0.30
Wed Aug 29 02:12:19 EDT 2007
 01:04:01 up 6 min,  4 users,  load average: 0.99, 0.60, 0.30
Wed Aug 29 02:12:21 EDT 2007
 01:04:03 up 6 min,  4 users,  load average: 0.99, 0.61, 0.30

the kernel's date (as seen by uptime) is correctly set, but the date as returned
by /bin/date remains wrong (although it changes by the same amount.)

Reproduced with multiple kernels; since it appears to be an issue reading the
time (and seeing the changelogs), pushing to glibc.

Version-Release number of selected component (if applicable):

2.6.90-13

Comment 1 Bill Nottingham 2007-08-29 13:51:26 UTC

2.6.90-11 appears to be behaving better in short testing.

Comment 2 Jakub Jelinek 2007-08-29 14:01:11 UTC

Likely a kernel bug then.  The change in 2.6.90-12/13 is just that for
gettimeofday it calls gettimeofday@@LINUX_2.6 in kernel's VDSO if available,
previously it would always call the vsyscall 0xffffffffff600000ul.
gettimeofday in glibc is just a wrapper around either of those, only when
that vsyscall or vdso call returns >= -4095UL (== error value), it instead
returns -1 and sets errno to -retval.

Comment 3 Chuck Ebbert 2007-08-29 15:34:38 UTC

(In reply to comment #1)
> 2.6.90-11 appears to be behaving better in short testing.

Does booting the kernel with "vdso=0" fix things with the newer glibc?

Comment 4 Jakub Jelinek 2007-08-29 17:14:24 UTC

*** Bug 264301 has been marked as a duplicate of this bug. ***

Comment 5 Valdis Kletnieks 2007-08-29 17:32:42 UTC

Yes, vdso=0 works as a workaround.

Comment 6 Valdis Kletnieks 2007-08-29 17:34:47 UTC

Forgot to add - if it's a kernel bug, it's probably not confined to Fedora
kernels - I got bit on both 2.6.22-rc6-mm1 and 2.6.23-rc3-mm1.  I haven't
checked a vanilla Linus kernel yet.

Comment 7 Chuck Ebbert 2007-08-29 17:56:41 UTC

Almost certainly caused by the x86_64 vdso patch -- either that or the new glibc
vdso support for x86_64 is broken.

Comment 8 Ulrich Drepper 2007-08-29 19:02:00 UTC

(In reply to comment #7)
> either that or the new glibc vdso support for x86_64 is broken.

All glibc does is jump to the provided address.  Not much room to make mistakes.

Comment 9 Dave Jones 2007-08-29 19:11:46 UTC

I wonder if this started happening when we added the 64bit tickless patches
(which afaik are in -mm, which explains why Valdis saw it there).

Valdis, can you check if it reproduces on Linus' tree ?

thanks.

Comment 10 Thomas Gleixner 2007-08-29 19:26:17 UTC

The tickless patches are not changing the VDSO stuff.

Thanks,
   tglx

Comment 11 Valdis Kletnieks 2007-08-29 19:34:39 UTC

23-rc3-mm1 doesn't have the x86_64 tickless code, Andrew dropped it for the nonce.

I'll replicate against a Linus -rc3/-rc4/-git later tonight and see what shakes out.

Comment 12 Valdis Kletnieks 2007-08-30 14:13:34 UTC

I took a Linus 2.6.22 tarball, applied 2.6.23-rc3 to it, built it - and the
problem is there too.  So whatever the issue is, it's in mainline kernels as
well as the -mm and Fedora kernels.

I found this quote from #2 interesting:

gettimeofday in glibc is just a wrapper around either of those, only when
that vsyscall or vdso call returns >= -4095UL (== error value), it instead
returns -1 and sets errno to -retval.

mostly because the time offset is just about 4095/4096 seconds....

Comment 13 Chuck Ebbert 2007-08-31 17:30:24 UTC

Has anyone tried to see if this problem happens with different clock sources?

Comment 14 Chuck Ebbert 2007-09-07 19:08:19 UTC

How to change your clocksource:

Look at /sys/devices/system/clocksource/clocksource0/current_clocksource
and available_clocksource. Pick something different from available_clocksource
and add a kernel boot parameter using that:

    clocksource=<whatever>

Comment 15 Thorsten Leemhuis 2007-09-07 20:27:50 UTC

booted with "clocksource=acpi_pm" and problem vanished; seems the CPU enters C3
again now as well (according to powertop it did not do that with recent rawhide
kernels)

Default clocksource was "hpet" beforehand; I can try "jiffies" or "tsc" as well
if hat is any help.

Comment 16 Chuck Ebbert 2007-09-07 20:46:54 UTC

What clocksource was it using originally? hpet?

Trying jiffies or tsc could be interesting, but will probably be flaky on SMP
and/or with cpufreq.

Comment 17 Thorsten Leemhuis 2007-09-07 20:57:15 UTC

(In reply to comment #16)
> What clocksource was it using originally? hpet?

yes, hpet.

> Trying jiffies or tsc could be interesting, but will probably be flaky on SMP
> and/or with cpufreq.

That's what I assumed.

Are there any hpet specific options I could try to narrow down the problem further?

Comment 18 Valdis Kletnieks 2007-09-09 00:21:15 UTC

Confirming - hpet clocksource causes the clock warp, but acpi_pm clocksource
works.  I wasn't brave enough to test jiffies or tsc, I'm on a x86_64 SMP. ;)

Comment 19 Valdis Kletnieks 2007-09-09 03:33:42 UTC

Over on the lkml thread on this subject, Andi Kleen pointed out that since the
vdso runs in ring 3, only the hpet and tsc clocksources are available - which
probably means that when you're using acpi_pm, it's forced into a different
codepath that avoids whatever the bug we're seeing....

Comment 20 Valdis Kletnieks 2007-09-10 19:13:56 UTC

Andi Kleen posted a patch - Chuck Ebbert finally tracked this down.

http://lkml.org/lkml/2007/9/9/8 has the patch.

Congrats, Chuck! :)

Comment 21 Chuck Ebbert 2007-09-10 21:28:31 UTC

In kernel-2.6.23-0.171.rc5.git1

Comment 22 Chuck Ebbert 2007-09-10 23:06:59 UTC

Private build of kernel 0.171 works here where previous kernel failed, closing
as fixed.

Note You need to log in before you can comment on or make changes to this bug.