Bug 690478

Summary: vDSO gettimeofday used for Xen even though pvclocks do not support it
Product: Red Hat Enterprise Linux 5 Reporter: Stefan Nordhausen <stefan.nordhausen>
Component: kernel-xenAssignee: Laszlo Ersek <lersek>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.6CC: drjones, leiwang, lersek, pbonzini, prarit, qwan, xen-maint, yuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-01 08:49:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
C program to reproduce the bogus time results
none
output of "strace date" in main system, kernel 2.6.18-252.el5xen
none
output of "strace date" in chroot, kernel 2.6.18-252.el5xen
none
output of "ltrace date" in chroot, kernel 2.6.18-252.el5xen
none
strace of Laszlo's sample program
none
xen: set vxtime mode to VXTIME_KVM none

Description Stefan Nordhausen 2011-03-24 13:03:57 UTC
We have the problem that clock_gettime() and other time related glibc calls return bogus results when used inside a Debian chroot environment on a Xen-enabled kernel. This may be related to bug #675727 and/or bug #673616 which we also experienced.

With 'bogus results' I mean that time is advancing by 70-71 minutes per second, as seen in this example on the shell inside the chroot:


$ for i in 1 2 3 4 5; do date; sleep 1; done
Sat Apr  2 05:07:11 CEST 2011
Sat Apr  2 06:17:49 CEST 2011
Sat Apr  2 07:28:35 CEST 2011
Sat Apr  2 08:39:21 CEST 2011
Sat Apr  2 09:50:06 CEST 2011


Environment in our case is a 64 bit Debian Lenny/Squeeze chroot on a 64 bit RHEL 5.6, each with all current updates applied. The RHEL kernel was manually updated to 2.6.18-247.el5 which was needed to fix bug #675727 (see comment #17 in that bug), but the original kernel also had this problem.

Time related functions work fine, though,
- in the RHEL 5.6 main system
- when the Debian binaries are called from outside the chroot (=with RHEL libs)
- with a normal kernel instead of a Xen-kernel
- on a previous RHEL 5.5 installation with Xen-kernel

Note that we are not doing any virtual machine related things.

Comment 1 Stefan Nordhausen 2011-03-24 13:06:36 UTC
Created attachment 487322 [details]
C program to reproduce the bogus time results

Comment 3 Laszlo Ersek 2011-03-25 18:54:12 UTC
Hello Stefan,

please strace the test program both when it misbehaves and when it works alright.


I installed Debian 6.0.1 (and then updated it) on my amd64 workstation. There I built the following program both dynamically and statically:

----[ cut here ]----
#define _XOPEN_SOURCE 600

#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int
main(void)
{
  struct timespec foo;
  
  return 0 == clock_gettime(CLOCK_MONOTONIC, &foo)
      && 0 <= fprintf(stdout, "tv_sec %ju  tv_nsec %ju\n",
      (uintmax_t)foo.tv_sec, (uintmax_t)foo.tv_nsec)
      && 0 == fflush(stdout)
      ? EXIT_SUCCESS : EXIT_FAILURE;
}
----[ cut here ]----

$ gcc -o clock_gettime                -std=c99 -Wall -Wextra -Wformat=2 \
    clock_gettime.c -l rt
$ gcc -o clock_gettime.static -static -std=c99 -Wall -Wextra -Wformat=2 \
    clock_gettime.c -l rt


Both are working alright under Debian. I rebooted my RHEL-5.6 installation with
2.6.18-247.el5xen, mounted the Debian partition, and ran the statically linked executable again from there (as an initial test before setting up the chroot environment). Perhaps static linking affects the syscall behind clock_gettime(), but the static executable kept working correctly. (Also, the dynamic executable, linked against the RHEL-5.6 libs, was okay.)

This far I can't reproduce the problem. Can you please describe how you've set up your chroot environment? Thank you.

Comment 4 Stefan Nordhausen 2011-03-29 12:45:04 UTC
(In reply to comment #3)
> I installed Debian 6.0.1 (and then updated it) on my amd64 workstation.

Just to avoid possible confusion: I did NOT install a virtualized Debian. We are running Debian chroot that were created with "debootstrap".

To create your own Debian chroot do this as root:

- Get the most recent debootstrap from http://ftp.us.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.26+squeeze1_all.deb
- Unpack with "ar x debootstrap*.deb"
- Install debootstrap with "tar xzf data.tar.gz -C /" (mind the "/" at the end)
- Create directory for chroot, e.g. "mkdir ~/gettime_chroot"
- Set up chroot with "debootstrap --arch amd64 squeeze ~/gettime_chroot"
- Enter the chroot with "chroot ~/gettime_chroot"
- Get new Debian signatures with "apt-get update"
- Install new packages as needed with "apt-get install <package name>"
- Still inside the chroot, run "watch date" to see if clock_gettime() is working. "date" uses clock_gettime() according to "ltrace".


I was able to reproduce the problem with a RHEL 5.6 installed in a VirtualBox VM instead of the real hardware (where we found the issue first). I tried out a number of kernels for the main system with Laszlo's program from comment #3:

2.6.18-238.5.1.el5 --> OK
2.6.18-247.el5xen --> NOT OK
2.6.18-252.el5xen --> NOT OK
2.6.18-252.el5 --> OK

So only the Xen kernels are not working.

Once I compiled the program from comment #3 statically(!) with RHEL's gcc, it worked as expected when run inside the chroot.
I also compiled the program dynamically with the RHEL's gcc and then ran the binary inside the chroot, which gave incorrect results.

Comment 5 Stefan Nordhausen 2011-03-29 12:57:39 UTC
Created attachment 488429 [details]
output of "strace date" in main system, kernel 2.6.18-252.el5xen

Comment 6 Stefan Nordhausen 2011-03-29 12:58:44 UTC
Created attachment 488430 [details]
output of "strace date" in chroot, kernel 2.6.18-252.el5xen

Note that no syscall to "clock_gettime" is made

Comment 7 Stefan Nordhausen 2011-03-29 12:59:47 UTC
Created attachment 488431 [details]
output of "ltrace date" in chroot, kernel 2.6.18-252.el5xen

Note that library function "clock_gettime" is called

Comment 8 Stefan Nordhausen 2011-03-29 13:11:56 UTC
Created attachment 488439 [details]
strace of Laszlo's sample program

Both binaries were compiled using RHEL's gcc, one dynamic, one static. They are run inside the Debian chroot.

Note that the static one is making a syscall (=RHEL libs compiled in) and the other is making a vsyscall (=Debian libs dynamically loaded).

The results are completely different, static binary says "tv_sec 4343" while dynamic one says "tv_sec 234150" even though there were just a few moments in between.

Comment 9 Laszlo Ersek 2011-03-29 14:32:01 UTC
It looks like all of the following are necessary to see the bug:
- RHEL xen kernel,
- Debian eglibc,
- dynamically linked app.

For a dynamically linked app, the Debian eglibc will use the vsyscall (while the RHEL glibc still uses the syscall, see comment #5 and comment #6). It seems that under the RHEL xen kernel, the vsyscall either returns bogus results, or the Debian glibc is unable to call it -- or interpret the answer -- correctly.

Given that the RHEL glibc avoids the vsyscall among the same circumstances (or that it calls a vsyscall that falls back successfully to the clock_gettime() syscall), and that Debian 6 even uses a fork (eglibc), I wonder if the verdict might simply be: "The eglibc in Squeeze is an unsupported userspace-kernel interface on RHEL-5".

Comment 10 Stefan Nordhausen 2011-03-29 15:24:19 UTC
(In reply to comment #9)
> I wonder if the verdict
> might simply be: "The eglibc in Squeeze is an unsupported userspace-kernel
> interface on RHEL-5".

The point is that it works fine with RHEL 5.5 Xen kernels and RHEL 5.6 non-Xen kernels, which looks a bit strange to me and hints at some unintentional modifications in the new Xen kernel.

Comment 11 Laszlo Ersek 2011-03-29 15:53:49 UTC
I'll try to setup the chroot environment as you described in comment 4, and to find the first RHEL-5 kernel that does not cooperate with eglibc's clock_gettime().

Comment 12 Stefan Nordhausen 2011-03-29 16:31:48 UTC
The same symptom of "date" going nuts also occurs for a chroot with openSuse 11.4 (=most recent version), so this is not restricted to Debian.

Comment 13 Laszlo Ersek 2011-03-30 08:59:36 UTC
(In reply to comment #4)

> To create your own Debian chroot do this as root:
> 
> - Get the most recent debootstrap from
> http://ftp.us.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.26+squeeze1_all.deb
> - Unpack with "ar x debootstrap*.deb"
> - Install debootstrap with "tar xzf data.tar.gz -C /" (mind the "/" at the end)
> - Create directory for chroot, e.g. "mkdir ~/gettime_chroot"
> - Set up chroot with "debootstrap --arch amd64 squeeze ~/gettime_chroot"

Here's the end of debootstrap's output:

I: Extracting xz-utils...
I: Extracting zlib1g...
W: Failure trying to run: chroot /root/gettime_chroot mount -t proc proc /proc

The file "/root/gettime_chroot/debootstrap/debootstrap.log" says "Segmentation fault". Indeed, when I retry the above chroot+mount command manually, it says "Segmentation fault". Host kernel is 2.6.18-238.el5xen (RHEL-5.6).

I retried the bootstrap with a clean directory after booting a bare-metal 2.6.18-238.el5 kernel. It failed with the same symptoms.

This far we were unable to reproduce the error; we can't set up the three conditions described in comment 9.

IIRC Paolo tried to shoehorn the Debian eglibc into a dynamic executable, outside of the chroot, and it didn't work for him. I now tried to do something similar:

RHEL test:

$ cd /lib64 
$ ./ld-2.5.so --library-path $PWD ../bin/date
Wed Mar 30 04:54:28 EDT 2011

Debian test ("/root/gettime_chroot" tree in the state where the debootstrap failed above):

$ cd /root/gettime_chroot/lib64/
$ ./ld-2.5.so --library-path $PWD ../bin/date
../bin/date: error while loading shared libraries: ../bin/date: wrong ELF class: ELFCLASS64

$ file ld-2.5.so ../bin/date | cut -f 1 -d ,
ld-2.5.so:   ELF 32-bit LSB shared object
../bin/date: ELF 64-bit LSB executable

And now I see that in the chroot-to-be, "lib64" is actually a symlink to "lib". Why is that so?

Comment 14 Laszlo Ersek 2011-03-30 12:11:16 UTC
I installed the Debian partition again, as described in comment 3 (no,
this is not a guest, it is a normal disk partition to be mounted later
in the host). I'm unable to enter it "normally" with chroot (ie. to run
a shell) due to SIGSEGV, but following Paolo's recommendation, I could
run "chroot /misc /root/clock_gettime". (The latter is a dynamic
executable built within the Debian system -- ie. I booted temporarily
into Debian.) This program crashes with SIGSEGV as well. I could strace
it, and perhaps this is interesting:

    futex(0x7fff63ad44bc, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7fff63ad44bc, 0x189 /* FUTEX_??? */, 1, NULL, 2b98eedd9700)
    = -1 ENOSYS (Function not implemented)

This is similar to what can be seen in attachment 488430 [details] ('output of
"strace date" in chroot, kernel 2.6.18-252.el5xen')

    futex(0x7fff627e8c0c, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7fff627e8c0c,
    FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
    2aaee2d3b700) = -1 ENOSYS (Function not implemented)

and in attachment 488439 [details] ("strace of Laszlo's sample program"), the dynamic case:

    futex(0x7fff29358b2c, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7fff29358b2c,
    FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 2acbb48d4700) =
    -1 ENOSYS (Function not implemented)

I get the same result (including the SIGSEGV) under 2.6.18-238.el5, ie.
bare metal:

    futex(0x7fff1b51e67c, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7fff1b51e67c, 0x189 /* FUTEX_??? */, 1, NULL, 2abdd63f6700)
    = -1 ENOSYS (Function not implemented)

    rt_sigaction(SIGRTMIN, {0x2abdd61df860, [], SA_RESTORER|SA_SIGINFO,
    0x2abdd61e8f60}, NULL, 8) = 0

    rt_sigaction(SIGRT_1, {0x2abdd61df8f0, [],
    SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2abdd61e8f60}, NULL, 8) = 0

    rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0

    getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024,
    rlim_max=RLIM_INFINITY}) = 0

    --- SIGSEGV (Segmentation fault) @ 0 (0) ---
    +++ killed by SIGSEGV +++


However, under 2.6.18-194.el5xen, the clock_gettime program works as
expected -- no crash, and it even prints correct times:

    futex(0x7fffcb26c09c, FUTEX_WAKE_PRIVATE, 1) = 0

    futex(0x7fffcb26c09c, 0x189 /* FUTEX_??? */, 1, NULL, 2addb59a5700)
    = -1 ENOSYS (Function not implemented)
    
    rt_sigaction(SIGRTMIN, {0x2addb578e860, [], SA_RESTORER|SA_SIGINFO,
    0x2addb5797f60}, NULL, 8) = 0
    
    rt_sigaction(SIGRT_1, {0x2addb578e8f0, [],
    SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2addb5797f60}, NULL, 8) = 0

    rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0

    getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024,
    rlim_max=RLIM_INFINITY}) = 0
    
    clock_gettime(CLOCK_MONOTONIC, {562, 430186340}) = 0

Note how the first six steps match in the last two quoted sections. I
think it's probable that precisely the clock_gettime() libc function
crashes the processes

Even "chroot /misc" works under -194.el5xen. So I think we do have a
regression. The difference is in the symptoms I experience: it's not an
overrunning clock, but crashing processes. I shall start to bisect from
194 to 238 (as noted above, in the -238 build, even the bare metal
kernel crashes Debian's eglibc).

Comment 15 Stefan Nordhausen 2011-03-30 12:23:56 UTC
The reason why it's segfaulting is bug #688312 which is only fixed in kernel-2.6.18-238.8.1.el5 or newer. It is not yet fixed in the kernel which is currently delivered through online updating, since the bugfix is still in QA. That's why I'm using the kernels from http://people.redhat.com/jwilson/el5/

Comment 16 Laszlo Ersek 2011-03-30 12:51:22 UTC
The clock_gettime test program breaks from -212 (xen) to -214 (xen). No -213 is readily available, I'm re-brewing one now.

Comment 17 Laszlo Ersek 2011-03-30 13:17:39 UTC
(In reply to comment #15)
> The reason why it's segfaulting is bug #688312 which is only fixed in
> kernel-2.6.18-238.8.1.el5 or newer. It is not yet fixed in the kernel which is
> currently delivered through online updating, since the bugfix is still in QA.
> That's why I'm using the kernels from http://people.redhat.com/jwilson/el5/

Do I understand this right?

The test program happens to work with -212 due to a happy random constellation. Bug 675727 prevents me to bisect the real issue (the overrunning clock) from -214 to -246. With -247, where bug 675727 is fixed, the program should not crash, but the clock doesn't work (this is the current bug); it broke somewhere after -212.

So, if I wanted to bisect the *real* bug, I wouldn't have anything else to do than apply / backport the bug 675727 patch to *all* tested kernels, and brew them all. The last known good build is -212, 247-212-1 = 34, so I shouldn't have to patch/build more than 6 versions when bisecting.

Comment 18 Laszlo Ersek 2011-03-30 14:20:10 UTC
Here's a map I came up with, describing what commits to cherry-pick on top of any given base-build during the bisection. These should enable Debian's eglibc to at least call clock_gettime() via the vdso, and allow us to look at the result.

      bug 644860      bug 673616      bug 675727
      commit 863229a  commit a863435  commit c18529d

213-  backport        backport        backport
229-  present         backport        backport
245-  present         present         backport
247-  present         present         present

According to Paolo's suggestion: since -214 was the first build ever that saw vdso, it's possible that the added clock_gettime() functionality was broken wrt. eglibc from the start. For this reason I'll check out -214 before starting in the middle.

Comment 19 Laszlo Ersek 2011-03-30 17:30:55 UTC
Paolo was right, the vdso implementation of clock_gettime() never worked under Xen in RHEL.

I cherry-picked 863229a, a863435, and c18529d on top of -214, and rebuilt it locally. I used the hypervisor from the -214 kernel-xen RPM. Results:

while printf "%s -- " $(date +'%H:%M:%S'); do
  chroot /misc /root/clock_gettime
  sleep 1
done

18:37:20 -- tv_sec 1462890  tv_nsec 702634293
18:37:21 -- tv_sec 1467138  tv_nsec 131885746
18:37:22 -- tv_sec 1471383  tv_nsec 222874206
18:37:23 -- tv_sec 1475629  tv_nsec 285067096
18:37:24 -- tv_sec 1479874  tv_nsec 450274275
18:37:25 -- tv_sec 1484120  tv_nsec 362648368
18:37:26 -- tv_sec 1488366  tv_nsec 264068684
18:37:27 -- tv_sec 1492611  tv_nsec 821318262
18:37:28 -- tv_sec 1496856  tv_nsec 784331201

Paolo also pointed out earlier that the difference between adjacent lines is about 2^32 * 1000 nanoseconds (or 2^32 microseconds).

Comment 20 Laszlo Ersek 2011-03-31 11:56:30 UTC
Created attachment 489049 [details]
xen: set vxtime mode to VXTIME_KVM

This forces the arch/x86_64/vdso/vclock_gettime.c vdso functions to fall back to the heavy-weight syscalls. The RHEL libc directly relies on the syscalls anyway, but foreign C libraries might want to use the vdso functions.

Tested it on my workstation, on top of -214 + 863229a + a863435 + c18529d.

Brew link (build in progress, -253-based):

    http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3219377

Comment 21 Laszlo Ersek 2011-04-01 08:49:30 UTC
Prarit's brew build for bug 691735 solves this bug, so I'm closing this as a duplicate.

The first two hunks of commit a863435 for bug 673616 (= attachment 475892 [details]) lack a level of dereferencing on "vdso_sysctl_vsyscall". This was reported (fixed) in bug 691735 (attachment 488409 [details]).

"vdso_sysctl_vsyscall" is declared in "arch/x86_64/vdso/vextern.h":

   3 #define VEXTERN(x) \
   4	    extern typeof(x) *vdso_ ## x __attribute__((visibility("hidden")));

   9 /* Any kernel variables used in the vDSO must be exported in the main
  10	kernel's vmlinux.lds.S/vsyscall.h/proper __section and
  11	put into vextern.h and be referenced as a pointer with vdso prefix.
  12	The main kernel later fills in the values.   */

  20 VEXTERN(sysctl_vsyscall)

Even though I looked at this before, I didn't notice that the VEXTERN() function-like macro declares a *pointer*, even though the comment says so explicitly.

The pointed-to object, "sysctl_vsyscall" is set corretly to zero in the Xen kernel ("arch/x86_64/kernel/vsyscall-xen.c", line 238):

  229  static int __init vsyscall_init(void)
  230  {

  236  #ifdef CONFIG_XEN
  237          map_vsyscall_user();
  238          sysctl_vsyscall = 0; /* disable vgettimeofay() */
  239  #endif

  244  }

Once "*vdso_sysctl_vsyscall" is accessed with correct dereferencing in the vdso time functions ("arch/x86_64/vdso/vclock_gettime.c"), we don't need to activate the second part of the condition, because the original code in vsyscall_init() above does the job through the first part.

*** This bug has been marked as a duplicate of bug 691735 ***

Comment 22 Stefan Nordhausen 2011-04-01 09:36:50 UTC
I get "Access Denied" when I try to access bug #691735. Can you please drop a note in this bug about the kernel version that will have the fix included and the date when it hits the update servers?

Comment 23 Prarit Bhargava 2011-04-01 11:26:51 UTC
Stefan, we are currently evaluating and testing the patch.  I can add you to the cc-list of 691735.  Are you in a position to apply and patch to the kernel source, build a kernel and test out the patch?

If so, please let us know.  Additional off-site testing is always appreciated :)

Thanks,

P.