| Summary: | vDSO gettimeofday used for Xen even though pvclocks do not support it | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Stefan Nordhausen <stefan.nordhausen> |
| Component: | kernel-xen | Assignee: | Laszlo Ersek <lersek> |
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.6 | CC: | drjones, leiwang, lersek, pbonzini, prarit, qwan, xen-maint, yuzhang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-04-01 08:49:30 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Attachments: | |||
|
Description
Stefan Nordhausen
2011-03-24 13:03:57 UTC
Created attachment 487322 [details]
C program to reproduce the bogus time results
Hello Stefan,
please strace the test program both when it misbehaves and when it works alright.
I installed Debian 6.0.1 (and then updated it) on my amd64 workstation. There I built the following program both dynamically and statically:
----[ cut here ]----
#define _XOPEN_SOURCE 600
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int
main(void)
{
struct timespec foo;
return 0 == clock_gettime(CLOCK_MONOTONIC, &foo)
&& 0 <= fprintf(stdout, "tv_sec %ju tv_nsec %ju\n",
(uintmax_t)foo.tv_sec, (uintmax_t)foo.tv_nsec)
&& 0 == fflush(stdout)
? EXIT_SUCCESS : EXIT_FAILURE;
}
----[ cut here ]----
$ gcc -o clock_gettime -std=c99 -Wall -Wextra -Wformat=2 \
clock_gettime.c -l rt
$ gcc -o clock_gettime.static -static -std=c99 -Wall -Wextra -Wformat=2 \
clock_gettime.c -l rt
Both are working alright under Debian. I rebooted my RHEL-5.6 installation with
2.6.18-247.el5xen, mounted the Debian partition, and ran the statically linked executable again from there (as an initial test before setting up the chroot environment). Perhaps static linking affects the syscall behind clock_gettime(), but the static executable kept working correctly. (Also, the dynamic executable, linked against the RHEL-5.6 libs, was okay.)
This far I can't reproduce the problem. Can you please describe how you've set up your chroot environment? Thank you.
(In reply to comment #3) > I installed Debian 6.0.1 (and then updated it) on my amd64 workstation. Just to avoid possible confusion: I did NOT install a virtualized Debian. We are running Debian chroot that were created with "debootstrap". To create your own Debian chroot do this as root: - Get the most recent debootstrap from http://ftp.us.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.26+squeeze1_all.deb - Unpack with "ar x debootstrap*.deb" - Install debootstrap with "tar xzf data.tar.gz -C /" (mind the "/" at the end) - Create directory for chroot, e.g. "mkdir ~/gettime_chroot" - Set up chroot with "debootstrap --arch amd64 squeeze ~/gettime_chroot" - Enter the chroot with "chroot ~/gettime_chroot" - Get new Debian signatures with "apt-get update" - Install new packages as needed with "apt-get install <package name>" - Still inside the chroot, run "watch date" to see if clock_gettime() is working. "date" uses clock_gettime() according to "ltrace". I was able to reproduce the problem with a RHEL 5.6 installed in a VirtualBox VM instead of the real hardware (where we found the issue first). I tried out a number of kernels for the main system with Laszlo's program from comment #3: 2.6.18-238.5.1.el5 --> OK 2.6.18-247.el5xen --> NOT OK 2.6.18-252.el5xen --> NOT OK 2.6.18-252.el5 --> OK So only the Xen kernels are not working. Once I compiled the program from comment #3 statically(!) with RHEL's gcc, it worked as expected when run inside the chroot. I also compiled the program dynamically with the RHEL's gcc and then ran the binary inside the chroot, which gave incorrect results. Created attachment 488429 [details]
output of "strace date" in main system, kernel 2.6.18-252.el5xen
Created attachment 488430 [details]
output of "strace date" in chroot, kernel 2.6.18-252.el5xen
Note that no syscall to "clock_gettime" is made
Created attachment 488431 [details]
output of "ltrace date" in chroot, kernel 2.6.18-252.el5xen
Note that library function "clock_gettime" is called
Created attachment 488439 [details]
strace of Laszlo's sample program
Both binaries were compiled using RHEL's gcc, one dynamic, one static. They are run inside the Debian chroot.
Note that the static one is making a syscall (=RHEL libs compiled in) and the other is making a vsyscall (=Debian libs dynamically loaded).
The results are completely different, static binary says "tv_sec 4343" while dynamic one says "tv_sec 234150" even though there were just a few moments in between.
It looks like all of the following are necessary to see the bug: - RHEL xen kernel, - Debian eglibc, - dynamically linked app. For a dynamically linked app, the Debian eglibc will use the vsyscall (while the RHEL glibc still uses the syscall, see comment #5 and comment #6). It seems that under the RHEL xen kernel, the vsyscall either returns bogus results, or the Debian glibc is unable to call it -- or interpret the answer -- correctly. Given that the RHEL glibc avoids the vsyscall among the same circumstances (or that it calls a vsyscall that falls back successfully to the clock_gettime() syscall), and that Debian 6 even uses a fork (eglibc), I wonder if the verdict might simply be: "The eglibc in Squeeze is an unsupported userspace-kernel interface on RHEL-5". (In reply to comment #9) > I wonder if the verdict > might simply be: "The eglibc in Squeeze is an unsupported userspace-kernel > interface on RHEL-5". The point is that it works fine with RHEL 5.5 Xen kernels and RHEL 5.6 non-Xen kernels, which looks a bit strange to me and hints at some unintentional modifications in the new Xen kernel. I'll try to setup the chroot environment as you described in comment 4, and to find the first RHEL-5 kernel that does not cooperate with eglibc's clock_gettime(). The same symptom of "date" going nuts also occurs for a chroot with openSuse 11.4 (=most recent version), so this is not restricted to Debian. (In reply to comment #4) > To create your own Debian chroot do this as root: > > - Get the most recent debootstrap from > http://ftp.us.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.26+squeeze1_all.deb > - Unpack with "ar x debootstrap*.deb" > - Install debootstrap with "tar xzf data.tar.gz -C /" (mind the "/" at the end) > - Create directory for chroot, e.g. "mkdir ~/gettime_chroot" > - Set up chroot with "debootstrap --arch amd64 squeeze ~/gettime_chroot" Here's the end of debootstrap's output: I: Extracting xz-utils... I: Extracting zlib1g... W: Failure trying to run: chroot /root/gettime_chroot mount -t proc proc /proc The file "/root/gettime_chroot/debootstrap/debootstrap.log" says "Segmentation fault". Indeed, when I retry the above chroot+mount command manually, it says "Segmentation fault". Host kernel is 2.6.18-238.el5xen (RHEL-5.6). I retried the bootstrap with a clean directory after booting a bare-metal 2.6.18-238.el5 kernel. It failed with the same symptoms. This far we were unable to reproduce the error; we can't set up the three conditions described in comment 9. IIRC Paolo tried to shoehorn the Debian eglibc into a dynamic executable, outside of the chroot, and it didn't work for him. I now tried to do something similar: RHEL test: $ cd /lib64 $ ./ld-2.5.so --library-path $PWD ../bin/date Wed Mar 30 04:54:28 EDT 2011 Debian test ("/root/gettime_chroot" tree in the state where the debootstrap failed above): $ cd /root/gettime_chroot/lib64/ $ ./ld-2.5.so --library-path $PWD ../bin/date ../bin/date: error while loading shared libraries: ../bin/date: wrong ELF class: ELFCLASS64 $ file ld-2.5.so ../bin/date | cut -f 1 -d , ld-2.5.so: ELF 32-bit LSB shared object ../bin/date: ELF 64-bit LSB executable And now I see that in the chroot-to-be, "lib64" is actually a symlink to "lib". Why is that so? I installed the Debian partition again, as described in comment 3 (no, this is not a guest, it is a normal disk partition to be mounted later in the host). I'm unable to enter it "normally" with chroot (ie. to run a shell) due to SIGSEGV, but following Paolo's recommendation, I could run "chroot /misc /root/clock_gettime". (The latter is a dynamic executable built within the Debian system -- ie. I booted temporarily into Debian.) This program crashes with SIGSEGV as well. I could strace it, and perhaps this is interesting: futex(0x7fff63ad44bc, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7fff63ad44bc, 0x189 /* FUTEX_??? */, 1, NULL, 2b98eedd9700) = -1 ENOSYS (Function not implemented) This is similar to what can be seen in attachment 488430 [details] ('output of "strace date" in chroot, kernel 2.6.18-252.el5xen') futex(0x7fff627e8c0c, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7fff627e8c0c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 2aaee2d3b700) = -1 ENOSYS (Function not implemented) and in attachment 488439 [details] ("strace of Laszlo's sample program"), the dynamic case: futex(0x7fff29358b2c, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7fff29358b2c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 2acbb48d4700) = -1 ENOSYS (Function not implemented) I get the same result (including the SIGSEGV) under 2.6.18-238.el5, ie. bare metal: futex(0x7fff1b51e67c, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7fff1b51e67c, 0x189 /* FUTEX_??? */, 1, NULL, 2abdd63f6700) = -1 ENOSYS (Function not implemented) rt_sigaction(SIGRTMIN, {0x2abdd61df860, [], SA_RESTORER|SA_SIGINFO, 0x2abdd61e8f60}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x2abdd61df8f0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2abdd61e8f60}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ However, under 2.6.18-194.el5xen, the clock_gettime program works as expected -- no crash, and it even prints correct times: futex(0x7fffcb26c09c, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7fffcb26c09c, 0x189 /* FUTEX_??? */, 1, NULL, 2addb59a5700) = -1 ENOSYS (Function not implemented) rt_sigaction(SIGRTMIN, {0x2addb578e860, [], SA_RESTORER|SA_SIGINFO, 0x2addb5797f60}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x2addb578e8f0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2addb5797f60}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 clock_gettime(CLOCK_MONOTONIC, {562, 430186340}) = 0 Note how the first six steps match in the last two quoted sections. I think it's probable that precisely the clock_gettime() libc function crashes the processes Even "chroot /misc" works under -194.el5xen. So I think we do have a regression. The difference is in the symptoms I experience: it's not an overrunning clock, but crashing processes. I shall start to bisect from 194 to 238 (as noted above, in the -238 build, even the bare metal kernel crashes Debian's eglibc). The reason why it's segfaulting is bug #688312 which is only fixed in kernel-2.6.18-238.8.1.el5 or newer. It is not yet fixed in the kernel which is currently delivered through online updating, since the bugfix is still in QA. That's why I'm using the kernels from http://people.redhat.com/jwilson/el5/ The clock_gettime test program breaks from -212 (xen) to -214 (xen). No -213 is readily available, I'm re-brewing one now. (In reply to comment #15) > The reason why it's segfaulting is bug #688312 which is only fixed in > kernel-2.6.18-238.8.1.el5 or newer. It is not yet fixed in the kernel which is > currently delivered through online updating, since the bugfix is still in QA. > That's why I'm using the kernels from http://people.redhat.com/jwilson/el5/ Do I understand this right? The test program happens to work with -212 due to a happy random constellation. Bug 675727 prevents me to bisect the real issue (the overrunning clock) from -214 to -246. With -247, where bug 675727 is fixed, the program should not crash, but the clock doesn't work (this is the current bug); it broke somewhere after -212. So, if I wanted to bisect the *real* bug, I wouldn't have anything else to do than apply / backport the bug 675727 patch to *all* tested kernels, and brew them all. The last known good build is -212, 247-212-1 = 34, so I shouldn't have to patch/build more than 6 versions when bisecting. Here's a map I came up with, describing what commits to cherry-pick on top of any given base-build during the bisection. These should enable Debian's eglibc to at least call clock_gettime() via the vdso, and allow us to look at the result.
bug 644860 bug 673616 bug 675727
commit 863229a commit a863435 commit c18529d
213- backport backport backport
229- present backport backport
245- present present backport
247- present present present
According to Paolo's suggestion: since -214 was the first build ever that saw vdso, it's possible that the added clock_gettime() functionality was broken wrt. eglibc from the start. For this reason I'll check out -214 before starting in the middle.
Paolo was right, the vdso implementation of clock_gettime() never worked under Xen in RHEL. I cherry-picked 863229a, a863435, and c18529d on top of -214, and rebuilt it locally. I used the hypervisor from the -214 kernel-xen RPM. Results: while printf "%s -- " $(date +'%H:%M:%S'); do chroot /misc /root/clock_gettime sleep 1 done 18:37:20 -- tv_sec 1462890 tv_nsec 702634293 18:37:21 -- tv_sec 1467138 tv_nsec 131885746 18:37:22 -- tv_sec 1471383 tv_nsec 222874206 18:37:23 -- tv_sec 1475629 tv_nsec 285067096 18:37:24 -- tv_sec 1479874 tv_nsec 450274275 18:37:25 -- tv_sec 1484120 tv_nsec 362648368 18:37:26 -- tv_sec 1488366 tv_nsec 264068684 18:37:27 -- tv_sec 1492611 tv_nsec 821318262 18:37:28 -- tv_sec 1496856 tv_nsec 784331201 Paolo also pointed out earlier that the difference between adjacent lines is about 2^32 * 1000 nanoseconds (or 2^32 microseconds). Created attachment 489049 [details] xen: set vxtime mode to VXTIME_KVM This forces the arch/x86_64/vdso/vclock_gettime.c vdso functions to fall back to the heavy-weight syscalls. The RHEL libc directly relies on the syscalls anyway, but foreign C libraries might want to use the vdso functions. Tested it on my workstation, on top of -214 + 863229a + a863435 + c18529d. Brew link (build in progress, -253-based): http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3219377 Prarit's brew build for bug 691735 solves this bug, so I'm closing this as a duplicate. The first two hunks of commit a863435 for bug 673616 (= attachment 475892 [details]) lack a level of dereferencing on "vdso_sysctl_vsyscall". This was reported (fixed) in bug 691735 (attachment 488409 [details]). "vdso_sysctl_vsyscall" is declared in "arch/x86_64/vdso/vextern.h": 3 #define VEXTERN(x) \ 4 extern typeof(x) *vdso_ ## x __attribute__((visibility("hidden"))); 9 /* Any kernel variables used in the vDSO must be exported in the main 10 kernel's vmlinux.lds.S/vsyscall.h/proper __section and 11 put into vextern.h and be referenced as a pointer with vdso prefix. 12 The main kernel later fills in the values. */ 20 VEXTERN(sysctl_vsyscall) Even though I looked at this before, I didn't notice that the VEXTERN() function-like macro declares a *pointer*, even though the comment says so explicitly. The pointed-to object, "sysctl_vsyscall" is set corretly to zero in the Xen kernel ("arch/x86_64/kernel/vsyscall-xen.c", line 238): 229 static int __init vsyscall_init(void) 230 { 236 #ifdef CONFIG_XEN 237 map_vsyscall_user(); 238 sysctl_vsyscall = 0; /* disable vgettimeofay() */ 239 #endif 244 } Once "*vdso_sysctl_vsyscall" is accessed with correct dereferencing in the vdso time functions ("arch/x86_64/vdso/vclock_gettime.c"), we don't need to activate the second part of the condition, because the original code in vsyscall_init() above does the job through the first part. *** This bug has been marked as a duplicate of bug 691735 *** I get "Access Denied" when I try to access bug #691735. Can you please drop a note in this bug about the kernel version that will have the fix included and the date when it hits the update servers? Stefan, we are currently evaluating and testing the patch. I can add you to the cc-list of 691735. Are you in a position to apply and patch to the kernel source, build a kernel and test out the patch? If so, please let us know. Additional off-site testing is always appreciated :) Thanks, P. |