Bug 461036 - [RFE] pvclock vsyscall: Support necessary hypercalls in RHEL5 xen
[RFE] pvclock vsyscall: Support necessary hypercalls in RHEL5 xen
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
medium Severity medium
: rc
: ---
Assigned To: Paolo Bonzini
Martin Jenner
: FutureFeature
Depends On: 679207
Blocks: 514489
  Show dependency treegraph
Reported: 2008-09-03 11:41 EDT by Chris Lalancette
Modified: 2011-07-18 08:15 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-07-18 08:15:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Chris Lalancette 2008-09-03 11:41:06 EDT
Description of problem:
Currently the vsyscall implementation for the 64-bit Xen kernel is turned off.  This causes certain system calls (gettimeofday, in particular) to get much slower.  In a benchmark like the following:

#include <stdio.h>

int main(int argc,char **argv) {

    int i;
    for ( i = 0 ; i < 90000000 ; i++ ) {



Running with a bare-metal kernel takes about 14s, but under Xen takes 2m21s.  If I go into arch/x86_64/kernel/vsyscall-xen.c, and remove the "sysctl_vsyscall = 0" in vsyscall_init(), then the above benchmark drops to 30s (still not as good as bare-metal, but a big improvement).

Additionally, it looks like the upstream Linux Xen pv_ops implementation has this on.  The only caveat is that there is a slight possibility it is unsafe to do this under Xen; I'll have to ask around and find out.
Comment 1 C. Cooke 2008-09-26 09:06:37 EDT
Is there any update on this issue?

I note it's title references RHEL 5.3, but elsewhere you reference 5.4.

Which version can we expect to see it fixed in?
Comment 2 Bill Burns 2008-09-26 09:49:00 EDT
Very unlikely to make 5.3 as it is not in the current proposed beta kernel spin, and this would be something that would need a full beta cycle. Possible for 5.4.
Comment 3 Chris Lalancette 2008-09-26 11:00:05 EDT
Yes, to re-iterate, this can't make 5.3.  Besides the fact that this needs a lot of testing, there are a couple of problems I've found:

1.  We have some code to handle some corner cases of time going backwards.  Unfortunately, that code is not vsyscall friendly, so we would have to find another way to fix that.

2.  Upstream Xen hasn't enabled this because vxtime is not being updated properly, which means vsyscall wouldn't work.  So we would need to code this up for upstream, get it accepted there, and then get it into RHEL.

So there is quite a bit of work to get this working.

Chris Lalancette
Comment 4 C. Cooke 2008-09-26 11:59:47 EDT

Thanks to both of you for the update.
Comment 12 Paolo Bonzini 2010-01-29 11:52:09 EST
I'm not Chris, but I don't think anything changed from the situation of comment #3.
Comment 32 Paolo Bonzini 2010-09-21 08:57:11 EDT
After looking into this further, there is a fundamental problem with vsyscall in a virtualized environment: vsyscall assumes that the virtual CPUs never migrate across multiple physical CPUs.

There is code to deal with this in both the upstream hypervisor (but it was buggy, so it is currently disabled even there) and the upstream pvops kernel (ported up to 2.6.31.x).

I suggest that we change a bit our course of action here:

1) first, fix the upstream hypervisor's implementation of VCPUOP_register_runstate_memory_area and get the vsyscalls to work with upstream pvops.  For the original patch, see changeset 20339.  For the bug, see http://permalink.gmane.org/gmane.comp.emulators.xen.devel/79038.

2) Backport VCPUOP_register_runstate_memory_area to the RHEL5 hypervisor, and vsyscall support to RHEL6.

3) Finally, backport vsyscall support to RHEL5.  This is complicated because it probably means using the pvclock infrastructure instead of what is now in time-xen.c.
Comment 46 Paolo Bonzini 2011-06-29 12:52:25 EDT
Here are some benchmark results for 5000000 calls using different clocksources.

syscall  real    0m15.038s user    0m3.109s sys     0m11.722s
rdtsc    real    0m0.220s  user    0m0.208s sys     0m0.030s
hpet     real    0m3.344s  user    0m3.339s sys     0m0.004s
pvrdtscp real    0m2.760s  user    0m2.737s sys     0m0.004s

All measurements were taken on a F15 machine.  Measurements for RHEL5 were consistent with the above (except pvrdtscp was not available for RHEL5).  vsyscall speed would be roughly 0.2s slower than rdtsc/hpet/pvrdtscp due to the overhead of scaling.

In order to support migration and save/restore, the only possible clocksource is of course rdtscp.  The speedup would then be roughly a factor of 5 compared to current syscall performance, rather than 10-50 as it would be for TSC.  Based on this data, unless there is a compelling application executing so many gettimeofday (or clock_gettime) syscalls, I do not believe it is worth micro-optimizing Xen's performance at this stage.


Test program (HPET access would segfault without clocksource=hpet):

#include <sys/time.h>
static inline void pv_cpuid(unsigned idx, unsigned sub, unsigned *eax,
                            unsigned *ebx, unsigned *ecx, unsigned *edx)
        *eax = idx, *ecx = sub;
        asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax),
            "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx));

static inline unsigned long long do_rdtscp(unsigned *aux)
        static unsigned long long last = 0;
        unsigned lo32, hi32;
        unsigned long long val;

        asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux));
        val = lo32 | ((unsigned long long)hi32 << 32);
        return val;

int main()
        int i;
        struct timeval tv;
        for (i=0; i<5000000;i++) {
#if 0
                asm ("rdtsc" : : : "rax", "rdx");
#elif 1
                asm ("mov    0xffffffffff5ff0f0,%%eax" : : : "rax");
#elif 0
                unsigned aux;
#elif 0
                unsigned eax, ebx, ecx, edx;
                pv_cpuid(0x40000000, 0, &eax, &ebx, &ecx, &edx);
                gettimeofday (&tv, 0L);
Comment 47 Radim Krčmář 2011-07-07 10:45:34 EDT
For pvrdtscp to work, the hardware must support rdtscp instruction and invariant tsc. (CPUID.80000001H:EDX[27] and CPUID.80000007H:EDX[8]. Also a tsc synchronized across all cores, which hopefully always happens with hardware supporting invariant tsc...)
That means only newer processors (the oldest being nehalem or k10) can benefit, Otherwise the pvrdtscp is emulated, as in Paolo's case, and thus much slower.

The timekeeping code also changed a lot in xen since RHEL5, so it would be at least 1000 changed lines for a conservative backport.

Note You need to log in before you can comment on or make changes to this bug.