Bug 461036 - [RFE] pvclock vsyscall: Support necessary hypercalls in RHEL5 xen
Summary: [RFE] pvclock vsyscall: Support necessary hypercalls in RHEL5 xen
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On: 679207
Blocks: 514489
TreeView+ depends on / blocked
 
Reported: 2008-09-03 15:41 UTC by Chris Lalancette
Modified: 2018-11-14 20:30 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-18 12:15:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chris Lalancette 2008-09-03 15:41:06 UTC
Description of problem:
Currently the vsyscall implementation for the 64-bit Xen kernel is turned off.  This causes certain system calls (gettimeofday, in particular) to get much slower.  In a benchmark like the following:

#include <stdio.h>

int main(int argc,char **argv) {

    int i;
    for ( i = 0 ; i < 90000000 ; i++ ) {
           gettimeofday();

     }

}

Running with a bare-metal kernel takes about 14s, but under Xen takes 2m21s.  If I go into arch/x86_64/kernel/vsyscall-xen.c, and remove the "sysctl_vsyscall = 0" in vsyscall_init(), then the above benchmark drops to 30s (still not as good as bare-metal, but a big improvement).

Additionally, it looks like the upstream Linux Xen pv_ops implementation has this on.  The only caveat is that there is a slight possibility it is unsafe to do this under Xen; I'll have to ask around and find out.

Comment 1 C. Cooke 2008-09-26 13:06:37 UTC
Is there any update on this issue?

I note it's title references RHEL 5.3, but elsewhere you reference 5.4.

Which version can we expect to see it fixed in?

Comment 2 Bill Burns 2008-09-26 13:49:00 UTC
Very unlikely to make 5.3 as it is not in the current proposed beta kernel spin, and this would be something that would need a full beta cycle. Possible for 5.4.

Comment 3 Chris Lalancette 2008-09-26 15:00:05 UTC
Yes, to re-iterate, this can't make 5.3.  Besides the fact that this needs a lot of testing, there are a couple of problems I've found:

1.  We have some code to handle some corner cases of time going backwards.  Unfortunately, that code is not vsyscall friendly, so we would have to find another way to fix that.

2.  Upstream Xen hasn't enabled this because vxtime is not being updated properly, which means vsyscall wouldn't work.  So we would need to code this up for upstream, get it accepted there, and then get it into RHEL.

So there is quite a bit of work to get this working.

Chris Lalancette

Comment 4 C. Cooke 2008-09-26 15:59:47 UTC
Okay.

Thanks to both of you for the update.

Comment 12 Paolo Bonzini 2010-01-29 16:52:09 UTC
I'm not Chris, but I don't think anything changed from the situation of comment #3.

Comment 32 Paolo Bonzini 2010-09-21 12:57:11 UTC
After looking into this further, there is a fundamental problem with vsyscall in a virtualized environment: vsyscall assumes that the virtual CPUs never migrate across multiple physical CPUs.

There is code to deal with this in both the upstream hypervisor (but it was buggy, so it is currently disabled even there) and the upstream pvops kernel (ported up to 2.6.31.x).

I suggest that we change a bit our course of action here:

1) first, fix the upstream hypervisor's implementation of VCPUOP_register_runstate_memory_area and get the vsyscalls to work with upstream pvops.  For the original patch, see changeset 20339.  For the bug, see http://permalink.gmane.org/gmane.comp.emulators.xen.devel/79038.

2) Backport VCPUOP_register_runstate_memory_area to the RHEL5 hypervisor, and vsyscall support to RHEL6.

3) Finally, backport vsyscall support to RHEL5.  This is complicated because it probably means using the pvclock infrastructure instead of what is now in time-xen.c.

Comment 46 Paolo Bonzini 2011-06-29 16:52:25 UTC
Here are some benchmark results for 5000000 calls using different clocksources.

================================================================
syscall  real    0m15.038s user    0m3.109s sys     0m11.722s
================================================================
rdtsc    real    0m0.220s  user    0m0.208s sys     0m0.030s
hpet     real    0m3.344s  user    0m3.339s sys     0m0.004s
pvrdtscp real    0m2.760s  user    0m2.737s sys     0m0.004s
================================================================

All measurements were taken on a F15 machine.  Measurements for RHEL5 were consistent with the above (except pvrdtscp was not available for RHEL5).  vsyscall speed would be roughly 0.2s slower than rdtsc/hpet/pvrdtscp due to the overhead of scaling.

In order to support migration and save/restore, the only possible clocksource is of course rdtscp.  The speedup would then be roughly a factor of 5 compared to current syscall performance, rather than 10-50 as it would be for TSC.  Based on this data, unless there is a compelling application executing so many gettimeofday (or clock_gettime) syscalls, I do not believe it is worth micro-optimizing Xen's performance at this stage.

==================================

Test program (HPET access would segfault without clocksource=hpet):

#include <sys/time.h>
static inline void pv_cpuid(unsigned idx, unsigned sub, unsigned *eax,
                            unsigned *ebx, unsigned *ecx, unsigned *edx)
{
        *eax = idx, *ecx = sub;
        asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax),
            "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx));
}

static inline unsigned long long do_rdtscp(unsigned *aux)
{
        static unsigned long long last = 0;
        unsigned lo32, hi32;
        unsigned long long val;

        asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux));
        val = lo32 | ((unsigned long long)hi32 << 32);
        return val;
}


int main()
{
        int i;
        struct timeval tv;
        for (i=0; i<5000000;i++) {
#if 0
                asm ("rdtsc" : : : "rax", "rdx");
#elif 1
                asm ("mov    0xffffffffff5ff0f0,%%eax" : : : "rax");
#elif 0
                unsigned aux;
                do_rdtscp(&aux);
#elif 0
                unsigned eax, ebx, ecx, edx;
                pv_cpuid(0x40000000, 0, &eax, &ebx, &ecx, &edx);
#else
                gettimeofday (&tv, 0L);
#endif
        }
}

Comment 47 Radim Krčmář 2011-07-07 14:45:34 UTC
For pvrdtscp to work, the hardware must support rdtscp instruction and invariant tsc. (CPUID.80000001H:EDX[27] and CPUID.80000007H:EDX[8]. Also a tsc synchronized across all cores, which hopefully always happens with hardware supporting invariant tsc...)
That means only newer processors (the oldest being nehalem or k10) can benefit, Otherwise the pvrdtscp is emulated, as in Paolo's case, and thus much slower.

The timekeeping code also changed a lot in xen since RHEL5, so it would be at least 1000 changed lines for a conservative backport.


Note You need to log in before you can comment on or make changes to this bug.