Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 461036

Summary: [RFE] pvclock vsyscall: Support necessary hypercalls in RHEL5 xen
Product: Red Hat Enterprise Linux 5 Reporter: Chris Lalancette <clalance>
Component: kernel-xenAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4CC: charles.cooke, drjones, james.brown, jhunt, jlv, jmh, jon.shanks, pbonzini, rkrcmar, tao, tromer, xen-maint, zxvdr.au
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-18 12:15:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 679207    
Bug Blocks: 514489    

Description Chris Lalancette 2008-09-03 15:41:06 UTC
Description of problem:
Currently the vsyscall implementation for the 64-bit Xen kernel is turned off.  This causes certain system calls (gettimeofday, in particular) to get much slower.  In a benchmark like the following:

#include <stdio.h>

int main(int argc,char **argv) {

    int i;
    for ( i = 0 ; i < 90000000 ; i++ ) {
           gettimeofday();

     }

}

Running with a bare-metal kernel takes about 14s, but under Xen takes 2m21s.  If I go into arch/x86_64/kernel/vsyscall-xen.c, and remove the "sysctl_vsyscall = 0" in vsyscall_init(), then the above benchmark drops to 30s (still not as good as bare-metal, but a big improvement).

Additionally, it looks like the upstream Linux Xen pv_ops implementation has this on.  The only caveat is that there is a slight possibility it is unsafe to do this under Xen; I'll have to ask around and find out.

Comment 1 C. Cooke 2008-09-26 13:06:37 UTC
Is there any update on this issue?

I note it's title references RHEL 5.3, but elsewhere you reference 5.4.

Which version can we expect to see it fixed in?

Comment 2 Bill Burns 2008-09-26 13:49:00 UTC
Very unlikely to make 5.3 as it is not in the current proposed beta kernel spin, and this would be something that would need a full beta cycle. Possible for 5.4.

Comment 3 Chris Lalancette 2008-09-26 15:00:05 UTC
Yes, to re-iterate, this can't make 5.3.  Besides the fact that this needs a lot of testing, there are a couple of problems I've found:

1.  We have some code to handle some corner cases of time going backwards.  Unfortunately, that code is not vsyscall friendly, so we would have to find another way to fix that.

2.  Upstream Xen hasn't enabled this because vxtime is not being updated properly, which means vsyscall wouldn't work.  So we would need to code this up for upstream, get it accepted there, and then get it into RHEL.

So there is quite a bit of work to get this working.

Chris Lalancette

Comment 4 C. Cooke 2008-09-26 15:59:47 UTC
Okay.

Thanks to both of you for the update.

Comment 12 Paolo Bonzini 2010-01-29 16:52:09 UTC
I'm not Chris, but I don't think anything changed from the situation of comment #3.

Comment 32 Paolo Bonzini 2010-09-21 12:57:11 UTC
After looking into this further, there is a fundamental problem with vsyscall in a virtualized environment: vsyscall assumes that the virtual CPUs never migrate across multiple physical CPUs.

There is code to deal with this in both the upstream hypervisor (but it was buggy, so it is currently disabled even there) and the upstream pvops kernel (ported up to 2.6.31.x).

I suggest that we change a bit our course of action here:

1) first, fix the upstream hypervisor's implementation of VCPUOP_register_runstate_memory_area and get the vsyscalls to work with upstream pvops.  For the original patch, see changeset 20339.  For the bug, see http://permalink.gmane.org/gmane.comp.emulators.xen.devel/79038.

2) Backport VCPUOP_register_runstate_memory_area to the RHEL5 hypervisor, and vsyscall support to RHEL6.

3) Finally, backport vsyscall support to RHEL5.  This is complicated because it probably means using the pvclock infrastructure instead of what is now in time-xen.c.

Comment 46 Paolo Bonzini 2011-06-29 16:52:25 UTC
Here are some benchmark results for 5000000 calls using different clocksources.

================================================================
syscall  real    0m15.038s user    0m3.109s sys     0m11.722s
================================================================
rdtsc    real    0m0.220s  user    0m0.208s sys     0m0.030s
hpet     real    0m3.344s  user    0m3.339s sys     0m0.004s
pvrdtscp real    0m2.760s  user    0m2.737s sys     0m0.004s
================================================================

All measurements were taken on a F15 machine.  Measurements for RHEL5 were consistent with the above (except pvrdtscp was not available for RHEL5).  vsyscall speed would be roughly 0.2s slower than rdtsc/hpet/pvrdtscp due to the overhead of scaling.

In order to support migration and save/restore, the only possible clocksource is of course rdtscp.  The speedup would then be roughly a factor of 5 compared to current syscall performance, rather than 10-50 as it would be for TSC.  Based on this data, unless there is a compelling application executing so many gettimeofday (or clock_gettime) syscalls, I do not believe it is worth micro-optimizing Xen's performance at this stage.

==================================

Test program (HPET access would segfault without clocksource=hpet):

#include <sys/time.h>
static inline void pv_cpuid(unsigned idx, unsigned sub, unsigned *eax,
                            unsigned *ebx, unsigned *ecx, unsigned *edx)
{
        *eax = idx, *ecx = sub;
        asm volatile ( "ud2a ; .ascii \"xen\"; cpuid" : "=a" (*eax),
            "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx));
}

static inline unsigned long long do_rdtscp(unsigned *aux)
{
        static unsigned long long last = 0;
        unsigned lo32, hi32;
        unsigned long long val;

        asm volatile(".byte 0x0f,0x01,0xf9":"=a"(lo32),"=d"(hi32),"=c" (*aux));
        val = lo32 | ((unsigned long long)hi32 << 32);
        return val;
}


int main()
{
        int i;
        struct timeval tv;
        for (i=0; i<5000000;i++) {
#if 0
                asm ("rdtsc" : : : "rax", "rdx");
#elif 1
                asm ("mov    0xffffffffff5ff0f0,%%eax" : : : "rax");
#elif 0
                unsigned aux;
                do_rdtscp(&aux);
#elif 0
                unsigned eax, ebx, ecx, edx;
                pv_cpuid(0x40000000, 0, &eax, &ebx, &ecx, &edx);
#else
                gettimeofday (&tv, 0L);
#endif
        }
}

Comment 47 Radim Krčmář 2011-07-07 14:45:34 UTC
For pvrdtscp to work, the hardware must support rdtscp instruction and invariant tsc. (CPUID.80000001H:EDX[27] and CPUID.80000007H:EDX[8]. Also a tsc synchronized across all cores, which hopefully always happens with hardware supporting invariant tsc...)
That means only newer processors (the oldest being nehalem or k10) can benefit, Otherwise the pvrdtscp is emulated, as in Paolo's case, and thus much slower.

The timekeeping code also changed a lot in xen since RHEL5, so it would be at least 1000 changed lines for a conservative backport.