Bug 455259

Summary: More context for VDSO
Product: Red Hat Enterprise MRG Reporter: Carl Trieloff <cctrieloff>
Component: Realtime_Tuning_GuideAssignee: Lana Brindley <lbrindle>
Status: CLOSED CURRENTRELEASE QA Contact: Jeff Needle <jneedle>
Severity: medium Docs Contact:
Priority: high    
Version: 1.0CC: lgoncalv, mhideo, williams
Target Milestone: 1.1Keywords: Documentation
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-25 01:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Lana Brindley 2008-07-14 23:53:20 UTC
Hi Clark,

Context please?

LKB

Comment 2 Clark Williams 2008-10-10 19:34:40 UTC
Ok, here's an initial stab at a fuller explanation of VDSOs (thanks for the clarification Luis!):


- From the www.kernelnewbies.org/KernelGlossary:

VDSO

    Virtual Dynamically-linked Shared Object, a kernel-provided shared
    library that helps userspace perform a few kernel actions without
    the overhead of a system call, as well as automatically choosing
    the most efficient syscall mechanism. Also called the "vsyscall
    page". 

The VDSO is a way for the Linux kernel to provide low-overhead access to certain data contained in user-space. The VDSO is mainly used to provide fast access to the gettimeofday(2) system call data.

The VDSO is enabled in one of two ways:

1. Passing the vdso= parameter on the kernel boot line with a non-zero argument
2. Writing a non-zero value into the /proc/sys/kernel/syscall64 proc filesystem entry of a running kernel

More on just what those non-zero values should be later.

The kernel VDSO actually overrides library entry points for any user-specified library. So, when you enable the VDSO, you're effectively telling the kernel to use it's definition of the symbols in the VDSO, rather than the ones found in user-space shared libraries (notably the GNU C library or glibc). Note that the effects of enabling the VDSO are system-wide; either all processes use it or they don't. 

The entry point provided by the VDSO that we're concerned with here is gettimeofday(2). When enabled, the kernel VDSO overrides the glibc definition of gettimeofday(2) with it's own and the system loader will use that address when dynamically loaded programs reference gettimeofday(2). What does your program need to do to use this? Nothing. You write your code to call gettimeofday() and link it normally (default linking action nowadays is to link to shared libraries). When your program is loaded by the system loader (ld.so) it will look first for the VDSO and if that's not available, it will search the C library and resolve the symbol there. 

Why would you want to enable the VDSO? If you are calling gettimeofday a lot, such as timestamping lots of network traffic, you may want to use the VDSO, since it removes the overhead of a system call from your program, since you're making a call directly to kernel memory, rather than going through the C library provided trap to kernel space. 

The value used to enable the VDSO affects the behavior of gettimeofday(2) on the MRG RT kernel. 

* echo 1 > /proc/sys/kernel/syscall64 (or boot with vdso=1)

  this option affects several time related functions behavior. All the calls
  to gettimeofday(2) are solved in userspace. The time granularity (i.e. the
  smallest interval you will be able to measure) is 1us (one microsecond).
  How it operates: the kernel maintains an internal variable that is
  updated every millisecond and when gettimeofday() is called, the clock is 
  read from userspace and the interval since last update is calculated (time
  interpolation).

* echo 2 > /proc/sys/kernel/syscall64 (or boot with vdso=2)

  (note: this option is not available on RHEL5)
  this options differs from the latter because there is no clock reading at
  all. The variable updated by the kernel is read and its value is
  returned. This method presents a time granularity of 1ms. It is lower in
  overhead than method 1 but is less accurate.

Comment 3 Lana Brindley 2008-10-29 22:05:56 UTC
Due to be completed today and will be available for technical review tomorrow.

LKB

Comment 4 Lana Brindley 2008-10-30 05:42:18 UTC
Completed and available for review shortly.

LKB