Bug 828332

Summary:	Track recommended hypervisor timer settings
Product:	[Community] Virtualization Tools	Reporter:	Cole Robinson <crobinso>
Component:	libosinfo	Assignee:	Matthias Clasen <mclasen>
Status:	CLOSED DEFERRED	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	unspecified	CC:	berrange, cfergeau, fidencio, hannsj_uhl, mclasen, mtosatti, starlight, tburke
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 18:30:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Cole Robinson 2012-06-04 15:36:31 UTC

Various OS require specific qemu command line parameters/libvirt XML to track time optimally. Tracking this in libosinfo would be ideal.

Here's the matrix KVM team has on an internal wiki:

|| OS || qemu || guest's kernel cmdline || newer AMD host || old AMD
host (!constant_tsc) || Intel ||
|| RHEL 5.4 64 bit with pv clock || -no-kvm-pit-reinjection -rtc-td-hack
|| none || || || ||
|| RHEL 5.4 64 bit without pv clock || -no-kvm-pit-reinjection
-rtc-td-hack || divider=10 notsc lpj=n || || || ||
|| RHEL 5.4 32 bit with pv clock || -no-kvm-pit-reinjection -rtc-td-hack
||none || || || ||
|| RHEL 5.4 32 bit without pv clock || -no-kvm-pit-reinjection
-rtc-td-hack || divider=10 clocksource=acpi_pm lpj=n || || || ||
|| RHEL 5.3 64 bit || -no-kvm-pit-reinjection -rtc-td-hack || divider=10
notsc || || || ||
|| RHEL 5.3 32 bit || -no-kvm-pit-reinjection -rtc-td-hack || divider=10
clocksource=acpi_pm || || || ||
|| RHEL 4.8 64 bit || -no-kvm-pit-reinjection -rtc-td-hack || notsc
divider=10 || || || ||
|| RHEL 4.8 32 bit || -no-kvm-pit-reinjection -rtc-td-hack ||
clock=pmtmr divider=10 || || || ||
|| RHEL 3.9 64 bit || -no-kvm-pit-reinjection -rtc-td-hack || none || ||
|| ||
|| RHEL 3.9 32 bit || -no-kvm-pit-reinjection -rtc-td-hack || none || ||
|| ||
|| win2k3 64 bit || -rtc-td-hack || || || /use pmtimer in boot.ini || ||
|| win2k3 32 bit || -rtc-td-hack || || || /use pmtimer in boot.ini || ||
|| win2k8 64 bit || -rtc-td-hack || || || NO NEED TO USE PMTIMER || ||
|| win2k8 32 bit || -rtc-td-hack || || || NO NEED TO USE PMTIMER || ||
|| winxp 32 bit || -rtc-td-hack || || || /use pmtimer in boot.ini || ||

Comment 1 Daniel Berrangé 2012-06-11 21:05:02 UTC

As a point of reference, this is how those two pit/rtc flags map to libvirt XML

  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>

Comment 2 starlight 2013-10-11 07:02:35 UTC

VM guest time-keeping is without a doubt
the closet thing to black magic found in
the realm of computers.

Much of the above has changed in just
the last year.

I've stumbled on the best case I've seen
so far for RHEL 4.9 so I'm documenting it
here.

First: "-rtc-td-hack" appears to be gone.

The 'libvirt' XML is

<clock offset='utc'>
  <timer name='rtc' tickpolicy='catchup' track='guest'/>
  <timer name='pit' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
</clock>

The QEMU options that result

  -rtc base=utc,clock=vm,driftfix=slew
  -no-kvm-pit-reinjection

The RHEL 4.9 guest boot line can

a) have no clock parameters
b) have "clock=pmtmr" or
c) have "clock=pmtmr divider=100"

a/b/c all seem equivalent and result in
about 10 to 30 milliseconds negative
drift every five minutes.

"divider=10" is bad, causes rapid forward drift.

------

I have

  /usr/local/bin/ntpd -q -l /dev/null

set to run every five minutes in the
'crontab'.  /etc/ntpd.conf points to
a high quality CDMA time server that's
accurate to +/- 5 microseconds.

This works far better than running
'ntpd' continuously IMO.

The host server runs 'ntpd' against
the same CDMA time source and keeps time
to +/- 100 microseconds.

------

have set scheduler class and priorities
of guest VM threads and host 'ntpd':

# ps -Lce | fgrep -v ' TS '
  PID   LWP CLS PRI TTY     TIME CMD
 1647  1647 FF   70 ?   00:00:01 ntpd
  871   871 FF   60 ?   00:00:00 kvm-irqfd-clean
 4294  4294 FF   60 ?   00:00:04 qemu-kvm
 4294  4310 FF   60 ?   00:03:50 qemu-kvm
 4307  4307 FF   61 ?   00:00:20 kvm-pit/4294
 4308  4308 FF   60 ?   00:00:00 vhost-4294
 4309  4309 FF   60 ?   00:00:00 vhost-4294

versions
========
HOST
CentOS 6.4
kernel.org 3.10.15
qemu-img-0.12.1.2-2.355.0.1.el6_4.9.x86_64
qemu-kvm-0.12.1.2-2.355.0.1.el6_4.9.x86_64

GUEST
CentOS 4.9
CentOS 2.6.9-103.EL

CPU
===
AMD Athlon 4450B 2.3GHz

Comment 3 Cole Robinson 2013-10-11 12:29:31 UTC

Thanks for the info. There was just some discussion about ideal defaults internally, and what the qemu guys recommended for all OS was

<clock offset='utc'>
  <timer name='rtc' tickpolicy='catchup'/>
  <timer name='pit' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
</clock>

So basically what you have without the track='guest' bit. Though we didn't discuss guest kernel options at all.

FWIW -rtc-td-hack is the same thing as -rtc driftfix=slew

Comment 4 starlight 2013-10-11 17:30:08 UTC

I tried it without track='guest' and it does
not seem to work as well.

Reading about it, seems the idea with
the first-up-above config is to have the
one time source catch up (RTC), another drop
ticks (PIT, since the 2.6.9 kernel expects
to compensate for lost hardware ticks)
and have the virtual TSC run relative
to the guest's time instead of the hosts
time.  Some guessing here.

-----

Also I've found that the in the guest

   a) have no clock parameters

which is probably equivalent to

   "clock=pmtmr divider=1000"

results in high idle CPU consumption
from interrupts, so I've settled on

   c) have "clock=pmtmr divider=100"

Of course "divider=10" will use even less
CPU but when a tick is missed the clock jumps
100ms and that's too much for me.  This
config runs about 10% of a core at idle.

-----

Also found that 'ntpd' runs pretty good
on the guest where it the past the clock
was so unstable that it would get hopelessly
lost and stay that way.  Config is


# Running in a VM.
tinker panic 0
tinker step 0.500
#disable ntp

# CDMA time servers.
server 10.29.78.3  minpoll  4 maxpoll  4 prefer
server 10.29.78.1  minpoll  4 maxpoll  4

# Resort to physical host clock if CDMA unreachable.
server 10.29.78.23 minpoll  4 maxpoll  4

# Access control.
restrict 0.0.0.0     mask 0.0.0.0         ignore
restrict 127.0.0.1   mask 255.255.255.255
#
restrict 10.29.78.3  mask 255.255.255.255 nopeer nomodify notrap
restrict 10.29.78.1  mask 255.255.255.255 nopeer nomodify notrap
restrict 10.29.78.23 mask 255.255.255.255 nopeer nomodify notrap

# Miscellaneous.
disable auth
driftfile /etc/ntp.drift
statsdir /etc/ntpstats/
#filegen loopstats file loopstats type day nolink enable
#filegen peerstats file peerstats type day nolink enable

Comment 5 starlight 2013-10-11 17:31:54 UTC

Forgot to say this is for 32-bit RHEL 4.9.

Apparently 64-bit RHEL 4.9 is quite different.

Comment 7 Cole Robinson 2018-09-04 18:30:24 UTC

Since these timer defaults are not specific to the OS, this isn't really libosinfo's area to cover, but some higher level hypothetical library like libvirt-designer or virtuned