Description of problem: We're seeing the time in a KVM guest, on a heavily loaded Fedora 11 server, gain time very quickly, up to 20 seconds in a 60 second period. The data/time on the server is correct and does not drift. Version-Release number of selected component (if applicable): Fedora 11, kernel 2.6.29.4-167.fc11.x86_64, qemu-kvm-0.10.5-3.fc11.x86_64, Dell R710 server with Nehalem based CPUs, multiple 1 CPU KVM Guests all running RHEL-5.3, ntp enabled on both server and guests How reproducible: Easily Steps to Reproduce: 1. Create lots of guests 2. Run heavy CPU based workloads in all the guests 3. Watch the time drift. Actual results: Clock drift up to 20 seconds in a 60 second period. Expected results: No clock drift Additional info: Adding divider=10 to the guests kernel command line reduces the clock drift to about a second every 60 seconds The server is using the TSC clock source.
Thanks for the report What is the qemu-kvm command line you are using? If you're launching the guest using libvirt, please include the guest's log from /var/log/libvirt/qemu What clocksource is the guest using - look at /sys/devices/system/clocksource/clocksource0/current_clocksource I think this may be a well known issue, and AFAIR the paravirt kvm clocksource is included in RHEL5.4 and will fix this issue. I've very unclear on the details, though. Marcelo, Glauber?
Created attachment 349287 [details] kvm logfile
The guests current_clocksource reports "jiffies"
(In reply to comment #1) > Thanks for the report > > What is the qemu-kvm command line you are using? If you're launching the guest > using libvirt, please include the guest's log from /var/log/libvirt/qemu > > What clocksource is the guest using - look at > /sys/devices/system/clocksource/clocksource0/current_clocksource > > I think this may be a well known issue, and AFAIR the paravirt kvm clocksource > is included in RHEL5.4 and will fix this issue. I've very unclear on the > details, though. Marcelo, Glauber? No, 5.4 does *not* have the paravirt kvm clocksource. What it does have is the hypercall to fetch the lpj from the host during boot; that helps to properly configure the clocksources initially, but if there are further drifting problems, they will still be present. In terms of testing, it would be worthwhile to test out a 5.4 guest and see if the lpj stuff helps. Possibly with a 5.4 kernel + the tick divider, you might be able to get the drift under control. Another thing to try would be to run an F-11 guest, and see what it looks like. F-11 does have the full paravirt clocksource, so confirming that the paravirt clocksource makes a difference here would be another good data point to have. Chris Lalancette
I've tried a Fedora 11 guest (2.6.29.4-167.fc11.x86_64) and the time keeping is much better with that kernel. The current_clocksource file reports the kernel is using the "kvm-clock" kernel timer. Are they're any plans to include this clock source in the 5.3/5.4 kernels?
Allen: could you try the RHEL5 guest with "notsc divider=10" ? We've seen reports that this helps Also, could you include the dmesg from the guest? Looking for messages like "time:c Using ... timer." (There is a "time drift fix" -tdf option for qemu-kvm, but that affects the missed PIT interrupts, so I'm not sure that'll help)
We have tested a guest running RHEL-5.3 (2.6.18-128.1.1.el5) with the "divider=10 notsc" options and whilst the clock drift was reduced it was still loosing about a second every minute when the host, and guest, were heavily loaded. $ cat /proc/cmdline ro root=LABEL=/ panic=20 log_buf_len=131072 crashkernel=128M@16M divider=10 notsc iommu=soft elevator=noop $ dmesg |grep time Calibrating delay using timer specific routine.. 5328.84 BogoMIPS (lpj=2664424) Using local APIC timer interrupts. Detected 62.500 MHz APIC timer. Disabling vsyscall due to use of PM timer time.c: Using 3.579545 MHz WALL PM GTOD PM timer. time.c: Detected 2659.908 MHz processor. PCI: Setting latency timer of device 0000:00:01.1 to 64 PCI: Setting latency timer of device 0000:00:01.2 to 64 SELinux: Disabled at runtime. $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource jiffies
Allen, Two options: 1. pass clock=tsc on the RHEL5 guest, where the system clock will lose time (but depending on the load, the drift might be acceptable for ntp to adjust frequency). 2. clock=jiffies with -no-kvm-pit-reinjection option to qemu-kvm (without divider option). Make sure to delete /var/lib/ntp/drift (and reboot the guest) between tests. A better solution is being worked on.
Allen, if you're using libvirt and want to try -no-kvm-pit-reinjection, then create e.g. a /usr/bin/qemu-kvm-no-pit-reinjection script: #!/bin/bash exec /usr/bin/qemu-kvm -no-kvm-pit-reinjection $@ and change the <emulator> element in the guest XML config to point to it. (I needed to put selinux into enforcing mode to make that work) Also, you need a 2.6.30 kernel for -no-pit-reinjection (In reply to comment #11) > A better solution is being worked on. Care to elaborate a little Marcelo?
> > A better solution is being worked on. > > Care to elaborate a little Marcelo? The cause of the drift is that the guest time code expects a correlation between timer interrupts and TSC, which is very unprecise in KVM. clock=jiffies attempts to correct for lost ticks, and since KVM reinjects lost interrupts the end result is the time gain mentioned in comment #1. -no-kvm-pit-reinjection stops that, but the guest is still suspectible to time loss (negative drift) depending on the load of the system. clock=tsc does not attempt to correct for lost ticks. So the suggestions on comment #11 alleviates the drift problem, in the hope its within the acceptable range for ntp jitter correction, ntpd(8): "The maximum slew rate possible is limited to 500 parts-per-million (PPM) as a consequence of the correctness principles on which the NTP protocol and algorithm design are based." Note that even if the drift (or jitter in ntpd) is larger than 500PPM the clock will be corrected via offset (offset correction = time jumps can be seen). A better solution is planned which will improve the current situation.
With RHEL 5.3 on 5.4beta I'm seeing this too, without the heavy load. It is so bad that even the work-arounds posted here don't help. I see timejumps of a few secs every hour when the system is not doing much. I few minutes when it is.
Roel, Are you running ntpd on the guest?
Yes (and removed drift files before start). I switched VMs to "divider=10 notsc". During the night, the clocks were actually synchronised. So that is looking good. Will put some load on the systems this afternoon and see if it stays that way.
OK, the results of the last 24 hours (4am - 4am) of operation. The good; It looks like it is under control The bad; Positive and negative drift. The mailserver has an average utelization of only 12%, who knows what will happen when the average is 50% or so. The desktop VM: Jul 14 04:08:02 ntpd[3474]: time reset -0.295500 s Jul 14 04:12:20 ntpd[3474]: synchronized to LOCAL(0), stratum 10 Jul 14 04:13:26 ntpd[3474]: synchronized to a.b.c.d, stratum 3 The Mailserver VM (the busiest one) Jul 14 07:50:11 ntpd[9434]: time reset +0.137525 s Jul 14 07:54:32 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 07:55:37 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 14 08:42:09 ntpd[9434]: time reset -0.268744 s Jul 14 08:46:09 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 08:48:18 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 14 13:38:42 ntpd[9434]: time reset -0.130925 s Jul 14 13:43:01 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 13:44:05 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 14 16:55:47 ntpd[9434]: time reset -0.389383 s Jul 14 17:00:05 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 17:01:11 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 14 17:29:11 ntpd[9434]: time reset -0.140383 s Jul 14 17:33:17 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 17:33:32 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 14 22:52:22 ntpd[9434]: time reset -0.335109 s Jul 14 22:56:44 ntpd[9434]: synchronized to LOCAL(0), stratum 10 Jul 14 22:57:32 ntpd[9434]: synchronized to a.b.c.d, stratum 3 Jul 15 03:37:50 ntpd[9434]: time reset -0.228269 s Jul 15 03:41:31 ntpd[9434]: synchronized to a.b.c.d, stratum 3 The Webserver VM: Jul 14 08:35:27 ntpd[5382]: time reset -0.155370 s Jul 14 08:39:47 ntpd[5382]: synchronized to LOCAL(0), stratum 10 Jul 14 08:40:53 ntpd[5382]: synchronized to a.b.c.d, stratum 3 Jul 14 14:38:47 ntpd[5382]: time reset -0.168018 s Jul 14 14:43:02 ntpd[5382]: synchronized to LOCAL(0), stratum 10 Jul 14 14:44:08 ntpd[5382]: synchronized to a.b.c.d, stratum 3 Jul 14 17:46:41 ntpd[5382]: time reset -0.260242 s Jul 14 17:51:01 ntpd[5382]: synchronized to LOCAL(0), stratum 10 Jul 14 17:52:06 ntpd[5382]: synchronized to a.b.c.d, stratum 3
After more testing the combination of "divider=10 notsc" and the 'inject timer interrupts that got lost' option, -tdf, seem to control the worst of the clock drift issues with guests running 5.3. The -tdf option is not sufficient to control the drift without the "divider=10 notsc" options.
Allen, Unless you are using -no-kvm-irqchip, -tdf option has no effect.
Okay, it sounds to me like the conclusion here is that 5.3 guests need 'notsc divider=10' in order to avoid drift? Are people happy for this bug to be closed?
(In reply to comment #20) > Okay, it sounds to me like the conclusion here is that 5.3 guests need 'notsc > divider=10' in order to avoid drift? > > Are people happy for this bug to be closed? I'll let Allen chime in here, but from what I have seen while working with Allen is that regardless of -no-kvm-irqchip, -tdf, notsc, and divider=10, we still see skew. Quite a bit less, but still very evident. The only thing we've used thus far where skew was eliminated was kvm-clock. --chris
Should backport the -no-kvm-pit-reinjection support for FC11's 2.6.29 kernel. Unfortunately I won't be able to do that until Aug 24th. Perhaps testing a 2.6.30 kernel (or kvm-88 modules) in the meantime is desired.
Testing with the RHEV KVM version we still see clock drifts of a second or more, on guests running RHEL-5.3 or 5.4 using the "divider=10 notsc" kernel options. The Fedora 11 guests (2.6.29.4-167.fc11), with the kvm-clock timer, seem to keep time much more accurately. These are the sort of variations of times reported by the guests I'm seeing (after about 30 minutes of load on the KVM server) 07:45:50.304438114 07:45:50.522347000 07:45:51.032452000 07:45:51.072752559 07:45:51.546714000 07:45:51.486993000 07:45:53.768901000 07:45:51.340134000 07:45:53.584773000 07:45:55.983851000 The Fedora 11 guest, using the kvm-clock timer, keeps time much more accurately: 07:50:08.427770702 07:50:08.492880000 What would it take to get the kvm-clock timer added to the RHEL kernels?
RHEL 5.4 kernel 164.2.1 seems to have resolved to problem. I've seen no more time jumps in the last 24 hours. (No kernel parameters relating to clock set)
Many thanks for testing Roel, I'll close the bug then
Hi, I have RHEL 5.4 with kvm virtualization, the time on the guests is drifted, but the time on the server is correct. Kernel version: uname -a Linux mrbbo-admin.pgsm.hu 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux Needed informations: cat /proc/cmdline ro root=/dev/vg00/lvol0 rhgb quiet dmesg |grep time time.c: Using tsc for timekeeping HZ 1000 Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.20 BogoMIPS (lpj=3000104) Using local APIC timer interrupts. WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong. Detected 62.500 MHz APIC timer. Calibrating delay using timer specific routine.. 5992.21 BogoMIPS (lpj=2996106) Calibrating delay using timer specific routine.. 5993.51 BogoMIPS (lpj=2996757) Calibrating delay using timer specific routine.. 5992.45 BogoMIPS (lpj=2996228) time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer. time.c: Detected 3000.104 MHz processor. PCI: Setting latency timer of device 0000:00:01.1 to 64 PCI: Setting latency timer of device 0000:00:01.2 to 64 SELinux: Disabled at runtime. PCI: Setting latency timer of device 0000:00:03.0 to 64 time.c: can't update CMOS clock from 0 to 59 ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== time 192.168.96.68 4 u 57 64 377 0.205 99181.1 35920.0 How can i correct the ntp syncronization? Thanks, Gabor
2.6.18-164.11.1.el5 should get all kvmclock stacked already. Are you seeing this need for synchronization at boot time only?
How can i check this kernel contain the kvmclock? Not only boot time, i would like the keep the time always synchronized.
time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer. <=== this means you are using kvmclock. In x86_64 RHEL5, you cannot change clocksources at runtime. So if you booted with it, you'll be using it.
How can i change the clocksource? In KVM guest the time drifted just in jiffies mode? In guest OS: time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer. In host OS: time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. In host OS: cat /sys/devices/system/clocksource/clocksource0/available_clocksource jiffies In the host the clocksource is jiffies, and the time is correct: [root@mrbbo-admin5 ~]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== *mrbbo-admin2.pg 192.168.96.69 4 u 19 128 377 0.188 -1.469 0.108 +mrbbo-admin1.pg 192.168.96.69 4 u 64 128 377 0.200 -0.146 0.119 LOCAL(0) .LOCL. 10 l 34 64 377 0.000 0.000 0.001