Red Hat Bugzilla – Bug 307201
Time flapping / drifting on guest, getting worse over time
Last modified: 2009-07-13 16:49:14 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:184.108.40.206) Gecko/20070914 Firefox/220.127.116.11
Description of problem:
I am running a xen guest (HAV-Mode) with also RHEL5 as OS.
After a while I notice that the system-date is "jumping" around.
Every second the date jumps between two values. At first only
a few seconds difference, and over time its several minutes and so on.
# date; date;
Wed Sep 26 16:50:20 CEST 2007
Wed Sep 26 16:50:16 CEST 2007
A few seconds later:
# date; date;
Wed Sep 26 16:50:48 CEST 2007
Wed Sep 26 16:50:44 CEST 2007
Here we only have a drift of 4 seconds... but its getting worse.
The whole system starts to behave strangely after a while, because the
time is messed up.
Some more info:
Guest: Linux xxx 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
Host: Linux xen 2.6.18-8.1.8.el5xen #1 SMP Mon Jun 25 17:19:38 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /proc/sys/xen/independent_wallclock
There is no ntpd running, but I do sync the time every few hours
with a timeserver. (On both host and guest).
There are no errors reported.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install Guest like any other RHEL5 system in HAV mode
2. run guest
3. wait a few hours, run "date; date;"
Dates differ by several seconds/minutes.
Any News here? We running into serious problems here, because we using the
Xen-Enabled hosts in a HA-Linux Cluster environment and all the Cluster-
Failover mechamisms are broken because of the flapping time.
This clusters are unfortunately in production already which makes this really
critical for us.
Why do you have /proc/sys/xen/independent_wallclock set to 1? If these are
paravirt guests (which it seems like they are), you really want to have
independent_wallclock set to 0 so that it will keep the time in sync with the
HV. Otherwise it is a losing battle, since the guest cannot account for time it
wasn't on the processor at all.
At first /proc/sys/xen/independent_wallclock was set to 0 on the xen-host.
But we also had the problem with the guest there. We tried to fix the problem
by setting this to 1, and enabling a npt-sync on the guest. But even with the
npt-sync the times keep drifting apart.
(Currently - update of the guest is 6 Days, and two "/bin/date" calls in the
shell right after each other print a difference of 10 minutes.)
I am not sure if the guest is paravirtual - it completely hardware emulated,
there is nothing xen-related installed on it. (Its a plain RHEL5 install.)
If you need more information I am more to happy to provide it.
Oh, OK, I was confused. You are running a fully-virtualized guest; in that
case, independent_wallclock really does nothing for you. It's an optimization
for paravirtualized guests. You should really leave it enabled (set to 1) in
As far as the guest drift is concerned, sometimes it has to do with the actual
physical clock drifting, and sometimes it has to do with the guest kernel
requesting too many interrupts a second, which the host can't provide. If you
run NTP inside the guest, does that help keep it in sync? Is either the dom0 or
the guest very heavily loaded? Are you perhaps running a number of guests at
the same time?
A NPT sync within the guest does indeed correct the time for one of two dates.
The other date corrects itself by the same offset, but is still wrong.
And it is the only guest on the Xen host and we have very low load.
However, we have some new information now - when we set the virtual CPUs from
two to one, then it is ok. So, its seems to be a xen bug that only happens
when the "vcpus"-Settings is bigger then 1.
Its not a good workaround because the host is a quad-core machine and we are
basically just using one core now....
I have the same problem with RHEL5 hosts and rhel4 and rhel5 guests on it.
What's the status of this bug, cause it's giving me a lott of problems?
now with RedHat 5.1 available with Xen 3.1 included, we just stopped to use
fully virtualized hosts. Paravirtual now works MUCH better and the virtual
machines are multiple times faster as with the "old" method.
However, we tested also the above setup (fully virtualized, more than one
virtual CPUs assigned) and having the exact same bug. Its even much worse, the
drifting of time is more extreme.
We just reinstalled all our xen machines to paravirtual and things are much
partly in reply to comment 7, partly some more information.
We are running RHEL5.1 latest patch level 64-bits on intel quad core cpus with
16GB of memory. 1, sometimes 2 guests on 1 host. The hosts keep up with the
time quiet well, but the fully virtualized guests not. We cannot make all
guests para because we need some 32-bits guests as well. And they need to be
fully virtualized. Within an hour, the time drifts a couple of seconds. So this
is getting really frustrating.
OK. There are a couple of things to try here to keep your fully virtualized
guest clocks in sync:
1. Pass "clocksource=pit" in the guest kernel command-line; this should force
it to use the emulated PIT timesource, which seems to be a little more accurate
in the virtual environment than the emulated HPET.
2. Using RHEL-5.1 or later, use the tick divider by using "divider=10" on the
guest kernel command-line. This will effectively make the guest use a 100HZ
clock instead of a 1000HZ one, reducing the load on the HV and possibly keeping
your clocks in better sync. Note a couple of things here, however; you need at
least 2.6.18-53.1.4.el5 (since there were bugs before that), and you can't
currently use the divider option in conjunction with the recommendation in step
1 (because of BZ 427588).
3. Make sure to run NTP inside the guests to keep the clocks in sync.
Please let us know if some combination of the above improves the situation for you.
unfortunately I have no longer fully virtualized systems around. I order to
fix this problem we reinstalled all our systems in way we could use a
Sorry, so I am in no position to help you here testing anymore.
We are also experiencing problems with clock drift in fully-virtualized guests.
From our experience, running a paravirtualized guest is the best solution; not
only is the performance near-native, but clock drift isn't an issue.
Unfortunately, as with AJ in comment 8, we can't make all of our guests
paravirtualized, because some of them are 32-bit guests.
I'll try the workarounds in comment 9 and report our experiences.
Waiting on feedback from the reporters, so I'll put this in NEEDINFO for now.
This looks related to bug 449346, which is next on my todo list.
I am running RHEL 5.2 for my base xen server. Any fully virtualized RHEL 5.2 install I have I see this bug when more then 1 VCPU is used. I have a test system setup to track this down. I am willing to get any information needed or test any sugggested fixes on this test system. When under load you can literally see the minutes roll by. Please let me know I am willing to do any testing requested.
I am backporting a dozen or so HVM timer fixes from newer upstream Xen releases into RHEL 5. I will upload test RPMs once I have finished the backport and done some initial sanity testing on my systems.
On further reflection, one VCPU being behind the other may also be caused by the disk emulation code in qemu-dm, which can block timer interrupts going to the virtual CPU that handles the disk in the guest.
I have done a backport of the upstream qemu AIO code, which should alleviate that part of the problem to the point of it disappearing. I have some (experimental!) test RPMs of the AIO backport available at http://people.redhat.com/riel/.xen-aio/
If you are willing to experiment, trying out those RPMs could give us an important data point.
This bug can be caused by a combination of two main factors:
- while doing disk IO, one VCPU of an HVM guest can miss timer ticks
- Xen did not re-deliver those missed timer ticks later on, causing clock skew between VCPUs inside an HVM guest
Both of these issues should be resolved with the backport of the AIO disk handling code and upstream Xen 'no missed-tick accounting' timer code. Please test the test RPMs from http://people.redhat.com/riel/.xenaiotime/ and let us know if those (experimental!) test packages resolve the issue.
I believe this issue is resolved with the changes from Comment #17.
The fix is in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!
*** This bug has been marked as a duplicate of bug 449346 ***