Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
If a Virtual Machine running under KVM on a CentOS 6.2 host is saved, and then is restored after a few hours, the time in the VM will not sync with the real time, it will continue from the moment it was suspended. The VM uses kvm-clock as time source (recommended according to docs I've found). Even using NTP will not help because the time shift is too big for NTP to fix it.
Reported to CentOS bug tracker: http://bugs.centos.org/view.php?id=5726
Unfortunately, we can't do much about it because nothing happened from the guest OS' point of view. It has no idea it was suspended and resumed and we have no way of telling this to the guest OS. Therefore I'm closing the bug as CANTFIX.
However, it might be something that qemu guest agent could help with.
This can not be right. If there was "no way of telling" things to the guest OS, then a mere pause would desync the clock as well. However it does not happen. How come the same mechanism that resets the clock after a suspend, can not be employed to reset the clock after a managed save? It seems like a genuine bug to me.
Consider:
rabbit@Dungeon:~$ clockdiff 192.168.58.165
.....
host=192.168.58.165 rtt=237(295)ms/0ms delta=-1ms/-1ms Tue Jun 19 08:16:36 2012
rabbit@Dungeon:~$ si virsh suspend ws_lin_dev
Domain ws_lin_dev suspended
rabbit@Dungeon:~$ clockdiff 192.168.58.165
192.168.58.165 is down
rabbit@Dungeon:~$ si virsh resume ws_lin_dev
Domain ws_lin_dev resumed
rabbit@Dungeon:~$ clockdiff 192.168.58.165
..
host=192.168.58.165 rtt=562(280)ms/0ms delta=-1ms/-1ms Tue Jun 19 08:17:52 2012
rabbit@Dungeon:~$ si virsh managedsave ws_lin_dev
Domain ws_lin_dev state saved by libvirt
rabbit@Dungeon:~$ clockdiff 192.168.58.165
192.168.58.165 is down
rabbit@Dungeon:~$ si virsh start ws_lin_dev
Domain ws_lin_dev started
rabbit@Dungeon:~$ clockdiff 192.168.58.165
..
host=192.168.58.165 rtt=562(280)ms/0ms delta=-57417ms/-57417ms Tue Jun 19 08:19:16 2012
# ^^ OUCH - 57 secs
rabbit@Dungeon:~$ si virsh suspend ws_lin_dev
Domain ws_lin_dev suspended
rabbit@Dungeon:~$ sleep 20
# sleep a while while paused
rabbit@Dungeon:~$ si virsh resume ws_lin_dev
Domain ws_lin_dev resumed
rabbit@Dungeon:~$ clockdiff 192.168.58.165
.
host=192.168.58.165 rtt=750(187)ms/0ms delta=-57417ms/-57417ms Tue Jun 19 08:20:00 2012
# difference exact down to the millisecond after 20 secs of paused sleep
Investigating further I found this warnocked patch, which looks like exactly the thing in question. Please reopen this ticket so that there is a placeholder of when this fix will land properly.
http://www.mail-archive.com/kvm@vger.kernel.org/msg67881.html
(In reply to comment #5)
> Investigating further I found this warnocked patch, which looks like exactly
> the thing in question. Please reopen this ticket so that there is a
> placeholder of when this fix will land properly.
>
> http://www.mail-archive.com/kvm@vger.kernel.org/msg67881.html
(For the record)
Looks like that patch (by Marcelo Tosatti) appeared in bugzilla #694801 and has been applied to the RHEL 6.3 GA kernel.
This is still a frequently-requested issue on upstream lists; qemu 2.0 made it possible for the guest-agent to do guest-set-time without arguments to have the guest re-read the hardware clock and adjust software time from that, and qemu 2.1 is adding rtc-reset-reinjection to tell qemu when the agent has been used to force guest time and therefore qemu no longer needs to slew clock interrupts. Libvirt 1.2.5 added a virDomainSetTime API to trigger the guest agent command, and future libvirt versions may add a flag to that API to also trigger the rtc-reset-reinjection followup. There is also an idea of adding an <on_resume> action to the domain XML of libvirt to allow libvirt to automatically trigger time reset on resume (it can't be on by default, because it requires guest interaction which is only safe if the guest is trusted; but could be explicitly enabled by management as needed). Of course, as this is still under active work upstream, there is no telling how soon it can be backported downstream, or even if such a backport is feasible without a rebase.