Bug 624570 - [RFC] sync guest time post savevm/loadvm
[RFC] sync guest time post savevm/loadvm
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.0
All Linux
low Severity medium
: beta
: ---
Assigned To: Marcelo Tosatti
Virtualization Bugs
:
Depends On:
Blocks: 580953
  Show dependency treegraph
 
Reported: 2010-08-16 22:09 EDT by Shirley Zhou
Modified: 2015-03-04 19:52 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-04 16:08:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shirley Zhou 2010-08-16 22:09:15 EDT
Description of problem:
time drift after savevm/loadvm

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.109.el6.x86_64
kernel-2.6.32-63.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.do sync time on host
ntpdate -b clock.redhat.com
2.run rhel6 guest on above host as following:
 /usr/libexec/qemu-kvm -m 4G -smp 4 -cpu qemu64,+x2apic -usbdevice tablet -drive file=/mnt/rhel6.qcow2,if=none,id=drive-virtio0,boot=on,werror=stop,rerror=stop,cache=none,format=qcow2 -device ide-drive,drive=drive-virtio0,id=virtio-blk-pci0 -netdev tap,id=hostnet0,script=/mnt/qemu-ifup,vhost=on,ifname=virtio_nic_2 -device virtio-net-pci,netdev=hostnet0,mac=52:54:00:cc:7e:f7,bus=pci.0,id=virtio1  -uuid a2341245-8765-1234-95da-1dd0a8891cc4 -name rhel6 -qmp tcp:0:4446,server,nowait   -device virtio-balloon-pci,id=ba1 -monitor stdio -boot c -vnc :1  -no-kvm-pit-reinjection   -rtc base=utc,clock=host,driftfix=slew
2.do sync time on guest
ntpdate -b clock.redhat.com
3.query time
ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset -0.000696, delay 0.32861
17 Aug 09:56:46 ntpdate[1970]: adjust time server 66.187.233.4 offset -0.000696 sec
4.do savevm from monitor,then query time
ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset -0.090793, delay 0.33022
17 Aug 10:04:55 ntpdate[1975]: adjust time server 66.187.233.4 offset -0.090793 sec
5.do loadvm, then query time
ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset 78.049035, delay 0.32887
17 Aug 10:05:36 ntpdate[1994]: step time server 66.187.233.4 offset 78.049035 sec
  
Actual results:
after step5,there is huge time drift happens.

Expected results:
There should not be so much time drift.

Additional info:
Comment 2 RHEL Product and Program Management 2010-08-16 22:38:36 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 3 Dor Laor 2010-11-21 17:33:22 EST
Please try with -rtc=localtime
Comment 4 Shirley Zhou 2010-11-22 00:03:14 EST
(In reply to comment #3)
> Please try with -rtc=localtime

Try this issue with time device option as :
-no-kvm-pit-reinjection -rtc base=localtime,clock=host,driftfix=slew

And this bug also reproduce as :

Guest: RHEL6.0 64 bit

1. before save snapshot:
# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset -0.018592, delay 0.35168
21 Nov 23:56:27 ntpdate[2548]: adjust time server 66.187.233.4 offset -0.018592 sec
2.after save snapshot:
# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset -0.001604, delay 0.31325
21 Nov 23:56:53 ntpdate[2549]: adjust time server 66.187.233.4 offset -0.001604 sec
3. after load snapshot
# ntpdate -q clock.redhat.com
server 66.187.233.4, stratum 1, offset 70.323973, delay 0.31897
21 Nov 23:58:16 ntpdate[2550]: step time server 66.187.233.4 offset 70.323973 sec

From above result, we can see huge time drift after loadvm.
Comment 5 Dor Laor 2010-11-22 02:59:35 EST
It might be the time it takes to load the image.
Worth checking
Comment 6 Zachary Amsden 2010-11-22 16:57:25 EST
I don't find this particularly surprising.

You've just stopped running of the guest by loading a snapshot, and you expect it to have been keeping up with real time while it was not running?

If you stop a guest and restart it, you'll need to resync it with time servers as a manual action.  NTP is not designed to cope with service outages, it works properly only a continually running machine.  It will absolutely show a huge offset after a loadvm, which corresponds to the time it was not running.

I'm not sure the slew computation will properly compensate for an offset of time like this, which could be a bug, or not, depending on your perspective.  Obviously, it is undesirable to send millions of interrupts to a VM which has been suspended and loaded a few days later, so there are good reasons not to use slew to catch up time lost while the VM was not running.
Comment 7 Dor Laor 2010-11-25 05:51:01 EST
Zach, if the -rtc localtime info would override the rtc data saved in the VM it will fix things. Assuming the ntp and other apps will sync up from the OS (even the OS needs to get an interrupt about the new rtc change.
Let's move it into RFC and throw it to 6.2 or later.
Comment 8 Glauber Costa 2010-11-25 08:26:11 EST
I am in agreement with Dor here.

upon load, there is a window of opportunity where we can do something, just not sure yet what's the best strategy.
Comment 9 Zachary Amsden 2010-11-29 12:32:35 EST
Best strategy: configure NTP to have a 5 second rejection window and quit upon falling out of window.

Then configure scripts so NTP will restart and force sync time from the server upon exit.

No engineering changes need to be made, the software support for this should already exist.
Comment 11 Marcelo Tosatti 2011-08-04 16:07:08 EDT
I agree with Zach, the current behaviour is correct. Correcting the
guest clock upon resuming is responsability of ntp, not
the hypervisor hardware emulation.
Comment 12 Marcelo Tosatti 2011-08-04 16:08:07 EDT
(In reply to comment #6)
> I don't find this particularly surprising.
> 
> You've just stopped running of the guest by loading a snapshot, and you expect
> it to have been keeping up with real time while it was not running?
> 
> If you stop a guest and restart it, you'll need to resync it with time servers
> as a manual action.  NTP is not designed to cope with service outages, it works
> properly only a continually running machine.  It will absolutely show a huge
> offset after a loadvm, which corresponds to the time it was not running.
> 
> I'm not sure the slew computation will properly compensate for an offset of
> time like this, which could be a bug, or not, depending on your perspective. 

If it does not compensate, then ntp should step the guest clock into 
the correct value.

> Obviously, it is undesirable to send millions of interrupts to a VM which has
> been suspended and loaded a few days later, so there are good reasons not to
> use slew to catch up time lost while the VM was not running.
Comment 13 Marcelo Tosatti 2011-08-04 16:08:46 EDT
Closing as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.