Bug 1156194 - [RHEV-M] Time sync after resuming from suspend
Summary: [RHEV-M] Time sync after resuming from suspend
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.4.3
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-3.6.2
: 3.6.0
Assignee: Milan Zamazal
QA Contact: Israel Pinto
URL:
Whiteboard:
: 1247912 1270302 (view as bug list)
Depends On: 1115340
Blocks: 1230126 1287620
TreeView+ depends on / blocked
 
Reported: 2014-10-23 18:59 UTC by Marcelo Tosatti
Modified: 2019-08-15 04:02 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when a virtual machine was resumed from a suspended state its time was not updated, resulting in incorrect time in the guest operating system. This happened because the functionality to update time on resume was missing in VDSM. Now, the functionality has been added and the time is now updated after a virtual machine is resumed from a suspended state. As long as the guest operating system supports this feature and qemu-guest-agent is running. It is still recommended to have NTP services running in the guest operating system to provide precise time for the guest.
Clone Of:
: 1287620 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:26:49 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine_log (1.09 MB, application/zip)
2016-01-21 16:36 UTC, Israel Pinto
no flags Details
vdsm_log (666.23 KB, application/zip)
2016-01-21 16:36 UTC, Israel Pinto
no flags Details
vm_qemu_log (3.31 KB, application/zip)
2016-01-21 16:37 UTC, Israel Pinto
no flags Details
libvirt_log (1.31 MB, application/zip)
2016-01-21 16:38 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1510856 0 medium CLOSED [RFE] Time sync in VM after resuming from PAUSE state 2021-12-10 16:00:46 UTC
Red Hat Knowledge Base (Solution) 1244263 0 None None None Never
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 48860 0 master MERGED virt: vm: Update time on VM after resume 2020-11-11 19:17:36 UTC
oVirt gerrit 49212 0 master ABANDONED virt: vm: Update time on VM after migration 2020-11-11 19:17:36 UTC
oVirt gerrit 50426 0 ovirt-3.6 MERGED virt: vm: Update time on VM after resume 2020-11-11 19:17:36 UTC

Internal Links: 1510856

Description Marcelo Tosatti 2014-10-23 18:59:54 UTC
RHEV-M should execute commands to sync guest time upon guest unpause.

This is RFE for OpenStack:
https://access.redhat.com/support/cases/00991668/

https://bugzilla.redhat.com/show_bug.cgi?id=1049040

Several customers have asked for this to be functional.

Comment 10 Marina Kalinin 2014-10-30 14:11:27 UTC
Marcelo, thanks a lot.
I updated the article: https://access.redhat.com/solutions/1244263

Comment 12 Michal Skrivanek 2014-10-31 12:01:28 UTC
(In reply to Marcelo Tosatti from comment #0)
> RHEV-M should execute commands to sync guest time upon guest unpause.

is there a reason why this would not be "hidden" in libvirt on cont?

Comment 13 Michal Privoznik 2014-10-31 13:44:58 UTC
(In reply to Michal Skrivanek from comment #12)
> (In reply to Marcelo Tosatti from comment #0)
> > RHEV-M should execute commands to sync guest time upon guest unpause.
> 
> is there a reason why this would not be "hidden" in libvirt on cont?

Yes there is. Firstly, the API comes in pair: get & set time. Secondly, libvirt is unable to make such decision whether guest time should be synced when doing domain resume. Libvirt knows nothing about guest time policy, whether it's using NTP or other mechanisms to sync time. It's mgmt app to make such decision.

Comment 14 Michal Skrivanek 2014-10-31 14:38:48 UTC
(In reply to Michal Privoznik from comment #13)
> (In reply to Michal Skrivanek from comment #12)
> > (In reply to Marcelo Tosatti from comment #0)
> > > RHEV-M should execute commands to sync guest time upon guest unpause.
> > 
> > is there a reason why this would not be "hidden" in libvirt on cont?
> 
> Yes there is. Firstly, the API comes in pair: get & set time. Secondly,
> libvirt is unable to make such decision whether guest time should be synced
> when doing domain resume. Libvirt knows nothing about guest time policy,
> whether it's using NTP or other mechanisms to sync time. It's mgmt app to
> make such decision.

Well, RHEV does not have that guest time policy either. There are no plans to add any.

If RHEV had such a mechanism it would have to be an interaction with the guest, i.e. using guest agent. And when you have such a thing there's no reason for anything else in qemu or libvirt, guest agent can just set the time to whatever.

If it's needed to have a decision whether to sync time or not, it needs to be part of the same call. And it doesn't make much sense to me to not make it a default

Comment 15 Michal Skrivanek 2014-11-04 10:06:16 UTC
since the resume API doesn't take any arguments yet we have to make an extra call in RHEV to sync the time.

Comment 19 Michal Skrivanek 2015-08-12 12:27:27 UTC
(In reply to Michal Privoznik from comment #13)
> (In reply to Michal Skrivanek from comment #12)
> > (In reply to Marcelo Tosatti from comment #0)
> > > RHEV-M should execute commands to sync guest time upon guest unpause.
> > 
> > is there a reason why this would not be "hidden" in libvirt on cont?
> 
> Yes there is. Firstly, the API comes in pair: get & set time. Secondly,
> libvirt is unable to make such decision whether guest time should be synced
> when doing domain resume. Libvirt knows nothing about guest time policy,
> whether it's using NTP or other mechanisms to sync time. It's mgmt app to
> make such decision.

I can't imagine a situation when someone would *not* want to reset the time to somethign close to reality. I think it makes sense to make it a libvirt default behavior. Scott?

Comment 20 Michal Privoznik 2015-08-12 13:50:36 UTC
(In reply to Michal Skrivanek from comment #19)
> (In reply to Michal Privoznik from comment #13)
> > (In reply to Michal Skrivanek from comment #12)
> > > (In reply to Marcelo Tosatti from comment #0)
> > > > RHEV-M should execute commands to sync guest time upon guest unpause.
> > > 
> > > is there a reason why this would not be "hidden" in libvirt on cont?
> > 
> > Yes there is. Firstly, the API comes in pair: get & set time. Secondly,
> > libvirt is unable to make such decision whether guest time should be synced
> > when doing domain resume. Libvirt knows nothing about guest time policy,
> > whether it's using NTP or other mechanisms to sync time. It's mgmt app to
> > make such decision.
> 
> I can't imagine a situation when someone would *not* want to reset the time
> to somethign close to reality. I think it makes sense to make it a libvirt
> default behavior. Scott?

Well, for instance if you have NTP enabled guest (which you want to maintain continuous time) and the time that domain was suspended was just a short while, e.g. couple of seconds. And what is still a short while and what is not - that's for mgmt app to decide, not libvirt.

Comment 21 Marcelo Tosatti 2015-08-12 21:28:38 UTC
(In reply to Michal Privoznik from comment #20)
> (In reply to Michal Skrivanek from comment #19)
> > (In reply to Michal Privoznik from comment #13)
> > > (In reply to Michal Skrivanek from comment #12)
> > > > (In reply to Marcelo Tosatti from comment #0)
> > > > > RHEV-M should execute commands to sync guest time upon guest unpause.
> > > > 
> > > > is there a reason why this would not be "hidden" in libvirt on cont?
> > > 
> > > Yes there is. Firstly, the API comes in pair: get & set time. Secondly,
> > > libvirt is unable to make such decision whether guest time should be synced
> > > when doing domain resume. Libvirt knows nothing about guest time policy,
> > > whether it's using NTP or other mechanisms to sync time. It's mgmt app to
> > > make such decision.
> > 
> > I can't imagine a situation when someone would *not* want to reset the time
> > to somethign close to reality. I think it makes sense to make it a libvirt
> > default behavior. Scott?
> 
> Well, for instance if you have NTP enabled guest (which you want to maintain
> continuous time) and the time that domain was suspended was just a short
> while, e.g. couple of seconds. And what is still a short while and what is
> not - that's for mgmt app to decide, not libvirt.

Michal,

Sorry, can't see why this example is a valid one. Please be more verbose?

Can't see why it should not be a libvirt default (the alternative is to expect 
all mgmt software to have this problem handled properly while users suffer 
from the bug).

Comment 22 Michal Privoznik 2015-08-13 08:00:16 UTC
Well, maybe that was not the best example. The other reason that libvirt should not do this automatically is that it would mask two actions into a single API. What should the API return if resuming succeeded but time reset didn't? It can't return success, it partially failed. So it will return failure. But the domain transitioned into a different state. So client needs to call yet another API to check the domain state. And since client ends up calling two APIs anyway, I rather not join two actions into a single one.

Comment 23 Michal Skrivanek 2015-08-13 08:09:25 UTC
(In reply to Michal Privoznik from comment #22)
> Well, maybe that was not the best example. The other reason that libvirt
> should not do this automatically is that it would mask two actions into a
> single API. What should the API return if resuming succeeded but time reset
> didn't? It can't return success, it partially failed. So it will return
> failure. But the domain transitioned into a different state. So client needs
> to call yet another API to check the domain state. And since client ends up
> calling two APIs anyway, I rather not join two actions into a single one.

I can give you a counterexample:) During migration in final phase when everything is finished the VM is resumed on destination from paused state. You can also view it as two independent actions...if resume is the only thing failing at the very end....

(restoring needinfo in comment #19)

Comment 24 Michal Privoznik 2015-08-13 09:06:22 UTC
(In reply to Michal Skrivanek from comment #23)
> (In reply to Michal Privoznik from comment #22)
>>

> I can give you a counterexample:) During migration in final phase when
> everything is finished the VM is resumed on destination from paused state.
> You can also view it as two independent actions...if resume is the only
> thing failing at the very end....

That's something slightly different. Either you have a live migration, where domain vCPUs are running and you don't care whether domain got some extra cpu time, or you are doing paused migration in which case the whole operation is atomic too.
This in fact is the reason I don't like the idea - we can't guarantee atomicity of two different operations, therefore they should not be hidden under the same API.

To extend my example, if libvirt would detect that partially failed state, during rollback it would need to suspend the domain again, so that domain state is the same as prior to calling the api. However, it would never be, because while libvirt is trying to reset the time, domain already is being scheduled and got a host cpu time.

> 
> (restoring needinfo in comment #19)

Ah, sorry for that.

Comment 25 Marcelo Tosatti 2015-08-14 11:14:45 UTC
(In reply to Michal Privoznik from comment #22)
> Well, maybe that was not the best example. The other reason that libvirt
> should not do this automatically is that it would mask two actions into a
> single API. What should the API return if resuming succeeded but time reset
> didn't? It can't return success, it partially failed. 

Success.

> So it will return failure. But the domain transitioned into a different state. 

How do you expect management to handle the failure?

> So client needs
> to call yet another API to check the domain state. And since client ends up
> calling two APIs anyway, I rather not join two actions into a single one.

I don't see a problem with that proposal.

Comment 26 Marcelo Tosatti 2015-08-14 11:24:51 UTC
(In reply to Marcelo Tosatti from comment #25)
> (In reply to Michal Privoznik from comment #22)
> > Well, maybe that was not the best example. The other reason that libvirt
> > should not do this automatically is that it would mask two actions into a
> > single API. What should the API return if resuming succeeded but time reset
> > didn't? It can't return success, it partially failed. 
> 
> Success.
> 
> > So it will return failure. But the domain transitioned into a different state. 
> 
> How do you expect management to handle the failure?
> 
> > So client needs
> > to call yet another API to check the domain state. And since client ends up
> > calling two APIs anyway, I rather not join two actions into a single one.
> 
> I don't see a problem with that proposal.

So the commands would be, from the mgmt perspective:

* current management sw:

1) resume
2) check for resume errors

outcomes:
* benefit: guest clock is synchronized if guest agent is setup properly.
* downside: none.

* clock sync aware management sw:
1) resume
2) check for resume errors
3) query clock sync initiated by resume
4) take action on 3.

Note that there is no strict need for atomicity because error checking 
for clock sync usually involves reporting (resume operation did not fail).

outcomes:
* benefit: guest clock is synchronized if guest agent is setup properly.
* downside: none.

Comment 28 Michal Privoznik 2015-10-12 11:32:34 UTC
(In reply to Marcelo Tosatti from comment #26)
> (In reply to Marcelo Tosatti from comment #25)
> > (In reply to Michal Privoznik from comment #22)
> > > Well, maybe that was not the best example. The other reason that libvirt
> > > should not do this automatically is that it would mask two actions into a
> > > single API. What should the API return if resuming succeeded but time reset
> > > didn't? It can't return success, it partially failed. 
> > 
> > Success.
> > 
> > > So it will return failure. But the domain transitioned into a different state. 
> > 
> > How do you expect management to handle the failure?
> > 
> > > So client needs
> > > to call yet another API to check the domain state. And since client ends up
> > > calling two APIs anyway, I rather not join two actions into a single one.
> > 
> > I don't see a problem with that proposal.
> 
> So the commands would be, from the mgmt perspective:
> 
> * current management sw:
> 
> 1) resume
> 2) check for resume errors

This ^^ will end up in two different API calls anyway. If the joint resume API returns an error, you have to call another API to see if domain is resumed and thus it's time sync what failed.

> 
> outcomes:
> * benefit: guest clock is synchronized if guest agent is setup properly.
> * downside: none.
> 
> * clock sync aware management sw:
> 1) resume
> 2) check for resume errors
> 3) query clock sync initiated by resume
> 4) take action on 3.
> 
> Note that there is no strict need for atomicity because error checking 
> for clock sync usually involves reporting (resume operation did not fail).
> 
> outcomes:
> * benefit: guest clock is synchronized if guest agent is setup properly.
> * downside: none.

So when there's not downside for any of the approaches, I think we should not wrap two semantically different operations into single API.

Comment 29 Michal Privoznik 2015-10-13 11:14:53 UTC
I've proposed an RFC upstream:

https://www.redhat.com/archives/libvir-list/2015-October/msg00382.html

Lets see what upstream thinks.

BTW: I'd appreciate any input from vdsm/RHEV developer here, since it's two low level developers discussing something here that upper layers may not even want.

Comment 30 Michal Skrivanek 2015-10-14 06:07:30 UTC
it looks settled now. So I'll just note my disagreement here below and we'll use the API in RHEV:)
I consider it an elementary functionality, one that you can reasonably expect any sane management client would want to do. As such an elementary management library/API should take care of such operations. 
It's not a big burden to push it up the stack, but I believe in the long run it would eliminate the extra work needed in higher management apps, eliminate repeating code flow patterns and issues

Comment 31 Moran Goldboim 2015-10-20 09:58:30 UTC
*** Bug 1247912 has been marked as a duplicate of this bug. ***

Comment 32 Marcelo Tosatti 2015-10-25 19:54:40 UTC
(In reply to Michal Skrivanek from comment #30)
> it looks settled now. So I'll just note my disagreement here below and we'll
> use the API in RHEV:)
> I consider it an elementary functionality, one that you can reasonably
> expect any sane management client would want to do. As such an elementary
> management library/API should take care of such operations. 
> It's not a big burden to push it up the stack, but I believe in the long run
> it would eliminate the extra work needed in higher management apps,
> eliminate repeating code flow patterns and issues

Agreed.

Comment 33 Michal Privoznik 2015-10-26 08:38:32 UTC
(In reply to Michal Skrivanek from comment #30)
> it looks settled now. So I'll just note my disagreement here below and we'll
> use the API in RHEV:)
> I consider it an elementary functionality, one that you can reasonably
> expect any sane management client would want to do. As such an elementary
> management library/API should take care of such operations. 
> It's not a big burden to push it up the stack, but I believe in the long run
> it would eliminate the extra work needed in higher management apps,
> eliminate repeating code flow patterns and issues

In fact, it would not. Either you will have:

if (virDomainResumeFlags($dom, SYNC_TIME) < 0) {
  if (virDomainIsActive())
    error_syncing_time;
  else
    error_resuming_domain;
}

you will need to:

if (virDomainResume($dom) < 0)
  error_resuming_domain;
if (virDomainSetTime($dom) < 0)
  error_syncing_time;


OR, in case as Marcelo suggested, return success even if syncing time failed (best effort):

if (virDomainResumeFlags($dom, SYNC_TIME) < 0)
  error_resuming_domain;

vs:

if (virDomainResume($dom) < 0)
  error_resuming_domain;
virDomainSetTime($dom);

At any rate - change in mgmt app code is required regardless if we wrap this into a single API because current resume API is missing flags. The only advantage is in case #3 where you save a single LOC.

Comment 36 Michal Skrivanek 2015-12-02 13:26:31 UTC
*** Bug 1270302 has been marked as a duplicate of this bug. ***

Comment 37 Israel Pinto 2016-01-21 16:31:25 UTC
Verify with 

Scenario:
Create VM with OS and start it, connect to VM and check time with date command
suspend VM for 5 minutes, connect VM and check time with date command.
Guest with RHEL 7.2 OS - check pass  
Guest with RHEL 6.7 OS - check failed - The time did not update to current time after resume VM. VM name vm_67 ID: 72021d20-3530-4a56-b98c-411d09dea9c6
See logs attached.

Comment 38 Israel Pinto 2016-01-21 16:35:16 UTC
(In reply to Israel Pinto from comment #37)
> Verify with 
> 
> Scenario:
> Create VM with OS and start it, connect to VM and check time with date
> command
> suspend VM for 5 minutes, connect VM and check time with date command.
> Guest with RHEL 7.2 OS - check pass  
> Guest with RHEL 6.7 OS - check failed - The time did not update to current
> time after resume VM. VM name vm_67 ID: 72021d20-3530-4a56-b98c-411d09dea9c6
> See logs attached.

Verify with: 
RHEVM Version: 3.6.2.6-0.1.el6
VDSM: vdsm-4.17.15-0.el7ev
libvirt:  libvirt-1.2.17-13.el7_2.2

Comment 39 Israel Pinto 2016-01-21 16:36:13 UTC
Created attachment 1116973 [details]
engine_log

Comment 40 Israel Pinto 2016-01-21 16:36:50 UTC
Created attachment 1116974 [details]
vdsm_log

Comment 41 Israel Pinto 2016-01-21 16:37:34 UTC
Created attachment 1116975 [details]
vm_qemu_log

Comment 42 Israel Pinto 2016-01-21 16:38:24 UTC
Created attachment 1116976 [details]
libvirt_log

Comment 43 Milan Zamazal 2016-01-22 09:51:26 UTC
According to the libvirt log, setting time was attempted by libvirt but failed.

What's the version of qemu-guest-agent on the RHEL6 guest? It seems the set-time capability wasn't introduced before 1.5 so if the version is older then setting time by QEMU is unsupported in the guest OS.

Comment 44 Israel Pinto 2016-01-24 07:56:30 UTC
I install qemu-guest-agent, and it works. 
Verify with: 
RHEVM Version: 3.6.2.6-0.1.el6
VDSM: vdsm-4.17.15-0.el7ev
libvirt:  libvirt-1.2.17-13.el7_2.2

qemu version on guest 6.7:
qemu-img-rhev-0.12.1.2-2.479.el6_7.3.x86_64
gpxe-roms-qemu-0.9.7-6.14.el6.noarch
qemu-guest-agent-0.12.1.2-2.479.el6_7.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.479.el6_7.3.x86_64

Comment 46 errata-xmlrpc 2016-03-09 19:26:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.