RHEV-M should execute commands to sync guest time upon guest unpause. This is RFE for OpenStack: https://access.redhat.com/support/cases/00991668/ https://bugzilla.redhat.com/show_bug.cgi?id=1049040 Several customers have asked for this to be functional.
Marcelo, thanks a lot. I updated the article: https://access.redhat.com/solutions/1244263
(In reply to Marcelo Tosatti from comment #0) > RHEV-M should execute commands to sync guest time upon guest unpause. is there a reason why this would not be "hidden" in libvirt on cont?
(In reply to Michal Skrivanek from comment #12) > (In reply to Marcelo Tosatti from comment #0) > > RHEV-M should execute commands to sync guest time upon guest unpause. > > is there a reason why this would not be "hidden" in libvirt on cont? Yes there is. Firstly, the API comes in pair: get & set time. Secondly, libvirt is unable to make such decision whether guest time should be synced when doing domain resume. Libvirt knows nothing about guest time policy, whether it's using NTP or other mechanisms to sync time. It's mgmt app to make such decision.
(In reply to Michal Privoznik from comment #13) > (In reply to Michal Skrivanek from comment #12) > > (In reply to Marcelo Tosatti from comment #0) > > > RHEV-M should execute commands to sync guest time upon guest unpause. > > > > is there a reason why this would not be "hidden" in libvirt on cont? > > Yes there is. Firstly, the API comes in pair: get & set time. Secondly, > libvirt is unable to make such decision whether guest time should be synced > when doing domain resume. Libvirt knows nothing about guest time policy, > whether it's using NTP or other mechanisms to sync time. It's mgmt app to > make such decision. Well, RHEV does not have that guest time policy either. There are no plans to add any. If RHEV had such a mechanism it would have to be an interaction with the guest, i.e. using guest agent. And when you have such a thing there's no reason for anything else in qemu or libvirt, guest agent can just set the time to whatever. If it's needed to have a decision whether to sync time or not, it needs to be part of the same call. And it doesn't make much sense to me to not make it a default
since the resume API doesn't take any arguments yet we have to make an extra call in RHEV to sync the time.
(In reply to Michal Privoznik from comment #13) > (In reply to Michal Skrivanek from comment #12) > > (In reply to Marcelo Tosatti from comment #0) > > > RHEV-M should execute commands to sync guest time upon guest unpause. > > > > is there a reason why this would not be "hidden" in libvirt on cont? > > Yes there is. Firstly, the API comes in pair: get & set time. Secondly, > libvirt is unable to make such decision whether guest time should be synced > when doing domain resume. Libvirt knows nothing about guest time policy, > whether it's using NTP or other mechanisms to sync time. It's mgmt app to > make such decision. I can't imagine a situation when someone would *not* want to reset the time to somethign close to reality. I think it makes sense to make it a libvirt default behavior. Scott?
(In reply to Michal Skrivanek from comment #19) > (In reply to Michal Privoznik from comment #13) > > (In reply to Michal Skrivanek from comment #12) > > > (In reply to Marcelo Tosatti from comment #0) > > > > RHEV-M should execute commands to sync guest time upon guest unpause. > > > > > > is there a reason why this would not be "hidden" in libvirt on cont? > > > > Yes there is. Firstly, the API comes in pair: get & set time. Secondly, > > libvirt is unable to make such decision whether guest time should be synced > > when doing domain resume. Libvirt knows nothing about guest time policy, > > whether it's using NTP or other mechanisms to sync time. It's mgmt app to > > make such decision. > > I can't imagine a situation when someone would *not* want to reset the time > to somethign close to reality. I think it makes sense to make it a libvirt > default behavior. Scott? Well, for instance if you have NTP enabled guest (which you want to maintain continuous time) and the time that domain was suspended was just a short while, e.g. couple of seconds. And what is still a short while and what is not - that's for mgmt app to decide, not libvirt.
(In reply to Michal Privoznik from comment #20) > (In reply to Michal Skrivanek from comment #19) > > (In reply to Michal Privoznik from comment #13) > > > (In reply to Michal Skrivanek from comment #12) > > > > (In reply to Marcelo Tosatti from comment #0) > > > > > RHEV-M should execute commands to sync guest time upon guest unpause. > > > > > > > > is there a reason why this would not be "hidden" in libvirt on cont? > > > > > > Yes there is. Firstly, the API comes in pair: get & set time. Secondly, > > > libvirt is unable to make such decision whether guest time should be synced > > > when doing domain resume. Libvirt knows nothing about guest time policy, > > > whether it's using NTP or other mechanisms to sync time. It's mgmt app to > > > make such decision. > > > > I can't imagine a situation when someone would *not* want to reset the time > > to somethign close to reality. I think it makes sense to make it a libvirt > > default behavior. Scott? > > Well, for instance if you have NTP enabled guest (which you want to maintain > continuous time) and the time that domain was suspended was just a short > while, e.g. couple of seconds. And what is still a short while and what is > not - that's for mgmt app to decide, not libvirt. Michal, Sorry, can't see why this example is a valid one. Please be more verbose? Can't see why it should not be a libvirt default (the alternative is to expect all mgmt software to have this problem handled properly while users suffer from the bug).
Well, maybe that was not the best example. The other reason that libvirt should not do this automatically is that it would mask two actions into a single API. What should the API return if resuming succeeded but time reset didn't? It can't return success, it partially failed. So it will return failure. But the domain transitioned into a different state. So client needs to call yet another API to check the domain state. And since client ends up calling two APIs anyway, I rather not join two actions into a single one.
(In reply to Michal Privoznik from comment #22) > Well, maybe that was not the best example. The other reason that libvirt > should not do this automatically is that it would mask two actions into a > single API. What should the API return if resuming succeeded but time reset > didn't? It can't return success, it partially failed. So it will return > failure. But the domain transitioned into a different state. So client needs > to call yet another API to check the domain state. And since client ends up > calling two APIs anyway, I rather not join two actions into a single one. I can give you a counterexample:) During migration in final phase when everything is finished the VM is resumed on destination from paused state. You can also view it as two independent actions...if resume is the only thing failing at the very end.... (restoring needinfo in comment #19)
(In reply to Michal Skrivanek from comment #23) > (In reply to Michal Privoznik from comment #22) >> > I can give you a counterexample:) During migration in final phase when > everything is finished the VM is resumed on destination from paused state. > You can also view it as two independent actions...if resume is the only > thing failing at the very end.... That's something slightly different. Either you have a live migration, where domain vCPUs are running and you don't care whether domain got some extra cpu time, or you are doing paused migration in which case the whole operation is atomic too. This in fact is the reason I don't like the idea - we can't guarantee atomicity of two different operations, therefore they should not be hidden under the same API. To extend my example, if libvirt would detect that partially failed state, during rollback it would need to suspend the domain again, so that domain state is the same as prior to calling the api. However, it would never be, because while libvirt is trying to reset the time, domain already is being scheduled and got a host cpu time. > > (restoring needinfo in comment #19) Ah, sorry for that.
(In reply to Michal Privoznik from comment #22) > Well, maybe that was not the best example. The other reason that libvirt > should not do this automatically is that it would mask two actions into a > single API. What should the API return if resuming succeeded but time reset > didn't? It can't return success, it partially failed. Success. > So it will return failure. But the domain transitioned into a different state. How do you expect management to handle the failure? > So client needs > to call yet another API to check the domain state. And since client ends up > calling two APIs anyway, I rather not join two actions into a single one. I don't see a problem with that proposal.
(In reply to Marcelo Tosatti from comment #25) > (In reply to Michal Privoznik from comment #22) > > Well, maybe that was not the best example. The other reason that libvirt > > should not do this automatically is that it would mask two actions into a > > single API. What should the API return if resuming succeeded but time reset > > didn't? It can't return success, it partially failed. > > Success. > > > So it will return failure. But the domain transitioned into a different state. > > How do you expect management to handle the failure? > > > So client needs > > to call yet another API to check the domain state. And since client ends up > > calling two APIs anyway, I rather not join two actions into a single one. > > I don't see a problem with that proposal. So the commands would be, from the mgmt perspective: * current management sw: 1) resume 2) check for resume errors outcomes: * benefit: guest clock is synchronized if guest agent is setup properly. * downside: none. * clock sync aware management sw: 1) resume 2) check for resume errors 3) query clock sync initiated by resume 4) take action on 3. Note that there is no strict need for atomicity because error checking for clock sync usually involves reporting (resume operation did not fail). outcomes: * benefit: guest clock is synchronized if guest agent is setup properly. * downside: none.
(In reply to Marcelo Tosatti from comment #26) > (In reply to Marcelo Tosatti from comment #25) > > (In reply to Michal Privoznik from comment #22) > > > Well, maybe that was not the best example. The other reason that libvirt > > > should not do this automatically is that it would mask two actions into a > > > single API. What should the API return if resuming succeeded but time reset > > > didn't? It can't return success, it partially failed. > > > > Success. > > > > > So it will return failure. But the domain transitioned into a different state. > > > > How do you expect management to handle the failure? > > > > > So client needs > > > to call yet another API to check the domain state. And since client ends up > > > calling two APIs anyway, I rather not join two actions into a single one. > > > > I don't see a problem with that proposal. > > So the commands would be, from the mgmt perspective: > > * current management sw: > > 1) resume > 2) check for resume errors This ^^ will end up in two different API calls anyway. If the joint resume API returns an error, you have to call another API to see if domain is resumed and thus it's time sync what failed. > > outcomes: > * benefit: guest clock is synchronized if guest agent is setup properly. > * downside: none. > > * clock sync aware management sw: > 1) resume > 2) check for resume errors > 3) query clock sync initiated by resume > 4) take action on 3. > > Note that there is no strict need for atomicity because error checking > for clock sync usually involves reporting (resume operation did not fail). > > outcomes: > * benefit: guest clock is synchronized if guest agent is setup properly. > * downside: none. So when there's not downside for any of the approaches, I think we should not wrap two semantically different operations into single API.
I've proposed an RFC upstream: https://www.redhat.com/archives/libvir-list/2015-October/msg00382.html Lets see what upstream thinks. BTW: I'd appreciate any input from vdsm/RHEV developer here, since it's two low level developers discussing something here that upper layers may not even want.
it looks settled now. So I'll just note my disagreement here below and we'll use the API in RHEV:) I consider it an elementary functionality, one that you can reasonably expect any sane management client would want to do. As such an elementary management library/API should take care of such operations. It's not a big burden to push it up the stack, but I believe in the long run it would eliminate the extra work needed in higher management apps, eliminate repeating code flow patterns and issues
*** Bug 1247912 has been marked as a duplicate of this bug. ***
(In reply to Michal Skrivanek from comment #30) > it looks settled now. So I'll just note my disagreement here below and we'll > use the API in RHEV:) > I consider it an elementary functionality, one that you can reasonably > expect any sane management client would want to do. As such an elementary > management library/API should take care of such operations. > It's not a big burden to push it up the stack, but I believe in the long run > it would eliminate the extra work needed in higher management apps, > eliminate repeating code flow patterns and issues Agreed.
(In reply to Michal Skrivanek from comment #30) > it looks settled now. So I'll just note my disagreement here below and we'll > use the API in RHEV:) > I consider it an elementary functionality, one that you can reasonably > expect any sane management client would want to do. As such an elementary > management library/API should take care of such operations. > It's not a big burden to push it up the stack, but I believe in the long run > it would eliminate the extra work needed in higher management apps, > eliminate repeating code flow patterns and issues In fact, it would not. Either you will have: if (virDomainResumeFlags($dom, SYNC_TIME) < 0) { if (virDomainIsActive()) error_syncing_time; else error_resuming_domain; } you will need to: if (virDomainResume($dom) < 0) error_resuming_domain; if (virDomainSetTime($dom) < 0) error_syncing_time; OR, in case as Marcelo suggested, return success even if syncing time failed (best effort): if (virDomainResumeFlags($dom, SYNC_TIME) < 0) error_resuming_domain; vs: if (virDomainResume($dom) < 0) error_resuming_domain; virDomainSetTime($dom); At any rate - change in mgmt app code is required regardless if we wrap this into a single API because current resume API is missing flags. The only advantage is in case #3 where you save a single LOC.
*** Bug 1270302 has been marked as a duplicate of this bug. ***
Verify with Scenario: Create VM with OS and start it, connect to VM and check time with date command suspend VM for 5 minutes, connect VM and check time with date command. Guest with RHEL 7.2 OS - check pass Guest with RHEL 6.7 OS - check failed - The time did not update to current time after resume VM. VM name vm_67 ID: 72021d20-3530-4a56-b98c-411d09dea9c6 See logs attached.
(In reply to Israel Pinto from comment #37) > Verify with > > Scenario: > Create VM with OS and start it, connect to VM and check time with date > command > suspend VM for 5 minutes, connect VM and check time with date command. > Guest with RHEL 7.2 OS - check pass > Guest with RHEL 6.7 OS - check failed - The time did not update to current > time after resume VM. VM name vm_67 ID: 72021d20-3530-4a56-b98c-411d09dea9c6 > See logs attached. Verify with: RHEVM Version: 3.6.2.6-0.1.el6 VDSM: vdsm-4.17.15-0.el7ev libvirt: libvirt-1.2.17-13.el7_2.2
Created attachment 1116973 [details] engine_log
Created attachment 1116974 [details] vdsm_log
Created attachment 1116975 [details] vm_qemu_log
Created attachment 1116976 [details] libvirt_log
According to the libvirt log, setting time was attempted by libvirt but failed. What's the version of qemu-guest-agent on the RHEL6 guest? It seems the set-time capability wasn't introduced before 1.5 so if the version is older then setting time by QEMU is unsupported in the guest OS.
I install qemu-guest-agent, and it works. Verify with: RHEVM Version: 3.6.2.6-0.1.el6 VDSM: vdsm-4.17.15-0.el7ev libvirt: libvirt-1.2.17-13.el7_2.2 qemu version on guest 6.7: qemu-img-rhev-0.12.1.2-2.479.el6_7.3.x86_64 gpxe-roms-qemu-0.9.7-6.14.el6.noarch qemu-guest-agent-0.12.1.2-2.479.el6_7.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.479.el6_7.3.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html