Hide Forgot
libvirt provides "expected downtime" as part of job statistics, would that be enough? It's not the exact number though.
(In reply to Michal Skrivanek from comment #1) > libvirt provides "expected downtime" as part of job statistics, would that > be enough? It's not the exact number though. No, this request is for webadmin portal to report in the Events section the incurred downtime by each live migration.
(In reply to Julio Entrena Perez from comment #4) > (In reply to Michal Skrivanek from comment #1) > > libvirt provides "expected downtime" as part of job statistics, would that > > be enough? It's not the exact number though. > > No, this request is for webadmin portal to report in the Events section the > incurred downtime by each live migration. the need to see that in portal is understood. Correlating timestamps from src and dst hosts would be difficult. We're polling for task status periodically so we can use the last one as a really close estimate. In most cases this should correspond to the real downtime Other possibility is to report it afterwards. If we do it in RHEV-M it still may be misleading if src and dst host time differs. IMHO libvirt/qemu should provide such value if it needs to be really exact
(In reply to Michal Skrivanek from comment #5) > Other possibility is to report it afterwards. That's indeed what the customer expects: downtime reported after live migration completion. Currently RHEV-M webadmin portal reports the following in the Events section after a successful live migration: Migration complete (VM: vm_name, Source Host: host_name) They expect to see: Migration complete (VM: vm_name, Source Host: host_name, Downtime xxx ms)
posted at: http://gerrit.ovirt.org/#/c/16399
(In reply to Shahar Havivi from comment #7) > posted at: http://gerrit.ovirt.org/#/c/16399 Is this measuring the time elapsed between a VM is suspended in source host and the VM is resumed in destination host? Proposed patch seems to be measuring the duration of the entire live migration. This request is to report the *downtime* experienced by the VM during the live migration, that is the amount of time the VM is not running in any of the hosts, or in other words, the amount of time between the "Suspended" event in source host and the "Resumed" event in destination host.
(In reply to Julio Entrena Perez from comment #8) You are right, There will be different patch for this bug. This patch may be posted because it give the user additional info for the time that the migration took time.
(In reply to Shahar Havivi from comment #9) > (In reply to Julio Entrena Perez from comment #8) > You are right, > There will be different patch for this bug. Thanks for clarifying this. > > This patch may be posted because it give the user additional info for the > time that the migration took time. Thanks Shahar, customer would welcome RHEV-M reporting the duration of the entire live migration too in addition to the incurred downtime during it.
Scott, is this scoped for 3.5 ?
one more thing - we should ensure hosts time are in sync. Currently we alert when the drift is 300s, that's too much, we need something like 100ms...
setting Release note flag since we must mention the change of time drift tolerance from 5 mins to 100ms
(In reply to Michal Skrivanek from comment #17) > setting Release note flag since we must mention the change of time drift > tolerance from 5 mins to 100ms maybe also worth noting that accordingly, the name of the configuration option changed from HostTimeDriftInSec to HostTimeDriftInMS (when using engine-config)
see enhancement in libvirt reporting (bug 1213434), should provide more accurate numbers
ovirt-3.6.0-3 release
Verify with: Setup: RHEVM Version: 3.6.1.2-0.1.el6 vdsm:vdsm-4.17.13-1.el7ev libvirt:libvirt-1.2.17-13.el7_2.2 Test cases according to Polarion test case. restuls: PASS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html