Bug 970711
Summary: | [RFE] Report downtime for each live migration | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Julio Entrena Perez <jentrena> |
Component: | RFEs | Assignee: | Shahar Havivi <shavivi> |
Status: | CLOSED ERRATA | QA Contact: | Israel Pinto <ipinto> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.1.4 | CC: | iheim, ipinto, istein, jentrena, lpeer, michal.skrivanek, mtessun, nbarcet, pdwyer, rbalakri, shavivi, sherold |
Target Milestone: | ovirt-3.6.0-rc | Keywords: | FutureFeature, Improvement |
Target Release: | 3.6.0 | Flags: | istein:
needinfo+
sherold: Triaged+ |
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Release Note | |
Doc Text: |
With this release, the downtime during a virtual machine migration is reported. This is the duration of the handover time needed to transfer the execution from the source host to the destination host (the last phase of migration).
Note: as part of this enhancement a more strict clock synchronization is enforced between the Manager and hosts. Previously, there was an alert when the host was 5 minutes off the Manager time; now it is 100 ms. The reason is that for accurate downtime reporting the source and destination hosts must have the same clock time. This may cause a lot of new alerts in environments which are not configured properly. The configuration option (used in engine-config) has changed from 'HostTimeDriftInSec' to 'HostTimeDriftInMS'.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-09 20:31:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1063486, 1063724, 1138570, 1162588, 1208772, 1213434 | ||
Bug Blocks: |
Comment 1
Michal Skrivanek
2013-06-12 05:23:19 UTC
(In reply to Michal Skrivanek from comment #1) > libvirt provides "expected downtime" as part of job statistics, would that > be enough? It's not the exact number though. No, this request is for webadmin portal to report in the Events section the incurred downtime by each live migration. (In reply to Julio Entrena Perez from comment #4) > (In reply to Michal Skrivanek from comment #1) > > libvirt provides "expected downtime" as part of job statistics, would that > > be enough? It's not the exact number though. > > No, this request is for webadmin portal to report in the Events section the > incurred downtime by each live migration. the need to see that in portal is understood. Correlating timestamps from src and dst hosts would be difficult. We're polling for task status periodically so we can use the last one as a really close estimate. In most cases this should correspond to the real downtime Other possibility is to report it afterwards. If we do it in RHEV-M it still may be misleading if src and dst host time differs. IMHO libvirt/qemu should provide such value if it needs to be really exact (In reply to Michal Skrivanek from comment #5) > Other possibility is to report it afterwards. That's indeed what the customer expects: downtime reported after live migration completion. Currently RHEV-M webadmin portal reports the following in the Events section after a successful live migration: Migration complete (VM: vm_name, Source Host: host_name) They expect to see: Migration complete (VM: vm_name, Source Host: host_name, Downtime xxx ms) posted at: http://gerrit.ovirt.org/#/c/16399 (In reply to Shahar Havivi from comment #7) > posted at: http://gerrit.ovirt.org/#/c/16399 Is this measuring the time elapsed between a VM is suspended in source host and the VM is resumed in destination host? Proposed patch seems to be measuring the duration of the entire live migration. This request is to report the *downtime* experienced by the VM during the live migration, that is the amount of time the VM is not running in any of the hosts, or in other words, the amount of time between the "Suspended" event in source host and the "Resumed" event in destination host. (In reply to Julio Entrena Perez from comment #8) You are right, There will be different patch for this bug. This patch may be posted because it give the user additional info for the time that the migration took time. (In reply to Shahar Havivi from comment #9) > (In reply to Julio Entrena Perez from comment #8) > You are right, > There will be different patch for this bug. Thanks for clarifying this. > > This patch may be posted because it give the user additional info for the > time that the migration took time. Thanks Shahar, customer would welcome RHEV-M reporting the duration of the entire live migration too in addition to the incurred downtime during it. Scott, is this scoped for 3.5 ? one more thing - we should ensure hosts time are in sync. Currently we alert when the drift is 300s, that's too much, we need something like 100ms... setting Release note flag since we must mention the change of time drift tolerance from 5 mins to 100ms (In reply to Michal Skrivanek from comment #17) > setting Release note flag since we must mention the change of time drift > tolerance from 5 mins to 100ms maybe also worth noting that accordingly, the name of the configuration option changed from HostTimeDriftInSec to HostTimeDriftInMS (when using engine-config) see enhancement in libvirt reporting (bug 1213434), should provide more accurate numbers ovirt-3.6.0-3 release Verify with: Setup: RHEVM Version: 3.6.1.2-0.1.el6 vdsm:vdsm-4.17.13-1.el7ev libvirt:libvirt-1.2.17-13.el7_2.2 Test cases according to Polarion test case. restuls: PASS Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html |