Bug 970711 - [RFE] Report downtime for each live migration
[RFE] Report downtime for each live migration
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: RFEs (Show other bugs)
3.1.4
All Linux
medium Severity medium
: ovirt-3.6.0-rc
: 3.6.0
Assigned To: Shahar Havivi
Israel Pinto
: FutureFeature, Improvement
Depends On: 1063486 1063724 1138570 1162588 1208772 1213434
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-04 12:51 EDT by Julio Entrena Perez
Modified: 2016-03-09 15:31 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Release Note
Doc Text:
With this release, the downtime during a virtual machine migration is reported. This is the duration of the handover time needed to transfer the execution from the source host to the destination host (the last phase of migration). Note: as part of this enhancement a more strict clock synchronization is enforced between the Manager and hosts. Previously, there was an alert when the host was 5 minutes off the Manager time; now it is 100 ms. The reason is that for accurate downtime reporting the source and destination hosts must have the same clock time. This may cause a lot of new alerts in environments which are not configured properly. The configuration option (used in engine-config) has changed from 'HostTimeDriftInSec' to 'HostTimeDriftInMS'.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-09 15:31:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
istein: needinfo+
sherold: Triaged+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 406563 None None None Never
oVirt gerrit 37075 master ABANDONED RFE: Report downtime for each live migration Never
oVirt gerrit 38057 master ABANDONED RFE: Report downtime for each live migration Never
oVirt gerrit 38405 master ABANDONED Set default sync time between hosts to 100ms Never
oVirt gerrit 40100 master MERGED RFE: Report downtime for each live migration Never
oVirt gerrit 40103 master ABANDONED RFE: Report downtime for each live migration Never
oVirt gerrit 41415 master MERGED Report downtime for each live migration Never

  None (edit)
Comment 1 Michal Skrivanek 2013-06-12 01:23:19 EDT
libvirt provides "expected downtime" as part of job statistics, would that be enough? It's not the exact number though.
Comment 4 Julio Entrena Perez 2013-06-12 05:34:24 EDT
(In reply to Michal Skrivanek from comment #1)
> libvirt provides "expected downtime" as part of job statistics, would that
> be enough? It's not the exact number though.

No, this request is for webadmin portal to report in the Events section the incurred downtime by each live migration.
Comment 5 Michal Skrivanek 2013-07-03 06:40:40 EDT
(In reply to Julio Entrena Perez from comment #4)
> (In reply to Michal Skrivanek from comment #1)
> > libvirt provides "expected downtime" as part of job statistics, would that
> > be enough? It's not the exact number though.
> 
> No, this request is for webadmin portal to report in the Events section the
> incurred downtime by each live migration.
the need to see that in portal is understood.
Correlating timestamps from src and dst hosts would be difficult. We're polling for task status periodically so we can use the last one as a really close estimate. In most cases this should correspond to the real downtime

Other possibility is to report it afterwards.
If we do it in RHEV-M it still may be misleading if src and dst host time differs. IMHO libvirt/qemu should provide such value if it needs to be really exact
Comment 6 Julio Entrena Perez 2013-07-03 06:51:38 EDT
(In reply to Michal Skrivanek from comment #5)

> Other possibility is to report it afterwards.

That's indeed what the customer expects: downtime reported after live migration completion.

Currently RHEV-M webadmin portal reports the following in the Events section after a successful live migration:

Migration complete (VM: vm_name, Source Host: host_name)

They expect to see:

Migration complete (VM: vm_name, Source Host: host_name, Downtime xxx ms)
Comment 7 Shahar Havivi 2013-07-03 09:16:00 EDT
posted at: http://gerrit.ovirt.org/#/c/16399
Comment 8 Julio Entrena Perez 2013-07-12 08:36:58 EDT
(In reply to Shahar Havivi from comment #7)
> posted at: http://gerrit.ovirt.org/#/c/16399

Is this measuring the time elapsed between a VM is suspended in source host and the VM is resumed in destination host?

Proposed patch seems to be measuring the duration of the entire live migration.

This request is to report the *downtime* experienced by the VM during the live migration, that is the amount of time the VM is not running in any of the hosts, or in other words, the amount of time between the "Suspended" event in source host and the "Resumed" event in destination host.
Comment 9 Shahar Havivi 2013-07-14 03:35:17 EDT
(In reply to Julio Entrena Perez from comment #8)
You are right,
There will be different patch for this bug.

This patch may be posted because it give the user additional info for the time that the migration took time.
Comment 10 Julio Entrena Perez 2013-07-15 04:47:14 EDT
(In reply to Shahar Havivi from comment #9)
> (In reply to Julio Entrena Perez from comment #8)
> You are right,
> There will be different patch for this bug.
Thanks for clarifying this.
> 
> This patch may be posted because it give the user additional info for the
> time that the migration took time.
Thanks Shahar, customer would welcome RHEV-M reporting the duration of the entire live migration too in addition to the incurred downtime during it.
Comment 12 Arthur Berezin 2014-01-30 11:48:27 EST
Scott, is this scoped for 3.5 ?
Comment 16 Michal Skrivanek 2015-03-04 06:03:48 EST
one more thing - we should ensure hosts time are in sync. Currently we alert when the drift is 300s, that's too much, we need something like 100ms...
Comment 17 Michal Skrivanek 2015-03-05 09:59:54 EST
setting Release note flag since we must mention the change of time drift tolerance from 5 mins to 100ms
Comment 18 Omer Frenkel 2015-03-08 07:58:19 EDT
(In reply to Michal Skrivanek from comment #17)
> setting Release note flag since we must mention the change of time drift
> tolerance from 5 mins to 100ms

maybe also worth noting that accordingly, the name of the configuration option changed from
HostTimeDriftInSec
to
HostTimeDriftInMS

(when using engine-config)
Comment 19 Michal Skrivanek 2015-04-21 03:21:22 EDT
see enhancement in libvirt reporting (bug 1213434), should provide more accurate numbers
Comment 21 Max Kovgan 2015-06-28 10:12:29 EDT
ovirt-3.6.0-3 release
Comment 22 Israel Pinto 2015-12-10 08:18:34 EST
Verify with:
Setup:
RHEVM Version: 3.6.1.2-0.1.el6 
vdsm:vdsm-4.17.13-1.el7ev
libvirt:libvirt-1.2.17-13.el7_2.2

Test cases according to Polarion test case.

restuls: PASS
Comment 24 errata-xmlrpc 2016-03-09 15:31:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html

Note You need to log in before you can comment on or make changes to this bug.