1539777 – Improve Migration summary message

Bug 1539777 - Improve Migration summary message

Summary: Improve Migration summary message

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.1.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.2.3
Target Release:	---
Assignee:	Shmuel Melamud
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-29 15:22 UTC by Steffen Froemer
Modified:	2021-06-10 14:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, the migration summary message showed the same value for 'total migration time' and 'actual migration time'. This value was calculated as the period of time from the start of execution of the migration command until the end of the entire migration process. In the current release, 'actual migration time' is calculated from the first migration progress event to the end of the entire migration process. If the migration command is run several times, 'actual migration time' reflects only the last run, while the 'total migration time' reflects the total time for all runs.
Clone Of:
Environment:
Last Closed:	2018-05-15 17:47:24 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:1488	None	None	None	2018-05-15 17:48:17 UTC
oVirt gerrit	89356	master	MERGED	core: Fix MigrateVmParameters.getTotalMigrationTime()	2020-05-06 01:46:15 UTC
oVirt gerrit	89497	master	MERGED	core: Migration duration from the first progress event	2020-05-06 01:46:15 UTC
oVirt gerrit	90025	ovirt-engine-4.2	MERGED	core: Fix MigrateVmParameters.getTotalMigrationTime()	2020-05-06 01:46:15 UTC
oVirt gerrit	90026	ovirt-engine-4.2	MERGED	core: Migration duration from the first progress event	2020-05-06 01:46:15 UTC

Description Steffen Froemer 2018-01-29 15:22:27 UTC

1. Proposed title of this feature request
Improve the information of migration summary message in RHV UI

3. What is the nature and description of the request?
When starting a migration of multiple virtual machines, the total time of the migration is nearly same like migration duration. This could be higher, if there were some retries.
But when a migration need to wait until it can start, because it has to wait for any other migrations, this wait-time is not added to the total-time.

4. Why does the customer need this? (List the business requirements here)
The current information could lead to misunderstanding the whole migration process.

5. How would the customer like to achieve this? (List the functional requirements here)
The "total time of migration" should be the time between starting the process of migration (clicking the migrate button) and the migration of the VM itself.
If I select multiple VMs (e.g. 10) and the migration only allow 2-migrations in parallel, the start of whole migration process is much earlier than the finish of last migration.

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
Select 10 VMs to migrate. The total time of last VM should be 5-times higher than the first migrated VM, as it would need to wait for finish of all other VMs.
And it's only allowed to perform 2 migrations in parallel.

7. Is there already an existing RFE upstream or in Red Hat Bugzilla?
no

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?
asap

9. Is the sales team involved in this request and do they have any additional input?
no

10. List any affected packages or components.
ovirt-engine

11. Would the customer be able to assist in testing this functionality if implemented?
yes

Comment 7 Michal Skrivanek 2018-02-26 09:50:03 UTC

sure, it's just that it was already supposed to be like that. Arik, do we now do any throttling on engine side? We should initiate all the migrations at the same time and queue in vdsm

Comment 8 Arik 2018-02-26 11:34:37 UTC

(In reply to Michal Skrivanek from comment #7)
> sure, it's just that it was already supposed to be like that. Arik, do we
> now do any throttling on engine side? We should initiate all the migrations
> at the same time and queue in vdsm

Well, there are several operations that may take time which will not be included in the current measurements:
1. the time it took to validate the input at the client side (should be negligible)
2. the time it took to process the request (that depends on the number of threads that are available for processing requests).
3. if there are multiple VMs being selected together, we execute them as multiple-actions, which may cause their validation to run in parallel - initiating the threads and waiting for results may introduce some overhead.
4. the time it took to validate the migration request (e.g., that there is an available host).
5. the time it took to schedule the destination host.

From that time we start our 'timer'. the timer would stop when an event that the migration completed is produced. Another phase that may take some time - the infrastructure used to hold 2 connections with VDSM so theoretically if we're about to send 3 migration requests at the same time, the third will be sent to VDSM only after we get response from VDSM about one of the first two requests. I'm not sure if it has changed though.

Unfortunately, we cannot start the timer on the client side since the client side may not be in sync with the engine. I see the motivation in having such measurement but really, the time that the steps above are expected to take is negligible compared to what we measure today.

Comment 9 Michal Skrivanek 2018-02-26 11:58:38 UTC

(In reply to Arik from comment #8)
> Another phase that may take some time -
> the infrastructure used to hold 2 connections with VDSM so theoretically if
> we're about to send 3 migration requests at the same time, the third will be
> sent to VDSM only after we get response from VDSM about one of the first two
> requests. I'm not sure if it has changed though.

well, it's queued on the semaphore inside a separate thread, so all 3 requests should be processed immediately. But looking at the bug description it looks more like requests queued up in engine already and they are not sent to vdsm.

Comment 10 Steffen Froemer 2018-02-26 15:00:58 UTC

Well, most of the data is already included. Let me try to describe it with facts and an example.

I started a migration with 3 virtual machine at the same time. 

 2018-02-26 09:47:19.082-05 |       62 | Migration started (VM: vm1, Source: hostC, Destination: hostA, User: someone).
 2018-02-26 09:47:19.334-05 |       62 | Migration started (VM: vm2, Source: hostC, Destination: hostA, User: someone).
 2018-02-26 09:47:19.581-05 |       62 | Migration started (VM: vm3, Source: hostC, Destination: hostB, User: someone).
 2018-02-26 09:47:41.499-05 |       63 | Migration completed (VM: vm1, Source: hostC, Destination: hostA, Duration: 22 seconds, Total: 22 seconds, Actual downtime: 228ms)
 2018-02-26 09:47:42.637-05 |       63 | Migration completed (VM: vm2, Source: hostC, Destination: hostA, Duration: 23 seconds, Total: 23 seconds, Actual downtime: 284ms)
 2018-02-26 09:48:07.171-05 |       63 | Migration completed (VM: vm3, Source: hostC, Destination: hostB, Duration: 47 seconds, Total: 47 seconds, Actual downtime: (N/A))

As you can see, the migration started for all three systems at the same time
  => 2018-02-26 09:47:19 (ignoring milliseconds)

But in fact, always "Duration" and "Total" is same.
While Total does mention the correct value, the "Duration" for the vm3 should be lower, maybe similar to something around ~20 seconds, as duration should only measure the real migration progress.

Does this make sense to you?

Comment 11 Michal Skrivanek 2018-02-26 15:36:00 UTC

yes, thanks a lot. That matches the implementation, it seems the total is just lost somewhere in the process later on

Comment 28 Israel Pinto 2018-04-25 06:40:36 UTC

Verify with:
Engine version:4.2.3.2-0.1.el7

Steps:
1. Migrate one VM and check that duration is reported in the end of the migration
2. Migrate 5 VMs and check that for each VM we have report of the duration 
3. Migrate VM with each policy and see that there is no effect on the report

PASS

Comment 32 errata-xmlrpc 2018-05-15 17:47:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 33 Franta Kust 2019-05-16 13:05:01 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.