Bug 1097341 - The start time for 'migration_max_time_per_gib_mem' appears to be calculated too early.
Summary: The start time for 'migration_max_time_per_gib_mem' appears to be calculated ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
: 3.3.3
Assignee: Vinzenz Feenstra [evilissimo]
QA Contact: meital avital
URL:
Whiteboard: virt
Depends On: 1090109 1097332
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-13 15:14 UTC by Chris Pelland
Modified: 2019-04-28 09:25 UTC (History)
16 users (show)

Fixed In Version: vdsm-4.13.2-0.16.el6ev
Doc Type: Bug Fix
Doc Text:
* Previously, migration start time was captured at the start of the MigrationSourceThread process. This meant that the migration would fail if the virtual machine had to wait a long time to acquire the migration semaphore. Now, the migration start time is captured when migration begins.
Clone Of: 1097332
Environment:
Last Closed: 2014-05-27 08:57:42 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm.log (753.86 KB, application/x-gzip)
2014-05-20 09:45 UTC, Eldad Marciano
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 869483 0 None None None Never
Red Hat Knowledge Base (Solution) 873473 0 None None None Never
Red Hat Product Errata RHBA-2014:0548 0 normal SHIPPED_LIVE vdsm 3.3.3 bug fix update 2014-05-27 12:56:53 UTC
oVirt gerrit 27135 0 None None None Never
oVirt gerrit 27637 0 None MERGED virt: Capture migration start time after the semaphore was accquired Never

Comment 5 Eldad Marciano 2014-05-20 09:40:16 UTC
100 % reproduced the bug.

As described above,

Without the fix, 'migration_max_time_per_gib_mem'(timeout) compute for all vm's that vdsm should migrate.

With the fix vdsm compute the timeout per vm that should be migrate.

In order to make vdsm calculate the timeout as failed,
we compute how much time takes to copy 1 single vm with 1gb ram (idle) = ~9 sec
changing the 'migration_max_time_per_gib_mem' to 15 (9+buffer) in order to make migration failure.


Using the same use case:
-migrating 6 vms 
-without the fix 3 of them fail to migrate
-with the fix all of them migrate.

see the logs on failure (attached whole log):
Thread-131::DEBUG::2014-05-20 09:01:37,459::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`69da39b5-8633-4d2d-b469-54550bf67fef`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com

Thread-133::DEBUG::2014-05-20 09:01:37,501::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`24bc780a-1dfc-4d8a-94ca-0bc65fa9b76b`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com

Thread-136::DEBUG::2014-05-20 09:01:38,111::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`a722fcca-8224-450c-9261-70ae94b5711d`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com

Thread-142::DEBUG::2014-05-20 09:01:53,905::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`51a42fb1-569f-4fa5-b306-91e24bcfedf9`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com

Thread-150::DEBUG::2014-05-20 09:01:57,344::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`ffab1eb0-fdb8-46d3-a39e-73c8a76adf58`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com

Thread-146::DEBUG::2014-05-20 09:01:57,352::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`fbca5fc9-c163-419d-8992-62cdc3d61fe6`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com



here can we saw the timeout (in total - per all of the vms) expired.
Thread-156::WARNING::2014-05-20 09:02:03,909::vm::805::vm.Vm::(run) vmId=`51a42fb1-569f-4fa5-b306-91e24bcfedf9`::The migration took 26 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted.

Thread-158::WARNING::2014-05-20 09:02:07,362::vm::805::vm.Vm::(run) vmId=`ffab1eb0-fdb8-46d3-a39e-73c8a76adf58`::The migration took 27 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted.

Thread-160::WARNING::2014-05-20 09:02:07,369::vm::805::vm.Vm::(run) vmId=`fbca5fc9-c163-419d-8992-62cdc3d61fe6`::The migration took 28 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted.

Comment 6 Eldad Marciano 2014-05-20 09:45:00 UTC
Created attachment 897521 [details]
vdsm.log

Comment 7 Eldad Marciano 2014-05-20 14:47:26 UTC
-build is 36.4 installed
-reducing the time out in order to reproduced the problem for 1gb ram.
-bug fixed.
-migration time not not expired, multiple migration pass.
-code fix in vm.py located.

Comment 9 errata-xmlrpc 2014-05-27 08:57:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0548.html


Note You need to log in before you can comment on or make changes to this bug.