100 % reproduced the bug. As described above, Without the fix, 'migration_max_time_per_gib_mem'(timeout) compute for all vm's that vdsm should migrate. With the fix vdsm compute the timeout per vm that should be migrate. In order to make vdsm calculate the timeout as failed, we compute how much time takes to copy 1 single vm with 1gb ram (idle) = ~9 sec changing the 'migration_max_time_per_gib_mem' to 15 (9+buffer) in order to make migration failure. Using the same use case: -migrating 6 vms -without the fix 3 of them fail to migrate -with the fix all of them migrate. see the logs on failure (attached whole log): Thread-131::DEBUG::2014-05-20 09:01:37,459::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`69da39b5-8633-4d2d-b469-54550bf67fef`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com Thread-133::DEBUG::2014-05-20 09:01:37,501::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`24bc780a-1dfc-4d8a-94ca-0bc65fa9b76b`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com Thread-136::DEBUG::2014-05-20 09:01:38,111::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`a722fcca-8224-450c-9261-70ae94b5711d`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com Thread-142::DEBUG::2014-05-20 09:01:53,905::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`51a42fb1-569f-4fa5-b306-91e24bcfedf9`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com Thread-150::DEBUG::2014-05-20 09:01:57,344::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`ffab1eb0-fdb8-46d3-a39e-73c8a76adf58`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com Thread-146::DEBUG::2014-05-20 09:01:57,352::vm::377::vm.Vm::(_startUnderlyingMigration) vmId=`fbca5fc9-c163-419d-8992-62cdc3d61fe6`::starting migration to qemu+tls://host27-rack06.scale.openstack.engineering.redhat.com/system with miguri tcp://host27-rack06.scale.openstack.engineering.redhat.com here can we saw the timeout (in total - per all of the vms) expired. Thread-156::WARNING::2014-05-20 09:02:03,909::vm::805::vm.Vm::(run) vmId=`51a42fb1-569f-4fa5-b306-91e24bcfedf9`::The migration took 26 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted. Thread-158::WARNING::2014-05-20 09:02:07,362::vm::805::vm.Vm::(run) vmId=`ffab1eb0-fdb8-46d3-a39e-73c8a76adf58`::The migration took 27 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted. Thread-160::WARNING::2014-05-20 09:02:07,369::vm::805::vm.Vm::(run) vmId=`fbca5fc9-c163-419d-8992-62cdc3d61fe6`::The migration took 28 seconds which is exceeding the configured maximum time for migrations of 15 seconds. The migration will be aborted.
Created attachment 897521 [details] vdsm.log
-build is 36.4 installed -reducing the time out in order to reproduced the problem for 1gb ram. -bug fixed. -migration time not not expired, multiple migration pass. -code fix in vm.py located.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0548.html