The confusion is caused by variable naming. Actually, migration_timeout is counted not from the migration start, but from the moment the migration is stalled, so here it worked as designed. But the issue raises not only the question of variable naming — that's easy and will be fixed. More serious is the behaviour of the destination host, which is totally wrong. That's is being investigated.
(In reply to Saveliev Peter from comment #2) > The confusion is caused by variable naming. According to /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample : # Maximum time the destination waits for migration to end. Source # waits twice as long (to avoid races). # migration_timeout = 300 > > Actually, migration_timeout is counted not from the migration start, but > from the moment the migration is stalled, so here it worked as designed. If that's the case we still need to rephrase the above comment (and explain behaviour around migration_timeout properly somewhere).
Yes, surely. It will be done as well.
also need to address/verify engine error on timeout as it seems the migration fails with Migration failed due to Error: Internal Engine Error (VM: dev31bc4a, Source Host: devrhev06)."
(In reply to Michal Skrivanek from comment #5) > also need to address/verify engine error on timeout as it seems the > migration fails with Migration failed due to Error: Internal Engine Error > (VM: dev31bc4a, Source Host: devrhev06)." Ok.
*** Bug 965172 has been marked as a duplicate of this bug. ***
The internal error happened due to a 'ClassCastException' in the vdsbroker: 2013-05-17 12:34:00,569 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) START, MigrateStatusVDSCommand(HostName = i-mpapp3, HostId = 1a62f776-695e-11e2-a97a-fb8bf5530f36, vmId=d6446340-b00a-4068-8778-2227f89776fd), log id: 3b3e8edd 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (pool-3-thread-49) Failed in MigrateStatusVDS method, for vds: i-mpapp3; host: 10.204.125.31 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command MigrateStatusVDS execution failed. Exception: ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer 2013-05-17 12:34:00,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) FINISH, MigrateStatusVDSCommand, log id: 3b3e8edd 2013-05-17 12:34:00,781 INFO [org.ovirt.engine.core.bll.VdsSelector] (pool-3-thread-49) VDS i-mpapp1 419a3eb6-4452-11e2-ab96-575e82ebec1e is not in up status or belongs to the VM's cluster VDS i-mpapp4 2bb65ff4-5bd0-11e2-8088-8f3b14835353 have failed running this VM in the current selection cycle VDS jtest02 1948e33c-490b-11e2-8443-1b53e1383a1a is not in up status or belongs to the VM's cluster VDS i-mpweb2 33ff1c5e-7a9e-11e2-ab5e-170d2d7c2bd6 is not in up status or belongs to the VM's cluster VDS jtest01 c5ea366a-43a0-11e2-b207-ff9e163144da is not in up status or belongs to the VM's cluster VDS i-mpapp2 3550eabc-5b43-11e2-af4e-5b3ed4fe7828 is not in up status or belongs to the VM's cluster VDS i-mpweb1 92af67dc-4938-11e2-baf4-eb85f55b5ed5 is not in up status or belongs to the VM's cluster 2013-05-17 12:34:00,781 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-3-thread-49) CanDoAction of action MigrateVm failed. Reasons:ACTION_TYPE_FAILED_VDS_VM_CLUSTER,VAR__ACTION__MIGRATE,VAR__TYPE__VM This most likely is due to receiving a different value (probably an error message) from VDSM than it was expected.
bug 1015887 is supposedly fixing comment #10
moving to 3.3.2 since 3.3.1 was built and moved to QE. please make sure to backport into z-stream.
FailedQA Changing migration_max_time_per_gib_mem to smaller value (5) makes migration times out Appropriate message should be displayed about this in the event log. Instead we get two errors: 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress (VM: a, Source: host1, Destination: host2). 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: a, Source: host1, Destination: host2). "Message like migration timed out after %d seconds." should be displayed instead.
error message tracked as bug 1071260. moving back to ON_QA as the functionality is not affected
functionality working moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html