Bug 970645 - migration_timeout not honoured, live migration goes on beyond it
migration_timeout not honoured, live migration goes on beyond it
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.1.4
All Linux
high Severity medium
: ---
: 3.4.0
Assigned To: Vinzenz Feenstra [evilissimo]
Lukas Svaty
virt
: Triaged, ZStream
: 965172 (view as bug list)
Depends On: 1015887
Blocks: 1069220 1069731 rhev3.4beta 1142926
  Show dependency treegraph
 
Reported: 2013-06-04 09:46 EDT by Julio Entrena Perez
Modified: 2014-09-18 08:24 EDT (History)
19 users (show)

See Also:
Fixed In Version: ovirt-3.4.0-beta2
Doc Type: Bug Fix
Doc Text:
Live migration operations now respect the 300 second limit, and live migration operations continue for only 300 seconds.
Story Points: ---
Clone Of:
: 1069220 (view as bug list)
Environment:
Last Closed: 2014-06-09 09:24:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 390493 None None None Never
oVirt gerrit 16382 None None None Never
oVirt gerrit 21708 None None None Never
Red Hat Product Errata RHBA-2014:0504 normal SHIPPED_LIVE vdsm 3.4.0 bug fix and enhancement update 2014-06-09 13:21:35 EDT

  None (edit)
Comment 2 Saveliev Peter 2013-06-05 09:14:26 EDT
The confusion is caused by variable naming.

Actually, migration_timeout is counted not from the migration start, but from the moment the migration is stalled, so here it worked as designed.

But the issue raises not only the question of variable naming — that's easy and will be fixed. More serious is the behaviour of the destination host, which is totally wrong. That's is being investigated.
Comment 3 Julio Entrena Perez 2013-06-05 09:39:47 EDT
(In reply to Saveliev Peter from comment #2)
> The confusion is caused by variable naming.

According to /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample :

# Maximum time the destination waits for migration to end. Source
# waits twice as long (to avoid races).
# migration_timeout = 300

> 
> Actually, migration_timeout is counted not from the migration start, but
> from the moment the migration is stalled, so here it worked as designed.

If that's the case we still need to rephrase the above comment (and explain behaviour around migration_timeout properly somewhere).
Comment 4 Saveliev Peter 2013-06-05 12:28:19 EDT
Yes, surely. It will be done as well.
Comment 5 Michal Skrivanek 2013-07-03 00:03:39 EDT
also need to address/verify engine error on timeout as it seems the migration fails with Migration failed due to Error: Internal Engine Error (VM: dev31bc4a, Source Host: devrhev06)."
Comment 6 Saveliev Peter 2013-07-09 10:32:38 EDT
(In reply to Michal Skrivanek from comment #5)
> also need to address/verify engine error on timeout as it seems the
> migration fails with Migration failed due to Error: Internal Engine Error
> (VM: dev31bc4a, Source Host: devrhev06)."

Ok.
Comment 7 Martin Kletzander 2013-08-15 10:05:18 EDT
*** Bug 965172 has been marked as a duplicate of this bug. ***
Comment 10 Vinzenz Feenstra [evilissimo] 2013-11-06 04:07:49 EST
The internal error happened due to a 'ClassCastException' in the vdsbroker:

2013-05-17 12:34:00,569 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) START, MigrateStatusVDSCommand(HostName = i-mpapp3, HostId = 1a62f776-695e-11e2-a97a-fb8bf5530f36, vmId=d6446340-b00a-4068-8778-2227f89776fd), log id: 3b3e8edd
2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (pool-3-thread-49) Failed in MigrateStatusVDS method, for vds: i-mpapp3; host: 10.204.125.31
2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command MigrateStatusVDS execution failed. Exception: ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer
2013-05-17 12:34:00,607 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) FINISH, MigrateStatusVDSCommand, log id: 3b3e8edd
2013-05-17 12:34:00,781 INFO  [org.ovirt.engine.core.bll.VdsSelector] (pool-3-thread-49)  VDS i-mpapp1 419a3eb6-4452-11e2-ab96-575e82ebec1e is not in up status or belongs to the VM's cluster VDS i-mpapp4 2bb65ff4-5bd0-11e2-8088-8f3b14835353 have failed running this VM in the current selection cycle VDS jtest02 1948e33c-490b-11e2-8443-1b53e1383a1a is not in up status or belongs to the VM's cluster VDS i-mpweb2 33ff1c5e-7a9e-11e2-ab5e-170d2d7c2bd6 is not in up status or belongs to the VM's cluster VDS jtest01 c5ea366a-43a0-11e2-b207-ff9e163144da is not in up status or belongs to the VM's cluster VDS i-mpapp2 3550eabc-5b43-11e2-af4e-5b3ed4fe7828 is not in up status or belongs to the VM's cluster VDS i-mpweb1 92af67dc-4938-11e2-baf4-eb85f55b5ed5 is not in up status or belongs to the VM's cluster
2013-05-17 12:34:00,781 WARN  [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-3-thread-49) CanDoAction of action MigrateVm failed. Reasons:ACTION_TYPE_FAILED_VDS_VM_CLUSTER,VAR__ACTION__MIGRATE,VAR__TYPE__VM

This most likely is due to receiving a different value (probably an error message) from VDSM than it was expected.
Comment 11 Michal Skrivanek 2013-11-19 04:13:28 EST
bug 1015887 is supposedly fixing comment #10
Comment 15 Eyal Edri 2014-02-10 05:31:34 EST
moving to 3.3.2 since 3.3.1 was built and moved to QE.
please make sure to backport into z-stream.
Comment 19 Lukas Svaty 2014-02-27 10:29:54 EST
FailedQA

Changing migration_max_time_per_gib_mem to smaller value (5) makes migration times out

Appropriate message should be displayed about this in the event log. Instead we get two errors:

2014-Feb-27, 16:22
Migration failed due to Error: Migration not in progress (VM: a, Source: host1, Destination: host2).
		
2014-Feb-27, 16:22
Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: a, Source: host1, Destination: host2).

"Message like migration timed out after %d seconds." should be displayed instead.
Comment 22 Michal Skrivanek 2014-02-28 06:45:56 EST
error message tracked as bug 1071260. moving back to ON_QA as the functionality is not affected
Comment 23 Lukas Svaty 2014-02-28 10:32:49 EST
functionality working moving to verified
Comment 24 errata-xmlrpc 2014-06-09 09:24:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html

Note You need to log in before you can comment on or make changes to this bug.