Bug 970645
Summary: | migration_timeout not honoured, live migration goes on beyond it | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Julio Entrena Perez <jentrena> | |
Component: | vdsm | Assignee: | Vinzenz Feenstra [evilissimo] <vfeenstr> | |
Status: | CLOSED ERRATA | QA Contact: | Lukas Svaty <lsvaty> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 3.1.4 | CC: | acathrow, bazulay, eedri, flo_bugzilla, iheim, jentrena, jkt, lbopf, lpeer, lsvaty, lyarwood, mavital, michal.skrivanek, pbandark, pstehlik, sbonazzo, sputhenp, vfeenstr, yeylon | |
Target Milestone: | --- | Keywords: | Triaged, ZStream | |
Target Release: | 3.4.0 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | virt | |||
Fixed In Version: | ovirt-3.4.0-beta2 | Doc Type: | Bug Fix | |
Doc Text: |
Live migration operations now respect the 300 second limit, and live migration operations continue for only 300 seconds.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1069220 (view as bug list) | Environment: | ||
Last Closed: | 2014-06-09 13:24:50 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1015887 | |||
Bug Blocks: | 1069220, 1069731, 1078909, 1142926 |
Comment 2
Saveliev Peter
2013-06-05 13:14:26 UTC
(In reply to Saveliev Peter from comment #2) > The confusion is caused by variable naming. According to /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample : # Maximum time the destination waits for migration to end. Source # waits twice as long (to avoid races). # migration_timeout = 300 > > Actually, migration_timeout is counted not from the migration start, but > from the moment the migration is stalled, so here it worked as designed. If that's the case we still need to rephrase the above comment (and explain behaviour around migration_timeout properly somewhere). Yes, surely. It will be done as well. also need to address/verify engine error on timeout as it seems the migration fails with Migration failed due to Error: Internal Engine Error (VM: dev31bc4a, Source Host: devrhev06)." (In reply to Michal Skrivanek from comment #5) > also need to address/verify engine error on timeout as it seems the > migration fails with Migration failed due to Error: Internal Engine Error > (VM: dev31bc4a, Source Host: devrhev06)." Ok. *** Bug 965172 has been marked as a duplicate of this bug. *** The internal error happened due to a 'ClassCastException' in the vdsbroker: 2013-05-17 12:34:00,569 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) START, MigrateStatusVDSCommand(HostName = i-mpapp3, HostId = 1a62f776-695e-11e2-a97a-fb8bf5530f36, vmId=d6446340-b00a-4068-8778-2227f89776fd), log id: 3b3e8edd 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (pool-3-thread-49) Failed in MigrateStatusVDS method, for vds: i-mpapp3; host: 10.204.125.31 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command MigrateStatusVDS execution failed. Exception: ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer 2013-05-17 12:34:00,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) FINISH, MigrateStatusVDSCommand, log id: 3b3e8edd 2013-05-17 12:34:00,781 INFO [org.ovirt.engine.core.bll.VdsSelector] (pool-3-thread-49) VDS i-mpapp1 419a3eb6-4452-11e2-ab96-575e82ebec1e is not in up status or belongs to the VM's cluster VDS i-mpapp4 2bb65ff4-5bd0-11e2-8088-8f3b14835353 have failed running this VM in the current selection cycle VDS jtest02 1948e33c-490b-11e2-8443-1b53e1383a1a is not in up status or belongs to the VM's cluster VDS i-mpweb2 33ff1c5e-7a9e-11e2-ab5e-170d2d7c2bd6 is not in up status or belongs to the VM's cluster VDS jtest01 c5ea366a-43a0-11e2-b207-ff9e163144da is not in up status or belongs to the VM's cluster VDS i-mpapp2 3550eabc-5b43-11e2-af4e-5b3ed4fe7828 is not in up status or belongs to the VM's cluster VDS i-mpweb1 92af67dc-4938-11e2-baf4-eb85f55b5ed5 is not in up status or belongs to the VM's cluster 2013-05-17 12:34:00,781 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-3-thread-49) CanDoAction of action MigrateVm failed. Reasons:ACTION_TYPE_FAILED_VDS_VM_CLUSTER,VAR__ACTION__MIGRATE,VAR__TYPE__VM This most likely is due to receiving a different value (probably an error message) from VDSM than it was expected. bug 1015887 is supposedly fixing comment #10 moving to 3.3.2 since 3.3.1 was built and moved to QE. please make sure to backport into z-stream. FailedQA Changing migration_max_time_per_gib_mem to smaller value (5) makes migration times out Appropriate message should be displayed about this in the event log. Instead we get two errors: 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress (VM: a, Source: host1, Destination: host2). 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: a, Source: host1, Destination: host2). "Message like migration timed out after %d seconds." should be displayed instead. error message tracked as bug 1071260. moving back to ON_QA as the functionality is not affected functionality working moving to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html |