Bug 1154397 - Live migration fails due to timeout exceeded, no reason display in aduit log.
Summary: Live migration fails due to timeout exceeded, no reason display in aduit log.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Marek Libra
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks: 1286708
TreeView+ depends on / blocked
 
Reported: 2014-10-19 12:52 UTC by Israel Pinto
Modified: 2022-06-30 12:56 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-18 14:47:21 UTC
oVirt Team: Virt
Embargoed:
sbonazzo: ovirt-4.1-


Attachments (Terms of Use)
engine logs (1.12 MB, application/zip)
2014-10-19 12:52 UTC, Israel Pinto
no flags Details
Host Logs (1.18 MB, application/zip)
2014-10-19 12:52 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46830 0 None None None 2022-06-30 12:56:04 UTC
oVirt gerrit 42801 0 master ABANDONED migration: on abort, override reason on dest VM Never
oVirt gerrit 49525 0 master ABANDONED migration: track and report abort reason 2016-02-16 10:38:52 UTC

Description Israel Pinto 2014-10-19 12:52:17 UTC
Created attachment 948263 [details]
engine logs

check with vesion: 3.5.0-0.12.beta.el6ev

How reproducible:
1. RHEL 6.5 VM with Defined Memory: 2G
2. 2 Hosts
3. On migrate VM runnnig linux stress with command:
    stress --vm 1 --vm-bytes 512M --vm-hang 2 --timeout 3600s &
4. Migrate VM via UI   
5. The migrate failed after ~2 min, in the event tab the message was:
   2014-Oct-19, 14:40	
   Migration failed due to Error: Migration not in progress (VM: test-02,    Source: 10.35.4.161, Destination: 10.35.4.137). 
   
Actual results:
No explanation why the migrate failed.

Expected results:
Explanation why the migrate failed: Timeout.

Additional info:
* From the vdsm log:
Thread-75::WARNING::2014-10-19 14:40:35,602::migration::435::vm.Vm::(monitor_migration) vmId=`6b3cd572-a7ce-4775-b405-4eb53e7a0968`::The migration took 130 seconds which is exceeding the configured maximum time for migrations of 128 seconds. The migration will be aborted.
The migrate failed since timeout.

*See attached logs.

Comment 1 Israel Pinto 2014-10-19 12:52:58 UTC
Created attachment 948264 [details]
Host Logs

Comment 2 Michal Skrivanek 2015-06-02 10:26:54 UTC
migration reporting improvement

Comment 3 Francesco Romani 2015-06-24 11:11:46 UTC
VDSM patch posted. Not sure this is better handled on VDSM side or on Engine side. Let's discuss on gerrit.

Comment 4 Francesco Romani 2015-06-24 11:26:15 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1134974 is related, probably same (or very similar) root cause

Comment 5 Omer Frenkel 2015-09-08 13:13:48 UTC
fixing in vdsm might be more complicated than fixing in engine,
suggestion is that on migration failure, engine will query both src and dst hosts for the failure reason 
(instead of the current way of querying only the src)
and provide better info to the user.

this will not make it on time for 3.6.0
suggesting 3.6.z because this can improve user experience, but anyway the failure reason is available manually for the user on the hosts.

Comment 10 Red Hat Bugzilla Rules Engine 2015-11-30 19:17:12 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 11 Sandro Bonazzola 2016-05-02 10:02:32 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 12 Yaniv Lavi 2016-05-23 13:17:46 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 13 Yaniv Lavi 2016-05-23 13:21:46 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 14 Tomas Jelinek 2016-06-01 11:48:32 UTC
will not fit into 4.0 - pushing to 4.1

Comment 15 Tomas Jelinek 2017-01-18 14:47:21 UTC
There is no simple/good solution to this. The previous patches have been abandoned.
Considering the changes in migration in 4.0 and the adding of the postcopy in 4.1 I think this is not relevant anymore, so closing.

If you will face this and is important to you, please feel free to reopen.


Note You need to log in before you can comment on or make changes to this bug.