1534664 – rhevm reports VM "up" on destination Host even after VM migration failure

Bug 1534664 - rhevm reports VM "up" on destination Host even after VM migration failure

Summary: rhevm reports VM "up" on destination Host even after VM migration failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.1.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ovirt-4.2.2
Target Release:	---
Assignee:	Shmuel Melamud
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1541529
TreeView+	depends on / blocked

Reported:	2018-01-15 17:11 UTC by Koutuk Shukla
Modified:	2021-06-10 14:14 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-05-15 17:47:24 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:1488	0	None	None	None	2018-05-15 17:48:17 UTC

Description Koutuk Shukla 2018-01-15 17:11:35 UTC

Description of problem:

rhevm reports VM "up" on destination Host even after VM migration failure. 

Version-Release number of selected component (if applicable):

RHEV 4.1.5
vdsm-4.19.15-1.el7ev.x86_64

How reproducible:
N/A

Steps to Reproduce:
1.
2.
3.

Actual results:

Host A was moved to maintenance mode after facing issues with Storage connections. VM was in unknown state.

VM migration was triggered from host A to B which failed but rhevm reported VM to be "up" on Host B and "not responding" on Host A.

Expected results:

Upon VM migration failure VM should not be reported up on Destination host.

Additional info:

Comment 3 Yaniv Kaul 2018-01-15 17:49:25 UTC

We have fixed what sounds like a very similar issue in 4.1.6. See https://bugzilla.redhat.com/show_bug.cgi?id=1487913 .

Comment 5 Marina Kalinin 2018-01-30 18:53:03 UTC

Meital, can your team please check if this scenario reproducible or fixed in 4.1.6 as part of bz#1487913.
From the two bug descriptions it sounds like 2 different issues.

Thank you!

Comment 8 meital avital 2018-02-07 12:30:44 UTC

(In reply to Marina from comment #5)
> Meital, can your team please check if this scenario reproducible or fixed in
> 4.1.6 as part of bz#1487913.
> From the two bug descriptions it sounds like 2 different issues.
> 
> Thank you!

Yes, we can try.
Israel, can you please try to reproduce?

Comment 11 Israel Pinto 2018-02-08 07:07:12 UTC

I will reproduce with the following steps: (not HA VM):
1. Start VM on source host
2. Block storage connection to source host
3. Wait for VM to become unknown
4. Switch source host to maintenance
4. Check that VM is migration and up on destination host 
   and not exist on source host using "virsh -r list"

Comment 16 Michal Skrivanek 2018-02-10 18:35:06 UTC

where do you see postcopy migration?

Comment 24 Israel Pinto 2018-02-21 12:56:05 UTC

Please approve the steps at: https://bugzilla.redhat.com/show_bug.cgi?id=1534664#c11

Comment 25 Michal Skrivanek 2018-02-21 14:26:53 UTC

it's a bit more tricky. You'd need to reproduce exactly what happened in comment #17
start migration, during migration fencing (or maybe you can do that manually) restarts vdsm, recovery finishes(while that vm is still migrating) and engine migrates the VM again to a different host(that should happen automatically once the vdsm recovery finishes in the previous step). Now the second migration should fail after some time, and the first one concludes (either successfully or not, that should not matter)

Comment 26 Israel Pinto 2018-03-12 15:38:54 UTC

verify with engine version:4.2.2.2-0.1.el7
Host: 
OS Version: RHEL - 7.5 - 8.el7
Kernel Version:3.10.0 - 858.el7.x86_64
KVM Version:2.10.0 - 21.el7
LIBVIRT Version:libvirt-3.9.0-14.el7
VDSM Version:vdsm-4.20.19-1.el7ev

For Migration policy: post copy and Minimal downtime
run the following cases: 
Case 1:
 1. Run VM (run load on vm to slow migration)
 2. Migration VM (wait about 30 sec)
 3. Block connection to storage on source host (wait about 60 sec)
 4. Re connect connection to storage 
 5. Restart vdsm on both source and destination host

Case 2:
 1. Run VM (run load on vm to slow migration)
 2. Migration VM (wait about 30 sec)
 3. Block connection to storage on destination host (wait about 60 sec)
 4. Re connect connection to storage 
 5. Restart vdsm on both source and destination host

Results:
In post copy, VM was up all migration time, migration succeeded.
In Minimal downtime VM become  "not responding" after block connection to storage 
migration succeeded after connection back.

Did not see VM running on both host after migration done, migration did not failed.

Comment 27 RHV bug bot 2018-03-16 15:02:26 UTC

INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 33 errata-xmlrpc 2018-05-15 17:47:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 34 Franta Kust 2019-05-16 13:03:51 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.