Bug 1238114

Summary: VM is down although migration succeeded
Product: Red Hat Enterprise Virtualization Manager Reporter: Israel Pinto <ipinto>
Component: ovirt-engineAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.0CC: gklein, istein, lpeer, lsurette, michal.skrivanek, oourfali, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: AutomationBlocker
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 3.6.0-4 alpha3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-20 01:30:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Israel Pinto 2015-07-01 08:51:41 UTC
Created attachment 1044952 [details]
logs

Description of problem:
Migration one VM with two hosts in cluster,
The migration is succeeded but the VM is down, and after ~2 up on destination.


Version-Release number of selected component (if applicable):

oVirt Engine Version: 3.6.0-0.0.master.20150627185750.git6f063c1.el6
(3.6.0-03)

How reproducible:
2 of 5 times manually
in automation also.

Steps to Reproduce:
Migrate VM

Actual results:
VM is down although migration succeeded

Expected results:
VM up and migration succeeded


Additional info:
Event log:

Comment 2 Omer Frenkel 2015-07-01 15:03:39 UTC
the problem is with new events infrastructure for vm stats mechanism:
engine receive event that sent from one host on all the hosts,
so once the migration completed, and vm is UP on the destination,
engine receive this event on destination host, but also on the source host,
this makes the engine think the migration failed (because the event say the vm moved to up on source)
later the engine discovers the real status.

i was able to reproduce this easily on latest master and verify the above with extra logging of the events.

adding a link to a patch that were merged earlier today that should fix this.

Comment 3 Omer Frenkel 2015-07-05 13:02:55 UTC
Now that the bug in the events is fixed, 
trying to verify it also fix the migration scenario,
i am able to see there is also a bug with the code that reads the event and execute the monitoring code, which still cause the reported issue.

moving back to virt to handle the new issue.

Comment 4 Israel Pinto 2015-08-11 12:23:11 UTC
Verify with version:3.6.0-5
3.6.0-0.0.master.20150804111407.git122a3a0.el6
VDSM: vdsm-4.17.0-1239.git6575e3f.el7.noarch
Check with Automation and manually
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/3.6-GE-compute/144/