Bug 1134974

Summary: "Domain not found: no domain with matching uuid" error logged to audit_log after live migration fails due to timeout exceeded
Product: [oVirt] ovirt-engine Reporter: Arik <ahadas>
Component: GeneralAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: ahadas, bazulay, bugs, fromani, gklein, ipinto, jentrena, lpeer, mgoldboi, michal.skrivanek, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ovirt-4.1.0-betaKeywords: Reopened
Target Release: 4.1.0.2Flags: rule-engine: ovirt-4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
gklein: testing_ack+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1104195 Environment:
Last Closed: 2017-03-16 14:47:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1104195    
Bug Blocks:    

Comment 1 Arik 2014-08-28 14:31:10 UTC
As part of migration monitoring, we should show error messages that occur on the destination host before the handoff phase. As part of the fix for bz 1104195, we removed such audit logs because they were confusing and were not shown properly.

We probably need vdsm on the source host to fetch the error message, if exists, on the destination host so we will be able to query that from the engine using the migrate-status command. Then, it could be nice to present summary of the migration:
tried from host A to B, exit status (from the dest host): <..>
...
succeeded to migrate to host Z
And to show the duration of the whole process as well.

Comment 2 Michal Skrivanek 2015-06-02 09:19:30 UTC
Francesco, getting the dst side logs is probably a good idea

Comment 3 Francesco Romani 2015-06-24 11:23:53 UTC
This seems specular to what we have in https://bugzilla.redhat.com/show_bug.cgi?id=1154397

I think we can merge the two issues since it seems there are actually one: we should centralize the information about migration failures and make easy for Engine to understand what's happening/happened without losing information.

On https://bugzilla.redhat.com/show_bug.cgi?id=1154397 I added a patch to propagate abort reason to destination.
After a quick read of this BZ, it seems that we actually need also the other way around.

Comment 4 Red Hat Bugzilla Rules Engine 2015-10-19 10:53:26 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Yaniv Kaul 2016-03-10 13:44:33 UTC
Any update to this bug? Is it being worked on for 4.0?

Comment 6 Francesco Romani 2016-03-11 07:33:52 UTC
We require changes on both Engine and Vdsm to properly fix this. We are discussing the issue internally on the virt team but no final patch(es) yet.

Comment 7 Moran Goldboim 2016-03-27 08:47:28 UTC
postponing to 4.1 due to capacity.

Comment 8 Francesco Romani 2016-12-05 18:45:23 UTC
I think this was solved by monitoring and migration changes introduced in 4.1.0 (postcopy et. al.) Please reopen if it still happens.

Comment 9 Francesco Romani 2016-12-06 07:56:21 UTC
Worth further consideration (with no prio increase) after

http://lists.ovirt.org/pipermail/users/2016-December/044477.html

Comment 10 Francesco Romani 2017-01-16 09:31:30 UTC
doc_text not required; this error was just misleading, and now is gone.
Patches merged in the ovirt-4.1 branch -> MODIFIED

Comment 12 Francesco Romani 2017-01-17 08:57:12 UTC
(In reply to Francesco Romani from comment #10)
> doc_text not required; this error was just misleading, and now is gone.
> Patches merged in the ovirt-4.1 branch -> MODIFIED

The fix was done on the Vdsm side.

Comment 13 Israel Pinto 2017-03-07 08:25:32 UTC
Verify with:
Engine: 4.1.1.3-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 550.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.3.1
LIBVIRT Version:libvirt-2.0.0-10.el7_3.5
VDSM Version:vdsm-4.19.7-1.el7ev
SPICE Version:0.12.4 - 20.el7_3

Steps:
Migrate vm with load and make the migration failed, 
check the the error massage: 'Domain not found: no domain with matching uuid'
don't exists.