Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1414626

Summary: Crash VM during migrating with error "Failed in MigrateBrokerVDS"
Product: [oVirt] vdsm Reporter: lifeman <creatmbox>
Component: CoreAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.18.22CC: bugs, creatmbox, fromani, gklein, mst, tjelinek
Target Milestone: ovirt-4.1.1Flags: fromani: needinfo-
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
tjelinek: devel_ack+
mavital: testing_ack+
Target Release: 4.19.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:35:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
vdsm-node01.log
none
vdsm-node02.log
none
messages
none
vm.log none

Description lifeman 2017-01-19 03:46:02 UTC
Created attachment 1242331 [details]
engine.log

Description of problem:

Crash VM during migration with error "Failed in MigrateBrokerVDS" at 10:24:54 PM

Comment 1 lifeman 2017-01-19 03:49:15 UTC
Created attachment 1242332 [details]
vdsm-node01.log

Comment 2 lifeman 2017-01-19 03:51:23 UTC
Created attachment 1242333 [details]
vdsm-node02.log

Comment 3 Tomas Jelinek 2017-01-19 07:38:16 UTC
could you please also attach libvirt and qemu logs?

Comment 4 lifeman 2017-01-19 07:50:56 UTC
Where path of the logs is located?(/var/log/libvirt/quemu/vm.log?)

Comment 5 Tomas Jelinek 2017-01-24 08:00:18 UTC
yeah, that and than /var/log/messages
If there will be nothing interesting we can enable debug logging of libvirt and look at there.

Comment 6 lifeman 2017-01-24 08:38:39 UTC
Created attachment 1243854 [details]
messages

Comment 7 lifeman 2017-01-24 08:39:05 UTC
Created attachment 1243855 [details]
vm.log

Comment 8 Tomas Jelinek 2017-01-24 15:27:58 UTC
hmm:
2017-01-19 03:23:22.986+0000: initiating migration
2017-01-19 03:24:38.939+0000: shutting down
2017-01-19T03:24:39.464773Z qemu-kvm: terminating on signal 15 from pid 2669

@Francesco: any idea?

Comment 9 Tomas Jelinek 2017-01-25 10:21:43 UTC
*** Bug 1413847 has been marked as a duplicate of this bug. ***

Comment 10 Francesco Romani 2017-01-25 10:46:10 UTC
Looks like there is a couple of bugs in Vdsm.
1. Vdsm fails to retrieve the progress from libvirt job stats. This is one issue per se, as we fail to update the downtime, and this could make migration not converging, or converging slower.
2. There is a race in migration progress reporting. This could cause the progression meter go backward, but it is much easier to trigger only if we hit bug #1. In this case, the race confused the migration source Vdsm, leading it to believe the migration was NOT completed - while it was. What happened
2.a. migration attempt #1 completed, despite lack of downtime adjustment
2.b. due to bug#1 and the race, the progress report was not correctly set to 100% after migration completed
2.c. the migration source handler, misdetected the migration completed (because the progress was not 100% once it ended) and started a new one, which failed
2.d. the Engine only saw the last failed migration - this error was bogus, and acted accordingly

We will fix both issues.

Comment 11 Francesco Romani 2017-02-06 15:59:22 UTC
bug actually on Vdsm, and fixed there. Engine reacted according to (false) information reported, so it's innocent.

Comment 12 Red Hat Bugzilla Rules Engine 2017-02-06 15:59:30 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 13 Francesco Romani 2017-02-06 16:01:46 UTC
no doc_text, this is just a plain bug caused by one unusual, but possible, sequence of events,

Comment 14 Francesco Romani 2017-02-10 12:00:43 UTC
patches merged in the stable branch -> MODIFIED

Comment 15 Israel Pinto 2017-02-22 11:20:59 UTC
Verify with:
Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7

run migration sanity all pass