Bug 1390557

Summary: Migrating "Powering Up" VM will drop the VM to the "pause" state
Product: [oVirt] vdsm Reporter: Artyom <alukiano>
Component: CoreAssignee: Dan Kenigsberg <danken>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.18.15CC: alukiano, bugs, gklein, mavital, tjelinek
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-10 11:44:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs none

Description Artyom 2016-11-01 11:56:41 UTC
Created attachment 1216071 [details]
logs

Description of problem:
I can see this bug only in automation run.
Test start the VM and when the VM has state "Powering Up" migrates it.

Version-Release number of selected component (if applicable):
vdsm-yajsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-jsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.15.2-1.el7ev.noarch
vdsm-hook-openstacknet-4.18.15.2-1.el7ev.noarch
vdsm-api-4.18.15.2-1.el7ev.noarch
vdsm-infra-4.18.15.2-1.el7ev.noarch
vdsm-hook-vhostmd-4.18.15.2-1.el7ev.noarch
vdsm-python-4.18.15.2-1.el7ev.noarch
vdsm-cli-4.18.15.2-1.el7ev.noarch
vdsm-4.18.15.2-1.el7ev.x86_64
vdsm-hook-ethtool-options-4.18.15.2-1.el7ev.noarch
vdsm-xmlrpc-4.18.15.2-1.el7ev.noarch
vdsm-hook-fcoe-4.18.15.2-1.el7ev.noarch

ovirt-engine-4.0.5.4-0.1.el7ev.noarch

How reproducible:
10% only under automation tests

Steps to Reproduce:
1. Start the VM
2. Migrate the VM in "Powering Up" state
3.

Actual results:
Migration succeeds, but the VM appears in the pause state on the destination host

Expected results:
Migration succeeds, and the VM has staet "Up"

Additional info:
I do not sure if the bug relates to the engine or vdsm

Comment 1 Dan Kenigsberg 2016-11-01 12:34:33 UTC
Is this a regression? did the same test pass with vdsm-v4.18.2 ?

What happens if you manually continue the VM on destination? Does the guest start running all right?

Comment 2 Artyom 2016-11-01 13:20:55 UTC
Like I said, it happens to me only under automation run(not in the same test case), so I can not reproduce it manually.
I believe it regression because the bug appears only in last automation runs, but again I do not sure for 100% because from all tests that we run it happens only for some single test case.

Comment 3 Dan Kenigsberg 2016-11-01 13:29:18 UTC
Artyom, what is the most recent version of vdsm that passes this test %100? Can you stop the test after failure, in order to see what happens inside the guest and whether manual "continue" solves things?

Comment 4 Artyom 2016-11-01 14:01:46 UTC
We do not have such long history of runs under Jenkins, but from email, I can see that 4.0.4-1 does not have tests that fail because of this bug.
I will try to catch the bug on my local automation environment and will check the guest, but I do not sure if I will succeed.
Until then can someone take a look at logs?

Comment 5 Tomas Jelinek 2016-11-08 12:16:42 UTC
hmmm, I have not found anything too interesting in the logs, could you please try to provide the qemu and libvirt logs from the destination host?
thanx

Comment 6 Artyom 2016-11-10 10:55:31 UTC
I do not have libvirt and qemu logs for the automation run.
I tried to reproduce this problem locally, but without result
100 iterations:
1) Start the VM
2) Migrate the VM straight forward after start
3) Stop the VM

Maybe this problem relates to some problematic host

I believe you can close it with insufficient data, in case if I will see this error again I will attach logs.

Comment 7 Tomas Jelinek 2016-11-10 11:44:18 UTC
Thank you!
So closing as insufficient data since without libvirt/qemu logs there is not much we can do.

If it will happen again and you will have this logs please reopen.