Bug 1649615

Summary: [downstream clone] engine fails to imports external VMs
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Liran Rotenberg <lrotenbe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2.7CC: amashah, bugs, dario.pulcinelli, lrotenbe, michal.skrivanek, mkalinin, ratamir, rbarry, Rhev-m-bugs, stirabos, usurse
Target Milestone: ovirt-4.2.8Keywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.8.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1639604 Environment:
Last Closed: 2019-01-22 12:44:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1639604    
Bug Blocks:    

Description Germano Veit Michel 2018-11-14 04:21:16 UTC
+++ This bug was initially created as a clone of Bug #1639604 +++

Description of problem:
After vintage deployment, adding a storage domain doesn't add the hosted-engine storage domain and the HE VM to the environment. 

Version-Release number of selected component (if applicable):
ovirt-host-deploy-1.7.5-0.0.master.20180530161905.gitc423dec.el7.noarch
ovirt-engine-appliance-4.2-20181014.1.el7.noarch
ovirt-imageio-common-1.4.5-0.el7.x86_64
ovirt-release42-snapshot-4.2.7-0.2.rc2.20181014014958.gitfb30674.el7.noarch
ovirt-release42-4.2.7-0.2.rc2.20181014014958.gitfb30674.el7.noarch
ovirt-engine-sdk-python-3.6.9.2-0.1.20180209.gite99bbd1.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.9-2.20181004git4d189a6.el7.x86_64
ovirt-provider-ovn-driver-1.2.17-0.20181003135950.git6aa6b37.el7.noarch
cockpit-ovirt-dashboard-0.11.35-1.el7.noarch
ovirt-setup-lib-1.1.6-0.0.master.20180921125403.git90612e6.el7.noarch
ovirt-vmconsole-host-1.0.6-1.el7.noarch
ovirt-host-dependencies-4.2.3-1.el7.x86_64
ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch
cockpit-machines-ovirt-176-1.el7.noarch
ovirt-imageio-daemon-1.4.5-0.el7.noarch
ovirt-host-4.2.3-1.el7.x86_64
ovirt-vmconsole-1.0.6-1.el7.noarch
ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch
vdsm-4.20.42-4.git43e2555.el7.x86_64
libvirt-4.5.0-10.el7.x86_64
glusterfs-cli-3.12.14-1.el7.x86_64
glusterfs-fuse-3.12.14-1.el7.x86_64
glusterfs-client-xlators-3.12.14-1.el7.x86_64
glusterfs-libs-3.12.14-1.el7.x86_64
glusterfs-3.12.14-1.el7.x86_64
glusterfs-rdma-3.12.14-1.el7.x86_64
glusterfs-api-3.12.14-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7.x86_64

ovirt-engine-4.2.7.3-0.0.master.20181012152958.gitfc1595b.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Vintage deploy HE environment(I saw it on gluster storage).
2. Add storage domain to the environment.

Actual results:
The storage domain is added after step 2. No other storage is added, HE-VM is not seen in the web-ui.
On the engine every ~15 seconds there is a task:
"Adding unmanaged VMs running on Host ocelot05.qa.lab.tlv.redhat.com to Cluster Default". 

Expected results:
After adding a storage domain, the auto import should activate and succeed. Another storage domain should be added - the storage of the hosted engine and the HE VM should be in showed in the environment.

Additional info:
It is possible to manually add the storage domain(using import) of the HE-VM, but the HE-VM is still not shown in the environment, also in the specific storage under the domain's virutal machines tab there is no VM. In that case the engine keeps doing the task above ("Adding umnanaged VMs...").

--- Additional comment from Liran Rotenberg on 2018-10-16 04:05 EDT ---



--- Additional comment from Liran Rotenberg on 2018-10-16 04:06 EDT ---



--- Additional comment from Simone Tiraboschi on 2018-10-16 04:51:48 EDT ---

This happens on engine side: the engine continuously scans for the external VMs on the host but it never imports them.

for instance in engine.log in the attached engine-sosreport we can count 43 instances of "Running command: AddUnmanagedVmsCommand internal: true."  but no one of them is successful or failing with a clear error.

--- Additional comment from Sandro Bonazzola on 2018-10-17 05:30:15 EDT ---

Moving to Ryan being identified like a Virt related issue.

--- Additional comment from Simone Tiraboschi on 2018-10-17 05:46:06 EDT ---

I reproduced it also with an external VM that is not related to hosted-engine.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-17 06:07:59 EDT ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

--- Additional comment from Ryan Barry on 2018-11-07 21:54:42 EST ---

I've repeatedly tried to reproduce this, and failed.

It is not reproducible on master.

It is not reproducible on 4.2.

I'm currently editing an appliance image with a development build of 4.2, but that's a very slow process.

Simone, can you please provide steps to reproduce with another VM? I've created with virt-install and virsh, with both successfully imported (on both 4.2 and 4.3), though neither was on a storage domain -- both were local storage.

I'll continue trying to reproduce with the appliance, but it's a long turnaround?

--- Additional comment from Simone Tiraboschi on 2018-11-08 04:29:42 EST ---

(In reply to Ryan Barry from comment #7)
> I've repeatedly tried to reproduce this, and failed.
> 
> It is not reproducible on master.
> 
> It is not reproducible on 4.2.

It's systematically reproduced on master and on 4.2, please check:
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-suite-4.2/
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-4.2/
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-suite-master/
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/

> Simone, can you please provide steps to reproduce with another VM? I've
> created with virt-install and virsh, with both successfully imported (on
> both 4.2 and 4.3), though neither was on a storage domain -- both were local
> storage.
> 
> I'll continue trying to reproduce with the appliance, but it's a long
> turnaround?

That engine VM to be imported resides on a VDSM managed Storage Domain and it has been directly created trough VDSM.
Maybe it depends from a specific devices or something like that.
I'd suggest to try the vintage hosted-engine deployment (deploy with 'hosted-engine --deploy --noansible') and check what happens on that engine.

--- Additional comment from Ryan Barry on 2018-11-08 07:18:47 EST ---

(In reply to Simone Tiraboschi from comment #8)
> It's systematically reproduced on master and on 4.2, please check:
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-
> tests_he-basic-suite-4.2/
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-
> tests_he-basic-iscsi-suite-4.2/
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-
> tests_he-basic-suite-master/
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-
> tests_he-basic-iscsi-suite-master/

Right, but all of these involve hosted engine itself. It doesn't seem to be reproducible outside of hosted engine.

> That engine VM to be imported resides on a VDSM managed Storage Domain and
> it has been directly created trough VDSM.
> Maybe it depends from a specific devices or something like that.
> I'd suggest to try the vintage hosted-engine deployment (deploy with
> 'hosted-engine --deploy --noansible') and check what happens on that engine.

I'll test this way as well.

If it's not reproducible there, I'll test with the appliance, but this is an extremely slow process, because to find a root cause, it involves rebuilding the engine RPM, editing the appliance qcow, deploying HE, waiting for it to fail, and repeating.

That's ok, but I wouldn't expect to find a cause until later this week.

--- Additional comment from Michal Skrivanek on 2018-11-12 08:52:29 EST ---

note that unless anyone helps with reproduction scenario we'll have to close this. Contrary to comment #5 we're not able to reproduce this with regular VMs

--- Additional comment from Simone Tiraboschi on 2018-11-12 09:45:41 EST ---

Isn't the engine VM in the vintage flow a good example by itself?

--- Additional comment from Michal Skrivanek on 2018-11-12 10:09:43 EST ---

don't know. It did work for me in bug 1626157. So there is probably something else involved here.

--- Additional comment from Ryan Barry on 2018-11-12 19:39:06 EST ---

Note:

I've also failed to reproduce this with the vintage flow on both NFS and is so.

Liram reproduced on Gluster, but I don't have a gluster environment set up. If this is isolated to a deprecated flow on OST/Gluster only, I'm nacking until we see a "real world" report or a more reliable reproducer is found, since it's unlikely that any HC users will select the vintage flow

--- Additional comment from Simone Tiraboschi on 2018-11-13 04:49:21 EST ---

(In reply to Ryan Barry from comment #13)
> Note:
> 
> I've also failed to reproduce this with the vintage flow on both NFS and is
> so.
> 
> Liram reproduced on Gluster, but I don't have a gluster environment set up.
> If this is isolated to a deprecated flow on OST/Gluster only, I'm nacking
> until we see a "real world" report or a more reliable reproducer is found,
> since it's unlikely that any HC users will select the vintage flow

We have for sure a report by an upstream user on:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DOJENTCWFPFQGDT3IZW542POCTTNAZOW/

The user specify *nested* and in OST we are running nested as well.
Maybe the issue is just there?

--- Additional comment from Ryan Barry on 2018-11-13 07:04:37 EST ---

Unfortunately, the tests were all run nested.

I'll try to reproduce a couple more times, but it seems extremely reliable in my environment.

Alternatively, if QE can provide an environment where this is reliably reproducible (without actually reproducing it, so I can edit the appliance), that may yield progress

--- Additional comment from Ryan Barry on 2018-11-13 22:05:32 EST ---

Finally got a reproducer, which was actually as trivial as logging into the engine after vintage HE deployment. I'm not sure how it succeeds without the VM registered.

It looks like FullList is not actually returning disks. Patch tomorrow, hopefully

Comment 1 Germano Veit Michel 2018-11-14 04:23:30 UTC
Exact same sequence of logs, a loop on:

VmAnalyzer -> AddUnmanagedVmsCommand -> DumpXmlsVDSCommand

Comment 4 Germano Veit Michel 2018-11-14 04:39:34 UTC
Versions:
vdsm-4.20.43-1.el7ev.x86_64
rhvm-4.2.7.4-0.1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch

Comment 6 Ryan Barry 2018-11-15 11:20:11 UTC
As a temporary workaround, I'd suggest using an older version of the appliance to restore, updating when available. 4.2.4 is definitely safe

Comment 7 RHV bug bot 2018-11-28 14:38:09 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.com

Comment 8 Simone Tiraboschi 2018-12-03 08:20:44 UTC
*** Bug 1655172 has been marked as a duplicate of this bug. ***

Comment 9 Dario Pulcinelli 2018-12-04 21:48:15 UTC
Hello,

my case is a little different

I'm not able to manually import the HE storage domain and I'm using FC



Thanks

Comment 10 Ryan Barry 2018-12-04 21:51:24 UTC
Same case, unfortunately. Resolved in current master or 4.2.8

Comment 12 Liran Rotenberg 2018-12-18 08:23:01 UTC
Verified on:
ovirt-engine-4.2.8.1-0.1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.32-1.el7ev.noarch
rhvm-appliance-4.2-20181212.0.el7.noarch

Steps:
1. Vintage deploy HE environment (NFS used).
2. Add storage domain to the environment.

Results:
After adding a storage domain, the auto import succeed. The hosted-engine storage domain added and HE VM imported into the environment.

Comment 13 Ryan Barry 2018-12-26 00:14:41 UTC
*** Bug 1661990 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2019-01-22 12:44:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0121