+++ This bug was initially created as a clone of Bug #1639604 +++ Description of problem: After vintage deployment, adding a storage domain doesn't add the hosted-engine storage domain and the HE VM to the environment. Version-Release number of selected component (if applicable): ovirt-host-deploy-1.7.5-0.0.master.20180530161905.gitc423dec.el7.noarch ovirt-engine-appliance-4.2-20181014.1.el7.noarch ovirt-imageio-common-1.4.5-0.el7.x86_64 ovirt-release42-snapshot-4.2.7-0.2.rc2.20181014014958.gitfb30674.el7.noarch ovirt-release42-4.2.7-0.2.rc2.20181014014958.gitfb30674.el7.noarch ovirt-engine-sdk-python-3.6.9.2-0.1.20180209.gite99bbd1.el7.centos.noarch python-ovirt-engine-sdk4-4.2.9-2.20181004git4d189a6.el7.x86_64 ovirt-provider-ovn-driver-1.2.17-0.20181003135950.git6aa6b37.el7.noarch cockpit-ovirt-dashboard-0.11.35-1.el7.noarch ovirt-setup-lib-1.1.6-0.0.master.20180921125403.git90612e6.el7.noarch ovirt-vmconsole-host-1.0.6-1.el7.noarch ovirt-host-dependencies-4.2.3-1.el7.x86_64 ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch cockpit-machines-ovirt-176-1.el7.noarch ovirt-imageio-daemon-1.4.5-0.el7.noarch ovirt-host-4.2.3-1.el7.x86_64 ovirt-vmconsole-1.0.6-1.el7.noarch ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch vdsm-4.20.42-4.git43e2555.el7.x86_64 libvirt-4.5.0-10.el7.x86_64 glusterfs-cli-3.12.14-1.el7.x86_64 glusterfs-fuse-3.12.14-1.el7.x86_64 glusterfs-client-xlators-3.12.14-1.el7.x86_64 glusterfs-libs-3.12.14-1.el7.x86_64 glusterfs-3.12.14-1.el7.x86_64 glusterfs-rdma-3.12.14-1.el7.x86_64 glusterfs-api-3.12.14-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7.x86_64 ovirt-engine-4.2.7.3-0.0.master.20181012152958.gitfc1595b.el7.noarch How reproducible: Always Steps to Reproduce: 1. Vintage deploy HE environment(I saw it on gluster storage). 2. Add storage domain to the environment. Actual results: The storage domain is added after step 2. No other storage is added, HE-VM is not seen in the web-ui. On the engine every ~15 seconds there is a task: "Adding unmanaged VMs running on Host ocelot05.qa.lab.tlv.redhat.com to Cluster Default". Expected results: After adding a storage domain, the auto import should activate and succeed. Another storage domain should be added - the storage of the hosted engine and the HE VM should be in showed in the environment. Additional info: It is possible to manually add the storage domain(using import) of the HE-VM, but the HE-VM is still not shown in the environment, also in the specific storage under the domain's virutal machines tab there is no VM. In that case the engine keeps doing the task above ("Adding umnanaged VMs..."). --- Additional comment from Liran Rotenberg on 2018-10-16 04:05 EDT --- --- Additional comment from Liran Rotenberg on 2018-10-16 04:06 EDT --- --- Additional comment from Simone Tiraboschi on 2018-10-16 04:51:48 EDT --- This happens on engine side: the engine continuously scans for the external VMs on the host but it never imports them. for instance in engine.log in the attached engine-sosreport we can count 43 instances of "Running command: AddUnmanagedVmsCommand internal: true." but no one of them is successful or failing with a clear error. --- Additional comment from Sandro Bonazzola on 2018-10-17 05:30:15 EDT --- Moving to Ryan being identified like a Virt related issue. --- Additional comment from Simone Tiraboschi on 2018-10-17 05:46:06 EDT --- I reproduced it also with an external VM that is not related to hosted-engine. --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-10-17 06:07:59 EDT --- This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. --- Additional comment from Ryan Barry on 2018-11-07 21:54:42 EST --- I've repeatedly tried to reproduce this, and failed. It is not reproducible on master. It is not reproducible on 4.2. I'm currently editing an appliance image with a development build of 4.2, but that's a very slow process. Simone, can you please provide steps to reproduce with another VM? I've created with virt-install and virsh, with both successfully imported (on both 4.2 and 4.3), though neither was on a storage domain -- both were local storage. I'll continue trying to reproduce with the appliance, but it's a long turnaround? --- Additional comment from Simone Tiraboschi on 2018-11-08 04:29:42 EST --- (In reply to Ryan Barry from comment #7) > I've repeatedly tried to reproduce this, and failed. > > It is not reproducible on master. > > It is not reproducible on 4.2. It's systematically reproduced on master and on 4.2, please check: https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-suite-4.2/ https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-4.2/ https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-suite-master/ https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/ > Simone, can you please provide steps to reproduce with another VM? I've > created with virt-install and virsh, with both successfully imported (on > both 4.2 and 4.3), though neither was on a storage domain -- both were local > storage. > > I'll continue trying to reproduce with the appliance, but it's a long > turnaround? That engine VM to be imported resides on a VDSM managed Storage Domain and it has been directly created trough VDSM. Maybe it depends from a specific devices or something like that. I'd suggest to try the vintage hosted-engine deployment (deploy with 'hosted-engine --deploy --noansible') and check what happens on that engine. --- Additional comment from Ryan Barry on 2018-11-08 07:18:47 EST --- (In reply to Simone Tiraboschi from comment #8) > It's systematically reproduced on master and on 4.2, please check: > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system- > tests_he-basic-suite-4.2/ > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system- > tests_he-basic-iscsi-suite-4.2/ > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system- > tests_he-basic-suite-master/ > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system- > tests_he-basic-iscsi-suite-master/ Right, but all of these involve hosted engine itself. It doesn't seem to be reproducible outside of hosted engine. > That engine VM to be imported resides on a VDSM managed Storage Domain and > it has been directly created trough VDSM. > Maybe it depends from a specific devices or something like that. > I'd suggest to try the vintage hosted-engine deployment (deploy with > 'hosted-engine --deploy --noansible') and check what happens on that engine. I'll test this way as well. If it's not reproducible there, I'll test with the appliance, but this is an extremely slow process, because to find a root cause, it involves rebuilding the engine RPM, editing the appliance qcow, deploying HE, waiting for it to fail, and repeating. That's ok, but I wouldn't expect to find a cause until later this week. --- Additional comment from Michal Skrivanek on 2018-11-12 08:52:29 EST --- note that unless anyone helps with reproduction scenario we'll have to close this. Contrary to comment #5 we're not able to reproduce this with regular VMs --- Additional comment from Simone Tiraboschi on 2018-11-12 09:45:41 EST --- Isn't the engine VM in the vintage flow a good example by itself? --- Additional comment from Michal Skrivanek on 2018-11-12 10:09:43 EST --- don't know. It did work for me in bug 1626157. So there is probably something else involved here. --- Additional comment from Ryan Barry on 2018-11-12 19:39:06 EST --- Note: I've also failed to reproduce this with the vintage flow on both NFS and is so. Liram reproduced on Gluster, but I don't have a gluster environment set up. If this is isolated to a deprecated flow on OST/Gluster only, I'm nacking until we see a "real world" report or a more reliable reproducer is found, since it's unlikely that any HC users will select the vintage flow --- Additional comment from Simone Tiraboschi on 2018-11-13 04:49:21 EST --- (In reply to Ryan Barry from comment #13) > Note: > > I've also failed to reproduce this with the vintage flow on both NFS and is > so. > > Liram reproduced on Gluster, but I don't have a gluster environment set up. > If this is isolated to a deprecated flow on OST/Gluster only, I'm nacking > until we see a "real world" report or a more reliable reproducer is found, > since it's unlikely that any HC users will select the vintage flow We have for sure a report by an upstream user on: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DOJENTCWFPFQGDT3IZW542POCTTNAZOW/ The user specify *nested* and in OST we are running nested as well. Maybe the issue is just there? --- Additional comment from Ryan Barry on 2018-11-13 07:04:37 EST --- Unfortunately, the tests were all run nested. I'll try to reproduce a couple more times, but it seems extremely reliable in my environment. Alternatively, if QE can provide an environment where this is reliably reproducible (without actually reproducing it, so I can edit the appliance), that may yield progress --- Additional comment from Ryan Barry on 2018-11-13 22:05:32 EST --- Finally got a reproducer, which was actually as trivial as logging into the engine after vintage HE deployment. I'm not sure how it succeeds without the VM registered. It looks like FullList is not actually returning disks. Patch tomorrow, hopefully
Exact same sequence of logs, a loop on: VmAnalyzer -> AddUnmanagedVmsCommand -> DumpXmlsVDSCommand
Versions: vdsm-4.20.43-1.el7ev.x86_64 rhvm-4.2.7.4-0.1.el7ev.noarch ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch
As a temporary workaround, I'd suggest using an older version of the appliance to restore, updating when available. 4.2.4 is definitely safe
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops
*** Bug 1655172 has been marked as a duplicate of this bug. ***
Hello, my case is a little different I'm not able to manually import the HE storage domain and I'm using FC Thanks
Same case, unfortunately. Resolved in current master or 4.2.8
Verified on: ovirt-engine-4.2.8.1-0.1.el7ev.noarch ovirt-hosted-engine-setup-2.2.32-1.el7ev.noarch rhvm-appliance-4.2-20181212.0.el7.noarch Steps: 1. Vintage deploy HE environment (NFS used). 2. Add storage domain to the environment. Results: After adding a storage domain, the auto import succeed. The hosted-engine storage domain added and HE VM imported into the environment.
*** Bug 1661990 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0121