Description of problem: We attempted to run an unattended install (--config-append) on a second host in an ovirt environment. We were able to successfully perform this before with a hosted engine SD name of 'int_domain'. After reading bug 1269768 we changed the SD name to hosted_storage to have the ability for the domain to be imported into Ovirt. After doing this our unattended install on a second host began to fail again with the VDSM error below. We performed the following tests with the following results: -- Hosted engine SD name "hosted_engine" 1) unattended install (--config-append) on host 2 -> Failed 2) Interactive install (no config-append) on host 2 -> Works 3) -- Hosted engine SD name "int_domain" 1) unattended install (--config-append) on host 2 -> Works 2) Interactive install (no config-append) on host 2 -> Works NOTE: Both unattended installs were utilizing the exact same answer file except the SD name Thread-138::INFO::2016-02-23 15:03:06,122::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='1c0494f1-8931-4f51-93c5-dae8f927ab2e', spUUID='00000001-0001-0001-0001-00000000011a', imgUUID='c8ddbc32-1b14-4a18-b94a-497a8724fcb5', leafUUID='c3d30793-5dce-4fab-81b9-cb70b64caad2') Thread-138::ERROR::2016-02-23 15:03:06,122::task::866::Storage.TaskManager.Task::(_setError) Task=`1a5bdf78-8507-4b35-b4c1-7e7439946694`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3211, in prepareImage self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 322, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('00000001-0001-0001-0001-00000000011a',) Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch libgovirt-0.3.3-1.el7.x86_64 ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch vdsm-infra-4.17.18-0.el7.centos.noarch vdsm-xmlrpc-4.17.18-0.el7.centos.noarch vdsm-gluster-4.17.18-0.el7.centos.noarch vdsm-python-4.17.18-0.el7.centos.noarch vdsm-jsonrpc-4.17.18-0.el7.centos.noarch vdsm-4.17.18-0.el7.centos.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch vdsm-cli-4.17.18-0.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy a hosted engine with SD name of "hosted_storage" 2. Attempt an unattended (--config-append) install on host 2 3. Will fail with vdsm error above Actual results: Second host fails to deploy unattended with VDSM error above. Expected results: Second host is deployed and configured into Ovirt Additional info:
Created attachment 1129946 [details] /var/log/vdsm/mom.log
Created attachment 1129947 [details] /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup*.log
Created attachment 1129948 [details] /var/log/vdsm/vdsm.log
The major difference we noted was in the vdsm error in comment #1. the spUUID in the error is not blank where as when we did the interactive install as well as the unattended with a different SD name the sdUUID was set to all zeros. "spUUID='00000000-0000-0000-0000-000000000000'" Whenever the spUUID was set to all zeros the install worked but when it was not set to all zeros the install would fail with the vdsm error above. We tested the above scenario with 3 nodes in each test. AKA when we attempted the unattended install with hosted engine SD name of 'hosted_storage' we attempted it on 3 hosts with the same result each time. When we tested the unattended install with a hosted engine SD name of 'int_domain' we tested it on 3 hosts and all three worked.
On the unattended install that failed our answer file has the following line: OVEHOSTED_STORAGE/spUUID=str:00000000-0000-0000-0000-000000000000 But looking in the logs and output the unattended install is changing that line to the spUUID seen above. When it makes the change, the install does not succeed
Created attachment 1129956 [details] Answer file for SD name of 'hosted_storage' This is the answer file we are using for the hosted SD name of 'hosted_storage'
Created attachment 1129957 [details] Answer file for SD name of 'int_domain' This is the answer file we are using for the SD name of int_domain that works
The issue is not directly related to the storage domain name but to the fact that the auto-import procedure will attach the hosted-engine storage domain to the engine storage pool (the auto import will fail if the name is different as for rhbz#1269768 ) Interactive setup is sane since in that case the first host answerfile get parsed only at 'SYSTEM CONFIGURATION' stage which comes after 'STORAGE CONFIGURATION' stage (where we check the connected storage pool UUID) while if you append an answerfile by command line it will be parsed in earlier stages.
Any possibility that this fix can be incorporated in 3.6.4? This breaks our deployment scenario, as we rely on multiple oVirt hosts.
(In reply to Charlie Inglese from comment #9) > Any possibility that this fix can be incorporated in 3.6.4? This breaks our > deployment scenario, as we rely on multiple oVirt hosts. It should be there.
Probably has something in common with the https://bugzilla.redhat.com/show_bug.cgi?id=1306825. I think it is a same root cause.
Works for me on these components: libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 mom-0.5.2-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
This fix was not propagated to the ovirt-engine-appliance RPM. Within the latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm.
(In reply to Charlie Inglese from comment #14) > This fix was not propagated to the ovirt-engine-appliance RPM. Within the > latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is > ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm. Our team is also using the appliance for our deployment. Will there be a new appliance spun up for 3.6.4? If not, this will severely impact our deployment as we are dependent on this fix in 3.6.4 but also dependent on the appliance currently. Thanks
You can gather latest pre-releases of the engine appliance for 3.6 for testing purposes from here: http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-x86_64/ But in order to get this issue fixed you don't need a new fresh appliance since the issue was on hosted-engine-setup
(In reply to Simone Tiraboschi from comment #16) > You can gather latest pre-releases of the engine appliance for 3.6 for > testing purposes from here: > http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7- > x86_64/ > > But in order to get this issue fixed you don't need a new fresh appliance > since the issue was on hosted-engine-setup Yes, that is true but in deploying the old appliance we do not get the new ovirt-engine RPM installed on the engine host. So, we have a mixed bag of versions. Do you know if all the new RPMs were tested against the old 3.6.3.4 version of the engine? We would also like to have the latest engine 3.6.4.
(In reply to Simone Tiraboschi from comment #16) > You can gather latest pre-releases of the engine appliance for 3.6 for > testing purposes from here: > http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7- > x86_64/ > this provides latest stable, not pre-release. > But in order to get this issue fixed you don't need a new fresh appliance > since the issue was on hosted-engine-setup handling bug #1325332 right now, the appliance will be ready in ~15 minutes.