Bug 1311317 - Unattended Ovirt host deploy fails on second host when the engine already imported the hosted-engine storage domain
Unattended Ovirt host deploy fails on second host when the engine already imp...
Status: CLOSED CURRENTRELEASE
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General (Show other bugs)
1.3.2.3
All Linux
unspecified Severity high (vote)
: ovirt-3.6.5
: 1.3.4.0
Assigned To: Simone Tiraboschi
Nikolai Sednev
: Triaged
Depends On:
Blocks: 1315808
  Show dependency treegraph
 
Reported: 2016-02-23 18:00 EST by Bryan
Modified: 2017-05-11 05:26 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: On the interactive flow the answer-file got fetched from the first host after validating the storage device. While the user can also append a custom answer-file via CLI to achieve a fully unattended setup. In this case that answerfile got parsed earlier and so storage validation could fail. Consequence: Storage validation fails since the initial answerfile doesn't match the configuration we got after the auto-import of the hosted-engine storage domain. Fix: honoring BLANK uuid for SP also if supplied via answerfile. Result: it works as expected
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-21 10:41:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
/var/log/vdsm/mom.log (645.50 KB, text/plain)
2016-02-23 18:01 EST, Charlie Inglese
no flags Details
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup*.log (380.10 KB, text/plain)
2016-02-23 18:02 EST, Charlie Inglese
no flags Details
/var/log/vdsm/vdsm.log (915.08 KB, text/plain)
2016-02-23 18:02 EST, Charlie Inglese
no flags Details
Answer file for SD name of 'hosted_storage' (3.51 KB, text/plain)
2016-02-23 18:18 EST, Bryan
no flags Details
Answer file for SD name of 'int_domain' (3.51 KB, text/plain)
2016-02-23 18:18 EST, Bryan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 53975 master MERGED storage: honor SP=BLANK if provided on CLI answerfile 2016-02-25 09:11 EST
oVirt gerrit 53976 ovirt-hosted-engine-setup-1.3 MERGED storage: honor SP=BLANK if provided on CLI answerfile 2016-02-25 09:12 EST

  None (edit)
Description Bryan 2016-02-23 18:00:34 EST
Description of problem:
We attempted to run an unattended install (--config-append) on a second host in an ovirt environment.  We were able to successfully perform this before with a hosted engine SD name of 'int_domain'.  After reading bug 1269768 we changed the SD name to hosted_storage to have the ability for the domain to be imported into Ovirt.  After doing this our unattended install on a second host began to fail again with the VDSM error below.  We performed the following tests with the following results:

-- Hosted engine SD name "hosted_engine"
1) unattended install (--config-append) on host 2   -> Failed
2) Interactive install (no config-append) on host 2 -> Works
3)

-- Hosted engine SD name "int_domain"
1) unattended install (--config-append) on host 2   -> Works
2) Interactive install (no config-append) on host 2 -> Works

NOTE: Both unattended installs were utilizing the exact same answer file except the SD name

Thread-138::INFO::2016-02-23 15:03:06,122::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='1c0494f1-8931-4f51-93c5-dae8f927ab2e', spUUID='00000001-0001-0001-0001-00000000011a', imgUUID='c8ddbc32-1b14-4a18-b94a-497a8724fcb5', leafUUID='c3d30793-5dce-4fab-81b9-cb70b64caad2')
Thread-138::ERROR::2016-02-23 15:03:06,122::task::866::Storage.TaskManager.Task::(_setError) Task=`1a5bdf78-8507-4b35-b4c1-7e7439946694`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3211, in prepareImage
    self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 322, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('00000001-0001-0001-0001-00000000011a',)


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch
libgovirt-0.3.3-1.el7.x86_64
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
ovirt-host-deploy-1.4.1-1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch
vdsm-infra-4.17.18-0.el7.centos.noarch
vdsm-xmlrpc-4.17.18-0.el7.centos.noarch
vdsm-gluster-4.17.18-0.el7.centos.noarch
vdsm-python-4.17.18-0.el7.centos.noarch
vdsm-jsonrpc-4.17.18-0.el7.centos.noarch
vdsm-4.17.18-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch
vdsm-cli-4.17.18-0.el7.centos.noarch

How reproducible: 100%


Steps to Reproduce:
1. Deploy a hosted engine with SD name of "hosted_storage"
2. Attempt an unattended (--config-append) install on host 2
3. Will fail with vdsm error above

Actual results:
Second host fails to deploy unattended with VDSM error above.

Expected results:
Second host is deployed and configured into Ovirt


Additional info:
Comment 1 Charlie Inglese 2016-02-23 18:01 EST
Created attachment 1129946 [details]
/var/log/vdsm/mom.log
Comment 2 Charlie Inglese 2016-02-23 18:02 EST
Created attachment 1129947 [details]
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup*.log
Comment 3 Charlie Inglese 2016-02-23 18:02 EST
Created attachment 1129948 [details]
/var/log/vdsm/vdsm.log
Comment 4 Bryan 2016-02-23 18:06:28 EST
The major difference we noted was in the vdsm error in comment #1.  the spUUID in the error is not blank where as when we did the interactive install as well as the unattended with a different SD name the sdUUID was set to all zeros. "spUUID='00000000-0000-0000-0000-000000000000'"

Whenever the spUUID was set to all zeros the install worked but when it was not set to all zeros the install would fail with the vdsm error above.

We tested the above scenario with 3 nodes in each test.  AKA when we attempted the unattended install with hosted engine SD name of 'hosted_storage' we attempted it on 3 hosts with the same result each time.

When we tested the unattended install with a hosted engine SD name of 'int_domain' we tested it on 3 hosts and all three worked.
Comment 5 Bryan 2016-02-23 18:15:20 EST
On the unattended install that failed our answer file has the following line:

OVEHOSTED_STORAGE/spUUID=str:00000000-0000-0000-0000-000000000000

But looking in the logs and output the unattended install is changing that line to the spUUID seen above.  When it makes the change, the install does not succeed
Comment 6 Bryan 2016-02-23 18:18 EST
Created attachment 1129956 [details]
Answer file for SD name of 'hosted_storage'

This is the answer file we are using for the hosted SD name of 'hosted_storage'
Comment 7 Bryan 2016-02-23 18:18 EST
Created attachment 1129957 [details]
Answer file for SD name of 'int_domain'

This is the answer file we are using for the SD name of int_domain that works
Comment 8 Simone Tiraboschi 2016-02-24 09:32:27 EST
The issue is not directly related to the storage domain name but to the fact that the auto-import procedure will attach the hosted-engine storage domain to the engine storage pool (the auto import will fail if the name is different as for rhbz#1269768 )

Interactive setup is sane since in that case the first host answerfile get parsed only at 'SYSTEM CONFIGURATION' stage which comes after 'STORAGE CONFIGURATION' stage (where we check the connected storage pool UUID) while if you append an answerfile by command line it will be parsed in earlier stages.
Comment 9 Charlie Inglese 2016-03-10 09:04:31 EST
Any possibility that this fix can be incorporated in 3.6.4? This breaks our deployment scenario, as we rely on multiple oVirt hosts.
Comment 10 Simone Tiraboschi 2016-03-22 06:10:53 EDT
(In reply to Charlie Inglese from comment #9)
> Any possibility that this fix can be incorporated in 3.6.4? This breaks our
> deployment scenario, as we rely on multiple oVirt hosts.

It should be there.
Comment 12 Nikolai Sednev 2016-03-23 09:48:10 EDT
Probably has something in common with the https://bugzilla.redhat.com/show_bug.cgi?id=1306825.
I think it is a same root cause.
Comment 13 Nikolai Sednev 2016-03-24 03:37:22 EDT
Works for me on these components:
libvirt-client-1.2.17-13.el7_2.4.x86_64
ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64
mom-0.5.2-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
vdsm-4.17.23.1-0.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Comment 14 Charlie Inglese 2016-04-06 10:03:33 EDT
This fix was not propagated to the ovirt-engine-appliance RPM. Within the latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm.
Comment 15 Bryan 2016-04-08 09:01:42 EDT
(In reply to Charlie Inglese from comment #14)
> This fix was not propagated to the ovirt-engine-appliance RPM. Within the
> latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is
> ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm.

Our team is also using the appliance for our deployment.  Will there be a new appliance spun up for 3.6.4?  If not, this will severely impact our deployment as we are dependent on this fix in 3.6.4 but also dependent on the appliance currently.

Thanks
Comment 16 Simone Tiraboschi 2016-04-08 09:24:33 EDT
You can gather latest pre-releases of the engine appliance for 3.6 for testing purposes from here: http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-x86_64/

But in order to get this issue fixed you don't need a new fresh appliance since the issue was on hosted-engine-setup
Comment 17 Bryan 2016-04-11 08:05:28 EDT
(In reply to Simone Tiraboschi from comment #16)
> You can gather latest pre-releases of the engine appliance for 3.6 for
> testing purposes from here:
> http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-
> x86_64/
> 
> But in order to get this issue fixed you don't need a new fresh appliance
> since the issue was on hosted-engine-setup

Yes, that is true but in deploying the old appliance we do not get the new ovirt-engine RPM installed on the engine host.  So, we have a mixed bag of versions.  

Do you know if all the new RPMs were tested against the old 3.6.3.4 version of the engine? 

We would also like to have the latest engine 3.6.4.
Comment 18 Sandro Bonazzola 2016-04-12 03:40:13 EDT
(In reply to Simone Tiraboschi from comment #16)
> You can gather latest pre-releases of the engine appliance for 3.6 for
> testing purposes from here:
> http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-
> x86_64/
> 

this provides latest stable, not pre-release.


> But in order to get this issue fixed you don't need a new fresh appliance
> since the issue was on hosted-engine-setup

handling bug #1325332 right now, the appliance will be ready in ~15 minutes.

Note You need to log in before you can comment on or make changes to this bug.