Bug 1311317 - Unattended Ovirt host deploy fails on second host when the engine already imported the hosted-engine storage domain
Summary: Unattended Ovirt host deploy fails on second host when the engine already imp...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 1.3.2.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ovirt-3.6.5
: 1.3.4.0
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1315808
TreeView+ depends on / blocked
 
Reported: 2016-02-23 23:00 UTC by Bryan
Modified: 2019-04-28 13:48 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-04-21 14:41:28 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
/var/log/vdsm/mom.log (645.50 KB, text/plain)
2016-02-23 23:01 UTC, Charlie Inglese
no flags Details
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup*.log (380.10 KB, text/plain)
2016-02-23 23:02 UTC, Charlie Inglese
no flags Details
/var/log/vdsm/vdsm.log (915.08 KB, text/plain)
2016-02-23 23:02 UTC, Charlie Inglese
no flags Details
Answer file for SD name of 'hosted_storage' (3.51 KB, text/plain)
2016-02-23 23:18 UTC, Bryan
no flags Details
Answer file for SD name of 'int_domain' (3.51 KB, text/plain)
2016-02-23 23:18 UTC, Bryan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1277010 0 high CLOSED hosted-engine --deploy fails in second host when using gluster volume 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1320128 0 urgent CLOSED Host setup fails - network disconnection causes SetupNetwork not to be sent (or not received by host) 2024-07-23 03:55:37 UTC
oVirt gerrit 53975 0 master MERGED storage: honor SP=BLANK if provided on CLI answerfile 2016-02-25 14:11:24 UTC
oVirt gerrit 53976 0 ovirt-hosted-engine-setup-1.3 MERGED storage: honor SP=BLANK if provided on CLI answerfile 2016-02-25 14:12:02 UTC

Internal Links: 1277010 1320128 1326847

Description Bryan 2016-02-23 23:00:34 UTC
Description of problem:
We attempted to run an unattended install (--config-append) on a second host in an ovirt environment.  We were able to successfully perform this before with a hosted engine SD name of 'int_domain'.  After reading bug 1269768 we changed the SD name to hosted_storage to have the ability for the domain to be imported into Ovirt.  After doing this our unattended install on a second host began to fail again with the VDSM error below.  We performed the following tests with the following results:

-- Hosted engine SD name "hosted_engine"
1) unattended install (--config-append) on host 2   -> Failed
2) Interactive install (no config-append) on host 2 -> Works
3)

-- Hosted engine SD name "int_domain"
1) unattended install (--config-append) on host 2   -> Works
2) Interactive install (no config-append) on host 2 -> Works

NOTE: Both unattended installs were utilizing the exact same answer file except the SD name

Thread-138::INFO::2016-02-23 15:03:06,122::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='1c0494f1-8931-4f51-93c5-dae8f927ab2e', spUUID='00000001-0001-0001-0001-00000000011a', imgUUID='c8ddbc32-1b14-4a18-b94a-497a8724fcb5', leafUUID='c3d30793-5dce-4fab-81b9-cb70b64caad2')
Thread-138::ERROR::2016-02-23 15:03:06,122::task::866::Storage.TaskManager.Task::(_setError) Task=`1a5bdf78-8507-4b35-b4c1-7e7439946694`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3211, in prepareImage
    self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 322, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('00000001-0001-0001-0001-00000000011a',)


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch
libgovirt-0.3.3-1.el7.x86_64
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
ovirt-host-deploy-1.4.1-1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch
vdsm-infra-4.17.18-0.el7.centos.noarch
vdsm-xmlrpc-4.17.18-0.el7.centos.noarch
vdsm-gluster-4.17.18-0.el7.centos.noarch
vdsm-python-4.17.18-0.el7.centos.noarch
vdsm-jsonrpc-4.17.18-0.el7.centos.noarch
vdsm-4.17.18-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch
vdsm-cli-4.17.18-0.el7.centos.noarch

How reproducible: 100%


Steps to Reproduce:
1. Deploy a hosted engine with SD name of "hosted_storage"
2. Attempt an unattended (--config-append) install on host 2
3. Will fail with vdsm error above

Actual results:
Second host fails to deploy unattended with VDSM error above.

Expected results:
Second host is deployed and configured into Ovirt


Additional info:

Comment 1 Charlie Inglese 2016-02-23 23:01:16 UTC
Created attachment 1129946 [details]
/var/log/vdsm/mom.log

Comment 2 Charlie Inglese 2016-02-23 23:02:07 UTC
Created attachment 1129947 [details]
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup*.log

Comment 3 Charlie Inglese 2016-02-23 23:02:29 UTC
Created attachment 1129948 [details]
/var/log/vdsm/vdsm.log

Comment 4 Bryan 2016-02-23 23:06:28 UTC
The major difference we noted was in the vdsm error in comment #1.  the spUUID in the error is not blank where as when we did the interactive install as well as the unattended with a different SD name the sdUUID was set to all zeros. "spUUID='00000000-0000-0000-0000-000000000000'"

Whenever the spUUID was set to all zeros the install worked but when it was not set to all zeros the install would fail with the vdsm error above.

We tested the above scenario with 3 nodes in each test.  AKA when we attempted the unattended install with hosted engine SD name of 'hosted_storage' we attempted it on 3 hosts with the same result each time.

When we tested the unattended install with a hosted engine SD name of 'int_domain' we tested it on 3 hosts and all three worked.

Comment 5 Bryan 2016-02-23 23:15:20 UTC
On the unattended install that failed our answer file has the following line:

OVEHOSTED_STORAGE/spUUID=str:00000000-0000-0000-0000-000000000000

But looking in the logs and output the unattended install is changing that line to the spUUID seen above.  When it makes the change, the install does not succeed

Comment 6 Bryan 2016-02-23 23:18:07 UTC
Created attachment 1129956 [details]
Answer file for SD name of 'hosted_storage'

This is the answer file we are using for the hosted SD name of 'hosted_storage'

Comment 7 Bryan 2016-02-23 23:18:49 UTC
Created attachment 1129957 [details]
Answer file for SD name of 'int_domain'

This is the answer file we are using for the SD name of int_domain that works

Comment 8 Simone Tiraboschi 2016-02-24 14:32:27 UTC
The issue is not directly related to the storage domain name but to the fact that the auto-import procedure will attach the hosted-engine storage domain to the engine storage pool (the auto import will fail if the name is different as for rhbz#1269768 )

Interactive setup is sane since in that case the first host answerfile get parsed only at 'SYSTEM CONFIGURATION' stage which comes after 'STORAGE CONFIGURATION' stage (where we check the connected storage pool UUID) while if you append an answerfile by command line it will be parsed in earlier stages.

Comment 9 Charlie Inglese 2016-03-10 14:04:31 UTC
Any possibility that this fix can be incorporated in 3.6.4? This breaks our deployment scenario, as we rely on multiple oVirt hosts.

Comment 10 Simone Tiraboschi 2016-03-22 10:10:53 UTC
(In reply to Charlie Inglese from comment #9)
> Any possibility that this fix can be incorporated in 3.6.4? This breaks our
> deployment scenario, as we rely on multiple oVirt hosts.

It should be there.

Comment 12 Nikolai Sednev 2016-03-23 13:48:10 UTC
Probably has something in common with the https://bugzilla.redhat.com/show_bug.cgi?id=1306825.
I think it is a same root cause.

Comment 13 Nikolai Sednev 2016-03-24 07:37:22 UTC
Works for me on these components:
libvirt-client-1.2.17-13.el7_2.4.x86_64
ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64
mom-0.5.2-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
vdsm-4.17.23.1-0.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

Comment 14 Charlie Inglese 2016-04-06 14:03:33 UTC
This fix was not propagated to the ovirt-engine-appliance RPM. Within the latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm.

Comment 15 Bryan 2016-04-08 13:01:42 UTC
(In reply to Charlie Inglese from comment #14)
> This fix was not propagated to the ovirt-engine-appliance RPM. Within the
> latest 3.6 ovirt repo, the latest ovirt-engine-appliance RPM is
> ovirt-engine-appliance-3.6-20160301.1.el7.centos.noarch.rpm.

Our team is also using the appliance for our deployment.  Will there be a new appliance spun up for 3.6.4?  If not, this will severely impact our deployment as we are dependent on this fix in 3.6.4 but also dependent on the appliance currently.

Thanks

Comment 16 Simone Tiraboschi 2016-04-08 13:24:33 UTC
You can gather latest pre-releases of the engine appliance for 3.6 for testing purposes from here: http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-x86_64/

But in order to get this issue fixed you don't need a new fresh appliance since the issue was on hosted-engine-setup

Comment 17 Bryan 2016-04-11 12:05:28 UTC
(In reply to Simone Tiraboschi from comment #16)
> You can gather latest pre-releases of the engine appliance for 3.6 for
> testing purposes from here:
> http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-
> x86_64/
> 
> But in order to get this issue fixed you don't need a new fresh appliance
> since the issue was on hosted-engine-setup

Yes, that is true but in deploying the old appliance we do not get the new ovirt-engine RPM installed on the engine host.  So, we have a mixed bag of versions.  

Do you know if all the new RPMs were tested against the old 3.6.3.4 version of the engine? 

We would also like to have the latest engine 3.6.4.

Comment 18 Sandro Bonazzola 2016-04-12 07:40:13 UTC
(In reply to Simone Tiraboschi from comment #16)
> You can gather latest pre-releases of the engine appliance for 3.6 for
> testing purposes from here:
> http://jenkins.ovirt.org/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-
> x86_64/
> 

this provides latest stable, not pre-release.


> But in order to get this issue fixed you don't need a new fresh appliance
> since the issue was on hosted-engine-setup

handling bug #1325332 right now, the appliance will be ready in ~15 minutes.


Note You need to log in before you can comment on or make changes to this bug.