Bug 1277010
Summary: | hosted-engine --deploy fails in second host when using gluster volume | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-setup | Reporter: | Ramesh N <rnachimu> | ||||||||||||||||||||||||
Component: | Plugins.Gluster | Assignee: | Simone Tiraboschi <stirabos> | ||||||||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||||||
Version: | 1.3.0 | CC: | acanan, bhughes, bugs, cinglese, dfediuck, didi, gklein, lveyde, rmartins, rnachimu, sabose, sbonazzo, stirabos, ylavi | ||||||||||||||||||||||||
Target Milestone: | ovirt-3.6.2 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ sbonazzo: devel_ack+ gklein: testing_ack+ |
||||||||||||||||||||||||
Target Release: | 1.3.2.3 | ||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||
Fixed In Version: | ovirt-hosted-engine-setup-1.3.2.3-1 | Doc Type: | Bug Fix | ||||||||||||||||||||||||
Doc Text: |
hosted-engine-setup generates answer file with a blank SP UUID
for additional hosts cause it correctly disconnected its SD
from the boostrap storage pool.
If we want to use the answerfile for not-interactively replaing
the scenario we need a valid spUUID value otherwise we are not
able to create the bootstrap storage pool.
On the other side, passing an answerfile via CLI is a valid
alternative instead of download it from another host via scp.
And in that case we should honour the BLANK UUID.
So resetting spUUID to blank only if really needed to create
a new boostrap storage pool.
|
Story Points: | --- | ||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||
Last Closed: | 2016-02-18 11:01:55 UTC | Type: | Bug | ||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||
Bug Depends On: | 995362, 1304445, 1304514 | ||||||||||||||||||||||||||
Bug Blocks: | 1258386 | ||||||||||||||||||||||||||
Attachments: |
|
Description
Ramesh N
2015-11-02 04:45:34 UTC
Created attachment 1088457 [details]
ovirt-hosted-engine-setup log
Ramesh, could you please attach also /var/log/message and vdsm logs from that host? I ran into same issue today while trying out the latest RHEVM 3.6 ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch vdsm-jsonrpc-4.17.10.1-0.1.el7ev.noarch (with vdsm-gluster enabled) Created attachment 1097097 [details]
messages
Created attachment 1097098 [details]
vdsm.log
Created attachment 1097099 [details]
ovirt-hosted-engine-setup
Why was this moved to 4.0? Is this a blocker to HE support for shared Gluster? (In reply to Yaniv Dary from comment #7) > Why was this moved to 4.0? It has been always on 4.0 Doron Fediuck 2015-11-02 02:20:21 EST Target Milestone: --- → ovirt-4.0.0 > Is this a blocker to HE support for shared Gluster? Looks like that. Moving to Simone who already started investigating. I see this problem only when I run hosted engine deploy with the answer file from first host using "hosted-engine --deploy --config-append=answers.conf" and say no to scp the config file from first. If i say to 'yes' to scp the config file and download the config file then it works. I am not sure what happens in case of scping the config file as part of hosted engine deploy script. (In reply to Ramesh N from comment #9) > I see this problem only when I run hosted engine deploy with the answer file > from first host using "hosted-engine --deploy --config-append=answers.conf" > and say no to scp the config file from first. If i say to 'yes' to scp the > config file and download the config file then it works. I am not sure what > happens in case of scping the config file as part of hosted engine deploy > script. So it doesn't seems a blocker for 3.6.2 to me. Postponing to 3.6.5. Second HE host deployment succeeded over Gluster using the following: vdsm-xmlrpc-4.17.18-0.el7ev.noarch vdsm-4.17.18-0.el7ev.noarch vdsm-python-4.17.18-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch vdsm-jsonrpc-4.17.18-0.el7ev.noarch vdsm-yajsonrpc-4.17.18-0.el7ev.noarch vdsm-cli-4.17.18-0.el7ev.noarch vdsm-infra-4.17.18-0.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch rhevm-3.6.2.6-0.1.el6.noarch glusterfs-devel-3.7.1-16.el7.x86_64 glusterfs-rdma-3.7.1-16.el7.x86_64 glusterfs-client-xlators-3.7.1-16.el7.x86_64 glusterfs-api-devel-3.7.1-16.el7.x86_64 glusterfs-3.7.1-16.el7.x86_64 glusterfs-fuse-3.7.1-16.el7.x86_64 glusterfs-cli-3.7.1-16.el7.x86_64 glusterfs-libs-3.7.1-16.el7.x86_64 glusterfs-api-3.7.1-16.el7.x86_64 I am encountering the same issue (using an answer file): RPMs: ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch Output from hosted-engine --deploy --config-append=<answerfile>: [ ERROR ] Failed to execute stage 'Setup validation': failed to read metadata: [ Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_e ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/ans wers-20160201121651.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please c heck the issue, fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted -engine-setup-20160201121639-t0vqx7.log Output (ERROR lines only) from /var/log/ovirt-hosted-engine-setup/ovirt-hosted -engine-setup-20160201121639-t0vqx7.log: 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 454, in _validation ] + ".metadata", File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 171, in get_all_host_stats_direct self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 122, in get_all_stats_direct stats = sb.get_raw_stats_for_service_type("client", service_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 140, in get_raw_stats_for_service_type .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' 2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Setup validation': failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' 2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc misc._terminate:170 Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy (In reply to Charlie Inglese from comment #13) > I am encountering the same issue (using an answer file): > > RPMs: > ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch > ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch > > Output from hosted-engine --deploy --config-append=<answerfile>: > [ ERROR ] Failed to execute stage 'Setup validation': failed to read > metadata: [ > Errno 2] No such file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_e > ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' > [ INFO ] Stage: Clean up > [ INFO ] Generating answer file > '/var/lib/ovirt-hosted-engine-setup/answers/ans > wers-20160201121651.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > [ ERROR ] Hosted Engine deployment failed: this system is not reliable, > please c > heck the issue, fix and redeploy > Log file is located at > /var/log/ovirt-hosted-engine-setup/ovirt-hosted > -engine-setup-20160201121639-t0vqx7.log > > Output (ERROR lines only) from > /var/log/ovirt-hosted-engine-setup/ovirt-hosted > -engine-setup-20160201121639-t0vqx7.log: > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method > exception > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in > _executeMethod > method['method']() > File > "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine- > setup/storage/storage.py", line 454, in _validation > ] + ".metadata", > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 171, in get_all_host_stats_direct > self.StatModes.HOST) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 122, in get_all_stats_direct > stats = sb.get_raw_stats_for_service_type("client", service_type) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/ > storage_broker.py", line 140, in get_raw_stats_for_service_type > .format(str(e))) > RequestError: failed to read metadata: [Errno 2] No such file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943- > a0dc0a91bd16/ha_agent/hosted-engine.metadata' > 2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to > execute stage 'Setup validation': failed to read metadata: [Errno 2] No such > file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943- > a0dc0a91bd16/ha_agent/hosted-engine.metadata' > 2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc > misc._terminate:170 Hosted Engine deployment failed: this system is not > reliable, please check the issue, fix and redeploy ls -l /rhev/data-center/mnt/glusterSD/pserver7\:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/: lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.lockspace -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/7b9406eb-461c-4c31-8b07-e7efde56e9c6/caf6bd87-a378-497f-b320-4936db48becc lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.metadata -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/acf1c484-edf0-4a27-9eea-9adacc186cbf/ace65aa0-6628-4cbf-b420-5ef58f780a25 Both links (hosted-engine.lockspace, hosted-engine.metadata) are broken. The /var/run/vdsm/storage directory doesn't exist. ls -l /var/run/vdsm: -rw-r--r--. 1 vdsm kvm 0 Feb 1 12:16 client.log drwxr-xr-x. 2 vdsm kvm 60 Feb 1 12:16 lvm srwxr-xr-x. 1 vdsm kvm 0 Feb 1 12:16 mom-vdsm.sock -rw-r--r--. 1 root root 0 Feb 1 12:16 nets_restored drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 sourceRoutes srwxr-xr-x. 1 vdsm kvm 0 Feb 1 12:16 svdsm.sock drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 trackedInterfaces drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 v2v I'm attaching my vdsm logs to this ticket. Charlie, can you please attach also the answerfile and the whole /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160201121639-t0vqx7.log ? Created attachment 1120193 [details]
VDSM log (cinglese)
Created attachment 1120195 [details]
ovirt-hosted-engine-setup answer file
Created attachment 1120197 [details]
ovirt-hosted-engine-setup log
Simone, I uploaded my answer file and complete ovirt-hosted-engine-setup log as requested; thanks. At this point, I cannot add an additional oVirt HE host either via answer file or interactively. I used the oVirt HE appliance on my initial host (ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch). RPM baseline (on additional oVirt HE node): glusterfs-3.7.6-1.el7.x86_64 glusterfs-api-3.7.6-1.el7.x86_64 glusterfs-cli-3.7.6-1.el7.x86_64 glusterfs-client-xlators-3.7.6-1.el7.x86_64 glusterfs-fuse-3.7.6-1.el7.x86_64 glusterfs-libs-3.7.6-1.el7.x86_64 libgovirt-0.3.3-1.el7.x86_64 ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch vdsm-4.17.18-0.el7.centos.noarch vdsm-cli-4.17.18-0.el7.centos.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch vdsm-infra-4.17.18-0.el7.centos.noarch vdsm-jsonrpc-4.17.18-0.el7.centos.noarch vdsm-python-4.17.18-0.el7.centos.noarch vdsm-xmlrpc-4.17.18-0.el7.centos.noarch vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch OK, you hit a corner case that differs from what we fixed and verified: now ovirt-hosted-engine setup honors the UUID in the answerfile but, in your case, the hosted-engine storage domain was already imported by the engine and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has got when imported. In the normal interative flow (without providing the answerfile on CLI with --config-append) this shouldn't happen cause the first host answerfile will get parsed after scanning the hosted-engine storage domain and so it will be the definitive response. Can you please try deploying the second host without passing an answerfile with --config-append? (In reply to Simone Tiraboschi from comment #21) > OK, > you hit a corner case that differs from what we fixed and verified: now > ovirt-hosted-engine setup honors the UUID in the answerfile but, in your > case, the hosted-engine storage domain was already imported by the engine > and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has > got when imported. > > In the normal interative flow (without providing the answerfile on CLI with > --config-append) this shouldn't happen cause the first host answerfile will > get parsed after scanning the hosted-engine storage domain and so it will be > the definitive response. > > Can you please try deploying the second host without passing an answerfile > with --config-append? Simone, To confirm, you would like me to run "hosted-engine --deploy" or "hosted-engine --deploy --config-append=<null>"? If you are requesting me to execute "hosted-engine --deploy" I've already tried this and ran into the same exact issue. If you want me to try passing in a null answer file, I can attempt that next. Charlie, can you please attach the hosted-engine.setup log file you get executing just? 'hosted-engine --deploy' Simone, I believe I'm now running into another issue. The installation on the additional host is now failing with the following error: "The VDSM host was found in a failed state. Please check the engine and bootstrap installation logs. Unable to add OvirtHost2 to the manager." I believe that this is now an iptables issue (I've attached the iptables configuration on the system as autoconfigured by the oVirt installation). From looking at the mom.log, vdsm.log files I see: "mom.vdsmInterface - ERROR - Cannot connect to VDSM! [Errno 111] Connection refused". SELinux on the additional host is in permissive mode, and I've tried stopping iptables/firewalld prior to me initiating the install. Created attachment 1120794 [details]
iptables -nvL
Created attachment 1120795 [details]
/var/log/messages
installation of additional oVirt HE host failing
Created attachment 1120796 [details]
/var/log/vdsm/mom.log
installation of additional oVirt HE host failing
Created attachment 1120797 [details]
/var/log/vdsm/vdsm.log
installation of additional oVirt HE host failing
Can you please attach engine.log and host-deploy logs from the engine VM? Simone, After additional troubleshooting, we've been able to get to the bottom of what's going on (documented in Bug 1304514). It appears that when using glusterfs, we now need to install the gluster RPMs prior to executing hosted-engine --deploy on the additional nodes. It works now in both interactive and answer file installs when gluster RPMs are installed prior to executing. For reference, the following RPMs were required: bridge-utils vdsm vdsm-cli glusterfs-server vdsm-gluster |