Description of problem: hosted-engine --deploy is failing in the second host when gluster volume is used as a storage for hosted_engine. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Deploy hosted engine using gluster storage in first hosts 2. hosted-engine --deploy in the second host Actual results: Fails with error RequestError: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/rhsdev-docker1.lab.eng.blr.**FILTERED**.com:_engine/94450737-e549-46e4-ae12-35ea392dd19b/ha_agent/hosted-engine.metadata' Expected results: hosted-engine --deploy should succeed. Additional info: file *:_engine/94450737-e549-46e4-ae12-35ea392dd19b/ha_agent/hosted-engine.metadata is a soff link to /var/run/vdsm/storage/94450737-e549-46e4-ae12-35ea392dd19b/ed043a70-f524-4321-9007-02a1b554a6d2/fd230752-9ebe-4ae6-965e-9a2dca27cf83 and this file doesn't exists. When I remove the soft link and copy the file to the location instead of link the it works.
Created attachment 1088457 [details] ovirt-hosted-engine-setup log
Ramesh, could you please attach also /var/log/message and vdsm logs from that host?
I ran into same issue today while trying out the latest RHEVM 3.6 ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch vdsm-jsonrpc-4.17.10.1-0.1.el7ev.noarch (with vdsm-gluster enabled)
Created attachment 1097097 [details] messages
Created attachment 1097098 [details] vdsm.log
Created attachment 1097099 [details] ovirt-hosted-engine-setup
Why was this moved to 4.0? Is this a blocker to HE support for shared Gluster?
(In reply to Yaniv Dary from comment #7) > Why was this moved to 4.0? It has been always on 4.0 Doron Fediuck 2015-11-02 02:20:21 EST Target Milestone: --- → ovirt-4.0.0 > Is this a blocker to HE support for shared Gluster? Looks like that. Moving to Simone who already started investigating.
I see this problem only when I run hosted engine deploy with the answer file from first host using "hosted-engine --deploy --config-append=answers.conf" and say no to scp the config file from first. If i say to 'yes' to scp the config file and download the config file then it works. I am not sure what happens in case of scping the config file as part of hosted engine deploy script.
(In reply to Ramesh N from comment #9) > I see this problem only when I run hosted engine deploy with the answer file > from first host using "hosted-engine --deploy --config-append=answers.conf" > and say no to scp the config file from first. If i say to 'yes' to scp the > config file and download the config file then it works. I am not sure what > happens in case of scping the config file as part of hosted engine deploy > script. So it doesn't seems a blocker for 3.6.2 to me. Postponing to 3.6.5.
Second HE host deployment succeeded over Gluster using the following: vdsm-xmlrpc-4.17.18-0.el7ev.noarch vdsm-4.17.18-0.el7ev.noarch vdsm-python-4.17.18-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch vdsm-jsonrpc-4.17.18-0.el7ev.noarch vdsm-yajsonrpc-4.17.18-0.el7ev.noarch vdsm-cli-4.17.18-0.el7ev.noarch vdsm-infra-4.17.18-0.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch rhevm-3.6.2.6-0.1.el6.noarch
glusterfs-devel-3.7.1-16.el7.x86_64 glusterfs-rdma-3.7.1-16.el7.x86_64 glusterfs-client-xlators-3.7.1-16.el7.x86_64 glusterfs-api-devel-3.7.1-16.el7.x86_64 glusterfs-3.7.1-16.el7.x86_64 glusterfs-fuse-3.7.1-16.el7.x86_64 glusterfs-cli-3.7.1-16.el7.x86_64 glusterfs-libs-3.7.1-16.el7.x86_64 glusterfs-api-3.7.1-16.el7.x86_64
I am encountering the same issue (using an answer file): RPMs: ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch Output from hosted-engine --deploy --config-append=<answerfile>: [ ERROR ] Failed to execute stage 'Setup validation': failed to read metadata: [ Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_e ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/ans wers-20160201121651.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please c heck the issue, fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted -engine-setup-20160201121639-t0vqx7.log Output (ERROR lines only) from /var/log/ovirt-hosted-engine-setup/ovirt-hosted -engine-setup-20160201121639-t0vqx7.log: 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 454, in _validation ] + ".metadata", File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 171, in get_all_host_stats_direct self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 122, in get_all_stats_direct stats = sb.get_raw_stats_for_service_type("client", service_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 140, in get_raw_stats_for_service_type .format(str(e))) RequestError: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' 2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Setup validation': failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' 2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc misc._terminate:170 Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy
(In reply to Charlie Inglese from comment #13) > I am encountering the same issue (using an answer file): > > RPMs: > ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch > ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch > > Output from hosted-engine --deploy --config-append=<answerfile>: > [ ERROR ] Failed to execute stage 'Setup validation': failed to read > metadata: [ > Errno 2] No such file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_e > ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata' > [ INFO ] Stage: Clean up > [ INFO ] Generating answer file > '/var/lib/ovirt-hosted-engine-setup/answers/ans > wers-20160201121651.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > [ ERROR ] Hosted Engine deployment failed: this system is not reliable, > please c > heck the issue, fix and redeploy > Log file is located at > /var/log/ovirt-hosted-engine-setup/ovirt-hosted > -engine-setup-20160201121639-t0vqx7.log > > Output (ERROR lines only) from > /var/log/ovirt-hosted-engine-setup/ovirt-hosted > -engine-setup-20160201121639-t0vqx7.log: > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage > validation METHOD > otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation > 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method > exception > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in > _executeMethod > method['method']() > File > "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine- > setup/storage/storage.py", line 454, in _validation > ] + ".metadata", > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 171, in get_all_host_stats_direct > self.StatModes.HOST) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 122, in get_all_stats_direct > stats = sb.get_raw_stats_for_service_type("client", service_type) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/ > storage_broker.py", line 140, in get_raw_stats_for_service_type > .format(str(e))) > RequestError: failed to read metadata: [Errno 2] No such file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943- > a0dc0a91bd16/ha_agent/hosted-engine.metadata' > 2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to > execute stage 'Setup validation': failed to read metadata: [Errno 2] No such > file or directory: > '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943- > a0dc0a91bd16/ha_agent/hosted-engine.metadata' > 2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc > misc._terminate:170 Hosted Engine deployment failed: this system is not > reliable, please check the issue, fix and redeploy ls -l /rhev/data-center/mnt/glusterSD/pserver7\:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/: lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.lockspace -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/7b9406eb-461c-4c31-8b07-e7efde56e9c6/caf6bd87-a378-497f-b320-4936db48becc lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.metadata -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/acf1c484-edf0-4a27-9eea-9adacc186cbf/ace65aa0-6628-4cbf-b420-5ef58f780a25 Both links (hosted-engine.lockspace, hosted-engine.metadata) are broken. The /var/run/vdsm/storage directory doesn't exist. ls -l /var/run/vdsm: -rw-r--r--. 1 vdsm kvm 0 Feb 1 12:16 client.log drwxr-xr-x. 2 vdsm kvm 60 Feb 1 12:16 lvm srwxr-xr-x. 1 vdsm kvm 0 Feb 1 12:16 mom-vdsm.sock -rw-r--r--. 1 root root 0 Feb 1 12:16 nets_restored drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 sourceRoutes srwxr-xr-x. 1 vdsm kvm 0 Feb 1 12:16 svdsm.sock drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 trackedInterfaces drwxr-xr-x. 2 vdsm kvm 40 Jan 19 13:04 v2v I'm attaching my vdsm logs to this ticket.
Charlie, can you please attach also the answerfile and the whole /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160201121639-t0vqx7.log ?
Created attachment 1120193 [details] VDSM log (cinglese)
Created attachment 1120195 [details] ovirt-hosted-engine-setup answer file
Created attachment 1120197 [details] ovirt-hosted-engine-setup log
Simone, I uploaded my answer file and complete ovirt-hosted-engine-setup log as requested; thanks.
At this point, I cannot add an additional oVirt HE host either via answer file or interactively. I used the oVirt HE appliance on my initial host (ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch). RPM baseline (on additional oVirt HE node): glusterfs-3.7.6-1.el7.x86_64 glusterfs-api-3.7.6-1.el7.x86_64 glusterfs-cli-3.7.6-1.el7.x86_64 glusterfs-client-xlators-3.7.6-1.el7.x86_64 glusterfs-fuse-3.7.6-1.el7.x86_64 glusterfs-libs-3.7.6-1.el7.x86_64 libgovirt-0.3.3-1.el7.x86_64 ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch vdsm-4.17.18-0.el7.centos.noarch vdsm-cli-4.17.18-0.el7.centos.noarch vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch vdsm-infra-4.17.18-0.el7.centos.noarch vdsm-jsonrpc-4.17.18-0.el7.centos.noarch vdsm-python-4.17.18-0.el7.centos.noarch vdsm-xmlrpc-4.17.18-0.el7.centos.noarch vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch
OK, you hit a corner case that differs from what we fixed and verified: now ovirt-hosted-engine setup honors the UUID in the answerfile but, in your case, the hosted-engine storage domain was already imported by the engine and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has got when imported. In the normal interative flow (without providing the answerfile on CLI with --config-append) this shouldn't happen cause the first host answerfile will get parsed after scanning the hosted-engine storage domain and so it will be the definitive response. Can you please try deploying the second host without passing an answerfile with --config-append?
(In reply to Simone Tiraboschi from comment #21) > OK, > you hit a corner case that differs from what we fixed and verified: now > ovirt-hosted-engine setup honors the UUID in the answerfile but, in your > case, the hosted-engine storage domain was already imported by the engine > and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has > got when imported. > > In the normal interative flow (without providing the answerfile on CLI with > --config-append) this shouldn't happen cause the first host answerfile will > get parsed after scanning the hosted-engine storage domain and so it will be > the definitive response. > > Can you please try deploying the second host without passing an answerfile > with --config-append? Simone, To confirm, you would like me to run "hosted-engine --deploy" or "hosted-engine --deploy --config-append=<null>"? If you are requesting me to execute "hosted-engine --deploy" I've already tried this and ran into the same exact issue. If you want me to try passing in a null answer file, I can attempt that next.
Charlie, can you please attach the hosted-engine.setup log file you get executing just? 'hosted-engine --deploy'
Simone, I believe I'm now running into another issue. The installation on the additional host is now failing with the following error: "The VDSM host was found in a failed state. Please check the engine and bootstrap installation logs. Unable to add OvirtHost2 to the manager." I believe that this is now an iptables issue (I've attached the iptables configuration on the system as autoconfigured by the oVirt installation). From looking at the mom.log, vdsm.log files I see: "mom.vdsmInterface - ERROR - Cannot connect to VDSM! [Errno 111] Connection refused". SELinux on the additional host is in permissive mode, and I've tried stopping iptables/firewalld prior to me initiating the install.
Created attachment 1120794 [details] iptables -nvL
Created attachment 1120795 [details] /var/log/messages installation of additional oVirt HE host failing
Created attachment 1120796 [details] /var/log/vdsm/mom.log installation of additional oVirt HE host failing
Created attachment 1120797 [details] /var/log/vdsm/vdsm.log installation of additional oVirt HE host failing
Can you please attach engine.log and host-deploy logs from the engine VM?
Simone, After additional troubleshooting, we've been able to get to the bottom of what's going on (documented in Bug 1304514). It appears that when using glusterfs, we now need to install the gluster RPMs prior to executing hosted-engine --deploy on the additional nodes. It works now in both interactive and answer file installs when gluster RPMs are installed prior to executing. For reference, the following RPMs were required: bridge-utils vdsm vdsm-cli glusterfs-server vdsm-gluster