Bug 1277010

Summary: hosted-engine --deploy fails in second host when using gluster volume
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Ramesh N <rnachimu>
Component: Plugins.GlusterAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 1.3.0CC: acanan, bhughes, bugs, cinglese, dfediuck, didi, gklein, lveyde, rmartins, rnachimu, sabose, sbonazzo, stirabos, ylavi
Target Milestone: ovirt-3.6.2Flags: rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
sbonazzo: devel_ack+
gklein: testing_ack+
Target Release: 1.3.2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-1.3.2.3-1 Doc Type: Bug Fix
Doc Text:
hosted-engine-setup generates answer file with a blank SP UUID for additional hosts cause it correctly disconnected its SD from the boostrap storage pool. If we want to use the answerfile for not-interactively replaing the scenario we need a valid spUUID value otherwise we are not able to create the bootstrap storage pool. On the other side, passing an answerfile via CLI is a valid alternative instead of download it from another host via scp. And in that case we should honour the BLANK UUID. So resetting spUUID to blank only if really needed to create a new boostrap storage pool.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-18 11:01:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 995362, 1304445, 1304514    
Bug Blocks: 1258386    
Attachments:
Description Flags
ovirt-hosted-engine-setup log
none
messages
none
vdsm.log
none
ovirt-hosted-engine-setup
none
VDSM log (cinglese)
none
ovirt-hosted-engine-setup answer file
none
ovirt-hosted-engine-setup log
none
iptables -nvL
none
/var/log/messages
none
/var/log/vdsm/mom.log
none
/var/log/vdsm/vdsm.log none

Description Ramesh N 2015-11-02 04:45:34 UTC
Description of problem:

hosted-engine --deploy is failing in the second host when gluster volume is used as a storage for hosted_engine.

Version-Release number of selected component (if applicable):

ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch

How reproducible:

Always

Steps to Reproduce:
1. Deploy hosted engine using gluster storage in first hosts
2. hosted-engine --deploy in the second host


Actual results:
Fails with error 
    RequestError: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/rhsdev-docker1.lab.eng.blr.**FILTERED**.com:_engine/94450737-e549-46e4-ae12-35ea392dd19b/ha_agent/hosted-engine.metadata'

Expected results:

 hosted-engine --deploy should succeed. 

Additional info:

  file *:_engine/94450737-e549-46e4-ae12-35ea392dd19b/ha_agent/hosted-engine.metadata is a soff link to /var/run/vdsm/storage/94450737-e549-46e4-ae12-35ea392dd19b/ed043a70-f524-4321-9007-02a1b554a6d2/fd230752-9ebe-4ae6-965e-9a2dca27cf83 and this file doesn't exists. 

 When I remove the soft link and copy the file to the location instead of link the it works.

Comment 1 Ramesh N 2015-11-02 04:48:01 UTC
Created attachment 1088457 [details]
ovirt-hosted-engine-setup log

Comment 2 Simone Tiraboschi 2015-11-20 09:04:50 UTC
Ramesh, could you please attach also /var/log/message and vdsm logs from that host?

Comment 3 Sahina Bose 2015-11-20 10:48:50 UTC
I ran into same issue today while trying out the latest RHEVM 3.6

ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch
vdsm-jsonrpc-4.17.10.1-0.1.el7ev.noarch (with vdsm-gluster enabled)

Comment 4 Sahina Bose 2015-11-20 10:49:52 UTC
Created attachment 1097097 [details]
messages

Comment 5 Sahina Bose 2015-11-20 10:50:45 UTC
Created attachment 1097098 [details]
vdsm.log

Comment 6 Sahina Bose 2015-11-20 10:51:35 UTC
Created attachment 1097099 [details]
ovirt-hosted-engine-setup

Comment 7 Yaniv Lavi 2015-11-24 15:42:21 UTC
Why was this moved to 4.0? Is this a blocker to HE support for shared Gluster?

Comment 8 Sandro Bonazzola 2015-11-30 12:17:53 UTC
(In reply to Yaniv Dary from comment #7)
> Why was this moved to 4.0?

It has been always on 4.0
Doron Fediuck 2015-11-02 02:20:21 EST
Target Milestone: --- → ovirt-4.0.0

> Is this a blocker to HE support for shared Gluster?

Looks like that. Moving to Simone who already started investigating.

Comment 9 Ramesh N 2016-01-08 09:17:00 UTC
I see this problem only when I run hosted engine deploy with the answer file from first host using "hosted-engine --deploy  --config-append=answers.conf" and say no to scp the config file from first. If i say to 'yes' to scp the config file and download the config file then it works. I am not sure what happens in case of scping the config file as part of hosted engine deploy script.

Comment 10 Sandro Bonazzola 2016-01-14 07:59:34 UTC
(In reply to Ramesh N from comment #9)
> I see this problem only when I run hosted engine deploy with the answer file
> from first host using "hosted-engine --deploy  --config-append=answers.conf"
> and say no to scp the config file from first. If i say to 'yes' to scp the
> config file and download the config file then it works. I am not sure what
> happens in case of scping the config file as part of hosted engine deploy
> script.

So it doesn't seems a blocker for 3.6.2 to me. Postponing to 3.6.5.

Comment 11 Elad 2016-01-24 16:36:08 UTC
Second HE host deployment succeeded over Gluster using the following:

vdsm-xmlrpc-4.17.18-0.el7ev.noarch
vdsm-4.17.18-0.el7ev.noarch
vdsm-python-4.17.18-0.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch
vdsm-jsonrpc-4.17.18-0.el7ev.noarch
vdsm-yajsonrpc-4.17.18-0.el7ev.noarch
vdsm-cli-4.17.18-0.el7ev.noarch
vdsm-infra-4.17.18-0.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
libgovirt-0.3.3-1.el7_2.1.x86_64
ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch

rhevm-3.6.2.6-0.1.el6.noarch

Comment 12 Elad 2016-01-24 16:37:45 UTC
glusterfs-devel-3.7.1-16.el7.x86_64
glusterfs-rdma-3.7.1-16.el7.x86_64
glusterfs-client-xlators-3.7.1-16.el7.x86_64
glusterfs-api-devel-3.7.1-16.el7.x86_64
glusterfs-3.7.1-16.el7.x86_64
glusterfs-fuse-3.7.1-16.el7.x86_64
glusterfs-cli-3.7.1-16.el7.x86_64
glusterfs-libs-3.7.1-16.el7.x86_64
glusterfs-api-3.7.1-16.el7.x86_64

Comment 13 Charlie Inglese 2016-02-01 17:26:39 UTC
I am encountering the same issue (using an answer file): 

RPMs:
ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch

Output from hosted-engine --deploy --config-append=<answerfile>:
[ ERROR ] Failed to execute stage 'Setup validation': failed to read metadata: [                                                                    Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_e                                                                    ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata'
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/ans                                                                    wers-20160201121651.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please c                                                                    heck the issue, fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted                                                                    -engine-setup-20160201121639-t0vqx7.log

Output (ERROR lines only) from /var/log/ovirt-hosted-engine-setup/ovirt-hosted                                                                    -engine-setup-20160201121639-t0vqx7.log:
2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation
2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate
2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage validation METHOD otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation
2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 454, in _validation
    ] + ".metadata",
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 171, in get_all_host_stats_direct
    self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 122, in get_all_stats_direct
    stats = sb.get_raw_stats_for_service_type("client", service_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 140, in get_raw_stats_for_service_type
    .format(str(e)))
RequestError: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata'
2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Setup validation': failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata'
2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc misc._terminate:170 Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy

Comment 14 Charlie Inglese 2016-02-01 17:32:43 UTC
(In reply to Charlie Inglese from comment #13)
> I am encountering the same issue (using an answer file): 
> 
> RPMs:
> ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch
> ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch
> 
> Output from hosted-engine --deploy --config-append=<answerfile>:
> [ ERROR ] Failed to execute stage 'Setup validation': failed to read
> metadata: [                                                                 
> Errno 2] No such file or directory:
> '/rhev/data-center/mnt/glusterSD/pserver7:_e                                
> ngine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/hosted-engine.metadata'
> [ INFO  ] Stage: Clean up
> [ INFO  ] Generating answer file
> '/var/lib/ovirt-hosted-engine-setup/answers/ans                             
> wers-20160201121651.conf'
> [ INFO  ] Stage: Pre-termination
> [ INFO  ] Stage: Termination
> [ ERROR ] Hosted Engine deployment failed: this system is not reliable,
> please c                                                                   
> heck the issue, fix and redeploy
>           Log file is located at
> /var/log/ovirt-hosted-engine-setup/ovirt-hosted                             
> -engine-setup-20160201121639-t0vqx7.log
> 
> Output (ERROR lines only) from
> /var/log/ovirt-hosted-engine-setup/ovirt-hosted                             
> -engine-setup-20160201121639-t0vqx7.log:
> 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage
> validation METHOD
> otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace.Plugin._validation
> 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage
> validation METHOD
> otopi.plugins.ovirt_hosted_engine_setup.storage.glusterfs.Plugin._validate
> 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:142 Stage
> validation METHOD
> otopi.plugins.ovirt_hosted_engine_setup.storage.storage.Plugin._validation
> 2016-02-01 12:16:51 DEBUG otopi.context context._executeMethod:156 method
> exception
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in
> _executeMethod
>     method['method']()
>   File
> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-
> setup/storage/storage.py", line 454, in _validation
>     ] + ".metadata",
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 171, in get_all_host_stats_direct
>     self.StatModes.HOST)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 122, in get_all_stats_direct
>     stats = sb.get_raw_stats_for_service_type("client", service_type)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/
> storage_broker.py", line 140, in get_raw_stats_for_service_type
>     .format(str(e)))
> RequestError: failed to read metadata: [Errno 2] No such file or directory:
> '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-
> a0dc0a91bd16/ha_agent/hosted-engine.metadata'
> 2016-02-01 12:16:51 ERROR otopi.context context._executeMethod:165 Failed to
> execute stage 'Setup validation': failed to read metadata: [Errno 2] No such
> file or directory:
> '/rhev/data-center/mnt/glusterSD/pserver7:_engine/9495c726-12a1-4fa5-b943-
> a0dc0a91bd16/ha_agent/hosted-engine.metadata'
> 2016-02-01 12:16:51 ERROR otopi.plugins.ovirt_hosted_engine_setup.core.misc
> misc._terminate:170 Hosted Engine deployment failed: this system is not
> reliable, please check the issue, fix and redeploy

ls -l /rhev/data-center/mnt/glusterSD/pserver7\:_engine/9495c726-12a1-4fa5-b943-a0dc0a91bd16/ha_agent/:
lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.lockspace -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/7b9406eb-461c-4c31-8b07-e7efde56e9c6/caf6bd87-a378-497f-b320-4936db48becc
lrwxrwxrwx. 1 vdsm kvm 132 Jan 28 01:55 hosted-engine.metadata -> /var/run/vdsm/storage/9495c726-12a1-4fa5-b943-a0dc0a91bd16/acf1c484-edf0-4a27-9eea-9adacc186cbf/ace65aa0-6628-4cbf-b420-5ef58f780a25

Both links (hosted-engine.lockspace, hosted-engine.metadata) are broken. The /var/run/vdsm/storage directory doesn't exist.

ls -l /var/run/vdsm:
-rw-r--r--. 1 vdsm kvm   0 Feb  1 12:16 client.log
drwxr-xr-x. 2 vdsm kvm  60 Feb  1 12:16 lvm
srwxr-xr-x. 1 vdsm kvm   0 Feb  1 12:16 mom-vdsm.sock
-rw-r--r--. 1 root root  0 Feb  1 12:16 nets_restored
drwxr-xr-x. 2 vdsm kvm  40 Jan 19 13:04 sourceRoutes
srwxr-xr-x. 1 vdsm kvm   0 Feb  1 12:16 svdsm.sock
drwxr-xr-x. 2 vdsm kvm  40 Jan 19 13:04 trackedInterfaces
drwxr-xr-x. 2 vdsm kvm  40 Jan 19 13:04 v2v

I'm attaching my vdsm logs to this ticket.

Comment 15 Simone Tiraboschi 2016-02-01 17:34:07 UTC
Charlie, can you please attach also the answerfile and the whole /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160201121639-t0vqx7.log ?

Comment 16 Charlie Inglese 2016-02-01 17:34:22 UTC
Created attachment 1120193 [details]
VDSM log (cinglese)

Comment 17 Charlie Inglese 2016-02-01 17:42:23 UTC
Created attachment 1120195 [details]
ovirt-hosted-engine-setup answer file

Comment 18 Charlie Inglese 2016-02-01 17:42:57 UTC
Created attachment 1120197 [details]
ovirt-hosted-engine-setup log

Comment 19 Charlie Inglese 2016-02-01 17:43:49 UTC
Simone,

I uploaded my answer file and complete ovirt-hosted-engine-setup log as requested; thanks.

Comment 20 Charlie Inglese 2016-02-01 22:06:36 UTC
At this point, I cannot add an additional oVirt HE host either via answer file or interactively. I used the oVirt HE appliance on my initial host (ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch). 

RPM baseline (on additional oVirt HE node):
glusterfs-3.7.6-1.el7.x86_64
glusterfs-api-3.7.6-1.el7.x86_64
glusterfs-cli-3.7.6-1.el7.x86_64
glusterfs-client-xlators-3.7.6-1.el7.x86_64
glusterfs-fuse-3.7.6-1.el7.x86_64
glusterfs-libs-3.7.6-1.el7.x86_64
libgovirt-0.3.3-1.el7.x86_64
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
ovirt-host-deploy-1.4.1-1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch
ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
vdsm-4.17.18-0.el7.centos.noarch
vdsm-cli-4.17.18-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7.centos.noarch
vdsm-infra-4.17.18-0.el7.centos.noarch
vdsm-jsonrpc-4.17.18-0.el7.centos.noarch
vdsm-python-4.17.18-0.el7.centos.noarch
vdsm-xmlrpc-4.17.18-0.el7.centos.noarch
vdsm-yajsonrpc-4.17.18-0.el7.centos.noarch

Comment 21 Simone Tiraboschi 2016-02-01 22:15:30 UTC
OK,
you hit a corner case that differs from what we fixed and verified: now ovirt-hosted-engine setup honors the UUID in the answerfile but, in your case, the hosted-engine storage domain was already imported by the engine and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has got when imported.

In the normal interative flow (without providing the answerfile on CLI with --config-append) this shouldn't happen cause the first host answerfile will get parsed after scanning the hosted-engine storage domain and so it will be the definitive response.

Can you please try deploying the second host without passing an answerfile with --config-append?

Comment 22 Charlie Inglese 2016-02-02 15:14:50 UTC
(In reply to Simone Tiraboschi from comment #21)
> OK,
> you hit a corner case that differs from what we fixed and verified: now
> ovirt-hosted-engine setup honors the UUID in the answerfile but, in your
> case, the hosted-engine storage domain was already imported by the engine
> and so hosted-engine-setup tried to use the SP_UUID the hosted-engine SD has
> got when imported.
> 
> In the normal interative flow (without providing the answerfile on CLI with
> --config-append) this shouldn't happen cause the first host answerfile will
> get parsed after scanning the hosted-engine storage domain and so it will be
> the definitive response.
> 
> Can you please try deploying the second host without passing an answerfile
> with --config-append?

Simone,

To confirm, you would like me to run "hosted-engine --deploy" or "hosted-engine --deploy --config-append=<null>"? If you are requesting me to execute "hosted-engine --deploy" I've already tried this and ran into the same exact issue. If you want me to try passing in a null answer file, I can attempt that next.

Comment 23 Simone Tiraboschi 2016-02-02 16:02:08 UTC
Charlie,
can you please attach the hosted-engine.setup log file you get executing just?
 'hosted-engine --deploy'

Comment 24 Charlie Inglese 2016-02-03 14:23:37 UTC
Simone,

I believe I'm now running into another issue. The installation on the additional host is now failing with the following error: "The VDSM host was found in a failed state. Please check the engine and bootstrap installation logs. Unable to add OvirtHost2 to the manager."

I believe that this is now an iptables issue (I've attached the iptables configuration on the system as autoconfigured by the oVirt installation). From looking at the mom.log, vdsm.log files I see: "mom.vdsmInterface - ERROR - Cannot connect to VDSM! [Errno 111] Connection refused".

SELinux on the additional host is in permissive mode, and I've tried stopping iptables/firewalld prior to me initiating the install.

Comment 25 Charlie Inglese 2016-02-03 14:24:04 UTC
Created attachment 1120794 [details]
iptables -nvL

Comment 26 Charlie Inglese 2016-02-03 14:24:42 UTC
Created attachment 1120795 [details]
/var/log/messages

installation of additional oVirt HE host failing

Comment 27 Charlie Inglese 2016-02-03 14:25:09 UTC
Created attachment 1120796 [details]
/var/log/vdsm/mom.log

installation of additional oVirt HE host failing

Comment 28 Charlie Inglese 2016-02-03 14:25:40 UTC
Created attachment 1120797 [details]
/var/log/vdsm/vdsm.log

installation of additional oVirt HE host failing

Comment 29 Simone Tiraboschi 2016-02-03 22:14:18 UTC
Can you please attach engine.log and host-deploy logs from the engine VM?

Comment 30 Charlie Inglese 2016-02-04 19:27:07 UTC
Simone,

After additional troubleshooting, we've been able to get to the bottom of what's going on (documented in Bug 1304514). It appears that when using glusterfs, we now need to install the gluster RPMs prior to executing hosted-engine --deploy on the additional nodes. 

It works now in both interactive and answer file installs when gluster RPMs are installed prior to executing.

For reference, the following RPMs were required: bridge-utils vdsm vdsm-cli glusterfs-server vdsm-gluster