+++ This bug was initially created as a clone of Bug #1595322 +++ Description of problem: ------------------------------------- The expand cluster operation is getting hung after the peers are probed . the gdeploy process id getting terminated. Version-Release number of selected component: ------------------------------------- cockpit-ovirt-dashboard-0.11.28-1.el7ev.noarch ansible-2.6.0-0.3.rc3.el7ae.noarch How reproducible: ------------------------------------ Everytime Steps to Reproduce: ----------------------------------- 1.Successfully deploy gluster and hosted engine deployment. 2.Click on the hosted engine tab after the deployment and the click on manage gluster button. 3.A window with operation expand cluster and create volume will appear and then click on expand cluster button. 4.Configure the tab hosts,packages, volume and brick and then deploy. 5.The process is hung in the step where the shell script is run. Actual results: ----------------------------- The operation is hung after and the gdeploy is getting terminated.The peers are probed successfully. Expected results: ----------------------------- The operation should proceed .
I could find the root cause for this issue. When the expand cluster operation is attempted, the grafton-sanity-check.sh script is run against those 3 new nodes. This script basically checks for reachability of host from each other. The conf ( /var/lib/ovirt-hosted-engine-setup/gdeploy/gdeployConfig.conf) looks like the following when expansion triggered from 10.70.36.79: <snip> #gdeploy configuration generated by cockpit-gluster plugin [hosts] 10.70.36.244 10.70.36.245 10.70.36.246 [script1:10.70.36.244] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246 [script1:10.70.36.245] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246 [script1:10.70.36.246] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246 </snip> The script looks to execute the command on 10.70.36.(245, 246}, which doesn't have SSH public key authentication. To expand host{4,5,6} in to the existing cluster of host{1,2,3} SSH public key authentication is set up from host1 to host{4,5,6}. But there is no such authentication setup from host4 to host{4,5,6}. This is leading to the problem
Should we document this or is there a fix planned in 4.2.5?
We need to document for 4.2.5, will give option to set passwordless from cockpit ui in 4.2.6.
The issue was passwordless ssh configuration and we have documented this. So closing this bug.