Bug 1595323 - The expand cluster(Day 2 operations ) is hung.
Summary: The expand cluster(Day 2 operations ) is hung.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: cockpit-ovirt
Classification: oVirt
Component: Gdeploy
Version: 0.11.20
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.6
: ---
Assignee: Gobinda Das
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1595322
TreeView+ depends on / blocked
 
Reported: 2018-06-26 15:41 UTC by Mugdha Soni
Modified: 2018-08-16 10:34 UTC (History)
6 users (show)

Fixed In Version:
Clone Of: 1595322
Environment:
Last Closed: 2018-08-16 10:34:18 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: ovirt-4.2?
sasundar: planning_ack?
sasundar: devel_ack?
sasundar: testing_ack?


Attachments (Terms of Use)

Description Mugdha Soni 2018-06-26 15:41:31 UTC
+++ This bug was initially created as a clone of Bug #1595322 +++

Description of problem:
-------------------------------------
The expand cluster operation is getting hung after the peers are probed . the gdeploy process id getting terminated.

Version-Release number of selected component:
-------------------------------------
cockpit-ovirt-dashboard-0.11.28-1.el7ev.noarch

ansible-2.6.0-0.3.rc3.el7ae.noarch


How reproducible:
------------------------------------
Everytime

Steps to Reproduce:
-----------------------------------
1.Successfully deploy gluster and hosted engine deployment.

2.Click on the hosted engine tab after the deployment and the click on manage gluster button.

3.A window with operation expand cluster and create volume will appear and then click on expand cluster button.

4.Configure the tab hosts,packages, volume and brick and then deploy.

5.The process is hung in the step where the shell script is run.

Actual results:
-----------------------------
The operation is hung after and the gdeploy is getting terminated.The peers are probed successfully.

Expected results:
-----------------------------
The operation should proceed .

Comment 1 SATHEESARAN 2018-07-02 07:05:32 UTC
I could find the root cause for this issue. 

When the expand cluster operation is attempted, the grafton-sanity-check.sh script is run against those 3 new nodes. This script basically checks for reachability of host from each other. 

The conf ( /var/lib/ovirt-hosted-engine-setup/gdeploy/gdeployConfig.conf) looks like the following when expansion triggered from 10.70.36.79:
<snip>

#gdeploy configuration generated by cockpit-gluster plugin
[hosts]
10.70.36.244
10.70.36.245
10.70.36.246

[script1:10.70.36.244]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246

[script1:10.70.36.245]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246

[script1:10.70.36.246]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d sdb,sde -h 10.70.36.244,10.70.36.245,10.70.36.246

</snip>

The script looks to execute the command on 10.70.36.(245, 246}, which doesn't have SSH public key authentication.

To expand host{4,5,6} in to the existing cluster of host{1,2,3}
SSH public key authentication is set up from host1 to host{4,5,6}.
But there is no such authentication setup from host4 to host{4,5,6}. This is leading to the problem

Comment 2 Sahina Bose 2018-07-06 12:07:53 UTC
Should we document this or is there a fix planned in 4.2.5?

Comment 3 Gobinda Das 2018-07-06 12:41:29 UTC
We need to document for 4.2.5, will give option to set passwordless from cockpit ui in 4.2.6.

Comment 5 Gobinda Das 2018-08-16 10:34:18 UTC
The issue was passwordless ssh configuration and we have documented this. So closing this bug.


Note You need to log in before you can comment on or make changes to this bug.