Bug 1636616

Summary: OCS 3.10 fails to deploy at 'Load heketi topology' with 'command not found'
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Davi Garcia <dvercill>
Component: CNS-deploymentAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED NOTABUG QA Contact: Prasanth <pprakash>
Severity: high Docs Contact:
Priority: urgent    
Version: cns-3.10CC: aclewett, ajuricic, akhakhar, bkunal, dvercill, hchiramm, jarrpa, kramdoss, lasilva, madam, pprakash, rhs-bugs, rtalur, sankarshan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-07 18:40:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1641915, 1642792    

Description Davi Garcia 2018-10-05 21:28:57 UTC
>> Description of problem:

We are trying to deploy OCS 3.10 in Independent Mode on a OCP 3.9 (which should be supported), but it fails during the 'openshift_storage_glusterfs : Load heketi topology' task due multiple 'command not found errors'.

TASK [openshift_storage_glusterfs : Load heketi topology] ********************************************************************************************
Friday 05 October 2018  16:56:04 -0300 (0:00:01.729)       0:04:58.627 ******** 
fatal: [s01ops04]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-GgLL9D/admin.kubeconfig", "rsh", "--namespace=openshift-storage-ext", "deploy-heketi-storage-2-6vg72", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "SkACnWqb3pfWaQKV6WGsM/0bhfo9OTsg7PE0g4t4o1E=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-GgLL9D/topology.json", "2>&1"], "delta": "0:00:05.826397", "end": "2018-10-05 16:56:11.009775", "failed_when_result": true, "rc": 0, "start": "2018-10-05 16:56:05.183378", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: f22481462a83889ba9082080a563e9f2\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node s01ocs01.example.com ... ID: a00903ff26d020f48592b44f07283605\n\t\tAdding device /dev/sdb ... Unable to add device: sudo: pvcreate: command not found\n\tCreating node s01ocs02.example.com ... Unable to create node: sudo: gluster: command not found\n\tCreating node s01ocs03.example.com ... Unable to create node: sudo: gluster: command not found\n\tCreating node s01ocs04.example.com ... Unable to create node: sudo: gluster: command not found", "stdout_lines": ["Creating cluster ... ID: f22481462a83889ba9082080a563e9f2", "\tAllowing file volumes on cluster.", "\tAllowing block volumes on cluster.", "\tCreating node s01ocs01.example.com ... ID: a00903ff26d020f48592b44f07283605", "\t\tAdding device /dev/sdb ... Unable to add device: sudo: pvcreate: command not found", "\tCreating node s01ocs02.example.com ... Unable to create node: sudo: gluster: command not found", "\tCreating node s01ocs03.example.com ... Unable to create node: sudo: gluster: command not found", "\tCreating node s01ocs04.example.com ... Unable to create node: sudo: gluster: command not found"]}

>> Version-Release:

OpenShift Container Platform 3.9.33
Advanced Installer (openshift-ansible) 3.9.z
OpenShift Container Storage 3.10.z

>> How reproducible:

Easy

>> Steps to Reproduce:
1. Using and OCP 3.9.z environment, customize the inventory with:

openshift_storage_glusterfs_namespace=openshift-storage-ext
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=false
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_create=true
openshift_storage_glusterfs_block_host_vol_size=400
openshift_storage_glusterfs_block_storageclass=true
openshift_storage_glusterfs_block_storageclass_default=false
openshift_storage_glusterfs_is_native=false
openshift_storage_glusterfs_heketi_is_native=true
openshift_storage_glusterfs_heketi_executor=ssh
openshift_storage_glusterfs_heketi_ssh_port=22
openshift_storage_glusterfs_heketi_ssh_user=heketi
openshift_storage_glusterfs_heketi_ssh_sudo=true
openshift_storage_glusterfs_heketi_ssh_keyfile="/root/.ssh/heketi_rsa"
openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7:v3.10
openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7:v3.10
openshift_storage_glusterfs_heketi_image=registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7:v3.10

2. Run the playbook openshift-glusterfs:

ansible-playbook -i ~/inventory /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml

>> Actual results:

TASK [openshift_storage_glusterfs : Load heketi topology] ********************************************************************************************
Friday 05 October 2018  16:56:04 -0300 (0:00:01.729)       0:04:58.627 ******** 
fatal: [s01ops04]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-GgLL9D/admin.kubeconfig", "rsh", "--namespace=openshift-storage-ext", "deploy-heketi-storage-2-6vg72", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "SkACnWqb3pfWaQKV6WGsM/0bhfo9OTsg7PE0g4t4o1E=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-GgLL9D/topology.json", "2>&1"], "delta": "0:00:05.826397", "end": "2018-10-05 16:56:11.009775", "failed_when_result": true, "rc": 0, "start": "2018-10-05 16:56:05.183378", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: f22481462a83889ba9082080a563e9f2\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node s01ocs01.example.com ... ID: a00903ff26d020f48592b44f07283605\n\t\tAdding device /dev/sdb ... Unable to add device: sudo: pvcreate: command not found\n\tCreating node s01ocs02.example.com ... Unable to create node: sudo: gluster: command not found\n\tCreating node s01ocs03.example.com ... Unable to create node: sudo: gluster: command not found\n\tCreating node s01ocs04.example.com ... Unable to create node: sudo: gluster: command not found", "stdout_lines": ["Creating cluster ... ID: f22481462a83889ba9082080a563e9f2", "\tAllowing file volumes on cluster.", "\tAllowing block volumes on cluster.", "\tCreating node s01ocs01.example.com ... ID: a00903ff26d020f48592b44f07283605", "\t\tAdding device /dev/sdb ... Unable to add device: sudo: pvcreate: command not found", "\tCreating node s01ocs02.example.com ... Unable to create node: sudo: gluster: command not found", "\tCreating node s01ocs03.example.com ... Unable to create node: sudo: gluster: command not found", "\tCreating node s01ocs04.example.com ... Unable to create node: sudo: gluster: command not found"]}

PLAY RECAP *******************************************************************************************************************************************
localhost                  : ok=12   changed=0    unreachable=0    failed=0   
s01ocs01.example.com       : ok=20   changed=0    unreachable=0    failed=0   
s01ocs02.example.com       : ok=17   changed=0    unreachable=0    failed=0   
s01ocs03.example.com       : ok=17   changed=0    unreachable=0    failed=0   
s01ocs04.example.com       : ok=17   changed=0    unreachable=0    failed=0   
s01ops04                   : ok=64   changed=12   unreachable=0    failed=1   
s01ops05                   : ok=19   changed=1    unreachable=0    failed=0   
s01ops06                   : ok=19   changed=1    unreachable=0    failed=0   
s01ops11                   : ok=1    changed=0    unreachable=0    failed=0     

>> Expected results:

OCS Independent installed successfully 

>> Additional info:

We already tested and the remote user configured for Heketi has 'sudo' permissions and proper SSH keys distributed. Also, we could run the commands manually without problems:

[root@s01ocs01 ~]# su heketi -
[heketi@s01ocs01 root]$ whoami
heketi
[heketi@s01ocs01 root]$ sudo pvcreate --metadatasize=128M --dataalignment=256K '/dev/sdb'
  Physical volume "/dev/sdb" successfully created.

Comment 18 Davi Garcia 2018-11-07 18:40:04 UTC
We did some extra tests and verified that the user 'heketi' is only able to find the binaries in the path on login shells (the environment variable was being exported from /etc/profile). On non-interactive shells, the path was not properly defined and that was causing the problem. We fixed that putting the export on .bashrc for the 'heketi' user.