Bug 1628046

Summary: Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: vinutha <vinug>
Component: CNS-deploymentAssignee: Michael Adam <madam>
Status: CLOSED WORKSFORME QA Contact: vinutha <vinug>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.10CC: akhakhar, hchiramm, jarrpa, kramdoss, madam, pprakash, rhs-bugs, rtalur, sankarshan, sarumuga, vinug
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-04 11:48:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Ansible inventory file
none
Ansible-log file none

Description vinutha 2018-09-12 06:26:46 UTC
Description of problem:
Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible at Task : 'Create heketi config secret'

Version-Release number of selected component (if applicable):
# rpm -qa| grep openshift
openshift-ansible-roles-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-3.9.41-1.git.0.67432b0.el7.x86_64
openshift-ansible-playbooks-3.9.41-1.git.0.4c55974.el7.noarch
openshift-ansible-docs-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-utils-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-docker-excluder-3.9.41-1.git.0.67432b0.el7.noarch
atomic-openshift-clients-3.9.41-1.git.0.67432b0.el7.x86_64
atomic-openshift-excluder-3.9.41-1.git.0.67432b0.el7.noarch
atomic-openshift-sdn-ovs-3.9.41-1.git.0.67432b0.el7.x86_64
openshift-ansible-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-master-3.9.41-1.git.0.67432b0.el7.x86_64
atomic-openshift-node-3.9.41-1.git.0.67432b0.el7.x86_64

# oc rsh glusterfs-storage-mncjh
sh-4.2# rpm -qa| grep gluster 
glusterfs-client-xlators-3.12.2-18.el7rhgs.x86_64
glusterfs-cli-3.12.2-18.el7rhgs.x86_64
python2-gluster-3.12.2-18.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-18.el7rhgs.x86_64
glusterfs-libs-3.12.2-18.el7rhgs.x86_64
glusterfs-3.12.2-18.el7rhgs.x86_64
glusterfs-api-3.12.2-18.el7rhgs.x86_64
glusterfs-fuse-3.12.2-18.el7rhgs.x86_64
glusterfs-server-3.12.2-18.el7rhgs.x86_64
gluster-block-0.2.1-26.el7rhgs.x86_64

# oc rsh heketi-storage-1-f5jq9 
sh-4.2# rpm -qa| grep heketi
heketi-7.0.0-11.el7rhgs.x86_64
heketi-client-7.0.0-11.el7rhgs.x86_64



How reproducible:
4:4

Steps to Reproduce:
1. Create a OCP 3.9 + CNS 3.10 setup having 1 master+ 3 gluster nodes using ansible 
2. Run the gluster-registry deployment ansible playbook fails at task 'Create heketi config secret' 
3. Also, the registry namespace is not created 


Actual results:
TASK [openshift_storage_glusterfs : Copy heketi private key] ******************************************************************************************************************************************************
changed: [dhcp47-10.lab.eng.blr.redhat.com]
 
TASK [openshift_storage_glusterfs : Create heketi config secret] **************************************************************************************************************************************************
fatal: [dhcp47-10.lab.eng.blr.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": {"cmd": "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs", "results": {}, "returncode": 1, "stderr": "error: timed out waiting for the condition\n", "stdout": "secret \"heketi-storage-config-secret\" deleted\n"}}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.retry

Expected results:
Gluster registry should be deployed successfully 

Additional info:

Comment 2 vinutha 2018-09-12 06:33:12 UTC
Created attachment 1482545 [details]
Ansible inventory file

Comment 4 vinutha 2018-09-12 06:46:25 UTC
Created attachment 1482573 [details]
Ansible-log file

Comment 6 Saravanakumar 2018-09-12 09:17:21 UTC
(In reply to vinutha from comment #4)
> Created attachment 1482573 [details]
> Ansible-log file


As per the logs, the namespace referred is glusterfs and not infra-storage as mentioned in the inventory file.  


As I understand, you need to specify glusterfs and glusterfs_registry in the inventory file when you want to deploy both glusterfs and glusterfs registry.

If only glusterfs registry is required to be deployed, then glusterfs group needs to be removed/commented from the inventory file.

Could you remove glusterfs group in the inventory and try running the playbook.

Comment 9 Jose A. Rivera 2018-09-12 12:02:30 UTC
I should note, the glusterfs_registry pods will be deployed in whatever namepace the registry is using, which is 'default' by default.

Vinutha, reproduce the error and then try running "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs" manually on the master node. If it fails, look through "journalctl -xe" to see if you can spot any errors regarding that operation.

Comment 11 Jose A. Rivera 2018-09-20 13:47:51 UTC
Hmm... okay, looks like the oc_secret task is deleting the secret manifest before failing. Darn.

If you still have the environment, provide the output of:

oc get dc,po,secret --all-namespaces

Comment 14 Jose A. Rivera 2018-10-10 15:53:02 UTC
Oh, sorry, I lost track of this one.

Okay, from the look of things I think the problem is that when you run the registry.yml playbook it is trying to redeploy the "glusterfs" cluster, when we only want the "glusterfs_registry" cluster. So, for this operation, comment out the "glusterfs" nodes in your inventory file and try running the registry playbook again.

If that works we have identified the bug as one of idempotence. I'm not sure how high a priority this should take, but it's probably solvable.