1628046 – Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible

Bug 1628046 - Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible

Summary: Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	CNS-deployment
Sub Component:
Version:	cns-3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	vinutha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-12 06:26 UTC by vinutha
Modified:	2019-02-04 11:48 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-04 11:48:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Ansible inventory file (4.46 KB, text/plain) 2018-09-12 06:33 UTC, vinutha	no flags	Details
Ansible-log file (100.79 KB, text/plain) 2018-09-12 06:46 UTC, vinutha	no flags	Details
View All

Description vinutha 2018-09-12 06:26:46 UTC

Description of problem:
Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible at Task : 'Create heketi config secret'

Version-Release number of selected component (if applicable):
# rpm -qa| grep openshift
openshift-ansible-roles-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-3.9.41-1.git.0.67432b0.el7.x86_64
openshift-ansible-playbooks-3.9.41-1.git.0.4c55974.el7.noarch
openshift-ansible-docs-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-utils-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-docker-excluder-3.9.41-1.git.0.67432b0.el7.noarch
atomic-openshift-clients-3.9.41-1.git.0.67432b0.el7.x86_64
atomic-openshift-excluder-3.9.41-1.git.0.67432b0.el7.noarch
atomic-openshift-sdn-ovs-3.9.41-1.git.0.67432b0.el7.x86_64
openshift-ansible-3.9.41-1.git.0.4c55974.el7.noarch
atomic-openshift-master-3.9.41-1.git.0.67432b0.el7.x86_64
atomic-openshift-node-3.9.41-1.git.0.67432b0.el7.x86_64

# oc rsh glusterfs-storage-mncjh
sh-4.2# rpm -qa| grep gluster 
glusterfs-client-xlators-3.12.2-18.el7rhgs.x86_64
glusterfs-cli-3.12.2-18.el7rhgs.x86_64
python2-gluster-3.12.2-18.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-18.el7rhgs.x86_64
glusterfs-libs-3.12.2-18.el7rhgs.x86_64
glusterfs-3.12.2-18.el7rhgs.x86_64
glusterfs-api-3.12.2-18.el7rhgs.x86_64
glusterfs-fuse-3.12.2-18.el7rhgs.x86_64
glusterfs-server-3.12.2-18.el7rhgs.x86_64
gluster-block-0.2.1-26.el7rhgs.x86_64

# oc rsh heketi-storage-1-f5jq9 
sh-4.2# rpm -qa| grep heketi
heketi-7.0.0-11.el7rhgs.x86_64
heketi-client-7.0.0-11.el7rhgs.x86_64



How reproducible:
4:4

Steps to Reproduce:
1. Create a OCP 3.9 + CNS 3.10 setup having 1 master+ 3 gluster nodes using ansible 
2. Run the gluster-registry deployment ansible playbook fails at task 'Create heketi config secret' 
3. Also, the registry namespace is not created 


Actual results:
TASK [openshift_storage_glusterfs : Copy heketi private key] ******************************************************************************************************************************************************
changed: [dhcp47-10.lab.eng.blr.redhat.com]
 
TASK [openshift_storage_glusterfs : Create heketi config secret] **************************************************************************************************************************************************
fatal: [dhcp47-10.lab.eng.blr.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": {"cmd": "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs", "results": {}, "returncode": 1, "stderr": "error: timed out waiting for the condition\n", "stdout": "secret \"heketi-storage-config-secret\" deleted\n"}}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.retry

Expected results:
Gluster registry should be deployed successfully 

Additional info:

Comment 2 vinutha 2018-09-12 06:33:12 UTC

Created attachment 1482545 [details]
Ansible inventory file

Comment 4 vinutha 2018-09-12 06:46:25 UTC

Created attachment 1482573 [details]
Ansible-log file

Comment 6 Saravanakumar 2018-09-12 09:17:21 UTC

(In reply to vinutha from comment #4)
> Created attachment 1482573 [details]
> Ansible-log file


As per the logs, the namespace referred is glusterfs and not infra-storage as mentioned in the inventory file.  


As I understand, you need to specify glusterfs and glusterfs_registry in the inventory file when you want to deploy both glusterfs and glusterfs registry.

If only glusterfs registry is required to be deployed, then glusterfs group needs to be removed/commented from the inventory file.

Could you remove glusterfs group in the inventory and try running the playbook.

Comment 9 Jose A. Rivera 2018-09-12 12:02:30 UTC

I should note, the glusterfs_registry pods will be deployed in whatever namepace the registry is using, which is 'default' by default.

Vinutha, reproduce the error and then try running "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs" manually on the master node. If it fails, look through "journalctl -xe" to see if you can spot any errors regarding that operation.

Comment 11 Jose A. Rivera 2018-09-20 13:47:51 UTC

Hmm... okay, looks like the oc_secret task is deleting the secret manifest before failing. Darn.

If you still have the environment, provide the output of:

oc get dc,po,secret --all-namespaces

Comment 14 Jose A. Rivera 2018-10-10 15:53:02 UTC

Oh, sorry, I lost track of this one.

Okay, from the look of things I think the problem is that when you run the registry.yml playbook it is trying to redeploy the "glusterfs" cluster, when we only want the "glusterfs_registry" cluster. So, for this operation, comment out the "glusterfs" nodes in your inventory file and try running the registry playbook again.

If that works we have identified the bug as one of idempotence. I'm not sure how high a priority this should take, but it's probably solvable.

Note You need to log in before you can comment on or make changes to this bug.