Description of problem: Deployment of gluster-registry fails on OCP3.9+ CNS3.10 setup using ansible at Task : 'Create heketi config secret' Version-Release number of selected component (if applicable): # rpm -qa| grep openshift openshift-ansible-roles-3.9.41-1.git.0.4c55974.el7.noarch atomic-openshift-3.9.41-1.git.0.67432b0.el7.x86_64 openshift-ansible-playbooks-3.9.41-1.git.0.4c55974.el7.noarch openshift-ansible-docs-3.9.41-1.git.0.4c55974.el7.noarch atomic-openshift-utils-3.9.41-1.git.0.4c55974.el7.noarch atomic-openshift-docker-excluder-3.9.41-1.git.0.67432b0.el7.noarch atomic-openshift-clients-3.9.41-1.git.0.67432b0.el7.x86_64 atomic-openshift-excluder-3.9.41-1.git.0.67432b0.el7.noarch atomic-openshift-sdn-ovs-3.9.41-1.git.0.67432b0.el7.x86_64 openshift-ansible-3.9.41-1.git.0.4c55974.el7.noarch atomic-openshift-master-3.9.41-1.git.0.67432b0.el7.x86_64 atomic-openshift-node-3.9.41-1.git.0.67432b0.el7.x86_64 # oc rsh glusterfs-storage-mncjh sh-4.2# rpm -qa| grep gluster glusterfs-client-xlators-3.12.2-18.el7rhgs.x86_64 glusterfs-cli-3.12.2-18.el7rhgs.x86_64 python2-gluster-3.12.2-18.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-18.el7rhgs.x86_64 glusterfs-libs-3.12.2-18.el7rhgs.x86_64 glusterfs-3.12.2-18.el7rhgs.x86_64 glusterfs-api-3.12.2-18.el7rhgs.x86_64 glusterfs-fuse-3.12.2-18.el7rhgs.x86_64 glusterfs-server-3.12.2-18.el7rhgs.x86_64 gluster-block-0.2.1-26.el7rhgs.x86_64 # oc rsh heketi-storage-1-f5jq9 sh-4.2# rpm -qa| grep heketi heketi-7.0.0-11.el7rhgs.x86_64 heketi-client-7.0.0-11.el7rhgs.x86_64 How reproducible: 4:4 Steps to Reproduce: 1. Create a OCP 3.9 + CNS 3.10 setup having 1 master+ 3 gluster nodes using ansible 2. Run the gluster-registry deployment ansible playbook fails at task 'Create heketi config secret' 3. Also, the registry namespace is not created Actual results: TASK [openshift_storage_glusterfs : Copy heketi private key] ****************************************************************************************************************************************************** changed: [dhcp47-10.lab.eng.blr.redhat.com] TASK [openshift_storage_glusterfs : Create heketi config secret] ************************************************************************************************************************************************** fatal: [dhcp47-10.lab.eng.blr.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": {"cmd": "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs", "results": {}, "returncode": 1, "stderr": "error: timed out waiting for the condition\n", "stdout": "secret \"heketi-storage-config-secret\" deleted\n"}} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.retry Expected results: Gluster registry should be deployed successfully Additional info:
Created attachment 1482545 [details] Ansible inventory file
Created attachment 1482573 [details] Ansible-log file
(In reply to vinutha from comment #4) > Created attachment 1482573 [details] > Ansible-log file As per the logs, the namespace referred is glusterfs and not infra-storage as mentioned in the inventory file. As I understand, you need to specify glusterfs and glusterfs_registry in the inventory file when you want to deploy both glusterfs and glusterfs registry. If only glusterfs registry is required to be deployed, then glusterfs group needs to be removed/commented from the inventory file. Could you remove glusterfs group in the inventory and try running the playbook.
I should note, the glusterfs_registry pods will be deployed in whatever namepace the registry is using, which is 'default' by default. Vinutha, reproduce the error and then try running "/usr/bin/oc replace -f /tmp/heketi-storage-config-secret --force -n glusterfs" manually on the master node. If it fails, look through "journalctl -xe" to see if you can spot any errors regarding that operation.
Hmm... okay, looks like the oc_secret task is deleting the secret manifest before failing. Darn. If you still have the environment, provide the output of: oc get dc,po,secret --all-namespaces
Oh, sorry, I lost track of this one. Okay, from the look of things I think the problem is that when you run the registry.yml playbook it is trying to redeploy the "glusterfs" cluster, when we only want the "glusterfs_registry" cluster. So, for this operation, comment out the "glusterfs" nodes in your inventory file and try running the registry playbook again. If that works we have identified the bug as one of idempotence. I'm not sure how high a priority this should take, but it's probably solvable.