Description of problem: CNS gluster pods will start and be in "Running" 1/1 state even gluster cluster is not formed withing cns pods. Inside all CNS pods : Number of Peers: 0 Version-Release number of selected component (if applicable): OCP v3.9 and gluster image Image: rhgs3/rhgs-server-rhel7:v3.9 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7@sha256:e8ba9f0b090108d468b8c012201e11e3f8b769f9b6adc90f53b184c2157e5e5b Actual results: gluster pods in Running state 1/1 , but gluster cluster not formed Expected results: Having gluster pods in Running state should be connected with formed / working gluster cluster inside cns pods If Additional info: currently livenesProbe and readinesProbe are --- readinessProbe: timeoutSeconds: 3 initialDelaySeconds: 40 exec: command: - "/bin/bash" - "-c" - systemctl status glusterd.service periodSeconds: 25 successThreshold: 1 failureThreshold: 50 livenessProbe: timeoutSeconds: 3 initialDelaySeconds: 40 exec: command: - "/bin/bash" - "-c" - systemctl status glusterd.service --- where only status of glusterd service is checked - glusterd service can run but this does not mean that cluster is formed, I think livenessProbe can be to check status of gluster service, but readinessProbe should only report "Ready" once connects with peers.
I'm hitting this bug too. In my case the workaround was to "wipefs -a <glusterfs_device>" and redeploy. An xfs filesystem that existed on one of the block devices tripped the installer even though glusterfs_wipe was used. "wipefs -a" should be added to the installer if/when openshift_storage_glusterfs_wipe is defined.
Upstream workaround: https://github.com/openshift/openshift-ansible/pull/7863
The liveliness probe is for a pod afaict, so the TSP formation does not really fall into the liveliness probe. Ultimately glusterd service is the service which we are expected to run in this pod or container.
Adding devices will fail if gluster cluster is not formed, so it is necessary to ensure that gluster cluster is up and running ( gluster peer status ) before proceeding with adding devices in cns configuration. If there is anything I can help with, please let know.