Bug 1564070

Summary: ensure gluster cluster is formed in cns cluster - before reporting cns pods in ready state
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Elvir Kuric <ekuric>
Component: CNS-deploymentAssignee: Michael Adam <madam>
Status: CLOSED NOTABUG QA Contact: Prasanth <pprakash>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.0CC: akhakhar, aos-bugs, aos-storage-staff, ekuric, hchiramm, jarrpa, jmencak, jmulligan, kramdoss, madam, pprakash, rhs-bugs, rtalur, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-39
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-22 20:06:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elvir Kuric 2018-04-05 09:37:32 UTC
Description of problem:

CNS gluster pods will start and be in 

"Running" 1/1 state 

even gluster cluster is not formed withing cns pods. 

Inside all CNS pods :

Number of Peers: 0


Version-Release number of selected component (if applicable):
OCP v3.9 and gluster image 

 Image:          rhgs3/rhgs-server-rhel7:v3.9
 Image ID:       docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7@sha256:e8ba9f0b090108d468b8c012201e11e3f8b769f9b6adc90f53b184c2157e5e5b



Actual results:

gluster pods in Running state 1/1 , but gluster cluster not formed 


Expected results:

Having gluster pods in Running state should be connected with formed / working gluster cluster inside cns pods 
If 

Additional info:

currently livenesProbe and readinesProbe are 

--- readinessProbe:
            timeoutSeconds: 3
            initialDelaySeconds: 40
            exec:
              command:
              - "/bin/bash"
              - "-c"
              - systemctl status glusterd.service
            periodSeconds: 25
            successThreshold: 1
            failureThreshold: 50
          livenessProbe:
            timeoutSeconds: 3
            initialDelaySeconds: 40
            exec:
              command:
              - "/bin/bash"
              - "-c"
              - systemctl status glusterd.service


--- 
where only status of glusterd service is checked - glusterd service can run but this does not mean that cluster is formed, I think livenessProbe can be 
to check status of gluster service, but  readinessProbe should only report "Ready" once connects with peers.

Comment 1 Jiří Mencák 2018-04-08 07:20:32 UTC
I'm hitting this bug too.  In my case the workaround was to "wipefs -a <glusterfs_device>" and redeploy.  An xfs filesystem that existed on one of the block devices tripped the installer even though glusterfs_wipe was used.  "wipefs -a" should be added to the installer if/when openshift_storage_glusterfs_wipe is defined.

Comment 2 Jiří Mencák 2018-04-09 15:54:44 UTC
Upstream workaround: https://github.com/openshift/openshift-ansible/pull/7863

Comment 4 Humble Chirammal 2018-04-24 11:42:53 UTC
The liveliness probe is for a pod afaict, so the TSP formation does not really fall into the liveliness probe. Ultimately glusterd service is the service which we are expected to run in this pod or container.

Comment 5 Elvir Kuric 2018-05-15 06:41:07 UTC
Adding devices will fail if gluster cluster is not formed, so it is necessary to ensure that gluster cluster is up and running ( gluster peer status ) before proceeding with adding devices in cns configuration. 
If there is anything I can help with, please let know.