Bug 1564070 - ensure gluster cluster is formed in cns cluster - before reporting cns pods in ready state
Summary: ensure gluster cluster is formed in cns cluster - before reporting cns pods i...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: CNS-deployment
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: Prasanth
Whiteboard: aos-scalability-39
Depends On:
TreeView+ depends on / blocked
Reported: 2018-04-05 09:37 UTC by Elvir Kuric
Modified: 2019-04-22 20:06 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-04-22 20:06:32 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Elvir Kuric 2018-04-05 09:37:32 UTC
Description of problem:

CNS gluster pods will start and be in 

"Running" 1/1 state 

even gluster cluster is not formed withing cns pods. 

Inside all CNS pods :

Number of Peers: 0

Version-Release number of selected component (if applicable):
OCP v3.9 and gluster image 

 Image:          rhgs3/rhgs-server-rhel7:v3.9
 Image ID:       docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7@sha256:e8ba9f0b090108d468b8c012201e11e3f8b769f9b6adc90f53b184c2157e5e5b

Actual results:

gluster pods in Running state 1/1 , but gluster cluster not formed 

Expected results:

Having gluster pods in Running state should be connected with formed / working gluster cluster inside cns pods 

Additional info:

currently livenesProbe and readinesProbe are 

--- readinessProbe:
            timeoutSeconds: 3
            initialDelaySeconds: 40
              - "/bin/bash"
              - "-c"
              - systemctl status glusterd.service
            periodSeconds: 25
            successThreshold: 1
            failureThreshold: 50
            timeoutSeconds: 3
            initialDelaySeconds: 40
              - "/bin/bash"
              - "-c"
              - systemctl status glusterd.service

where only status of glusterd service is checked - glusterd service can run but this does not mean that cluster is formed, I think livenessProbe can be 
to check status of gluster service, but  readinessProbe should only report "Ready" once connects with peers.

Comment 1 jmencak 2018-04-08 07:20:32 UTC
I'm hitting this bug too.  In my case the workaround was to "wipefs -a <glusterfs_device>" and redeploy.  An xfs filesystem that existed on one of the block devices tripped the installer even though glusterfs_wipe was used.  "wipefs -a" should be added to the installer if/when openshift_storage_glusterfs_wipe is defined.

Comment 2 jmencak 2018-04-09 15:54:44 UTC
Upstream workaround: https://github.com/openshift/openshift-ansible/pull/7863

Comment 4 Humble Chirammal 2018-04-24 11:42:53 UTC
The liveliness probe is for a pod afaict, so the TSP formation does not really fall into the liveliness probe. Ultimately glusterd service is the service which we are expected to run in this pod or container.

Comment 5 Elvir Kuric 2018-05-15 06:41:07 UTC
Adding devices will fail if gluster cluster is not formed, so it is necessary to ensure that gluster cluster is up and running ( gluster peer status ) before proceeding with adding devices in cns configuration. 
If there is anything I can help with, please let know.

Note You need to log in before you can comment on or make changes to this bug.