Bug 1435401

Summary: [Tracker Bug (OCP)] [RFE] cns-deploy should permanently set the storagenode=gluster label to enable automatic restart of GlusterFS pods
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Daniel Messer <dmesser>
Component: CNS-deploymentAssignee: Michael Adam <madam>
Status: CLOSED DUPLICATE QA Contact: Prasanth <pprakash>
Severity: high Docs Contact:
Priority: high    
Version: cns-3.4CC: aclewett, akhakhar, annair, hchiramm, jarrpa, madam, mifiedle, pprakash, rcyriac, rhs-bugs, rreddy, rtalur
Target Milestone: ---Keywords: FutureFeature, ZStream
Target Release: CNS 3.7   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-05 10:40:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1326732, 1559271    
Bug Blocks:    

Description Daniel Messer 2017-03-23 17:41:56 UTC
Description of problem:

During cns-deploy run a DaemonSet template containing a node selector on 'storagenode=glusterfs 'is used to deploy the GlusterFS pods. The label is applied to each node from via the Kubernetes API. When a node gets rebooted or temporarily shutdown the OpenShift Masters will delete the node. Upon restart of the node the label is not re-applied because the node registers from scratch.
Hence the GlusterFS pod does not startup automatically leaving the deployment in a degraded state despite the node being back up and healthy.

Request for enhancement:

cns-deploy should modify /etc/origin/node/node-config.yaml to include the storagenode=glusterfs label to it is present upon node registration.

Version-Release number of selected component (if applicable):


How reproducible:

- Deploy CNS on OpenShift Container Platform
- Observe all GlusterFS pods health
- observe label present: oc get nodes --show-labels
- shutdown one of the OpenShift nodes hosting a GlusterFS pods
- observe the node being erased by the masters: oc get nodes
- restart the node
- observe the node rejoin the cluster but without the label oc get nodes --show-labels

Actual results:

- observe the node rejoin the cluster but without the label oc get nodes --show-labels
- observe GlusterFS pod missing: oc get pods


Expected results:

- observe the node rejoin the cluster with the label oc get nodes --show-labels
- observice GlusterFS pod spawned again: oc get pods


Additional info:

A temporary workaround is to relabel the node(s) with: oc label <node-name> storagenode=glusterfs

Comment 2 Michael Adam 2017-03-30 13:52:52 UTC
Interesting observation, thanks!

We are doing he initial cli-labeling in cns-deploy, but if I get you right, there is no mechanism to reapply the label when a gluster pod is brought down and up again. So the node-config.yaml change would be a way to make this labeling permanent?

Thanks - Michael

Comment 3 Daniel Messer 2017-03-31 08:47:04 UTC
Correct. You may want to check back with the OpenShift/Kubernetes folks to verify this is indeed the best way. It worked just fine in my environment.

Comment 8 Humble Chirammal 2017-08-03 04:33:03 UTC
This has been communicated to CNS program and no objections on moving this out of CNS 3.6 release. I am changing the flag accordingly.

Comment 11 Niels de Vos 2019-02-05 10:40:41 UTC
According to the dependent bz, this has been fixed through bug 1559271.

*** This bug has been marked as a duplicate of bug 1559271 ***