Bug 1435401

Summary:	[Tracker Bug (OCP)] [RFE] cns-deploy should permanently set the storagenode=gluster label to enable automatic restart of GlusterFS pods
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Daniel Messer <dmesser>
Component:	CNS-deployment	Assignee:	Michael Adam <madam>
Status:	CLOSED DUPLICATE	QA Contact:	Prasanth <pprakash>
Severity:	high	Docs Contact:
Priority:	high
Version:	cns-3.4	CC:	aclewett, akhakhar, annair, hchiramm, jarrpa, madam, mifiedle, pprakash, rcyriac, rhs-bugs, rreddy, rtalur
Target Milestone:	---	Keywords:	FutureFeature, ZStream
Target Release:	CNS 3.7
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-02-05 10:40:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1326732, 1559271
Bug Blocks:

Description Daniel Messer 2017-03-23 17:41:56 UTC

Description of problem:

During cns-deploy run a DaemonSet template containing a node selector on 'storagenode=glusterfs 'is used to deploy the GlusterFS pods. The label is applied to each node from via the Kubernetes API. When a node gets rebooted or temporarily shutdown the OpenShift Masters will delete the node. Upon restart of the node the label is not re-applied because the node registers from scratch.
Hence the GlusterFS pod does not startup automatically leaving the deployment in a degraded state despite the node being back up and healthy.

Request for enhancement:

cns-deploy should modify /etc/origin/node/node-config.yaml to include the storagenode=glusterfs label to it is present upon node registration.

Version-Release number of selected component (if applicable):


How reproducible:

- Deploy CNS on OpenShift Container Platform
- Observe all GlusterFS pods health
- observe label present: oc get nodes --show-labels
- shutdown one of the OpenShift nodes hosting a GlusterFS pods
- observe the node being erased by the masters: oc get nodes
- restart the node
- observe the node rejoin the cluster but without the label oc get nodes --show-labels

Actual results:

- observe the node rejoin the cluster but without the label oc get nodes --show-labels
- observe GlusterFS pod missing: oc get pods


Expected results:

- observe the node rejoin the cluster with the label oc get nodes --show-labels
- observice GlusterFS pod spawned again: oc get pods


Additional info:

A temporary workaround is to relabel the node(s) with: oc label <node-name> storagenode=glusterfs

Comment 2 Michael Adam 2017-03-30 13:52:52 UTC

Interesting observation, thanks!

We are doing he initial cli-labeling in cns-deploy, but if I get you right, there is no mechanism to reapply the label when a gluster pod is brought down and up again. So the node-config.yaml change would be a way to make this labeling permanent?

Thanks - Michael

Comment 3 Daniel Messer 2017-03-31 08:47:04 UTC

Correct. You may want to check back with the OpenShift/Kubernetes folks to verify this is indeed the best way. It worked just fine in my environment.

Comment 8 Humble Chirammal 2017-08-03 04:33:03 UTC

This has been communicated to CNS program and no objections on moving this out of CNS 3.6 release. I am changing the flag accordingly.

Comment 11 Niels de Vos 2019-02-05 10:40:41 UTC

According to the dependent bz, this has been fixed through bug 1559271.

*** This bug has been marked as a duplicate of bug 1559271 ***