Description of problem:
During cns-deploy run a DaemonSet template containing a node selector on 'storagenode=glusterfs 'is used to deploy the GlusterFS pods. The label is applied to each node from via the Kubernetes API. When a node gets rebooted or temporarily shutdown the OpenShift Masters will delete the node. Upon restart of the node the label is not re-applied because the node registers from scratch.
Hence the GlusterFS pod does not startup automatically leaving the deployment in a degraded state despite the node being back up and healthy.
Request for enhancement:
cns-deploy should modify /etc/origin/node/node-config.yaml to include the storagenode=glusterfs label to it is present upon node registration.
Version-Release number of selected component (if applicable):
How reproducible:
- Deploy CNS on OpenShift Container Platform
- Observe all GlusterFS pods health
- observe label present: oc get nodes --show-labels
- shutdown one of the OpenShift nodes hosting a GlusterFS pods
- observe the node being erased by the masters: oc get nodes
- restart the node
- observe the node rejoin the cluster but without the label oc get nodes --show-labels
Actual results:
- observe the node rejoin the cluster but without the label oc get nodes --show-labels
- observe GlusterFS pod missing: oc get pods
Expected results:
- observe the node rejoin the cluster with the label oc get nodes --show-labels
- observice GlusterFS pod spawned again: oc get pods
Additional info:
A temporary workaround is to relabel the node(s) with: oc label <node-name> storagenode=glusterfs
Interesting observation, thanks!
We are doing he initial cli-labeling in cns-deploy, but if I get you right, there is no mechanism to reapply the label when a gluster pod is brought down and up again. So the node-config.yaml change would be a way to make this labeling permanent?
Thanks - Michael