Description of problem: During cns-deploy run a DaemonSet template containing a node selector on 'storagenode=glusterfs 'is used to deploy the GlusterFS pods. The label is applied to each node from via the Kubernetes API. When a node gets rebooted or temporarily shutdown the OpenShift Masters will delete the node. Upon restart of the node the label is not re-applied because the node registers from scratch. Hence the GlusterFS pod does not startup automatically leaving the deployment in a degraded state despite the node being back up and healthy. Request for enhancement: cns-deploy should modify /etc/origin/node/node-config.yaml to include the storagenode=glusterfs label to it is present upon node registration. Version-Release number of selected component (if applicable): How reproducible: - Deploy CNS on OpenShift Container Platform - Observe all GlusterFS pods health - observe label present: oc get nodes --show-labels - shutdown one of the OpenShift nodes hosting a GlusterFS pods - observe the node being erased by the masters: oc get nodes - restart the node - observe the node rejoin the cluster but without the label oc get nodes --show-labels Actual results: - observe the node rejoin the cluster but without the label oc get nodes --show-labels - observe GlusterFS pod missing: oc get pods Expected results: - observe the node rejoin the cluster with the label oc get nodes --show-labels - observice GlusterFS pod spawned again: oc get pods Additional info: A temporary workaround is to relabel the node(s) with: oc label <node-name> storagenode=glusterfs
Interesting observation, thanks! We are doing he initial cli-labeling in cns-deploy, but if I get you right, there is no mechanism to reapply the label when a gluster pod is brought down and up again. So the node-config.yaml change would be a way to make this labeling permanent? Thanks - Michael
Correct. You may want to check back with the OpenShift/Kubernetes folks to verify this is indeed the best way. It worked just fine in my environment.
This has been communicated to CNS program and no objections on moving this out of CNS 3.6 release. I am changing the flag accordingly.
According to the dependent bz, this has been fixed through bug 1559271. *** This bug has been marked as a duplicate of bug 1559271 ***