Description of problem: The sdn pods should tolerate all taints like the other simialr pods like sync, ovs and logging pods. version: # oc version oc v3.10.145 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://xxxx-master-etcd-1:8443 openshift v3.10.139 kubernetes v1.10.0+b81c8f8 openshift-ansible-3.10.145-1.git.0.b76c9df.el7.noarch.rpm How reproducible: Always Steps to Reproduce: 1. $node is the compute node where the pods are running 2. Note the pods running for sync and ovs,sdn pods # oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ; sync-vkxwk 1/1 Running ovs-gjkts 1/1 Running sdn-c7c7f 1/1 Running 3. Note that there are 3 pods running for each of sync, ovs, sdn 4. taint the node #oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute #oc describe node $node | grep -i taint Taints: NodeWithImpairedVolumes=true:NoExecute 6. Note the pods for sync, ovs and sdn pods. # oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ; sync-mtjx7 1/1 Running 0 18m 10.240.0.51 ovs-sb8qz 1/1 Running 0 18m 10.240.0.51 Actual Results: The pods for ovs, sync tolerate the taint but not the sdn pods. Expected results: The sdn pod should tolerate the taint like sync and ovs pods. Additional info: Note that this is affecting the logging pods(https://bugzilla.redhat.com/show_bug.cgi?id=1690200)
PR: https://github.com/openshift/openshift-ansible/pull/11616
The sdn pods are not getting affected by taints anymore, hence moving it to verified. # oc version oc v3.10.149 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO openshift v3.10.149 kubernetes v1.10.0+b81c8f8 openshift-ansible-3.10.149-1.git.0.eb0262c.el7.noarch.rpm Steps to verify: 1. $node is the compute node where the pods are running 2. Note the pods running for sync and ovs,sdn pods # oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ; 3. Note that there are 3 pods running for each of sync, ovs, sdn 4. taint the node #oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute #oc describe node $node | grep -i taint Taints: NodeWithImpairedVolumes=true:NoExecute 6. Note the pods for sync, ovs and sdn pods. # oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ; Even after applying the taint the sdn pods are still running unlike before where it got terminated. 7. Check the fluentd pods # oc project Using project "openshift-logging" on server # oc get pods 8. recreate the pods by toggle the flag from true to false and then to true #oc label node --all --overwrite logging-infra-fluentd=false -- note that all fluentd pods get terminated #oc label node --all --overwrite logging-infra-fluentd=true -- note that the all fluentd pods come back up Also verified that the logging pods are not getting affected by taints
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1607