DescriptionJoel Rosental R.
2021-09-16 07:54:54 UTC
Description of problem:
After doing an availability test by network isolating some worker nodes. As stated in the documentation below, it is the expectation that the taint "node.kubernetes.io/unreachable:NoExecute" would be added to nodes after five minutes and Pods could be re-scheduled in other available nodes.
https://docs.openshift.com/container-platform/4.6/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about-taintBasedEvictions_nodes-scheduler-taints-tolerations
However, it was noticed that once any node becomes "unrecheable" either by turning it off or shutting down the kubelet service, only a taint with "NoSchedule" effect is added to the node, so the pods that were running in this node are never evicted.
Version-Release number of selected component (if applicable):
4.6.21
How reproducible:
Always on customer environment.
Steps to Reproduce:
1. Either shutdown the node or the kubelet service
2. Wait until the node is marked as "NotReady" and it's marked as "unreacheable".
Actual results:
Only the "node.kubernetes.io/unreachable:NoSchedule" taint is added to the node.
Expected results:
The "node.kubernetes.io/unreachable:NoExecute" taint should be added as well.
Additional info: