Bug 2004836 - NoExecute taint is not being applied when nodes become unreachable
Summary: NoExecute taint is not being applied when nodes become unreachable
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.6
Hardware: Unspecified
OS: All
high
high
Target Milestone: ---
: ---
Assignee: Jan Chaloupka
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 1994111 2008266
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-16 07:54 UTC by Joel Rosental R.
Modified: 2022-11-21 13:05 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-26 14:20:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 962 0 None Merged [release-4.6] Bug 2008266: Rebase 1.19.14 2021-10-13 08:54:48 UTC
Red Hat Knowledge Base (Solution) 6369741 0 None None None 2021-09-28 19:58:33 UTC

Description Joel Rosental R. 2021-09-16 07:54:54 UTC
Description of problem:
After doing an availability test by network isolating some worker nodes. As stated in the documentation below, it is the expectation that the taint "node.kubernetes.io/unreachable:NoExecute" would be added to nodes after five minutes and Pods could be re-scheduled in other available nodes.

https://docs.openshift.com/container-platform/4.6/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about-taintBasedEvictions_nodes-scheduler-taints-tolerations

However, it was noticed that once any node becomes "unrecheable" either by turning it off or shutting down the kubelet service, only a taint with "NoSchedule" effect is added to the node, so the pods that were running in this node are never evicted.

Version-Release number of selected component (if applicable):

4.6.21

How reproducible:
Always on customer environment.

Steps to Reproduce:
1. Either shutdown the node or the kubelet service
2. Wait until the node is marked as "NotReady"  and it's marked as "unreacheable".


Actual results:
Only the "node.kubernetes.io/unreachable:NoSchedule" taint is added to the node.

Expected results:
The "node.kubernetes.io/unreachable:NoExecute" taint should be added as well.

Additional info:

Comment 32 RamaKasturi 2022-11-21 13:05:31 UTC
Marking the qe_test_coverage flag to '+' because the verification for this test is being covered by z stream regression e2e, upgrade tests.


Note You need to log in before you can comment on or make changes to this bug.