Bug 2004836

Summary: NoExecute taint is not being applied when nodes become unreachable
Product: OpenShift Container Platform Reporter: Joel Rosental R. <jrosenta>
Component: kube-controller-managerAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED CURRENTRELEASE QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aarrichi, alosadag, aos-bugs, bdeschen, cldavey, ddelcian, eparis, fburatti, fherrman, jchaloup, knarra, lsaldana, mfojtik, ptang, rhowe, riontel, vmedina
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-26 14:20:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1994111, 2008266    
Bug Blocks:    

Description Joel Rosental R. 2021-09-16 07:54:54 UTC
Description of problem:
After doing an availability test by network isolating some worker nodes. As stated in the documentation below, it is the expectation that the taint "node.kubernetes.io/unreachable:NoExecute" would be added to nodes after five minutes and Pods could be re-scheduled in other available nodes.

https://docs.openshift.com/container-platform/4.6/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about-taintBasedEvictions_nodes-scheduler-taints-tolerations

However, it was noticed that once any node becomes "unrecheable" either by turning it off or shutting down the kubelet service, only a taint with "NoSchedule" effect is added to the node, so the pods that were running in this node are never evicted.

Version-Release number of selected component (if applicable):

4.6.21

How reproducible:
Always on customer environment.

Steps to Reproduce:
1. Either shutdown the node or the kubelet service
2. Wait until the node is marked as "NotReady"  and it's marked as "unreacheable".


Actual results:
Only the "node.kubernetes.io/unreachable:NoSchedule" taint is added to the node.

Expected results:
The "node.kubernetes.io/unreachable:NoExecute" taint should be added as well.

Additional info:

Comment 32 RamaKasturi 2022-11-21 13:05:31 UTC
Marking the qe_test_coverage flag to '+' because the verification for this test is being covered by z stream regression e2e, upgrade tests.