Bug 2004836

Summary:	NoExecute taint is not being applied when nodes become unreachable
Product:	OpenShift Container Platform	Reporter:	Joel Rosental R. <jrosenta>
Component:	kube-controller-manager	Assignee:	Jan Chaloupka <jchaloup>
Status:	CLOSED CURRENTRELEASE	QA Contact:	zhou ying <yinzhou>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.6	CC:	aarrichi, alosadag, aos-bugs, bdeschen, cldavey, ddelcian, eparis, fburatti, fherrman, jchaloup, knarra, lsaldana, mfojtik, ptang, rhowe, riontel, vmedina
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-12-26 14:20:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1994111, 2008266
Bug Blocks:

Description Joel Rosental R. 2021-09-16 07:54:54 UTC

Description of problem:
After doing an availability test by network isolating some worker nodes. As stated in the documentation below, it is the expectation that the taint "node.kubernetes.io/unreachable:NoExecute" would be added to nodes after five minutes and Pods could be re-scheduled in other available nodes.

https://docs.openshift.com/container-platform/4.6/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about-taintBasedEvictions_nodes-scheduler-taints-tolerations

However, it was noticed that once any node becomes "unrecheable" either by turning it off or shutting down the kubelet service, only a taint with "NoSchedule" effect is added to the node, so the pods that were running in this node are never evicted.

Version-Release number of selected component (if applicable):

4.6.21

How reproducible:
Always on customer environment.

Steps to Reproduce:
1. Either shutdown the node or the kubelet service
2. Wait until the node is marked as "NotReady"  and it's marked as "unreacheable".


Actual results:
Only the "node.kubernetes.io/unreachable:NoSchedule" taint is added to the node.

Expected results:
The "node.kubernetes.io/unreachable:NoExecute" taint should be added as well.

Additional info:

Comment 32 RamaKasturi 2022-11-21 13:05:31 UTC

Marking the qe_test_coverage flag to '+' because the verification for this test is being covered by z stream regression e2e, upgrade tests.