2004836 – NoExecute taint is not being applied when nodes become unreachable

Bug 2004836 - NoExecute taint is not being applied when nodes become unreachable

Summary: NoExecute taint is not being applied when nodes become unreachable

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jan Chaloupka
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:	1994111 2008266
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-16 07:54 UTC by Joel Rosental R.
Modified:	2024-12-20 21:04 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-26 14:20:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 962	0	None	Merged	[release-4.6] Bug 2008266: Rebase 1.19.14	2021-10-13 08:54:48 UTC
Red Hat Knowledge Base (Solution)	6369741	0	None	None	None	2021-09-28 19:58:33 UTC

Description Joel Rosental R. 2021-09-16 07:54:54 UTC

Description of problem:
After doing an availability test by network isolating some worker nodes. As stated in the documentation below, it is the expectation that the taint "node.kubernetes.io/unreachable:NoExecute" would be added to nodes after five minutes and Pods could be re-scheduled in other available nodes.

https://docs.openshift.com/container-platform/4.6/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about-taintBasedEvictions_nodes-scheduler-taints-tolerations

However, it was noticed that once any node becomes "unrecheable" either by turning it off or shutting down the kubelet service, only a taint with "NoSchedule" effect is added to the node, so the pods that were running in this node are never evicted.

Version-Release number of selected component (if applicable):

4.6.21

How reproducible:
Always on customer environment.

Steps to Reproduce:
1. Either shutdown the node or the kubelet service
2. Wait until the node is marked as "NotReady"  and it's marked as "unreacheable".


Actual results:
Only the "node.kubernetes.io/unreachable:NoSchedule" taint is added to the node.

Expected results:
The "node.kubernetes.io/unreachable:NoExecute" taint should be added as well.

Additional info:

Comment 32 RamaKasturi 2022-11-21 13:05:31 UTC

Marking the qe_test_coverage flag to '+' because the verification for this test is being covered by z stream regression e2e, upgrade tests.

Note You need to log in before you can comment on or make changes to this bug.