Bug 1822211

Summary: taint a node cause debug node to fail
Product: OpenShift Container Platform Reporter: Raif Ahmed <rahmed>
Component: ocAssignee: Jan Chaloupka <jchaloup>
oc sub component: oc QA Contact: zhou ying <yinzhou>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, asoto, fminafra, jchaloup, jokerman, mfojtik, nnosenzo, pmannidi, rahmed, rsandu, yhe
Version: 4.2.z   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-17 15:56:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raif Ahmed 2020-04-08 13:56:43 UTC
Description of problem:

taint a node cause debug node to fail

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. add node taint 
oc patch node invd086.nxdi.nl-htc01.nxp.com --type=merge -p '{"spec":{"taints": [{ "key":"infra", "value":"reserved", "effect":"NoSchedule"},{ "key":"infra", "value":"reserved", "effect":"NoExecute"}]}}'

2. try to debug node 

 oc debug node/invd086.nxdi.nl-htc01.nxp.com 

3. Error is generated "Generated from taint-controller, Marking for deletion Pod dummy/invd094nxdinl-htc01nxpcom-debug"

Actual results:

debug pod is deleted

Expected results:

debug pod run, and have toleration for any node

Additional info:

I was able to work around this issue by adding toleration to the namespace, which was added to the debug node pod.

oc patch namespace dummy --type=merge -p '{"metadata": {"annotations": { "scheduler.alpha.kubernetes.io/defaultTolerations": "[{\"operator\": \"Exists\"}]"}}}'

Comment 1 Harshal Patil 2020-04-17 15:56:56 UTC
It's working as expected. Although it can be added in the documentation or a blog post as simple recipe to debug nodes that are tainted.

Comment 2 Raif Ahmed 2020-04-17 16:34:45 UTC
Ok, I will write a blog post on Taint in more details

Comment 3 yhe 2021-03-03 08:42:50 UTC
You may also need to use the --to-namespace option for the oc debug node command to have the debug pod be created in the dummy namespace. Updated the corresponding KCS as well.

$ oc debug node/worker-0.example.redhat.com --to-namespace dummy