Bug 1459745

Summary:	When enalbe DefaultTolerationSeconds daemonset pods shouldn't have tolerationSeconds
Product:	OpenShift Container Platform	Reporter:	DeShuai Ma <dma>
Component:	Node	Assignee:	Ryan Phillips <rphillips>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Xiaoli Tian <xtian>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.6.0	CC:	aos-bugs, decarr, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.0.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously when DefaulTolerationSeconds admission plugin was enabled, Daemonsets were created with default NoExecute toleration with toleration seconds of 300 seconds, which would cause them to evict in case of node problems. This fix ensures that Daemonsets are created with infinite toleration seconds to avoid their eviction in case of node problems.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-21 18:38:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description DeShuai Ma 2017-06-08 05:02:04 UTC

Description of problem:
When enalbe DefaultTolerationSeconds, DaemonSet pods are created with NoExecute tolerations for node.alpha.kubernetes.io/unreachable and node.alpha.kubernetes.io/notReady with no tolerationSeconds. This ensures that DaemonSet pods are never evicted due to these problems, which matches the behavior when this feature is disabled.
But now it has tolerationSeconds

Version-Release number of selected component (if applicable):
openshift v3.6.96
kubernetes v1.6.1+5115d708d7
etcd 3.1.0


How reproducible:
Always

Steps to Reproduce:
1.Enable DefaultTolerationSeconds
admissionConfig:
  pluginConfig:
    DefaultTolerationSeconds:
      configuration:
        kind: DefaultAdmissionConfig
        apiVersion: v1
        disable: false
2.Create a daemonset
[root@qe-dma36-master-1 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/daemon/daemonset.yaml -n dma
daemonset "hello-daemonset" created
[root@qe-dma36-master-1 ~]# oc get ds -n dma
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
hello-daemonset   2         2         0         2            0           <none>          4s

3.Check the daemonset pods's tolerations
[root@qe-dma36-master-1 ~]# oc describe po hello-daemonset-3scfh -n dma | grep -i NoExecute
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s

Actual results:
3. "tolerationSeconds": 300

Expected results:
3. no "tolerationSeconds"

Additional info:
In upstream:
[root@dhcp-140-98 ~]# kubectl describe po hello-daemonset-mz1z | grep NoExecute
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute
        node.alpha.kubernetes.io/unreachable=:Exists:NoExecute

Comment 1 Avesh Agarwal 2017-06-14 19:37:04 UTC

Sent pr to origin: https://github.com/openshift/origin/pull/14653

Comment 2 DeShuai Ma 2017-06-21 05:14:06 UTC

Test on openshift v3.6.121, This bug is fixed.

# oc describe po hello-daemonset-x5q5g -n dma | grep -i NoExecute
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute

        "tolerations": [
            {
                "effect": "NoExecute",
                "key": "node.alpha.kubernetes.io/notReady",
                "operator": "Exists"
            },
            {
                "effect": "NoExecute",
                "key": "node.alpha.kubernetes.io/unreachable",
                "operator": "Exists"
            }
        ],

Comment 3 DeShuai Ma 2017-06-21 05:15:22 UTC

Could you help move the bug to ON_QA status. I'll verify it.

Comment 4 Avesh Agarwal 2018-02-09 17:59:07 UTC

Not sure what should be target release here? Is just closing it enough?