Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1709422

Summary:	sdn daemonset should tolerate taints (3.10)
Product:	OpenShift Container Platform	Reporter:	Siva Reddy <schituku>
Component:	Installer	Assignee:	Joseph Callen <jcallen>
Installer sub component:	openshift-ansible	QA Contact:	Siva Reddy <schituku>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium
Version:	3.10.0
Target Milestone:	---
Target Release:	3.10.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-27 16:41:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1690200

Description Siva Reddy 2019-05-13 14:57:21 UTC

Description of problem:
The sdn pods should tolerate all taints like the other simialr pods like sync, ovs and logging pods.

version:
# oc version
oc v3.10.145
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xxxx-master-etcd-1:8443
openshift v3.10.139
kubernetes v1.10.0+b81c8f8

openshift-ansible-3.10.145-1.git.0.b76c9df.el7.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1. $node is the compute node where the pods are running
2. Note the pods running for sync and ovs,sdn pods
# oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ;
sync-vkxwk 1/1 Running
ovs-gjkts 1/1 Running
sdn-c7c7f 1/1 Running
3. Note that there are 3 pods running for each of sync, ovs, sdn
4. taint the node
#oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute
#oc describe node $node | grep -i taint
Taints: NodeWithImpairedVolumes=true:NoExecute
6. Note the pods for sync, ovs and sdn pods.
# oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ;
sync-mtjx7 1/1 Running 0 18m 10.240.0.51
ovs-sb8qz 1/1 Running 0 18m 10.240.0.51

Actual Results:
The pods for ovs, sync tolerate the taint but not the sdn pods.

Expected results:
The sdn pod should tolerate the taint like sync and ovs pods.

Additional info:
Note that this is affecting the logging pods(https://bugzilla.redhat.com/show_bug.cgi?id=1690200)

Comment 3 Joseph Callen 2019-05-16 16:30:51 UTC

PR: https://github.com/openshift/openshift-ansible/pull/11616

Comment 5 Siva Reddy 2019-06-17 15:12:09 UTC

The sdn pods are not getting affected by taints anymore, hence moving it to verified.

# oc version
oc v3.10.149
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.10.149
kubernetes v1.10.0+b81c8f8

openshift-ansible-3.10.149-1.git.0.eb0262c.el7.noarch.rpm

Steps to verify:
1. $node is the compute node where the pods are running
2. Note the pods running for sync and ovs,sdn pods
# oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ;
3. Note that there are 3 pods running for each of sync, ovs, sdn
4. taint the node
#oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute
#oc describe node $node | grep -i taint
Taints: NodeWithImpairedVolumes=true:NoExecute
6. Note the pods for sync, ovs and sdn pods.
# oc get pods -n openshift-node -o wide | grep $node ; oc get pods -n openshift-sdn -o wide | grep $node ;

Even after applying the taint the sdn pods are still running unlike before where it got terminated.

7. Check the fluentd pods
# oc project
Using project "openshift-logging" on server
# oc get pods
8. recreate the pods by toggle the flag from true to false and then to true
#oc label node --all --overwrite logging-infra-fluentd=false
-- note that all fluentd pods get terminated
#oc label node --all --overwrite logging-infra-fluentd=true
-- note that the all fluentd pods come back up

Also verified that the logging pods are not getting affected by taints

Comment 7 errata-xmlrpc 2019-06-27 16:41:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1607