Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1709422

Summary: sdn daemonset should tolerate taints (3.10)
Product: OpenShift Container Platform Reporter: Siva Reddy <schituku>
Component: InstallerAssignee: Joseph Callen <jcallen>
Installer sub component: openshift-ansible QA Contact: Siva Reddy <schituku>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 3.10.0   
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-27 16:41:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1690200    

Description Siva Reddy 2019-05-13 14:57:21 UTC
Description of problem:
  The sdn pods should tolerate all taints like the other simialr pods like sync, ovs and logging pods.

version:
# oc version
oc v3.10.145
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xxxx-master-etcd-1:8443
openshift v3.10.139
kubernetes v1.10.0+b81c8f8

openshift-ansible-3.10.145-1.git.0.b76c9df.el7.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1. $node is the compute node where the pods are running
2. Note the pods running for sync and ovs,sdn pods
   # oc get pods -n openshift-node -o wide | grep $node ;  oc get pods -n openshift-sdn -o wide | grep $node ;
    sync-vkxwk   1/1       Running   
    ovs-gjkts   1/1       Running   
    sdn-c7c7f   1/1       Running   
3. Note that there are 3 pods running for each of sync, ovs, sdn
4. taint the node
    #oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute
    #oc describe node $node | grep -i taint
      Taints:             NodeWithImpairedVolumes=true:NoExecute
6. Note the pods for sync, ovs and sdn pods.
# oc get pods -n openshift-node -o wide | grep $node ;  oc get pods -n openshift-sdn -o wide | grep $node ;
sync-mtjx7   1/1       Running   0          18m       10.240.0.51
ovs-sb8qz   1/1       Running   0          18m       10.240.0.51 

Actual Results:
   The pods for ovs, sync tolerate the taint but not the sdn pods.

Expected results:
   The sdn pod should tolerate the taint like sync and ovs pods.

Additional info:
   Note that this is affecting the logging pods(https://bugzilla.redhat.com/show_bug.cgi?id=1690200)

Comment 3 Joseph Callen 2019-05-16 16:30:51 UTC
PR: https://github.com/openshift/openshift-ansible/pull/11616

Comment 5 Siva Reddy 2019-06-17 15:12:09 UTC
   The sdn pods are not getting affected by taints anymore, hence moving it to verified.

# oc version
oc v3.10.149
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.10.149
kubernetes v1.10.0+b81c8f8

openshift-ansible-3.10.149-1.git.0.eb0262c.el7.noarch.rpm

Steps to verify:
1. $node is the compute node where the pods are running
2. Note the pods running for sync and ovs,sdn pods
   # oc get pods -n openshift-node -o wide | grep $node ;  oc get pods -n openshift-sdn -o wide | grep $node ;
3. Note that there are 3 pods running for each of sync, ovs, sdn
4. taint the node
    #oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute
    #oc describe node $node | grep -i taint
      Taints:             NodeWithImpairedVolumes=true:NoExecute
6. Note the pods for sync, ovs and sdn pods.
# oc get pods -n openshift-node -o wide | grep $node ;  oc get pods -n openshift-sdn -o wide | grep $node ;

      Even after applying the taint the sdn pods are still running unlike before where it got terminated.

7. Check the fluentd pods
  # oc project
  Using project "openshift-logging" on server 
  # oc get pods 
8. recreate the pods by toggle the flag from true to false and then to true
  #oc label node --all --overwrite logging-infra-fluentd=false
    -- note that all fluentd pods get terminated
  #oc label node --all --overwrite logging-infra-fluentd=true
    -- note that the all fluentd pods come back up 



Also verified that the logging pods are not getting affected by taints

Comment 7 errata-xmlrpc 2019-06-27 16:41:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1607