Bug 1690200

Summary: logging fluentd daemonset should tolerate all taints (3.10)
Product: OpenShift Container Platform Reporter: Siva Reddy <schituku>
Component: InstallerAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: Siva Reddy <schituku>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, ewolinet, gpei, jdesousa, jialiu, jokerman, jupierce, mmccomas, rmeggins, schituku, shiywang, smossber, vrutkovs, wmeng
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1685952 Environment:
Last Closed: 2019-06-27 16:41:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1635462, 1685952, 1685970, 1709422    
Bug Blocks:    
Attachments:
Description Flags
The fluentd daemonset file used to create the ds
none
The yaml of the pod that is stuck none

Comment 1 Siva Reddy 2019-03-19 17:39:18 UTC
Created attachment 1545757 [details]
The fluentd daemonset file used to create the ds

# oc project
Using project "openshift-logging" on server 
# oc get ds
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
logging-fluentd   3         3         3         3            3           logging-infra-fluentd=true   2h
# oc get ds logging-fluentd > logging-fluentd.yaml

Comment 12 Siva Reddy 2019-04-22 14:36:47 UTC
Created attachment 1557175 [details]
The yaml of the pod that is stuck

Comment 16 Vadim Rutkovsky 2019-04-25 08:32:41 UTC
3.10 PR - https://github.com/openshift/openshift-ansible/pull/11552

Comment 17 Vadim Rutkovsky 2019-05-02 09:38:20 UTC
Fix is available in openshift-ansible-3.10.143-1

Comment 21 Siva Reddy 2019-05-13 15:00:48 UTC
The taint is not affecting the fluentd pods that are already running, but they are not getting created new because of the taints affecting the sdn pods. So marking this bug as verified but for the pods to come up the sdn pods needs to tolerate the taints which is tracked in another bug.

version:
# oc version
oc v3.10.145
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xxxx-master-etcd-1:8443
openshift v3.10.139
kubernetes v1.10.0+b81c8f8

openshift-ansible-3.10.145-1.git.0.b76c9df.el7.noarch.rpm

Steps to verify the bug:

1. taint the node
   # oc adm taint node $node NodeWithImpairedVolumes=true:NoExecute
    #oc describe node $node | grep -i taint
2. Check the fluentd pods
  # oc project
  Using project "openshift-logging" on server 
  # oc get pods 
  -- note that there is one fluentd pod per node running and hence the taint is not affecting the existing ones
3. recreate the pods by toggle the flag from true to false and then to true
  #oc label node --all --overwrite logging-infra-fluentd=false
    -- note that all fluentd pods get terminated
  #oc label node --all --overwrite logging-infra-fluentd=true
    -- note that the fluentd pods come back up again except for the tainted node where it is stuck in containercreating
   # oc get pods -o wide | grep $node
logging-fluentd-52dwf                      0/1       ContainerCreating   0          14m       <none>

    As mentioned in the description even though the logging pods are not running, it is verified that the taints are not affecting their creation and hence verifying the bug.

Comment 23 errata-xmlrpc 2019-06-27 16:41:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1607