Description of problem: When following the documentation for deploying ClusterLogging and adding taints to nodes to only run Logging components, the image registry 'node-ca' daemonset does not include the proper toleration and these nodes with taints don't run the 'node-ca' pods. node-ca daemonset has this toleration: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists To run in all nodes, regardless of any toleration, this could be: tolerations: operator: Exists Version-Release number of selected component (if applicable): 4.2.16 How reproducible: 1. Deploy a ClusterLogging instance, customized to use tolerations and taints: https://docs.openshift.com/container-platform/4.2/logging/config/cluster-logging-tolerations.html Toleration customization: tolerations: - effect: NoExecute key: logging operator: Exists 2. Taint nodes with: $ oc adm taint nodes <node1|node2|node3> logging=true:NoExecute Actual results: After the taint, 'node-ca' pods were deleted from tainted nodes: $ oc get events -n openshift-image-registry [...] 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-pjxfn 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-2kclr 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-c5nn9 Expected results: Pods are not deleted. Additional info:
Duplicate of Bug 1785115 - this will be fixed in v4.4.0. *** This bug has been marked as a duplicate of bug 1785115 ***
Bug 1785115 was about NoSchedule, but this one about NoExecute. I agree we need to tolerate all effects.
Below toleration is added to node-ca on 4.4.0-0.nightly-2020-02-17-192940 : tolerations: - operator: Exists
Any chance of backporting to 4.2/4.3, or alternative workarounds? The documented process for setting up dedicated OCS nodes [1] has the user taint the storage nodes, which will break nodeCA. [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/deploying_openshift_container_storage/index#creating-an-openshift-container-storage-service_rhocs
It looks like from PR 457 that what you actually have is: tolerations: - effect: NoSchedule operator: Exists not this: tolerations: - operator: Exists Line 38 (- effect: NoSchedule) is not actually removed correct? This would not not allow for NoExecute taints
Sorry that was PR 421 I was looking at from a linked BZ https://bugzilla.redhat.com/show_bug.cgi?id=1785115
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581