Bug 1801474 - node-ca daemonset toleration conflicts with clusterlogging CR
Summary: node-ca daemonset toleration conflicts with clusterlogging CR
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1820242
TreeView+ depends on / blocked
 
Reported: 2020-02-10 23:11 UTC by Hugo Cisneiros (Eitch)
Modified: 2020-05-04 11:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the nodeca daemon didn't tolerate the NoExecute taint, but ClusterLogging documentation recommends to use NoExecute Consequence: the nodeca daemon doesn't manage certificates on such nodes Fix: tolerate all taints Result: additionalTrustedCA are synced to all nodes with any taints
Clone Of:
: 1820242 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:35:28 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 457 None closed Bug 1801474: nodeca daemon should tolerate all taints 2020-10-14 20:29:35 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:35:55 UTC

Description Hugo Cisneiros (Eitch) 2020-02-10 23:11:24 UTC
Description of problem:

When following the documentation for deploying ClusterLogging and adding taints to nodes to only run Logging components, the image registry 'node-ca' daemonset does not include the proper toleration and these nodes with taints don't run the 'node-ca' pods. 

node-ca daemonset has this toleration:

     tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists

To run in all nodes, regardless of any toleration, this could be:

     tolerations:
       operator: Exists

Version-Release number of selected component (if applicable):

4.2.16

How reproducible:

1. Deploy a ClusterLogging instance, customized to use tolerations and taints:

https://docs.openshift.com/container-platform/4.2/logging/config/cluster-logging-tolerations.html

Toleration customization:

      tolerations:
      - effect: NoExecute
        key: logging
        operator: Exists


2. Taint nodes with:

$ oc adm taint nodes <node1|node2|node3> logging=true:NoExecute

Actual results:

After the taint, 'node-ca' pods were deleted from tainted nodes:

$ oc get events -n openshift-image-registry
[...]
53m         Normal    SuccessfulDelete            daemonset/node-ca                                       Deleted pod: node-ca-pjxfn
53m         Normal    SuccessfulDelete            daemonset/node-ca                                       Deleted pod: node-ca-2kclr
53m         Normal    SuccessfulDelete            daemonset/node-ca                                       Deleted pod: node-ca-c5nn9

Expected results:

Pods are not deleted.

Additional info:

Comment 1 Adam Kaplan 2020-02-11 13:47:21 UTC
Duplicate of Bug 1785115 - this will be fixed in v4.4.0.

*** This bug has been marked as a duplicate of bug 1785115 ***

Comment 2 Oleg Bulatov 2020-02-11 16:59:39 UTC
Bug 1785115 was about NoSchedule, but this one about NoExecute. I agree we need to tolerate all effects.

Comment 4 Wenjing Zheng 2020-02-18 10:08:27 UTC
Below toleration is added to node-ca on 4.4.0-0.nightly-2020-02-17-192940 : 
     tolerations:
      - operator: Exists

Comment 5 Chet Hosey 2020-02-27 15:00:05 UTC
Any chance of backporting to 4.2/4.3, or alternative workarounds?

The documented process for setting up dedicated OCS nodes [1] has the user taint the storage nodes, which will break nodeCA.

[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/deploying_openshift_container_storage/index#creating-an-openshift-container-storage-service_rhocs

Comment 6 David Dreeggors 2020-04-02 14:25:22 UTC
It looks like from PR 457 that what you actually have is:

tolerations:
      - effect: NoSchedule
        operator: Exists

not this:

tolerations:
      - operator: Exists


Line 38 (- effect: NoSchedule) is not actually removed correct? This would not not allow for NoExecute taints

Comment 7 David Dreeggors 2020-04-02 14:28:27 UTC
Sorry that was PR 421 I was looking at from a linked BZ 

https://bugzilla.redhat.com/show_bug.cgi?id=1785115

Comment 9 errata-xmlrpc 2020-05-04 11:35:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.