Bug 1801474
| Summary: | node-ca daemonset toleration conflicts with clusterlogging CR | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hugo Cisneiros (Eitch) <hcisneir> | |
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> | |
| Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.2.z | CC: | adam.kaplan, aos-bugs, ChetRHosey, ddreggor, wewang | |
| Target Milestone: | --- | Keywords: | Reopened | |
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: the nodeca daemon didn't tolerate the NoExecute taint, but ClusterLogging documentation recommends to use NoExecute
Consequence: the nodeca daemon doesn't manage certificates on such nodes
Fix: tolerate all taints
Result: additionalTrustedCA are synced to all nodes with any taints
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1820242 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-04 11:35:28 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1820242 | |||
Duplicate of Bug 1785115 - this will be fixed in v4.4.0. *** This bug has been marked as a duplicate of bug 1785115 *** Bug 1785115 was about NoSchedule, but this one about NoExecute. I agree we need to tolerate all effects. Below toleration is added to node-ca on 4.4.0-0.nightly-2020-02-17-192940 :
tolerations:
- operator: Exists
Any chance of backporting to 4.2/4.3, or alternative workarounds? The documented process for setting up dedicated OCS nodes [1] has the user taint the storage nodes, which will break nodeCA. [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/deploying_openshift_container_storage/index#creating-an-openshift-container-storage-service_rhocs It looks like from PR 457 that what you actually have is:
tolerations:
- effect: NoSchedule
operator: Exists
not this:
tolerations:
- operator: Exists
Line 38 (- effect: NoSchedule) is not actually removed correct? This would not not allow for NoExecute taints
Sorry that was PR 421 I was looking at from a linked BZ https://bugzilla.redhat.com/show_bug.cgi?id=1785115 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
Description of problem: When following the documentation for deploying ClusterLogging and adding taints to nodes to only run Logging components, the image registry 'node-ca' daemonset does not include the proper toleration and these nodes with taints don't run the 'node-ca' pods. node-ca daemonset has this toleration: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists To run in all nodes, regardless of any toleration, this could be: tolerations: operator: Exists Version-Release number of selected component (if applicable): 4.2.16 How reproducible: 1. Deploy a ClusterLogging instance, customized to use tolerations and taints: https://docs.openshift.com/container-platform/4.2/logging/config/cluster-logging-tolerations.html Toleration customization: tolerations: - effect: NoExecute key: logging operator: Exists 2. Taint nodes with: $ oc adm taint nodes <node1|node2|node3> logging=true:NoExecute Actual results: After the taint, 'node-ca' pods were deleted from tainted nodes: $ oc get events -n openshift-image-registry [...] 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-pjxfn 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-2kclr 53m Normal SuccessfulDelete daemonset/node-ca Deleted pod: node-ca-c5nn9 Expected results: Pods are not deleted. Additional info: