Bug 1785115
Summary: | No global tolerations for NodeCA DaemonSet | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | rdomnu | |
Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> | |
Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.2.z | CC: | adam.kaplan, adeshpan, andcosta, aos-bugs, ChetRHosey, ddreggor, hcisneir, mharri, susuresh, wewang | |
Target Milestone: | --- | |||
Target Release: | 4.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: the nodeca daemonset didn't tolerate NoSchedule taints
Consequence: its pods were missing on such nodes
Fix: add toleration
Result: tainted nodes received updates from the nodeca daemonset
|
Story Points: | --- | |
Clone Of: | ||||
: | 1808431 (view as bug list) | Environment: | ||
Last Closed: | 2020-05-04 11:20:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1808431 |
Description
rdomnu
2019-12-19 07:49:10 UTC
After tainting the nodes, the number of pods NodeCA DaemonSet controls is the same with 4.4.0-0.nightly-2020-02-03-005212: spec: providerID: aws:///us-east-2b/i-097409042f4872c6c taints: - effect: NoSchedule key: infra value: "true" Any chance of backporting to 4.2/4.3, or alternative workarounds? The documented process for setting up dedicated OCS nodes [1] has the user taint the storage nodes, which will break nodeCA. [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/deploying_openshift_container_storage/index#creating-an-openshift-container-storage-service_rhocs *** Bug 1801474 has been marked as a duplicate of this bug. *** @Chet - backport PRs are now up [1][2], there will be separate BZs to track the release process for 4.3.z and 4.2.z. It may take some time for the 4.2.z fix to go out since it must be released in 4.3.z first. Currently (02/28/2020) there is a large backlog of 4.3.z fixes, and we are gating patches based our QE teams's capacity. [1] https://github.com/openshift/cluster-image-registry-operator/pull/472 [2] https://github.com/openshift/cluster-image-registry-operator/pull/473 These PRs are not correct for NoExecute I think, should this be : tolerations: - operator: Exists As seen in this PR? https://github.com/openshift/cluster-image-registry-operator/pull/457 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |