Bug 1671140
Summary: | openshift-operator-lifecycle-manager olm-operators pods do not tolerate masters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | OLM | Assignee: | Evan Cordell <ecordell> |
Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.1.0 | CC: | kkeane, nhale, shlao, sponnaga |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2019-01-30 20:46:21 UTC
I think that unless you have a technical reason why your component won't currently work on masters (e.g. [1]), you should *tolerate* them. This is different from *restricting* to masters; you can certainly continue to tolerate compute nodes as well. One use-case is all-in-one libvirt clusters (one master, zero compute nodes). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 (In reply to W. Trevor King from comment #3) > I think that unless you have a technical reason why your component won't > currently work on masters (e.g. [1]), you should *tolerate* them. This is > different from *restricting* to masters; you can certainly continue to > tolerate compute nodes as well. One use-case is all-in-one libvirt clusters > (one master, zero compute nodes). > > [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 Is tolerating everything acceptable, or is there a specific taint key for masters? tolerations: - operator: "Exists" https://github.com/operator-framework/operator-lifecycle-manager/pull/708 From looking at OLM's other deployments and the cluster-ingress-operator manifests, it seems like this works. I'll let you know once merged. The toleration change has been merged: https://github.com/operator-framework/operator-lifecycle-manager/pull/708 https://jira.coreos.com/browse/ALM-908 Now, the nodeSelector label added in the deployments, all the pods of the OLM running on the master node. Verify it. OLM version id: cce4af21efb662527a8f71d22f7f2c37007ea4bf [jzhang@dhcp-140-18 payload]$ oc get deployment NAME READY UP-TO-DATE AVAILABLE AGE catalog-operator 1/1 1 1 6h26m olm-operator 1/1 1 1 6h26m packageserver 2/2 2 2 36m [jzhang@dhcp-140-18 payload]$ oc get deployment -o yaml |grep nodeSelector -A 3 nodeSelector: beta.kubernetes.io/os: linux node-role.kubernetes.io/master: "" restartPolicy: Always -- nodeSelector: beta.kubernetes.io/os: linux node-role.kubernetes.io/master: "" restartPolicy: Always -- nodeSelector: beta.kubernetes.io/os: linux node-role.kubernetes.io/master: "" restartPolicy: Always [jzhang@dhcp-140-18 payload]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE catalog-operator-7fc5d98dbd-vf5rc 1/1 Running 0 6h26m 10.130.0.11 ip-10-0-141-17.us-east-2.compute.internal <none> olm-operator-75558c6d7-s7mrt 1/1 Running 0 40m 10.130.0.47 ip-10-0-141-17.us-east-2.compute.internal <none> olm-operators-ldt4l 1/1 Running 0 6h26m 10.130.0.12 ip-10-0-141-17.us-east-2.compute.internal <none> packageserver-54d858d7c6-jwkmf 1/1 Running 0 23m 10.128.0.72 ip-10-0-175-83.us-east-2.compute.internal <none> packageserver-54d858d7c6-kx478 1/1 Running 0 23m 10.129.0.54 ip-10-0-154-197.us-east-2.compute.internal <none> [jzhang@dhcp-140-18 payload]$ oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-141-17.us-east-2.compute.internal Ready master 6h49m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/hostname=ip-10-0-141-17,node-role.kubernetes.io/master= ip-10-0-142-252.us-east-2.compute.internal Ready worker 6h34m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/hostname=ip-10-0-142-252,node-role.kubernetes.io/worker= ip-10-0-152-115.us-east-2.compute.internal Ready worker 6h34m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/hostname=ip-10-0-152-115,node-role.kubernetes.io/worker= ip-10-0-154-197.us-east-2.compute.internal Ready master 6h49m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/hostname=ip-10-0-154-197,node-role.kubernetes.io/master= ip-10-0-168-71.us-east-2.compute.internal Ready worker 6h34m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/hostname=ip-10-0-168-71,node-role.kubernetes.io/worker= ip-10-0-175-83.us-east-2.compute.internal Ready master 6h49m v1.12.4+ec459b84aa beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/hostname=ip-10-0-175-83,node-role.kubernetes.io/master= PS: the operators installed by the OLM will not deploy on the master node, it depends on the operator component itself, it as expected. Correct me if I'm wrong. [jzhang@dhcp-140-18 payload]$ oc get pods -n openshift-operators -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE etcd-operator-755449645b-ljfkk 3/3 Running 0 2m37s 10.128.2.17 ip-10-0-152-115.us-east-2.compute.internal <none> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |