The canary daemonset is currently unschedulable on infra nodes since the canary namespace has the default worker node only selector. This leads to alerts firing about the ingress canary daemonset being unable to completely roll out in some clusters in some edge cases described in coreos slack. We want the canary daemonset to schedule pods to both worker and infra nodes (since infra nodes typically run monitoring workloads and therefore need to be reachable via routes). The canary namespace needs to override the default node selector via the `openshift.io/node-selector` annotation. In addition, the canary daemonset needs to specify a linux node selector as well as infra node taint tolerations. Note that specifying the node selector in the canary daemonset is not sufficient since the cluster-wide default node selector will be AND'd with the daemonset's node selector, and because you can only target one node type with a pod node selector. These changes need to be backported to 4.7 and no further.
Is there any way this could get backported to 4.6 as well? We (SREP) are trying to roll out protections on our infra nodes via the `NoSchedule` taint and as we add that taint to clusters the openshift-ingress-canary is throwing DaemonSetMisScheduled alerts (as one would expect as they are not evicted off of the infra nodes). We have to support 4.6 until 4.8 goes GA, and getting this protection for infra nodes is becoming more and more important by the day as customers end up overloading their clusters and then customer workloads get scheduled to infra nodes. Otherwise, I think our only path forward will be to evict this DS off of infra nodes until users upgrade to 4.7, which is less than ideal.
(In reply to Kirk Bater from comment #1) > Is there any way this could get backported to 4.6 as well? We (SREP) are > trying to roll out protections on our infra nodes via the `NoSchedule` taint > and as we add that taint to clusters the openshift-ingress-canary is > throwing DaemonSetMisScheduled alerts (as one would expect as they are not > evicted off of the infra nodes). We have to support 4.6 until 4.8 goes GA, > and getting this protection for infra nodes is becoming more and more > important by the day as customers end up overloading their clusters and then > customer workloads get scheduled to infra nodes. Otherwise, I think our > only path forward will be to evict this DS off of infra nodes until users > upgrade to 4.7, which is less than ideal. The canary daemonset is new in OCP 4.7. There is no canary controller component for the ingress operator in OCP 4.6.
Welp, that sure explains why we're only seeing this on certain clusters then. Sorry for the bother, but thank you for explaining.
(In reply to Kirk Bater from comment #3) > Welp, that sure explains why we're only seeing this on certain clusters then. > > Sorry for the bother, but thank you for explaining. No worries. Having the canary daemonset tolerate the infra node taint should be sufficient to resolve the issue in your case, right?
That's correct. Thank you.
verified in "4.8.0-0.nightly-2021-03-05-194645" release version. With this payload, it is observed that the canary namespace now gets "openshift.io/node-selector: """ selector field by default and the canary dameonset now spawns with require toleration support to deploy the pod on nodes with infra roles: ------ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-05-194645 True False 16m Cluster version is 4.8.0-0.nightly-2021-03-05-194645 new machineset deployed with infra role: oc -n openshift-machine-api get machineset NAME DESIRED CURRENT READY AVAILABLE AGE aiyengar-oc480803-qnlt7-infra-us-east-2a 1 1 17h aiyengar-oc480803-qnlt7-worker-us-east-2a 1 1 1 1 18h aiyengar-oc480803-qnlt7-worker-us-east-2b 1 1 1 1 18h aiyengar-oc480803-qnlt7-worker-us-east-2c 1 1 1 1 18h oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-135-84.us-east-2.compute.internal Ready worker 18h v1.20.0+aa519d9 ip-10-0-142-76.us-east-2.compute.internal Ready infra,worker 29m v1.20.0+aa519d9 ip-10-0-159-110.us-east-2.compute.internal Ready master 18h v1.20.0+aa519d9 ip-10-0-163-38.us-east-2.compute.internal Ready worker 18h v1.20.0+aa519d9 ip-10-0-166-247.us-east-2.compute.internal Ready master 18h v1.20.0+aa519d9 ip-10-0-209-121.us-east-2.compute.internal Ready master 18h v1.20.0+aa519d9 ip-10-0-216-250.us-east-2.compute.internal Ready worker 18h v1.20.0+aa519d9 oc -n openshift-machine-api get machineset NAME DESIRED CURRENT READY AVAILABLE AGE aiyengar-oc480803-qnlt7-infra-us-east-2a 1 1 1 1 17h aiyengar-oc480803-qnlt7-worker-us-east-2a 1 1 1 1 18h aiyengar-oc480803-qnlt7-worker-us-east-2b 1 1 1 1 18h aiyengar-oc480803-qnlt7-worker-us-east-2c 1 1 1 1 18h Canary pods gets deployed on the infra node, even if has a "node-role.kubernetes.io/infra:NoSchedule" taint added: oc -n openshift-ingress-canary get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ingress-canary-hz4w8 1/1 Running 0 18h 10.131.0.2 ip-10-0-216-250.us-east-2.compute.internal <none> <none> ingress-canary-l4x7q 1/1 Running 0 40m 10.130.2.2 ip-10-0-142-76.us-east-2.compute.internal <none> <none> <--- ingress-canary-njrkk 1/1 Running 0 18h 10.129.2.5 ip-10-0-163-38.us-east-2.compute.internal <none> <none> ingress-canary-rp5bb 1/1 Running 0 18h 10.128.2.5 ip-10-0-135-84.us-east-2.compute.internal <none> <none> Name: ip-10-0-142-76.us-east-2.compute.internal Roles: infra,worker Labels: beta.kubernetes.io/arch=amd64 .... CreationTimestamp: Tue, 09 Mar 2021 10:59:09 +0530 Taints: node-role.kubernetes.io/infra:NoSchedule <---- Unschedulable: false This is because the canary daemonset has the required toleration in place by default: oc -n openshift-ingress-canary get daemonset/ingress-canary -o yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: ingress-canary namespace: openshift-ingress-canary spec: revisionHistoryLimit: 10 selector: matchLabels: ingresscanary.operator.openshift.io/daemonset-ingresscanary: canary_controller template: metadata: creationTimestamp: null labels: ingresscanary.operator.openshift.io/daemonset-ingresscanary: canary_controller .... nodeSelector: kubernetes.io/os: linux restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: <----- - effect: NoSchedule <------ key: node-role.kubernetes.io/infra <----- operator: Exists <----- And the namespace having the selector in place: oc get ns openshift-ingress-canary -o yaml apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "" <---- openshift.io/sa.scc.mcs: s0:c24,c14 openshift.io/sa.scc.supplemental-groups: 1000580000/10000 ------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
Wouldn't it be much better to have it tolerate all taints, not just node-role.kubernetes.io/infra? We use a different taint and as it stands now, the only solution I see is to apply defaultTolerations annotation to the entire namespace (openshift-ingress-canary). I found this case after we set all taints and tolerations an I'd rather not change the taint, as I will have to change all tolerations everywhere across all clusters as well.