+++ This bug was initially created as a clone of Bug #1933102 +++
The canary daemonset is currently unschedulable on infra nodes since the canary namespace has the default worker node only selector. This leads to alerts firing about the ingress canary daemonset being unable to completely roll out in some clusters in some edge cases described in coreos slack.
We want the canary daemonset to schedule pods to both worker and infra nodes (since infra nodes typically run monitoring workloads and therefore need to be reachable via routes).
The canary namespace needs to override the default node selector via the `openshift.io/node-selector` annotation. In addition, the canary daemonset needs to specify a linux node selector as well as infra node taint tolerations. Note that specifying the node selector in the canary daemonset is not sufficient since the cluster-wide default node selector will be AND'd with the daemonset's node selector, and because you can only target one node type with a pod node selector.
These changes need to be backported to 4.7 and no further.
--- Additional comment from email@example.com on 2021-03-02 21:05:25 UTC ---
Is there any way this could get backported to 4.6 as well? We (SREP) are trying to roll out protections on our infra nodes via the `NoSchedule` taint and as we add that taint to clusters the openshift-ingress-canary is throwing DaemonSetMisScheduled alerts (as one would expect as they are not evicted off of the infra nodes). We have to support 4.6 until 4.8 goes GA, and getting this protection for infra nodes is becoming more and more important by the day as customers end up overloading their clusters and then customer workloads get scheduled to infra nodes. Otherwise, I think our only path forward will be to evict this DS off of infra nodes until users upgrade to 4.7, which is less than ideal.
--- Additional comment from firstname.lastname@example.org on 2021-03-02 21:07:24 UTC ---
(In reply to Kirk Bater from comment #1)
> Is there any way this could get backported to 4.6 as well? We (SREP) are
> trying to roll out protections on our infra nodes via the `NoSchedule` taint
> and as we add that taint to clusters the openshift-ingress-canary is
> throwing DaemonSetMisScheduled alerts (as one would expect as they are not
> evicted off of the infra nodes). We have to support 4.6 until 4.8 goes GA,
> and getting this protection for infra nodes is becoming more and more
> important by the day as customers end up overloading their clusters and then
> customer workloads get scheduled to infra nodes. Otherwise, I think our
> only path forward will be to evict this DS off of infra nodes until users
> upgrade to 4.7, which is less than ideal.
The canary daemonset is new in OCP 4.7. There is no canary controller component for the ingress operator in OCP 4.6.
--- Additional comment from email@example.com on 2021-03-02 21:10:24 UTC ---
Welp, that sure explains why we're only seeing this on certain clusters then.
Sorry for the bother, but thank you for explaining.
--- Additional comment from firstname.lastname@example.org on 2021-03-02 21:11:25 UTC ---
(In reply to Kirk Bater from comment #3)
> Welp, that sure explains why we're only seeing this on certain clusters then.
> Sorry for the bother, but thank you for explaining.
No worries. Having the canary daemonset tolerate the infra node taint should be sufficient to resolve the issue in your case, right?
--- Additional comment from email@example.com on 2021-03-02 21:31:52 UTC ---
That's correct. Thank you.
Verified in "4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b". With this payload, the canary deamonset now able to spawn pods on infra nodes as well:
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b True False 10m Cluster version is 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b
Before machineset creation:
$ oc -n openshift-machine-api get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c 1 1 1 1 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 57m
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ci-ln-w7wib1b-f76d1-rtxkb-master-0 Running n1-standard-4 us-east1 us-east1-b 57m
ci-ln-w7wib1b-f76d1-rtxkb-master-1 Running n1-standard-4 us-east1 us-east1-c 57m
ci-ln-w7wib1b-f76d1-rtxkb-master-2 Running n1-standard-4 us-east1 us-east1-d 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Running n1-standard-4 us-east1 us-east1-b 48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Running n1-standard-4 us-east1 us-east1-c 48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Running n1-standard-4 us-east1 us-east1-d 48m
Adding new machinesets:
$ oc create -f ci-machineset-test.yaml
NAME DESIRED CURRENT READY AVAILABLE AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c 2 2 2 2 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf 2 2 2 2 4m2s <---
oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-w7wib1b-f76d1-rtxkb-master-0 Ready master 137m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-1 Ready master 136m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-2 Ready master 136m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Ready worker 129m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp Ready worker 20m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Ready worker 130m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Ready worker 127m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 Ready infra,worker 11m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt Ready infra,worker 11m v1.20.0+5fbfd19
The canary namespace has required label and the deamonset set now has the default tolerations included for 'infra' role:
oc get ns openshift-ingress-canary -o yaml
openshift.io/node-selector: "" <-------
- apiVersion: v1
oc -n openshift-ingress-canary get daemonset.apps/ingress-canary -o yaml
- effect: NoSchedule
Tainting the infra nodes, the canary pods continues to remain up and functional on those nodes:
oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 node-role.kubernetes.io/infra:NoSchedule
oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt node-role.kubernetes.io/infra:NoSchedule
oc -n openshift-ingress-canary get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-canary-56j9l 1/1 Running 0 130m 10.129.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz <none> <none>
ingress-canary-892mt 1/1 Running 0 23m 10.130.2.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp <none> <none>
ingress-canary-m7z8q 1/1 Running 0 14m 10.131.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 <none> <none>
ingress-canary-n6xkv 1/1 Running 0 133m 10.131.0.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm <none> <none>
ingress-canary-t4tbf 1/1 Running 0 14m 10.128.4.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt <none> <none>
ingress-canary-v49w5 1/1 Running 0 133m 10.128.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd <none> <none>
oc -n openshift-ingress-canary get daemonset.apps/ingress-canary
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ingress-canary 6 6 6 6 6 kubernetes.io/os=linux 137m
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.