Bug 1933102 - Canary daemonset uses default node selector
Summary: Canary daemonset uses default node selector
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Stephen Greene
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks: 1934904
TreeView+ depends on / blocked
 
Reported: 2021-02-25 16:50 UTC by Stephen Greene
Modified: 2021-07-30 21:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The canary daemonset does not specify a node selector. Consequence: The canary daemonset uses the default node selector for the canary namespace (worker nodes only). The canary daemonset cannot schedule to infra nodes and in some cases may throw alerts. Fix: Explicitly schedule the canary daemonset to infra nodes. Tolerate infra node taints. Result: Canary daemonset can safely roll out to worker and infra nodes without issues or throwing alerts.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 560 0 None open Bug 1933102: Canary: Override the default node selector for the canary namespace 2021-02-25 16:53:25 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:51 UTC

Description Stephen Greene 2021-02-25 16:50:20 UTC
The canary daemonset is currently unschedulable on infra nodes since the canary namespace has the default worker node only selector. This leads to alerts firing about the ingress canary daemonset being unable to completely roll out in some clusters in some edge cases described in coreos slack. 

We want the canary daemonset to schedule pods to both worker and infra nodes (since infra nodes typically run monitoring workloads and therefore need to be reachable via routes).

The canary namespace needs to override the default node selector via the `openshift.io/node-selector` annotation.  In addition, the canary daemonset needs to specify a linux node selector as well as infra node taint tolerations. Note that specifying the node selector in the canary daemonset is not sufficient since the cluster-wide default node selector will be AND'd with the daemonset's node selector, and because you can only target one node type with a pod node selector.

These changes need to be backported to 4.7 and no further.

Comment 1 Kirk Bater 2021-03-02 21:05:25 UTC
Is there any way this could get backported to 4.6 as well?  We (SREP) are trying to roll out protections on our infra nodes via the `NoSchedule` taint and as we add that taint to clusters the openshift-ingress-canary is throwing DaemonSetMisScheduled alerts (as one would expect as they are not evicted off of the infra nodes).  We have to support 4.6 until 4.8 goes GA, and getting this protection for infra nodes is becoming more and more important by the day as customers end up overloading their clusters and then customer workloads get scheduled to infra nodes.  Otherwise, I think our only path forward will be to evict this DS off of infra nodes until users upgrade to 4.7, which is less than ideal.

Comment 2 Stephen Greene 2021-03-02 21:07:24 UTC
(In reply to Kirk Bater from comment #1)
> Is there any way this could get backported to 4.6 as well?  We (SREP) are
> trying to roll out protections on our infra nodes via the `NoSchedule` taint
> and as we add that taint to clusters the openshift-ingress-canary is
> throwing DaemonSetMisScheduled alerts (as one would expect as they are not
> evicted off of the infra nodes).  We have to support 4.6 until 4.8 goes GA,
> and getting this protection for infra nodes is becoming more and more
> important by the day as customers end up overloading their clusters and then
> customer workloads get scheduled to infra nodes.  Otherwise, I think our
> only path forward will be to evict this DS off of infra nodes until users
> upgrade to 4.7, which is less than ideal.

The canary daemonset is new in OCP 4.7. There is no canary controller component for the ingress operator in OCP 4.6.

Comment 3 Kirk Bater 2021-03-02 21:10:24 UTC
Welp, that sure explains why we're only seeing this on certain clusters then.

Sorry for the bother, but thank you for explaining.

Comment 4 Stephen Greene 2021-03-02 21:11:25 UTC
(In reply to Kirk Bater from comment #3)
> Welp, that sure explains why we're only seeing this on certain clusters then.
> 
> Sorry for the bother, but thank you for explaining.

No worries. Having the canary daemonset tolerate the infra node taint should be sufficient to resolve the issue in your case, right?

Comment 5 Kirk Bater 2021-03-02 21:31:52 UTC
That's correct.  Thank you.

Comment 7 Arvind iyengar 2021-03-09 10:40:24 UTC
verified in "4.8.0-0.nightly-2021-03-05-194645" release version. With this payload, it is observed that the canary namespace now gets "openshift.io/node-selector: """ selector field by default and the canary dameonset now spawns with require toleration support to deploy the pod on nodes with infra roles:
------
$ oc get clusterversion                                                       
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-05-194645   True        False         16m     Cluster version is 4.8.0-0.nightly-2021-03-05-194645

new machineset deployed with infra role:

oc -n openshift-machine-api get machineset          
NAME                                        DESIRED   CURRENT   READY   AVAILABLE   AGE
aiyengar-oc480803-qnlt7-infra-us-east-2a    1         1                             17h
aiyengar-oc480803-qnlt7-worker-us-east-2a   1         1         1       1           18h
aiyengar-oc480803-qnlt7-worker-us-east-2b   1         1         1       1           18h
aiyengar-oc480803-qnlt7-worker-us-east-2c   1         1         1       1           18h

oc get nodes                           
NAME                                         STATUS   ROLES          AGE   VERSION
ip-10-0-135-84.us-east-2.compute.internal    Ready    worker         18h   v1.20.0+aa519d9
ip-10-0-142-76.us-east-2.compute.internal    Ready    infra,worker   29m   v1.20.0+aa519d9
ip-10-0-159-110.us-east-2.compute.internal   Ready    master         18h   v1.20.0+aa519d9
ip-10-0-163-38.us-east-2.compute.internal    Ready    worker         18h   v1.20.0+aa519d9
ip-10-0-166-247.us-east-2.compute.internal   Ready    master         18h   v1.20.0+aa519d9
ip-10-0-209-121.us-east-2.compute.internal   Ready    master         18h   v1.20.0+aa519d9
ip-10-0-216-250.us-east-2.compute.internal   Ready    worker         18h   v1.20.0+aa519d9

oc -n openshift-machine-api get machineset      
NAME                                        DESIRED   CURRENT   READY   AVAILABLE   AGE
aiyengar-oc480803-qnlt7-infra-us-east-2a    1         1         1       1           17h
aiyengar-oc480803-qnlt7-worker-us-east-2a   1         1         1       1           18h
aiyengar-oc480803-qnlt7-worker-us-east-2b   1         1         1       1           18h
aiyengar-oc480803-qnlt7-worker-us-east-2c   1         1         1       1           18h


Canary pods gets deployed on the infra node, even if has a "node-role.kubernetes.io/infra:NoSchedule" taint added:

oc -n openshift-ingress-canary get pods -o wide   
NAME                   READY   STATUS    RESTARTS   AGE   IP           NODE                                         NOMINATED NODE   READINESS GATES
ingress-canary-hz4w8   1/1     Running   0          18h   10.131.0.2   ip-10-0-216-250.us-east-2.compute.internal   <none>           <none>
ingress-canary-l4x7q   1/1     Running   0          40m   10.130.2.2   ip-10-0-142-76.us-east-2.compute.internal    <none>           <none>         <---
ingress-canary-njrkk   1/1     Running   0          18h   10.129.2.5   ip-10-0-163-38.us-east-2.compute.internal    <none>           <none>
ingress-canary-rp5bb   1/1     Running   0          18h   10.128.2.5   ip-10-0-135-84.us-east-2.compute.internal    <none>           <none>

Name:               ip-10-0-142-76.us-east-2.compute.internal
Roles:              infra,worker
Labels:             beta.kubernetes.io/arch=amd64
....
CreationTimestamp:  Tue, 09 Mar 2021 10:59:09 +0530
Taints:             node-role.kubernetes.io/infra:NoSchedule <----
Unschedulable:      false


This is because the canary daemonset has the required toleration in place by default:

oc -n openshift-ingress-canary get daemonset/ingress-canary -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ingress-canary
  namespace: openshift-ingress-canary
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      ingresscanary.operator.openshift.io/daemonset-ingresscanary: canary_controller
  template:
    metadata:
      creationTimestamp: null
      labels:
        ingresscanary.operator.openshift.io/daemonset-ingresscanary: canary_controller
 ....
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:  <-----
      - effect: NoSchedule <------
        key: node-role.kubernetes.io/infra  <-----
        operator: Exists <-----


And the namespace having the selector in place:
oc get ns openshift-ingress-canary -o yaml         
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/node-selector: "" <----
    openshift.io/sa.scc.mcs: s0:c24,c14
    openshift.io/sa.scc.supplemental-groups: 1000580000/10000
------

Comment 10 errata-xmlrpc 2021-07-27 22:48:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 11 Aleksey Usov 2021-07-30 21:16:35 UTC
Wouldn't it be much better to have it tolerate all taints, not just node-role.kubernetes.io/infra? We use a different taint and as it stands now, the only solution I see is to apply defaultTolerations annotation to the entire namespace (openshift-ingress-canary). I found this case after we set all taints and tolerations an I'd rather not change the taint, as I will have to change all tolerations everywhere across all clusters as well.


Note You need to log in before you can comment on or make changes to this bug.