1934904 – Canary daemonset uses default node selector

Bug 1934904 - Canary daemonset uses default node selector

Summary: Canary daemonset uses default node selector

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Stephen Greene
QA Contact:	Melvin Joseph
Docs Contact:
URL:
Whiteboard:
Depends On:	1933102
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-04 00:43 UTC by OpenShift BugZilla Robot
Modified:	2022-11-26 16:30 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The canary daemonset does not specify a node selector. Consequence: The canary daemonset uses the default node selector for the canary namespace (worker nodes only). The canary daemonset cannot schedule to infra nodes and in some cases may throw alerts. Fix: Explicitly schedule the canary daemonset to infra nodes. Tolerate infra node taints. Result: Canary daemonset can safely roll out to worker and infra nodes without issues or throwing alerts.
Clone Of:
Environment:
Last Closed:	2021-03-25 01:53:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 564	0	None	open	[release-4.7] Bug 1934904: Canary: Schedule canary server pods to worker and infra nodes	2021-03-09 15:06:00 UTC
Red Hat Product Errata	RHBA-2021:0821	0	None	None	None	2021-03-25 01:53:14 UTC

Description OpenShift BugZilla Robot 2021-03-04 00:43:13 UTC

+++ This bug was initially created as a clone of Bug #1933102 +++

The canary daemonset is currently unschedulable on infra nodes since the canary namespace has the default worker node only selector. This leads to alerts firing about the ingress canary daemonset being unable to completely roll out in some clusters in some edge cases described in coreos slack. 

We want the canary daemonset to schedule pods to both worker and infra nodes (since infra nodes typically run monitoring workloads and therefore need to be reachable via routes).

The canary namespace needs to override the default node selector via the `openshift.io/node-selector` annotation.  In addition, the canary daemonset needs to specify a linux node selector as well as infra node taint tolerations. Note that specifying the node selector in the canary daemonset is not sufficient since the cluster-wide default node selector will be AND'd with the daemonset's node selector, and because you can only target one node type with a pod node selector.

These changes need to be backported to 4.7 and no further.

--- Additional comment from kbater on 2021-03-02 21:05:25 UTC ---

Is there any way this could get backported to 4.6 as well?  We (SREP) are trying to roll out protections on our infra nodes via the `NoSchedule` taint and as we add that taint to clusters the openshift-ingress-canary is throwing DaemonSetMisScheduled alerts (as one would expect as they are not evicted off of the infra nodes).  We have to support 4.6 until 4.8 goes GA, and getting this protection for infra nodes is becoming more and more important by the day as customers end up overloading their clusters and then customer workloads get scheduled to infra nodes.  Otherwise, I think our only path forward will be to evict this DS off of infra nodes until users upgrade to 4.7, which is less than ideal.

--- Additional comment from sgreene on 2021-03-02 21:07:24 UTC ---

(In reply to Kirk Bater from comment #1)
> Is there any way this could get backported to 4.6 as well?  We (SREP) are
> trying to roll out protections on our infra nodes via the `NoSchedule` taint
> and as we add that taint to clusters the openshift-ingress-canary is
> throwing DaemonSetMisScheduled alerts (as one would expect as they are not
> evicted off of the infra nodes).  We have to support 4.6 until 4.8 goes GA,
> and getting this protection for infra nodes is becoming more and more
> important by the day as customers end up overloading their clusters and then
> customer workloads get scheduled to infra nodes.  Otherwise, I think our
> only path forward will be to evict this DS off of infra nodes until users
> upgrade to 4.7, which is less than ideal.

The canary daemonset is new in OCP 4.7. There is no canary controller component for the ingress operator in OCP 4.6.

--- Additional comment from kbater on 2021-03-02 21:10:24 UTC ---

Welp, that sure explains why we're only seeing this on certain clusters then.

Sorry for the bother, but thank you for explaining.

--- Additional comment from sgreene on 2021-03-02 21:11:25 UTC ---

(In reply to Kirk Bater from comment #3)
> Welp, that sure explains why we're only seeing this on certain clusters then.
> 
> Sorry for the bother, but thank you for explaining.

No worries. Having the canary daemonset tolerate the infra node taint should be sufficient to resolve the issue in your case, right?

--- Additional comment from kbater on 2021-03-02 21:31:52 UTC ---

That's correct.  Thank you.

Comment 1 Arvind iyengar 2021-03-10 06:52:32 UTC

Verified in "4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b". With this payload, the canary deamonset now able to spawn pods on infra nodes as well:
-------
oc get clusterversion                    
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b   True        False         10m     Cluster version is 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b

Before machineset creation:
$ oc -n openshift-machine-api get machineset
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b   1         1         1       1           57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c   1         1         1       1           57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d   1         1         1       1           57m

$ oc get machines -n openshift-machine-api  
NAME                                       PHASE     TYPE            REGION     ZONE         AGE
ci-ln-w7wib1b-f76d1-rtxkb-master-0         Running   n1-standard-4   us-east1   us-east1-b   57m
ci-ln-w7wib1b-f76d1-rtxkb-master-1         Running   n1-standard-4   us-east1   us-east1-c   57m
ci-ln-w7wib1b-f76d1-rtxkb-master-2         Running   n1-standard-4   us-east1   us-east1-d   57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd   Running   n1-standard-4   us-east1   us-east1-b   48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm   Running   n1-standard-4   us-east1   us-east1-c   48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz   Running   n1-standard-4   us-east1   us-east1-d   48m

Adding new machinesets:
$ oc create -f ci-machineset-test.yaml 
machineset.machine.openshift.io/ci-ln-w7wib1b-f76d1-rtxkb-infra-d created

NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b     1         1         1       1           132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c     2         2         2       2           132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d     1         1         1       1           132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf   2         2         2       2           4m2s <---

oc get nodes                          
NAME                                         STATUS   ROLES          AGE    VERSION
ci-ln-w7wib1b-f76d1-rtxkb-master-0           Ready    master         137m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-1           Ready    master         136m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-2           Ready    master         136m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd     Ready    worker         129m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp     Ready    worker         20m    v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm     Ready    worker         130m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz     Ready    worker         127m   v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5   Ready    infra,worker   11m    v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt   Ready    infra,worker   11m    v1.20.0+5fbfd19

The canary namespace has required label and the deamonset set now has the default tolerations included for 'infra' role: 
oc get ns  openshift-ingress-canary -o yaml 
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/node-selector: ""   <-------
    openshift.io/sa.scc.mcs: s0:c24,c9
    openshift.io/sa.scc.supplemental-groups: 1000570000/10000
    openshift.io/sa.scc.uid-range: 1000570000/10000
  creationTimestamp: "2021-03-10T04:25:57Z"
  managedFields:
  - apiVersion: v1

oc -n openshift-ingress-canary get daemonset.apps/ingress-canary -o yaml       
 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
   labels:
     ingress.openshift.io/canary: canary_controller
 .....
       nodeSelector:
         kubernetes.io/os: linux
       restartPolicy: Always
       schedulerName: default-scheduler
       securityContext: {}
       terminationGracePeriodSeconds: 30
       tolerations:
       - effect: NoSchedule
         key: node-role.kubernetes.io/infra
         operator: Exists


Tainting the infra nodes, the canary pods continues to remain up and functional on those nodes:

oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 node-role.kubernetes.io/infra:NoSchedule
node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 tainted

oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt node-role.kubernetes.io/infra:NoSchedule
node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt tainted

oc -n openshift-ingress-canary get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE    IP           NODE                                         NOMINATED NODE   READINESS GATES
ingress-canary-56j9l   1/1     Running   0          130m   10.129.2.5   ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz     <none>           <none>
ingress-canary-892mt   1/1     Running   0          23m    10.130.2.2   ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp     <none>           <none>
ingress-canary-m7z8q   1/1     Running   0          14m    10.131.2.5   ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5   <none>           <none>
ingress-canary-n6xkv   1/1     Running   0          133m   10.131.0.2   ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm     <none>           <none>
ingress-canary-t4tbf   1/1     Running   0          14m    10.128.4.2   ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt   <none>           <none>
ingress-canary-v49w5   1/1     Running   0          133m   10.128.2.5   ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd     <none>           <none>

oc -n openshift-ingress-canary get daemonset.apps/ingress-canary
NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
ingress-canary   6         6         6       6            6           kubernetes.io/os=linux   137m
-------

Comment 5 errata-xmlrpc 2021-03-25 01:53:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0821

Note You need to log in before you can comment on or make changes to this bug.