Bug 1934904
| Summary: | Canary daemonset uses default node selector | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Networking | Assignee: | Stephen Greene <sgreene> |
| Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aiyengar, amcdermo, aos-bugs, hongli, kbater |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
The canary daemonset does not specify a node selector.
Consequence:
The canary daemonset uses the default node selector for the canary namespace (worker nodes only). The canary daemonset cannot schedule to infra nodes and in some cases may throw alerts.
Fix:
Explicitly schedule the canary daemonset to infra nodes.
Tolerate infra node taints.
Result:
Canary daemonset can safely roll out to worker and infra nodes without issues or throwing alerts.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-25 01:53:00 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1933102 | ||
| Bug Blocks: | |||
|
Description
OpenShift BugZilla Robot
2021-03-04 00:43:13 UTC
Verified in "4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b". With this payload, the canary deamonset now able to spawn pods on infra nodes as well:
-------
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b True False 10m Cluster version is 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b
Before machineset creation:
$ oc -n openshift-machine-api get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c 1 1 1 1 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 57m
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ci-ln-w7wib1b-f76d1-rtxkb-master-0 Running n1-standard-4 us-east1 us-east1-b 57m
ci-ln-w7wib1b-f76d1-rtxkb-master-1 Running n1-standard-4 us-east1 us-east1-c 57m
ci-ln-w7wib1b-f76d1-rtxkb-master-2 Running n1-standard-4 us-east1 us-east1-d 57m
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Running n1-standard-4 us-east1 us-east1-b 48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Running n1-standard-4 us-east1 us-east1-c 48m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Running n1-standard-4 us-east1 us-east1-d 48m
Adding new machinesets:
$ oc create -f ci-machineset-test.yaml
machineset.machine.openshift.io/ci-ln-w7wib1b-f76d1-rtxkb-infra-d created
NAME DESIRED CURRENT READY AVAILABLE AGE
ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-c 2 2 2 2 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 132m
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf 2 2 2 2 4m2s <---
oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-w7wib1b-f76d1-rtxkb-master-0 Ready master 137m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-1 Ready master 136m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-master-2 Ready master 136m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Ready worker 129m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp Ready worker 20m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Ready worker 130m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Ready worker 127m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 Ready infra,worker 11m v1.20.0+5fbfd19
ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt Ready infra,worker 11m v1.20.0+5fbfd19
The canary namespace has required label and the deamonset set now has the default tolerations included for 'infra' role:
oc get ns openshift-ingress-canary -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/node-selector: "" <-------
openshift.io/sa.scc.mcs: s0:c24,c9
openshift.io/sa.scc.supplemental-groups: 1000570000/10000
openshift.io/sa.scc.uid-range: 1000570000/10000
creationTimestamp: "2021-03-10T04:25:57Z"
managedFields:
- apiVersion: v1
oc -n openshift-ingress-canary get daemonset.apps/ingress-canary -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
ingress.openshift.io/canary: canary_controller
.....
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/infra
operator: Exists
Tainting the infra nodes, the canary pods continues to remain up and functional on those nodes:
oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 node-role.kubernetes.io/infra:NoSchedule
node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 tainted
oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt node-role.kubernetes.io/infra:NoSchedule
node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt tainted
oc -n openshift-ingress-canary get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-canary-56j9l 1/1 Running 0 130m 10.129.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz <none> <none>
ingress-canary-892mt 1/1 Running 0 23m 10.130.2.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp <none> <none>
ingress-canary-m7z8q 1/1 Running 0 14m 10.131.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 <none> <none>
ingress-canary-n6xkv 1/1 Running 0 133m 10.131.0.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm <none> <none>
ingress-canary-t4tbf 1/1 Running 0 14m 10.128.4.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt <none> <none>
ingress-canary-v49w5 1/1 Running 0 133m 10.128.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd <none> <none>
oc -n openshift-ingress-canary get daemonset.apps/ingress-canary
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ingress-canary 6 6 6 6 6 kubernetes.io/os=linux 137m
-------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0821 |