Bug 1934904
Summary: | Canary daemonset uses default node selector | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Networking | Assignee: | Stephen Greene <sgreene> |
Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aiyengar, amcdermo, aos-bugs, hongli, kbater |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
The canary daemonset does not specify a node selector.
Consequence:
The canary daemonset uses the default node selector for the canary namespace (worker nodes only). The canary daemonset cannot schedule to infra nodes and in some cases may throw alerts.
Fix:
Explicitly schedule the canary daemonset to infra nodes.
Tolerate infra node taints.
Result:
Canary daemonset can safely roll out to worker and infra nodes without issues or throwing alerts.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-03-25 01:53:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1933102 | ||
Bug Blocks: |
Description
OpenShift BugZilla Robot
2021-03-04 00:43:13 UTC
Verified in "4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b". With this payload, the canary deamonset now able to spawn pods on infra nodes as well: ------- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b True False 10m Cluster version is 4.7.0-0.ci.test-2021-03-10-040947-ci-ln-w7wib1b Before machineset creation: $ oc -n openshift-machine-api get machineset NAME DESIRED CURRENT READY AVAILABLE AGE ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 57m ci-ln-w7wib1b-f76d1-rtxkb-worker-c 1 1 1 1 57m ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 57m $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ci-ln-w7wib1b-f76d1-rtxkb-master-0 Running n1-standard-4 us-east1 us-east1-b 57m ci-ln-w7wib1b-f76d1-rtxkb-master-1 Running n1-standard-4 us-east1 us-east1-c 57m ci-ln-w7wib1b-f76d1-rtxkb-master-2 Running n1-standard-4 us-east1 us-east1-d 57m ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Running n1-standard-4 us-east1 us-east1-b 48m ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Running n1-standard-4 us-east1 us-east1-c 48m ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Running n1-standard-4 us-east1 us-east1-d 48m Adding new machinesets: $ oc create -f ci-machineset-test.yaml machineset.machine.openshift.io/ci-ln-w7wib1b-f76d1-rtxkb-infra-d created NAME DESIRED CURRENT READY AVAILABLE AGE ci-ln-w7wib1b-f76d1-rtxkb-worker-b 1 1 1 1 132m ci-ln-w7wib1b-f76d1-rtxkb-worker-c 2 2 2 2 132m ci-ln-w7wib1b-f76d1-rtxkb-worker-d 1 1 1 1 132m ci-ln-w7wib1b-f76d1-rtxkb-worker-inf 2 2 2 2 4m2s <--- oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-w7wib1b-f76d1-rtxkb-master-0 Ready master 137m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-master-1 Ready master 136m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-master-2 Ready master 136m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd Ready worker 129m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp Ready worker 20m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm Ready worker 130m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz Ready worker 127m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 Ready infra,worker 11m v1.20.0+5fbfd19 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt Ready infra,worker 11m v1.20.0+5fbfd19 The canary namespace has required label and the deamonset set now has the default tolerations included for 'infra' role: oc get ns openshift-ingress-canary -o yaml apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "" <------- openshift.io/sa.scc.mcs: s0:c24,c9 openshift.io/sa.scc.supplemental-groups: 1000570000/10000 openshift.io/sa.scc.uid-range: 1000570000/10000 creationTimestamp: "2021-03-10T04:25:57Z" managedFields: - apiVersion: v1 oc -n openshift-ingress-canary get daemonset.apps/ingress-canary -o yaml apiVersion: apps/v1 kind: DaemonSet metadata: labels: ingress.openshift.io/canary: canary_controller ..... nodeSelector: kubernetes.io/os: linux restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra operator: Exists Tainting the infra nodes, the canary pods continues to remain up and functional on those nodes: oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 node-role.kubernetes.io/infra:NoSchedule node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 tainted oc adm taint nodes ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt node-role.kubernetes.io/infra:NoSchedule node/ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt tainted oc -n openshift-ingress-canary get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ingress-canary-56j9l 1/1 Running 0 130m 10.129.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-d-dbhnz <none> <none> ingress-canary-892mt 1/1 Running 0 23m 10.130.2.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-pcgxp <none> <none> ingress-canary-m7z8q 1/1 Running 0 14m 10.131.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-fhhr5 <none> <none> ingress-canary-n6xkv 1/1 Running 0 133m 10.131.0.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-c-sh2fm <none> <none> ingress-canary-t4tbf 1/1 Running 0 14m 10.128.4.2 ci-ln-w7wib1b-f76d1-rtxkb-worker-inf-r85wt <none> <none> ingress-canary-v49w5 1/1 Running 0 133m 10.128.2.5 ci-ln-w7wib1b-f76d1-rtxkb-worker-b-lj4vd <none> <none> oc -n openshift-ingress-canary get daemonset.apps/ingress-canary NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ingress-canary 6 6 6 6 6 kubernetes.io/os=linux 137m ------- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0821 |