Bug 1960446
| Summary: | nmstate operator doesn't handle nodes with taints | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alan Chan <alchan> |
| Component: | Networking | Assignee: | Ben Nemec <bnemec> |
| Networking sub component: | kubernetes-nmstate-operator | QA Contact: | Oleg Sher <osher> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | aos-bugs, eparis, imelofer, jokerman, sdasu, tsedovic, welin |
| Version: | 4.7 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.8.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Incorrect toleration setting on nmstate-handler pod.
Consequence: Handler pods could not be deployed on infra nodes with NoSchedule taints, which made it impossible to configure networking on such nodes with the nmstate-operator.
Fix: Handler pod toleration was changed to allow deployment on all nodes.
Result: The nmstate-operator can now be used to configure networking on all nodes, regardless of taint.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-10 11:27:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1970127 | ||
hello everyone, We also face the same issue, NMState cannot handle the node with taint!
Actural results:
nmstate-handler-rwvqd
0/8 nodes are available: 3 node(s) had taint {node.ocs.openshift.io/storage: true}, that the pod didn't tolerate, 5 node(s) didn't match Pod's node affinity.
Have any plan to solve this issue?
our openshift version is 4.7.11, NMState Operator version is 4.7.0-202104250659.p0 A fix merged upstream recently: https://github.com/nmstate/kubernetes-nmstate/pull/755 That will need to be pulled in downstream and backported to 4.7. currrent nmstate version installed from operatorHub is kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202103010125.p0 and bug was opened for $ oc -n openshift-nmstate get csv | grep nmstate kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202104250659.p0 so the fix cant be verified. *** Bug 1977577 has been marked as a duplicate of this bug. *** The pull request linked to this BZ has been merged in the 4.8 branch so it's fixed it in 4.9 and 4.8. We still need a backport to 4.7 which is also linked in this BZ. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.4 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2983 |
Description of problem: Default tolerations of the nmstate-handler daemonset doesn't deploy to nodes with taints: $ oc get ds nmstate-handler -oyaml | yq e '.spec.template.spec.tolerations' - - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists Version-Release number of selected component (if applicable): $ oc version Client Version: 4.7.9 Server Version: 4.7.9 Kubernetes Version: v1.20.0+7d0a2b2 $ oc -n openshift-nmstate get csv | grep nmstate kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202104250659.p0 Succeeded How reproducible: 1. Have nodes with taints, then deploy the nmstate operator and create a nmstate instance. 2. Then check out the pods where it gets deployed. $ oc get nodes -o custom-columns=NODE:.metadata.name,TAINTS:.spec.taints NODE TAINTS ip-10-0-135-5.us-east-2.compute.internal <none> ip-10-0-139-144.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] ip-10-0-145-42.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-152-162.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-165-116.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] ip-10-0-170-249.us-east-2.compute.internal <none> ip-10-0-171-65.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-178-136.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-196-0.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-202-245.us-east-2.compute.internal <none> ip-10-0-207-208.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-218-204.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] $ oc -n openshift-nmstate get pod -l name=nmstate-handler -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName --sort-by='{.spec.nodeName}' NAME NODE nmstate-handler-77kzd ip-10-0-135-5.us-east-2.compute.internal nmstate-handler-jb6cb ip-10-0-139-144.us-east-2.compute.internal nmstate-handler-nwpxt ip-10-0-165-116.us-east-2.compute.internal nmstate-handler-6cxl6 ip-10-0-170-249.us-east-2.compute.internal nmstate-handler-tv7xv ip-10-0-202-245.us-east-2.compute.internal nmstate-handler-78nsx ip-10-0-218-204.us-east-2.compute.internal There are only 6 pods running, but there are 12 nodes total. You can see that nmstate-handler pods only run on the master or worker nodes with no taints. Expected results: - Needs to better handle nodes with taints. Maybe need to add the tolerations api in nmstate kind resource. Something like this somewhere to tolerate all taints: tolerations: - operator: "Exists"