Description of problem: Default tolerations of the nmstate-handler daemonset doesn't deploy to nodes with taints: $ oc get ds nmstate-handler -oyaml | yq e '.spec.template.spec.tolerations' - - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists Version-Release number of selected component (if applicable): $ oc version Client Version: 4.7.9 Server Version: 4.7.9 Kubernetes Version: v1.20.0+7d0a2b2 $ oc -n openshift-nmstate get csv | grep nmstate kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202104250659.p0 Succeeded How reproducible: 1. Have nodes with taints, then deploy the nmstate operator and create a nmstate instance. 2. Then check out the pods where it gets deployed. $ oc get nodes -o custom-columns=NODE:.metadata.name,TAINTS:.spec.taints NODE TAINTS ip-10-0-135-5.us-east-2.compute.internal <none> ip-10-0-139-144.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] ip-10-0-145-42.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-152-162.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-165-116.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] ip-10-0-170-249.us-east-2.compute.internal <none> ip-10-0-171-65.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-178-136.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-196-0.us-east-2.compute.internal [map[effect:NoSchedule key:infra value:reserved] map[effect:NoExecute key:infra value:reserved]] ip-10-0-202-245.us-east-2.compute.internal <none> ip-10-0-207-208.us-east-2.compute.internal [map[effect:NoSchedule key:node.ocs.openshift.io/storage value:true]] ip-10-0-218-204.us-east-2.compute.internal [map[effect:NoSchedule key:node-role.kubernetes.io/master]] $ oc -n openshift-nmstate get pod -l name=nmstate-handler -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName --sort-by='{.spec.nodeName}' NAME NODE nmstate-handler-77kzd ip-10-0-135-5.us-east-2.compute.internal nmstate-handler-jb6cb ip-10-0-139-144.us-east-2.compute.internal nmstate-handler-nwpxt ip-10-0-165-116.us-east-2.compute.internal nmstate-handler-6cxl6 ip-10-0-170-249.us-east-2.compute.internal nmstate-handler-tv7xv ip-10-0-202-245.us-east-2.compute.internal nmstate-handler-78nsx ip-10-0-218-204.us-east-2.compute.internal There are only 6 pods running, but there are 12 nodes total. You can see that nmstate-handler pods only run on the master or worker nodes with no taints. Expected results: - Needs to better handle nodes with taints. Maybe need to add the tolerations api in nmstate kind resource. Something like this somewhere to tolerate all taints: tolerations: - operator: "Exists"
hello everyone, We also face the same issue, NMState cannot handle the node with taint! Actural results: nmstate-handler-rwvqd 0/8 nodes are available: 3 node(s) had taint {node.ocs.openshift.io/storage: true}, that the pod didn't tolerate, 5 node(s) didn't match Pod's node affinity. Have any plan to solve this issue?
our openshift version is 4.7.11, NMState Operator version is 4.7.0-202104250659.p0
A fix merged upstream recently: https://github.com/nmstate/kubernetes-nmstate/pull/755 That will need to be pulled in downstream and backported to 4.7.
currrent nmstate version installed from operatorHub is kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202103010125.p0 and bug was opened for $ oc -n openshift-nmstate get csv | grep nmstate kubernetes-nmstate-operator.v4.7.0 Kubernetes NMState Operator 4.7.0-202104250659.p0 so the fix cant be verified.
*** Bug 1977577 has been marked as a duplicate of this bug. ***
The pull request linked to this BZ has been merged in the 4.8 branch so it's fixed it in 4.9 and 4.8. We still need a backport to 4.7 which is also linked in this BZ.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.4 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2983