Description of problem: By default, the "nodeselector" for nmstate-handler ds is "beta.kubernetes.io/arch=amd64". This also matches Windows nodes added to the cluster. So it will try to start "nmstate-handler" also on Windows nodes and will be in "pending" status. If I understand the code correctly, the NodesRunningNmstate is calculated by comparing "get nodes --selector=beta.kubernetes.io/arch=amd64" and the pod.Spec.NodeName from the "get pods --selector 'component=kubernetes-nmstate-handler'". This also counts Windows node since nmstate-handler is also scheduled on these nodes although the status is "pending". So when any nncp is created, it will wait for the nnce to get created on the Windows node as well. Since nmstate-handler will be always in pending status in Windows node, nnce will never get created and hence the state of the nncp will be "ConfigurationProgressing" forever. ~~~ message: Policy is progressing 7/9 nodes finished reason: ConfigurationProgressing status: Unknown type: Available ~~~ Here these two nodes are Windows nodes. Version-Release number of selected component (if applicable): v2.6.6 How reproducible: 100% Steps to Reproduce: Try to create nncp on an Openshift cluster that has Windows nodes. Actual results: nncp in progressing state forever when cluster is having Windows node Expected results: Probably, we also have to add "beta.kubernetes.io/os: linux" in nodeSelector for nmstate-handler ds since nmstate won't work with Windows nodes. Additional info:
Thanks for filing this bug, Nijin. We do not support OpenShift Virtualization on Windows workers. virt-handler has kubernetes.io/os=linux as its nodeSelector. It makes sense to add it to the network DaemonSets: bridge-marker, kube-cni-linux-bridge-plugin, nmstate-handler, ovs-cni-amd64.
(In reply to Dan Kenigsberg from comment #1) > Thanks for filing this bug, Nijin. We do not support OpenShift > Virtualization on Windows workers. Thank you, Dan. Yes, but I think we should be able to ignore the Windows worker nodes if added to the same cluster. > virt-handler has kubernetes.io/os=linux > as its nodeSelector. It makes sense to add it to the network DaemonSets: > bridge-marker, kube-cni-linux-bridge-plugin, nmstate-handler, ovs-cni-amd64. I think that will help here.
network Daemonsets still hold the old nodeSelector - "beta.kubernetes.io/arch=amd64"
It seems that https://github.com/nmstate/kubernetes-nmstate/pull/856 did not fix the issue. It may be due to CNAO overwriting the placement configuration https://github.com/kubevirt/cluster-network-addons-operator/blob/8d0037553962ff72226a817036214b6017fcce20/data/nmstate/operand/operator.yaml#L28.
Grooming: Meni raised that a better and more explicit approach would be to fail NNCP if its selector matches non-supported (Windows) nodes instead of silently ignoring them
We need to put linux as the default placement configuration at CNAO https://github.com/kubevirt/cluster-network-addons-operator/blob/main/pkg/network/placement_configuration.go#L54-L56
Looks like we did already at CNAO just for nmstate, https://github.com/kubevirt/cluster-network-addons-operator/pull/1124, are we sure we are testing it ?
Checking an OCP 4.10 cluster looks like os: linux is there nodeSelector: beta.kubernetes.io/arch: amd64 kubernetes.io/os: linux I think we can close this bz.
Verified on OCP 4.10, k-nmstate-handler-4.10.0-41. Same results as Quique - nodeSelector: beta.kubernetes.io/arch: amd64 kubernetes.io/os: linux Nodes hold kubernetes.io/os: linux label.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947