Description of problem: One worker node marked as draining and VF initing will be blocked when creating multi SriovNetworkNodePolicy with following steps create two yaml files with policies definition: file1.yaml: it should contain policies for worker1 nic1 and worker2 nic1 file2.yaml: it should contain policies for worker1 nic2 and worker2 nic2 apply file1.yaml wait until worker1 starts reboot apply file2.yaml wait until worker1 started Version-Release number of selected component (if applicable): 4.10.0-fc.1 Red Hat Enterprise Linux CoreOS 410.84.202201122058-0 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-100.rhaos4.10.git77d20b2.el8 sriov operator version: 4.10.0-202201181018 How reproducible: Steps to Reproduce: 1. setup cluster and sriov operator is installed 2. make sure there are two workers have the supported sriov nics 3. Create the following file # cat mlx277-rdma apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx277-dpdk namespace: openshift-sriov-network-operator spec: mtu: 1500 nicSelector: pfNames: - ens2f1 vendor: '15b3' deviceID: '1015' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 3 isRdma: true resourceName: mlx277dpdk # cat mlx278-rdma apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx278-dpdk namespace: openshift-sriov-network-operator spec: mtu: 1550 nicSelector: pfNames: - ens3f1 rootDevices: - '0000:5e:00.1' vendor: '15b3' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 3 isRdma: true resourceName: mlx278dpdk 4. Create the first yaml file oc create -f mlx277-rdma 5. watch the node status, when it gets to reboot and apply another yaml file # oc get node NAME STATUS ROLES AGE VERSION dell-per740-13.rhts.eng.pek2.redhat.com Ready master 57d v1.23.0+50f645e dell-per740-14.rhts.eng.pek2.redhat.com NotReady,SchedulingDisabled worker 56d v1.23.0+50f645e dell-per740-31.rhts.eng.pek2.redhat.com Ready master 57d v1.23.0+50f645e dell-per740-32.rhts.eng.pek2.redhat.com Ready master 57d v1.23.0+50f645e dell-per740-35.rhts.eng.pek2.redhat.com Ready worker 56d v1.23.0+50f645e ####check above worker is in reboot ###then create another yaml file #oc create -f mlx278-rdma After that, Found one worker marked as `SchedulingDisabled` # oc get node NAME STATUS ROLES AGE VERSION dell-per740-13.rhts.eng.pek2.redhat.com Ready,SchedulingDisabled master 57d v1.23.0+50f645e dell-per740-14.rhts.eng.pek2.redhat.com Ready worker 57d v1.23.0+50f645e dell-per740-31.rhts.eng.pek2.redhat.com Ready master 57d v1.23.0+50f645e dell-per740-32.rhts.eng.pek2.redhat.com Ready master 57d v1.23.0+50f645e dell-per740-35.rhts.eng.pek2.redhat.com Ready worker 57d v1.23.0+50f645e Actual results: sriov config daemon already in progress, see details logs in Additional info: # oc get sriovnetworknodestates.sriovnetwork.openshift.io dell-per740-14.rhts.eng.pek2.redhat.com -o yaml . .. mtu: 1500 name: ens2f1 numVfs: 2 pciAddress: 0000:60:00.1 totalvfs: 2 vendor: 15b3 syncStatus: InProgress Expected results: Additional info: details sriov logs: http://file.apac.redhat.com/~zzhao/sriovlog.tar.gz
The fix https://github.com/openshift/sriov-network-operator/commit/d48291194e861bcbcb575b9884d6a0a7a615d461 has been merged with https://github.com/openshift/sriov-network-operator/pull/620
Move this to verified to make it can backport
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069