@zzhao I am unable to reproduce this on my setup for 4.8. [root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-mlx namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics [root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx-2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-mlx-2 namespace: openshift-sriov-network-operator spec: deviceType: netdevice mtu: 1100 nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics I'm creating these two policies and the webhook already stops me from doing this: [root@wsfd-advnetlab50 sriov-network-operator]# oc create -f policy-mlx-2.yaml Error from server (VF index range in ens8f0 is overlapped with existing policy policy-mlx): error when creating "policy-mlx-2.yaml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: VF index range in ens8f0 is overlapped with existing policy policy-mlx Can you please check how to reproduce this?
Balazs Nemeth we need to disable webhook as above first step mentioned. 1. disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`
Verified this bug on 4.8.0-202207180915 # oc get csv -n openshift-sriov-network-operator NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.8.0-202207180915 SR-IOV Network Operator 4.8.0-202207180915 sriov-network-operator.4.8.0-202207071636 Succeeded with steps 1. disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false` 2. Create two policy with same PF . eg # cat intel-dpdk.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci mtu: 1700 nicSelector: deviceID: "158b" pfNames: - ens1f1 rootDevices: - '0000:3b:00.1' vendor: '8086' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 2 priority: 99 resourceName: inteldpdk # cat intel-dpdk.yaml-2 cat: intel-dpdk.yaml-2: No such file or directory [root@dell-per740-36 rhcos]# cat intel-dpdk.yaml_2 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk3 namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci nicSelector: deviceID: "158b" pfNames: - ens1f1 rootDevices: - '0000:3b:00.1' vendor: '8086' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 2 priority: 99 resourceName: inteldpdk3 3. Check the dp logs # oc logs sriov-device-plugin-8w7sm -n openshift-sriov-network-operator I0719 07:09:40.565241 1 manager.go:112] number of config: 2 I0719 07:09:40.565247 1 manager.go:116] I0719 07:09:40.565252 1 manager.go:117] Creating new ResourcePool: inteldpdk I0719 07:09:40.565257 1 manager.go:118] DeviceType: netDevice I0719 07:09:40.569687 1 factory.go:108] device added: [pciAddr: 0000:3b:0a.0, vendor: 8086, device: 154c, driver: vfio-pci] I0719 07:09:40.569701 1 factory.go:108] device added: [pciAddr: 0000:3b:0a.1, vendor: 8086, device: 154c, driver: vfio-pci] I0719 07:09:40.569720 1 manager.go:146] New resource server is created for inteldpdk ResourcePool I0719 07:09:40.569725 1 manager.go:116] I0719 07:09:40.569728 1 manager.go:117] Creating new ResourcePool: inteldpdk3 I0719 07:09:40.569732 1 manager.go:118] DeviceType: netDevice W0719 07:09:40.574739 1 manager.go:159] Cannot add PCI Address [0000:3b:0a.0]. Already allocated. W0719 07:09:40.574752 1 manager.go:159] Cannot add PCI Address [0000:3b:0a.1]. Already allocated. I0719 07:09:40.574757 1 manager.go:132] no devices in device pool, skipping creating resource server for inteldpdk3 I0719 07:09:40.574763 1 main.go:72] Starting all servers... I0719 07:09:40.575039 1 server.go:196] starting inteldpdk device plugin endpoint at: openshift.io_inteldpdk.sock I0719 07:09:40.576932 1 server.go:222] inteldpdk device plugin endpoint started serving I0719 07:09:40.577070 1 main.go:77] All servers started. I0719 07:09:40.577079 1 main.go:78] Listening for term signals I0719 07:09:41.153694 1 server.go:106] Plugin: openshift.io_inteldpdk.sock gets registered successfully at Kubelet I0719 07:09:41.153730 1 server.go:131] ListAndWatch(inteldpdk) invoked I0719 07:09:41.153791 1 server.go:139] ListAndWatch(inteldpdk): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:3b:0a.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:0a.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},} 4. and no inteldpdk3 was created. # oc describe node dell-per740-14.rhts.eng.pek2.redhat.com | grep "openshift.io/inteldpdk" openshift.io/inteldpdk: 2 openshift.io/inteldpdk: 2 openshift.io/inteldpdk 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.47 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5889