Description of problem: After creating SR-IOV policies, neither the allocations nor the VFs are listed in the node until we manually power cycle. Version-Release number of selected component (if applicable): $ oc version Client Version: 4.8.31 Server Version: 4.8.31 Kubernetes Version: v1.21.6+b82a451 $ oc get csv -n openshift-sriov-network-operator NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.8.0-202201210133 SR-IOV Network Operator 4.8.0-202201210133 Succeeded How reproducible: All the time in OCP 4.8 and Mellanox cards. We are running SNO. Steps to Reproduce: 1. Deploy OCP 4.8 Latest 2. Install SR-IOV operator 3. Deploy one or multiple SR-IOV policies with a Mellanox card. 4. Verify the policies are created, but no VFs are created in the node. Actual results: SR-IOV policies are created but the VFs are not listed in the interface nor any allocatable resources in the node. $ oc get SriovNetworkNodePolicy -n openshift-sriov-network-operator NAME AGE default 62m mlnx6-dpdk-node-policy01 32m mlnx6-dpdk-node-policy02 32m mlnx6-dpdk-node-policy03 32m mlnx6-dpdk-node-policy04 32m $ oc get node NAME STATUS ROLES AGE VERSION snohost-02 Ready master,worker 82m v1.21.6+b82a451 $ oc get node snohost-02 -o json | jq .status.allocatable { "cpu": "111500m", "ephemeral-storage": "482690118881", "hugepages-1Gi": "64Gi", "hugepages-2Mi": "0", "memory": "195460796Ki", "pods": "250" } $ sudo ip link show ens8f1 9: ens8f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 0c:42:a1:e1:db:d9 brd ff:ff:ff:ff:ff:ff Expected results: - Requested VFs and allocations are listed in the node. If power cycle is required, SR-IOV and MC should trigger the reboot of the node. [kni05@sno-provisioner01 ~]$ oc get node snohost-02 -o json | jq .status.allocatable { "cpu": "111500m", "ephemeral-storage": "482690118881", "hugepages-1Gi": "64Gi", "hugepages-2Mi": "0", "memory": "195460796Ki", "openshift.io/cucp": "4", "openshift.io/cuup": "4", "openshift.io/n3": "4", "openshift.io/n6": "4", "pods": "250" } [core@snohost-02 ~]$ sudo lspci -nn | grep Eth ... d8:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] d8:02.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:02.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:02.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:02.5 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:02.6 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:02.7 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.5 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.6 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:03.7 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:04.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] d8:04.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] Additional info: [core@snohost-02 ~]$ sudo lspci -nn | grep Eth ... d8:00.1 Ethernet controller [0200]: Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b] [core@snohost-02 ~]$ sudo ethtool -i ens8f1 driver: mlx5_core version: 5.0-0 firmware-version: 20.27.4000 (MT_0000000236) expansion-rom-version: bus-info: 0000:d8:00.1 must-gather data attached and collected via ose-sriov-operator-must-gather image.
Hi I think I reproduced the problem and see how to fix it, the SR-IOV feature was not enabled in the NIC at the BIOS level, so when running lspci I didn't see the capabilities and the file /sys/class/net/<nic-name>/device/sriov_totalvfs didn't exist, I reproduced this with a NIC: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017], but I suspect same behavior occurs with any NIC in the same situation. We have a Dell R640 and an HP ProLiant DL360 Gen10, and got the same results, but the moment I enabled SR-IOV in the NIC (not the global BIOS option), I was able to see the capabilities and the file sriov_totalvfs was created and returned the value I set in the BIOS. Now I can create the SR-IOV policies and the VFs are created without having to reboot, I can also delete the policies and VFs get cleaned. We just have another Dell R740 where we couldn't find anywhere in the BIOS the Mellanox ConnectX-6 card options, I checked some Dell blogs, but the options are not present at all. We suspect an old firmware or old iDRAC version causing this issue. So at this point I do not think it's an OCP or operator issue, but if you have heard of anything related with Mellanox cards, it would be appreciated. I'll also update if we find anything in the next days. Thanks,