Description of problem: When deleting and creating a policy asking for vfs, the operator do not enable the vfs but the sync ends successfuly Version-Release number of selected component (if applicable): 4.4 How reproducible: Always Steps to Reproduce: Start with a clean node: [root@fci1-installer ~]# oc get sriovnetworknodepolicy -A NAMESPACE NAME AGE openshift-sriov-network-operator default 25h [root@fci1-installer ~]# oc get -A sriovnetworknodestates.sriovnetwork.openshift.io -o yaml apiVersion: v1 items: - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: creationTimestamp: "2020-03-18T08:58:46Z" generation: 62 name: NODENAME namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3 resourceVersion: "997897" selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392 spec: dpConfigVersion: "997404" status: interfaces: - deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno1 pciAddress: "0000:19:00.0" totalvfs: 5 vendor: 15b3 - deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno2 pciAddress: "0000:19:00.1" totalvfs: 5 vendor: 15b3 syncStatus: Succeeded kind: List metadata: resourceVersion: "" selfLink: "" Create a policy.yaml selecting that node apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: testpolicy namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: pfNames: - eno1 nodeSelector: kubernetes.io/hostname: NODENAME numVfs: 5 priority: 99 resourceName: testresource Apply it and wait to settle: root@fci1-installer ~]# oc get -n openshift-sriov-network-operator sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: creationTimestamp: "2020-03-18T08:58:46Z" generation: 63 name: NODENAME namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3 resourceVersion: "999999" selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392 spec: dpConfigVersion: "999259" interfaces: - name: eno1 numVfs: 5 pciAddress: "0000:19:00.0" vfGroups: - deviceType: netdevice resourceName: testresource vfRange: 0-4 status: interfaces: - Vfs: - deviceID: "1016" driver: mlx5_core mtu: 1500 pciAddress: "0000:19:00.2" vendor: 15b3 vfID: 0 - deviceID: "1016" driver: mlx5_core mtu: 1500 pciAddress: "0000:19:00.3" vendor: 15b3 vfID: 1 - deviceID: "1016" driver: mlx5_core mtu: 1500 pciAddress: "0000:19:00.4" vendor: 15b3 vfID: 2 - deviceID: "1016" driver: mlx5_core mtu: 1500 pciAddress: "0000:19:00.5" vendor: 15b3 vfID: 3 - deviceID: "1016" driver: mlx5_core mtu: 1500 pciAddress: "0000:19:00.6" vendor: 15b3 vfID: 4 deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno1 numVfs: 5 pciAddress: "0000:19:00.0" totalvfs: 5 vendor: 15b3 - deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno2 pciAddress: "0000:19:00.1" totalvfs: 5 vendor: 15b3 syncStatus: Succeeded Then delete and recreate the policy without waiting: [root@fci1-installer ~]# oc delete -f policy.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted [root@fci1-installer ~]# oc create -f policy.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created Wait for the sync to complete. Actual results: The Vfs are not enabled [root@fci1-installer ~]# oc get -n openshift-sriov-network-operator sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: creationTimestamp: "2020-03-18T08:58:46Z" generation: 65 name: NODENAME namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3 resourceVersion: "1002677" selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392 spec: dpConfigVersion: "1001702" interfaces: - name: eno1 numVfs: 5 pciAddress: "0000:19:00.0" vfGroups: - deviceType: netdevice resourceName: testresource vfRange: 0-4 status: interfaces: - deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno1 pciAddress: "0000:19:00.0" totalvfs: 5 vendor: 15b3 - deviceID: "1015" driver: mlx5_core mtu: 1500 name: eno2 pciAddress: "0000:19:00.1" totalvfs: 5 vendor: 15b3 syncStatus: Succeeded No vfs are available to the node. Expected results: Vfs are available and showed in the node state. Additional info:
hi, Federico could you attach the config daemon pod logs here? I doubt the config daemon pod is still in process of init VF.
Created attachment 1671710 [details] sriov daemon logs + comments
Done, in the log there are also some comments, hope they help
Please note also that this: Then delete and recreate the policy without waiting: [root@fci1-installer ~]# oc delete -f policy.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted [root@fci1-installer ~]# oc create -f policy.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created is focal for triggering the bug. You don't have to wait for the status to be in sync but you need to delete + create immediately after.
thanks. I can reproduce this issue with delete + create policy immediately.
Verified this bug on 4.5.0-202004191920 VF can be init when delete and then create the same policy at same time.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409