Bug 1815039

Summary: Deleting and applying a policy do not enable vfs
Product: OpenShift Container Platform Reporter: Federico Paolinelli <fpaoline>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: bbennett, pliu, zshi
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1834201 (view as bug list) Environment:
Last Closed: 2020-08-04 18:06:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771572, 1834201    
Attachments:
Description Flags
sriov daemon logs + comments none

Description Federico Paolinelli 2020-03-19 11:16:37 UTC
Description of problem:
When deleting and creating a policy asking for vfs, the operator do not enable the vfs but the sync ends successfuly


Version-Release number of selected component (if applicable):
4.4

How reproducible:
Always

Steps to Reproduce:

Start with a clean node:
[root@fci1-installer ~]# oc get sriovnetworknodepolicy -A 
NAMESPACE                          NAME      AGE
openshift-sriov-network-operator   default   25h


[root@fci1-installer ~]# oc get -A sriovnetworknodestates.sriovnetwork.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
  metadata:
    creationTimestamp: "2020-03-18T08:58:46Z"
    generation: 62
    name: NODENAME
    namespace: openshift-sriov-network-operator
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
    resourceVersion: "997897"
    selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
    uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
  spec:
    dpConfigVersion: "997404"
  status:
    interfaces:
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno1
      pciAddress: "0000:19:00.0"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno2
      pciAddress: "0000:19:00.1"
      totalvfs: 5
      vendor: 15b3
    syncStatus: Succeeded
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Create a policy.yaml selecting that node

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: testpolicy
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - eno1
  nodeSelector:
    kubernetes.io/hostname: NODENAME
  numVfs: 5
  priority: 99
  resourceName: testresource

Apply it and wait to settle:

root@fci1-installer ~]# oc get -n openshift-sriov-network-operator  sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2020-03-18T08:58:46Z"
  generation: 63
  name: NODENAME
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
  resourceVersion: "999999"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
  uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
spec:
  dpConfigVersion: "999259"
  interfaces:
  - name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    vfGroups:
    - deviceType: netdevice
      resourceName: testresource
      vfRange: 0-4
status:
  interfaces:
  - Vfs:
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.2"
      vendor: 15b3
      vfID: 0
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.3"
      vendor: 15b3
      vfID: 1
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.4"
      vendor: 15b3
      vfID: 2
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.5"
      vendor: 15b3
      vfID: 3
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.6"
      vendor: 15b3
      vfID: 4
    deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    totalvfs: 5
    vendor: 15b3
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno2
    pciAddress: "0000:19:00.1"
    totalvfs: 5
    vendor: 15b3
  syncStatus: Succeeded

Then delete and recreate the policy without waiting:

[root@fci1-installer ~]# oc delete -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted
[root@fci1-installer ~]# oc create -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created


Wait for the sync to complete.

Actual results:


The Vfs are not enabled

[root@fci1-installer ~]# oc get -n openshift-sriov-network-operator  sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2020-03-18T08:58:46Z"
  generation: 65
  name: NODENAME
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
  resourceVersion: "1002677"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
  uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
spec:
  dpConfigVersion: "1001702"
  interfaces:
  - name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    vfGroups:
    - deviceType: netdevice
      resourceName: testresource
      vfRange: 0-4
status:
  interfaces:
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno1
    pciAddress: "0000:19:00.0"
    totalvfs: 5
    vendor: 15b3
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno2
    pciAddress: "0000:19:00.1"
    totalvfs: 5
    vendor: 15b3
  syncStatus: Succeeded

No vfs are available to the node.

Expected results:

Vfs are available and showed in the node state.

Additional info:

Comment 1 zhaozhanqi 2020-03-20 07:06:38 UTC
hi, Federico

could you attach the config daemon pod logs here? I doubt the config daemon pod is still in process of init VF.

Comment 2 Federico Paolinelli 2020-03-20 08:03:15 UTC
Created attachment 1671710 [details]
sriov daemon logs + comments

Comment 3 Federico Paolinelli 2020-03-20 08:04:13 UTC
Done, in the log there are also some comments, hope they help

Comment 4 Federico Paolinelli 2020-03-20 08:22:39 UTC
Please note also that this:

Then delete and recreate the policy without waiting:

[root@fci1-installer ~]# oc delete -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted
[root@fci1-installer ~]# oc create -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created



is focal for triggering the bug. You don't have to wait for the status to be in sync but you need to delete + create immediately after.

Comment 5 zhaozhanqi 2020-03-24 09:31:43 UTC
thanks. I can reproduce this issue with delete + create policy immediately.

Comment 9 zhaozhanqi 2020-04-20 10:11:35 UTC
Verified this bug on 4.5.0-202004191920

VF can be init when delete and then create the same policy at same time.

Comment 11 errata-xmlrpc 2020-08-04 18:06:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409