Bug 1815039 - Deleting and applying a policy do not enable vfs
Summary: Deleting and applying a policy do not enable vfs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1771572 1834201
TreeView+ depends on / blocked
 
Reported: 2020-03-19 11:16 UTC by Federico Paolinelli
Modified: 2020-08-04 18:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1834201 (view as bug list)
Environment:
Last Closed: 2020-08-04 18:06:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sriov daemon logs + comments (335.81 KB, text/plain)
2020-03-20 08:03 UTC, Federico Paolinelli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 175 0 None closed BUG 1815039: Retrieve the latest status of SriovNetworkNodeState CR when update is… 2020-07-09 03:26:17 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:06:07 UTC

Description Federico Paolinelli 2020-03-19 11:16:37 UTC
Description of problem:
When deleting and creating a policy asking for vfs, the operator do not enable the vfs but the sync ends successfuly


Version-Release number of selected component (if applicable):
4.4

How reproducible:
Always

Steps to Reproduce:

Start with a clean node:
[root@fci1-installer ~]# oc get sriovnetworknodepolicy -A 
NAMESPACE                          NAME      AGE
openshift-sriov-network-operator   default   25h


[root@fci1-installer ~]# oc get -A sriovnetworknodestates.sriovnetwork.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
  metadata:
    creationTimestamp: "2020-03-18T08:58:46Z"
    generation: 62
    name: NODENAME
    namespace: openshift-sriov-network-operator
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
    resourceVersion: "997897"
    selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
    uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
  spec:
    dpConfigVersion: "997404"
  status:
    interfaces:
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno1
      pciAddress: "0000:19:00.0"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno2
      pciAddress: "0000:19:00.1"
      totalvfs: 5
      vendor: 15b3
    syncStatus: Succeeded
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Create a policy.yaml selecting that node

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: testpolicy
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - eno1
  nodeSelector:
    kubernetes.io/hostname: NODENAME
  numVfs: 5
  priority: 99
  resourceName: testresource

Apply it and wait to settle:

root@fci1-installer ~]# oc get -n openshift-sriov-network-operator  sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2020-03-18T08:58:46Z"
  generation: 63
  name: NODENAME
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
  resourceVersion: "999999"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
  uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
spec:
  dpConfigVersion: "999259"
  interfaces:
  - name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    vfGroups:
    - deviceType: netdevice
      resourceName: testresource
      vfRange: 0-4
status:
  interfaces:
  - Vfs:
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.2"
      vendor: 15b3
      vfID: 0
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.3"
      vendor: 15b3
      vfID: 1
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.4"
      vendor: 15b3
      vfID: 2
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.5"
      vendor: 15b3
      vfID: 3
    - deviceID: "1016"
      driver: mlx5_core
      mtu: 1500
      pciAddress: "0000:19:00.6"
      vendor: 15b3
      vfID: 4
    deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    totalvfs: 5
    vendor: 15b3
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno2
    pciAddress: "0000:19:00.1"
    totalvfs: 5
    vendor: 15b3
  syncStatus: Succeeded

Then delete and recreate the policy without waiting:

[root@fci1-installer ~]# oc delete -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted
[root@fci1-installer ~]# oc create -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created


Wait for the sync to complete.

Actual results:


The Vfs are not enabled

[root@fci1-installer ~]# oc get -n openshift-sriov-network-operator  sriovnetworknodestates.sriovnetwork.openshift.io NODENAME -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2020-03-18T08:58:46Z"
  generation: 65
  name: NODENAME
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: ef8f10be-3ebe-43de-87b3-8fddb59689b3
  resourceVersion: "1002677"
  selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/NODENAME
  uid: 7d6a4b29-a1cc-4fa3-a05a-7a55fd89b392
spec:
  dpConfigVersion: "1001702"
  interfaces:
  - name: eno1
    numVfs: 5
    pciAddress: "0000:19:00.0"
    vfGroups:
    - deviceType: netdevice
      resourceName: testresource
      vfRange: 0-4
status:
  interfaces:
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno1
    pciAddress: "0000:19:00.0"
    totalvfs: 5
    vendor: 15b3
  - deviceID: "1015"
    driver: mlx5_core
    mtu: 1500
    name: eno2
    pciAddress: "0000:19:00.1"
    totalvfs: 5
    vendor: 15b3
  syncStatus: Succeeded

No vfs are available to the node.

Expected results:

Vfs are available and showed in the node state.

Additional info:

Comment 1 zhaozhanqi 2020-03-20 07:06:38 UTC
hi, Federico

could you attach the config daemon pod logs here? I doubt the config daemon pod is still in process of init VF.

Comment 2 Federico Paolinelli 2020-03-20 08:03:15 UTC
Created attachment 1671710 [details]
sriov daemon logs + comments

Comment 3 Federico Paolinelli 2020-03-20 08:04:13 UTC
Done, in the log there are also some comments, hope they help

Comment 4 Federico Paolinelli 2020-03-20 08:22:39 UTC
Please note also that this:

Then delete and recreate the policy without waiting:

[root@fci1-installer ~]# oc delete -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io "testpolicy" deleted
[root@fci1-installer ~]# oc create -f policy.yaml 
sriovnetworknodepolicy.sriovnetwork.openshift.io/testpolicy created



is focal for triggering the bug. You don't have to wait for the status to be in sync but you need to delete + create immediately after.

Comment 5 zhaozhanqi 2020-03-24 09:31:43 UTC
thanks. I can reproduce this issue with delete + create policy immediately.

Comment 9 zhaozhanqi 2020-04-20 10:11:35 UTC
Verified this bug on 4.5.0-202004191920

VF can be init when delete and then create the same policy at same time.

Comment 11 errata-xmlrpc 2020-08-04 18:06:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.