Bug 1860286 - [sriov] [4.5] VF cannot be inited when apply one policy if the 'default' policy is deleted and restored by operator
Summary: [sriov] [4.5] VF cannot be inited when apply one policy if the 'default' poli...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.5.z
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1860302
Blocks: 1860288 1871681
TreeView+ depends on / blocked
 
Reported: 2020-07-24 08:55 UTC by zhaozhanqi
Modified: 2020-09-08 10:54 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1860288 1860302 (view as bug list)
Environment:
Last Closed: 2020-09-08 10:54:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sriov operator and config daemon logs (120.00 KB, application/x-tar)
2020-07-24 08:55 UTC, zhaozhanqi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 311 0 None closed [release-4.5] Bug 1860286: load plugins only when none has been loaded 2020-09-17 15:41:36 UTC
Red Hat Product Errata RHBA-2020:3510 0 None None None 2020-09-08 10:54:26 UTC

Description zhaozhanqi 2020-07-24 08:55:52 UTC
Created attachment 1702323 [details]
sriov operator and  config daemon logs

Description of problem:
After I deleted the 'default' sriovnetworknodepolicies and it will be restored by operator. then now I apply one policy. Found the VF cannot be inited. 

Version-Release number of selected component (if applicable):
4.5

How reproducible:


Steps to Reproduce:
1. install the sriov operator
2. Delete the 'default' policy
oc delete sriovnetworknodepolicies.sriovnetwork.openshift.io default
3. Check the 'default' is restored
 oc get sriovnetworknodepolicies.sriovnetwork.openshift.io

4. Apply one policy with below
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens1f0
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 5
  priority: 99
  resourceName: intelnetdevice

5. waiting the sriov-device-plugin and sriov-cni pods become 'running' and configdaemon logs show 'setNodeStateStatus(): syncStatus: Succeeded, lastSyncError:"
6.  Check 'sriovnetworknodestates.sriovnetwork.openshift.io' of node
oc get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml

Actual results:
step 6: no VF were inited successfully.

spec:
  dpConfigVersion: "1540830"
  interfaces:
  - name: ens1f0
    numVfs: 5
    pciAddress: 0000:3b:00.0
    vfGroups:
    - deviceType: netdevice
      policyName: intel-netdevice
      resourceName: intelnetdevice
      vfRange: 0-4
status:
  interfaces:
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno1
    pciAddress: "0000:18:00.0"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno2
    pciAddress: "0000:18:00.1"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno3
    pciAddress: "0000:18:00.2"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno4
    pciAddress: "0000:18:00.3"
    totalvfs: 7
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 9200
    name: ens1f0
    pciAddress: 0000:3b:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    totalvfs: 64


Expected results:

VF can be inited. 

Additional info:

Delete configdaemon pod and make it re-create can resolve this issue

Comment 4 zhaozhanqi 2020-08-24 02:25:25 UTC
Verified this bug on 4.5.0-202008210149.p0

this VF can be inited when create policy after the default policy is deleted and restore.

Comment 6 errata-xmlrpc 2020-09-08 10:54:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3510


Note You need to log in before you can comment on or make changes to this bug.