Bug 1860286

Summary: [sriov] [4.5] VF cannot be inited when apply one policy if the 'default' policy is deleted and restored by operator
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: dosmith
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1860288 1860302 (view as bug list) Environment:
Last Closed: 2020-09-08 10:54:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1860302    
Bug Blocks: 1860288, 1871681    
Attachments:
Description Flags
sriov operator and config daemon logs none

Description zhaozhanqi 2020-07-24 08:55:52 UTC
Created attachment 1702323 [details]
sriov operator and  config daemon logs

Description of problem:
After I deleted the 'default' sriovnetworknodepolicies and it will be restored by operator. then now I apply one policy. Found the VF cannot be inited. 

Version-Release number of selected component (if applicable):
4.5

How reproducible:


Steps to Reproduce:
1. install the sriov operator
2. Delete the 'default' policy
oc delete sriovnetworknodepolicies.sriovnetwork.openshift.io default
3. Check the 'default' is restored
 oc get sriovnetworknodepolicies.sriovnetwork.openshift.io

4. Apply one policy with below
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens1f0
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 5
  priority: 99
  resourceName: intelnetdevice

5. waiting the sriov-device-plugin and sriov-cni pods become 'running' and configdaemon logs show 'setNodeStateStatus(): syncStatus: Succeeded, lastSyncError:"
6.  Check 'sriovnetworknodestates.sriovnetwork.openshift.io' of node
oc get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml

Actual results:
step 6: no VF were inited successfully.

spec:
  dpConfigVersion: "1540830"
  interfaces:
  - name: ens1f0
    numVfs: 5
    pciAddress: 0000:3b:00.0
    vfGroups:
    - deviceType: netdevice
      policyName: intel-netdevice
      resourceName: intelnetdevice
      vfRange: 0-4
status:
  interfaces:
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno1
    pciAddress: "0000:18:00.0"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno2
    pciAddress: "0000:18:00.1"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno3
    pciAddress: "0000:18:00.2"
    totalvfs: 7
    vendor: "8086"
  - deviceID: "1521"
    driver: igb
    mtu: 1500
    name: eno4
    pciAddress: "0000:18:00.3"
    totalvfs: 7
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 9200
    name: ens1f0
    pciAddress: 0000:3b:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens1f1
    pciAddress: 0000:3b:00.1
    totalvfs: 64


Expected results:

VF can be inited. 

Additional info:

Delete configdaemon pod and make it re-create can resolve this issue

Comment 4 zhaozhanqi 2020-08-24 02:25:25 UTC
Verified this bug on 4.5.0-202008210149.p0

this VF can be inited when create policy after the default policy is deleted and restore.

Comment 6 errata-xmlrpc 2020-09-08 10:54:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3510