Bug 1734626 - [sriov] sriov-network-config-daemon pod become CrashLoopBackOff when initial the XXV710 VF
Summary: [sriov] sriov-network-config-daemon pod become CrashLoopBackOff when initial ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: All
OS: All
high
high
Target Milestone: ---
: 4.2.0
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-31 06:36 UTC by zhaozhanqi
Modified: 2019-10-16 06:34 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:34:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:34:23 UTC

Description zhaozhanqi 2019-07-31 06:36:47 UTC
Description of problem:
Given the sriov operator is installed by operator hub
When creating SriovNetworkNodePolicy CR on xxv710, then sriov-network-config-daemon pod become CrashLoopBackOff


Version-Release number of selected component (if applicable):
 Red Hat Enterprise Linux CoreOS 420.8.20190721.0 (Ootpa)

How reproducible:


Steps to Reproduce:
1. setup the sriov cluster 
2. install the sriov operator from web console
3. Create the SriovNetworkNodePolicy CR on xxv710 with following:
   
   apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  creationTimestamp: '2019-07-30T08:28:20Z'
  generation: 4
  name: policy-1
  namespace: sriov-network-operator
  resourceVersion: '1994420'
  selfLink: >-
    /apis/sriovnetwork.openshift.io/v1/namespaces/sriov-network-operator/sriovnetworknodepolicies/policy-1
  uid: fd1f597a-b2a3-11e9-a8e4-2e808f6aa7e1
spec:
  deviceType: vfio-pci
  mtu: 1500
  nicSelector:
    pfNames:
      - ens1f0
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 6
  priority: 99
  resourceName: intelnics
**************
4. Check the config daemon pod become to Crash

Actual results:

logs:

E0730 08:31:11.652714   44720 utils.go:118] SyncNodeState(): fail to config sriov interface 0000:3b:00.0: write /sys/bus/pci/drivers/vfio-pci/bind: no such device


Expected results:

should work well

Additional info:

this issue happen on coreOS

Comment 1 Peng Liu 2019-07-31 07:13:50 UTC
Due to IOMMU is not enable by kernel args on RHCOS, vfio-pci driver was unable to be bond.

PR https://github.com/openshift/sriov-network-operator/pull/46

Comment 2 Peng Liu 2019-08-05 11:00:27 UTC
PR merged.

Comment 4 zhaozhanqi 2019-08-06 10:22:03 UTC
Move this bug to 'modified' since the new image is not build yet.

Comment 6 zhaozhanqi 2019-08-08 03:28:49 UTC
verified this bug with quay.io/openshift-release-dev/ocp-v4.0-art-dev:v4.2.0-201908061019-ose-sriov-network-operator

Comment 7 errata-xmlrpc 2019-10-16 06:34:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.