Bug 1842733

Summary: SR-IOV not working for HPE intel and mellanox NICs
Product: OpenShift Container Platform Reporter: vkhanna
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bbennett, pliu, vkhanna
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-04 15:19:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather logs none

Description vkhanna 2020-06-02 01:21:31 UTC
Description of problem:
HPE DL360 servers with Intel XXV710 and MLNX ConnectX-5 25GbE NICs. Both these network cards are shown to be supported for SR-IOV in 4.4 documentation. However, SR-IOV doesn't seem to be working with either on OCP 4.4 cluster.

Version-Release number of selected component (if applicable):
4.4.4

How reproducible:
Reproducible every time

Steps to Reproduce:
1. Install SR-IOV Network Operator
2. Create Subscription and SR-IOV network policy with NIC details shown below

Actual results:
VFs fail to create

Expected results:
VFs should be created


Additional info:

- deviceID: 158b
  driver: i40e
  mtu: 1500
  name: ens1f0
  pciAddress: "0000:12:00.0"
  totalvfs: 64
  vendor: "8086"

- deviceID: "1017"
  driver: mlx5_core
  mtu: 1500
  name: ens3f0
  pciAddress: 0000:d8:00.0
  totalvfs: 8
  vendor: 15b3

Comment 1 Peng Liu 2020-06-02 02:12:52 UTC
@Varun

Please collect help to collect the logs with script https://github.com/openshift/sriov-network-operator/blob/master/must-gather/collection-scripts/gather.

Comment 2 vkhanna 2020-06-02 22:24:12 UTC
Created attachment 1694615 [details]
must-gather logs

Comment 3 vkhanna 2020-06-02 22:28:19 UTC
Hi Peng,

Please find attached the must-gather output.

I noticed that the node did not reboot while I was monitoring the console. Also, the pods get stuck into terminating state.

$ oc get pods -n openshift-sriov-network-operator -o wide | grep worker-0
sriov-cni-w6t85                          1/1     Terminating   0          10m   10.128.2.5    worker-0.clus0.t5g.lab.eng.rdu2.redhat.com   <none>           <none>
sriov-device-plugin-4d4c7                1/1     Terminating   0          66s   10.1.24.4     worker-0.clus0.t5g.lab.eng.rdu2.redhat.com   <none>           <none>
sriov-network-config-daemon-jbdmx        0/1     Terminating   0          16s   10.1.24.4     worker-0.clus0.t5g.lab.eng.rdu2.redhat.com   <none>           <none>

Comment 4 Peng Liu 2020-06-03 06:21:40 UTC
Hi Varun,

From the logs you provided, It looks like you might use a wrong channel when installing the operator. Could you provides the following output?

1. oc get csv -n openshift-sriov-network-operator -o yaml 
2. oc get subscription -n openshift-sriov-network-operator -o yaml (The channel shall be "4.4")

Comment 5 Peng Liu 2020-06-03 06:35:23 UTC
 Maybe you are misled by the 4.4 docs. https://bugzilla.redhat.com/show_bug.cgi?id=1839068

Comment 6 vkhanna 2020-06-03 20:55:48 UTC
It looks like that I don't have access to the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1839068

As per the instructions in OCP 4.4 doc for sriov operator installation, following command returns a channel value of 4.2

$ oc get packagemanifest sriov-network-operator -n openshift-marketplace -o jsonpath='{.status.channels[].name}'

Hardcoding the channel to 4.4 while creating a subscription fixes my issues. Thanks for your time.

Comment 7 Ben Bennett 2020-06-04 13:23:10 UTC
Setting the target to the current development branch.  We can consider backporting a fix once the root cause has been identified.

Comment 9 Peng Liu 2020-06-04 15:19:37 UTC

*** This bug has been marked as a duplicate of bug 1839068 ***