Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2081781

Summary: SR-IOV Config Daemon not able to set up Intel X710 while specifing minimal VF value
Product: OpenShift Container Platform Reporter: Akash Dubey <adubey>
Component: NetworkingAssignee: Balazs Nemeth <bnemeth>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: wizhao, zshi
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-14 20:13:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Akash Dubey 2022-05-04 15:33:47 UTC
Description of problem:
Customer is using Intel X710 SR-IOV capable NIC & while specifying even very minimum value for VFs the SR-IOV ConfigDaemon is failing with an error stating the VF value stated is larger than total VFs.

However, the customer is using NumVfs value much lower than the total VF value as per the datasheet.

Version-Release number of selected component (if applicable):
SR-IOV Operator on OCP v4.8

How reproducible:
This should get reproduced on a cluster with Intel X710 NIC.

Steps to Reproduce:
1.
2.
3.

Actual results:
The below error can be seen in SR-IOV config Daemon logs

2022-04-11T19:13:22.387353543Z E0411 19:13:22.387258   10482 daemon.go:405] error syncing: cannot config SRIOV device: NumVfs is larger than TotalVfs, requeuing

Expected results:
SR-IOV VFs should get initialized by the SR-IOV operator

Additional info:
The Max supported VFs by this NIC is 128. They are able to get VFs initialized on one of their node & that too only few VFs.

Comment 3 Balazs Nemeth 2022-05-09 18:22:15 UTC
Can we run a custom version of sriov device plugin with more verbose logging? If we see the error message "NumVfs is larger than TotalVfs", I want to compare with value in sys.

In the meanwhile, I will push a PR that will make the error include the actaul number observed.

Comment 4 Balazs Nemeth 2022-05-09 18:26:05 UTC
Also, please provide the information for ethtool -i ... on the node where it doesn't work and the contents of

/sys/class/net/.../device/sriov_numvfs
/sys/class/net/.../device/sriov_totalvfs

Comment 5 Balazs Nemeth 2022-05-10 07:11:59 UTC
https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/304


PR posted upstream (will backport when merged) to make this error message more useful.

Comment 6 Balazs Nemeth 2022-05-10 07:14:20 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2083459

Comment 7 Balazs Nemeth 2022-05-17 16:05:11 UTC
I've pushed a PR for https://bugzilla.redhat.com/show_bug.cgi?id=2083459. 

Can you try to run that to get a better error message?

Comment 8 William Zhao 2022-11-14 20:13:02 UTC
It seems that the customer found out the BIOS was causing issues. Maybe they did not enable SRIOV support in the bios, thus sriov-network-opeator was unable to create the VFS.

Customer has closed the case. Also closing the case as NOTABUG.