Bug 2103629

Summary: nmstate verification fails with error "Found VF ports count does not match desired"
Product: Red Hat Enterprise Linux 8 Reporter: nijin ashok <nashok>
Component: nmstateAssignee: Gris Ge <fge>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: ferferna, jiji, jishi, network-qe, phoracek, sfaye, till
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 09:17:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nijin ashok 2022-07-04 09:22:37 UTC
Description of problem:

When working on the SR-IOV NICs, the nmstate verification is failing with the error below:

~~~
2022-06-30 10:29:16,299 root         DEBUG    Async action: Waiting activation of ens1f1 ethernet finished
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/libnmstate/nmstate.py", line 53, in plugin_context
    yield plugins
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 81, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 132, in _apply_ifaces_state
    _verify_change(plugins, net_state)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 145, in _verify_change
    net_state.verify(current_state)
  File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 86, in verify
    self._ifaces.verify(current_state.get(Interface.KEY))
  File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 584, in verify
    verify_sriov_vf(iface, cur_ifaces)
  File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ethernet.py", line 156, in verify_sriov_vf
    f"Found VF ports count does not match desired "
libnmstate.error.NmstateVerificationError: Found VF ports count does not match desired 32, current is: ens1f1v3,ens1f1v4,ens1f1v5,ens1f1v6,ens1f1v7,ens1f1v8,ens1f1v9,ens1f1v10,ens1f1v11,ens1f1v12,ens1f1v13,ens1f1v14,ens1f1v15
~~~

Below configuration was applied.

~~~
interfaces:
  - description: data_25g bond
    ipv4:
      dhcp: false
      enabled: false
    ipv6:
      enabled: false
    link-aggregation:
      mode: 802.3ad
      options:
        lacp_rate: fast
        miimon: "100"
      port:
      - ens1f1
      - ens1f2
    mtu: 9000
    name: data_25g
    state: up
    type: bond
~~~

It looks like it is taking more time for the VFs to initialize but nmstate is getting the list of VFs before initialization and failing during verification.

We can see below messages for the multiple VFs.

~~~
Jun 30 10:29:05 vm175-122. kernel: i40e 0000:3b:00.1: Allocating 32 VFs.
Jun 30 10:29:05 vm175-122 kernel: iavf 0000:3b:06.0: Device is still in reset (-16), retrying
Jun 30 10:29:05 vm175-122 kernel: iavf 0000:3b:06.1: Device is still in reset (-16), retrying
Jun 30 10:29:05 vm175-122 kernel: iavf 0000:3b:06.2: Device is still in reset (-16), retrying
~~~

If we try nmstate --no-verify, we can see that for some of the VFs, it takes around 10 seconds to initialize.

~~~
nmstate apply --no-verify

2022-06-30 11:08:42,048 root         DEBUG    Nmstate version: 1.0.2
2022-06-30 11:08:42,048 root         DEBUG    Applying desire state: {'interfaces': [{'description': 'data_25g bond', 'ipv4': {'dhcp': False, 'enabled': False}, 'ipv6': {'enabled': False}, 'link-aggregation': {'mode': '802.3ad', 'options': {'lacp_rate': 'fast', 'miimon': '100'}, 'port': ['ens1f1', 'ens1f2']}, 'mtu': 9000, 'name': 'data_25g', 'state': 'up', 'type': 'bond'}]}

Journalctl:

~~~
Jun 30 11:08:49 vm175-122 kernel: iavf 0000:3b:0a.0: Device is still in reset (-16), retrying
Jun 30 11:08:50 vm175-122 kernel: iavf 0000:3b:0a.0: Device is still in reset (-16), retrying
Jun 30 11:09:00 vm175-122 kernel: iavf 0000:3b:0a.0: Multiqueue Enabled: Queue pair count = 4
Jun 30 11:09:01 vm175-122 kernel: iavf 0000:3b:0a.0 ens1f2v0: renamed from eth0
~~~

Version-Release number of selected component (if applicable):

The issue was observed in Kubernetes-Nmstate in OpenShift Virtualization environment. The latest available version of nmstate here is ` 1.0.2`.

Tested with X710/X557 card. 

How reproducible:

100%

Steps to Reproduce:

1. Configure SR-IOV VFs on two NICs.

~~~
cat /sys/class/net/ens1f1/device/sriov_numvfs
32

cat /sys/class/net/ens1f2/device/sriov_numvfs
32
~~~

2. Try creating a bond on top of the above NICs.

3. Applying the configuration failed with error below:

~~~
libnmstate.error.NmstateVerificationError: Found VF ports count does not match desired 32, current is: ens1f1v3,ens1f1v4,ens1f1v5,ens1f1v6,ens1f1v7,ens1f1v8,ens1f1v9,ens1f1v10,ens1f1v11,ens1f1v12,ens1f1v13,ens1f1v14,ens1f1v15
~~~

Actual results:

nmstate verification fails with error "Found VF ports count does not match desired" 

Expected results:

nmstate to wait for the VFs to initialize.

Additional info:

Comment 13 errata-xmlrpc 2022-11-08 09:17:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7465