RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2038050 - [i40e sr-iov] failed to increase or reduce VF numbers when many VFs exists
Summary: [i40e sr-iov] failed to increase or reduce VF numbers when many VFs exists
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.7
Assignee: Beniamino Galvani
QA Contact: Vladimir Benes
URL:
Whiteboard:
Depends On:
Blocks: 2150831
TreeView+ depends on / blocked
 
Reported: 2022-01-07 07:55 UTC by Mingyu Shi
Modified: 2023-05-16 11:04 UTC (History)
12 users (show)

Fixed In Version: NetworkManager-1.40.2-1.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2150831 (view as bug list)
Environment:
Last Closed: 2023-05-16 09:04:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NMT-150 0 None None None 2023-01-23 15:53:27 UTC
Red Hat Issue Tracker RHELPLAN-107063 0 None None None 2022-01-07 08:03:55 UTC
Red Hat Product Errata RHBA-2023:2968 0 None None None 2023-05-16 09:06:21 UTC
freedesktop.org Gitlab NetworkManager/NetworkManager-ci/-/commit/122b7137aee07cf887e5838ddc2eb4b561430427 0 None None None 2023-01-23 15:50:24 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1413 0 None merged platform: set custom netlink buffer size when adding SR-IOV VFs 2022-10-17 08:40:17 UTC

Description Mingyu Shi 2022-01-07 07:55:44 UTC
Description of problem:
When there are many VFs of a PF, nmstate cannot increase or reduce VF. The definition of "many" seems depending on the environment of different hosts.

Version-Release number of selected component (if applicable):
nmstate-1.2.0-1.el8.x86_64
nispor-1.2.2-1.el8.x86_64
NetworkManager-1.36.0-0.3.el8.x86_64

driver: i40e
version: 4.18.0-357.el8.x86_64
firmware-version: 6.00 0x800036cb 1.1747.0
expansion-rom-version: 
bus-info: 0000:e3:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


How reproducible:
100%

Steps to Reproduce:
# cat many-vfs.yaml 
---
interfaces:
- name: ens4f0
  type: ethernet
  state: up
  ethernet:
    sr-iov:
      total-vfs: 64

[15:25:32@netqe-amd-01 ~/repo-nmstate/sriov]0# nmstatectl apply many-vfs.yaml 
/tmp/nmstatelog/2022-01-07-15:25:48-358932058.log
Desired state applied: 
---
interfaces:
- name: ens4f0
  type: ethernet
  state: up
  ethernet:
    sr-iov:
      total-vfs: 64
/tmp/nmstatelog/2022-01-07-15:25:48-358932058.0.log nmstatectl apply many-vfs.yaml return 0

#reduce 64 to 63, failed:
[15:25:56@netqe-amd-01 ~/repo-nmstate/sriov]0# sed 's/64/63/' many-vfs.yaml | nmstatectl apply -
/tmp/nmstatelog/2022-01-07-15:26:29-753811422.log
Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==1.2.0', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state
    plugin.apply_changes(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes
    NmProfiles(self.context).apply_config(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config
    self._ctx.wait_all_finish()
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish
    raise tmp_error
libnmstate.error.NmstateLibnmError: Activate profile uuid:d7e9349a-a8e8-408b-95e8-e6fd76fc7a7d iface:ens4f0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED of type NM.DeviceStateReason>

Actual results:
Failed

Expected results:
No failure

Additional info:
Changing 10 VFs to 9 or 11 is OK, but changing from 60 to 59 or 61 fails.
This is an extending of https://bugzilla.redhat.com/show_bug.cgi?id=1938675 , and nothing to do with VF profiles.

Comment 1 Mingyu Shi 2022-01-07 08:02:02 UTC
Also on netqe-amd-01.knqe.lab.eng.bos.redhat.com
In the beginning, created 40 VFs on this interface. Then trying to increase to 41, failed with a different error message:
# cat many-vfs.yaml 
---
interfaces:
- name: ens4f0
  type: ethernet
  state: up
  ethernet:
    sr-iov:
      total-vfs: 40

nmstatectl apply many-vfs.yaml
sed 's/40/41/' many-vfs.yaml | nmstatectl apply -

Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==1.2.0', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set
    return apply(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 140, in _apply_ifaces_state
    _verify_change(plugins, net_state)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 155, in _verify_change
    net_state.verify(current_state)
  File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 86, in verify
    self._ifaces.verify(current_state.get(Interface.KEY))
  File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 621, in verify
    verify_sriov_vf(iface, cur_ifaces)
  File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ethernet.py", line 169, in verify_sriov_vf
    f"Found VF ports count does not match desired "
libnmstate.error.NmstateVerificationError: Found VF ports count does not match desired 41, current is: ens4f0v0,ens4f0v1,ens4f0v3,ens4f0v4,ens4f0v5,ens4f0v6,ens4f0v7,ens4f0v8,ens4f0v9,ens4f0v10,ens4f0v11,ens4f0v12,ens4f0v13,ens4f0v14,ens4f0v15,ens4f0v16,ens4f0v17,ens4f0v18,ens4f0v19,ens4f0v20,ens4f0v21,ens4f0v22,ens4f0v23,ens4f0v24,ens4f0v25,ens4f0v26,ens4f0v27,ens4f0v28,ens4f0v29,ens4f0v30,ens4f0v31,ens4f0v32,ens4f0v33,ens4f0v34,ens4f0v35,ens4f0v36,ens4f0v37,ens4f0v38,ens4f0v39,ens4f0v40

Comment 2 Gris Ge 2022-01-07 09:15:37 UTC
Acceptance criteria: nmstate should not fail when changing SRIOV total vfs count in kernel support range.

Comment 3 Gris Ge 2022-07-27 07:41:44 UTC
Hi Mingyu,

SR-IOV code has changed a lot in nmstate-1.3.1-1.el8. Could you try again to see whether this bug is fixed also?

Comment 4 Mingyu Shi 2022-08-31 02:14:41 UTC
Hi Gris,

Sorry for the late response.

tested with:
nmstate-1.3.3-1.el8.x86_64
nispor-1.2.7-1.el8.x86_64
NetworkManager-1.40.0-1.el8.x86_64
openvswitch2.15-2.15.0-113.2.el8fdp.x86_64
Linux dell-per740-79.rhts.eng.pek2.redhat.com 4.18.0-422.el8.x86_64 #1 SMP Thu Aug 25 21:40:53 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
DISTRO=RHEL-8.7.0-20220829.1

driver: i40e
version: 4.18.0-422.el8.x86_64
firmware-version: 7.10 0x800075df 19.5.12
expansion-rom-version: 
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

I mentioned 2 different errors in #comment0 and #comment1:
for #comment1, increasing from 40 to 41, it works well now
for #comment0, reduce from 64 to 63, I still got the same error:
Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==1.3.3', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set
    return apply(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state
    plugin.apply_changes(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes
    NmProfiles(self.context).apply_config(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 100, in apply_config
    self._ctx.wait_all_finish()
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish
    raise tmp_error
libnmstate.error.NmstateLibnmError: Activate profile uuid:d1b8e2ce-9554-475f-87b2-c891f50de887 iface:ens1f0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED of type NM.DeviceStateReason>

Comment 6 Mingyu Shi 2022-08-31 04:57:23 UTC
Everything works fine on RHEL 9
nmstate-2.1.3-1.el9.x86_64
NetworkManager-1.39.90-1.el9.x86_64

Comment 7 Gris Ge 2022-09-07 02:49:00 UTC
Hi Beniamino,

Could you take a look on above logs regarding NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED failure of SR-IOV changes?

Thank you!

Comment 8 Beniamino Galvani 2022-10-05 04:49:54 UTC
>  <debug> [1661912355.4769] platform: (ens1f0) link:   VF 0 mac FE:68:46:1F:42:FE spoofchk 1 trust 0
>  <debug> [1661912355.4770] platform: (ens1f0) link:   VF 1 mac 96:71:B0:5A:41:FA spoofchk 1 trust 0
>  <debug> [1661912355.4770] platform: (ens1f0) link:   VF 2 mac 62:E6:BB:05:46:3A spoofchk 1 trust 0
>  ...
>  <debug> [1661912355.4782] platform: (ens1f0) link:   VF 63 mac 26:B6:D0:84:9A:C3 spoofchk 1 trust 0
>  <error> [1661912355.4783] device (ens1f0): failed to apply SR-IOV VFs

I think the problem is that we allocate a 4KiB buffer for the netlink message and when there are many VFs with parameters the buffer is not large enough. I'll prepare a patch for that.

Comment 9 Beniamino Galvani 2022-10-05 10:03:05 UTC
I have opened a upstream merge request for NM to increase the buffer based on the number of VFs.

Comment 10 Beniamino Galvani 2022-10-05 10:03:31 UTC
Should I reassign this bz to NM or clone it?

Comment 14 Vladimir Benes 2022-11-23 09:49:58 UTC
we do see some crashes, postponing ITM a bit

Comment 19 Vladimir Benes 2022-11-29 18:36:12 UTC
We cannot reproduce the crash on the 1.40.2-1 and the original machine was reinstalled, moving back to ON_QA

Comment 22 errata-xmlrpc 2023-05-16 09:04:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2968


Note You need to log in before you can comment on or make changes to this bug.