Bug 2038050
| Summary: | [i40e sr-iov] failed to increase or reduce VF numbers when many VFs exists | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Mingyu Shi <mshi> | |
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | |
| Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.6 | CC: | bgalvani, ferferna, fge, jiji, jishi, lrintel, network-qe, rkhan, sfaye, sukulkar, till, vbenes | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | NetworkManager-1.40.2-1.el8 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2150831 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-16 09:04:54 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2150831 | |||
Also on netqe-amd-01.knqe.lab.eng.bos.redhat.com
In the beginning, created 40 VFs on this interface. Then trying to increase to 41, failed with a different error message:
# cat many-vfs.yaml
---
interfaces:
- name: ens4f0
type: ethernet
state: up
ethernet:
sr-iov:
total-vfs: 40
nmstatectl apply many-vfs.yaml
sed 's/40/41/' many-vfs.yaml | nmstatectl apply -
Traceback (most recent call last):
File "/usr/bin/nmstatectl", line 11, in <module>
load_entry_point('nmstate==1.2.0', 'console_scripts', 'nmstatectl')()
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main
return args.func(args)
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set
return apply(args)
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply
args.save_to_disk,
File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state
save_to_disk=save_to_disk,
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply
_apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 140, in _apply_ifaces_state
_verify_change(plugins, net_state)
File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 155, in _verify_change
net_state.verify(current_state)
File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 86, in verify
self._ifaces.verify(current_state.get(Interface.KEY))
File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 621, in verify
verify_sriov_vf(iface, cur_ifaces)
File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ethernet.py", line 169, in verify_sriov_vf
f"Found VF ports count does not match desired "
libnmstate.error.NmstateVerificationError: Found VF ports count does not match desired 41, current is: ens4f0v0,ens4f0v1,ens4f0v3,ens4f0v4,ens4f0v5,ens4f0v6,ens4f0v7,ens4f0v8,ens4f0v9,ens4f0v10,ens4f0v11,ens4f0v12,ens4f0v13,ens4f0v14,ens4f0v15,ens4f0v16,ens4f0v17,ens4f0v18,ens4f0v19,ens4f0v20,ens4f0v21,ens4f0v22,ens4f0v23,ens4f0v24,ens4f0v25,ens4f0v26,ens4f0v27,ens4f0v28,ens4f0v29,ens4f0v30,ens4f0v31,ens4f0v32,ens4f0v33,ens4f0v34,ens4f0v35,ens4f0v36,ens4f0v37,ens4f0v38,ens4f0v39,ens4f0v40
Acceptance criteria: nmstate should not fail when changing SRIOV total vfs count in kernel support range. Hi Mingyu, SR-IOV code has changed a lot in nmstate-1.3.1-1.el8. Could you try again to see whether this bug is fixed also? Hi Gris, Sorry for the late response. tested with: nmstate-1.3.3-1.el8.x86_64 nispor-1.2.7-1.el8.x86_64 NetworkManager-1.40.0-1.el8.x86_64 openvswitch2.15-2.15.0-113.2.el8fdp.x86_64 Linux dell-per740-79.rhts.eng.pek2.redhat.com 4.18.0-422.el8.x86_64 #1 SMP Thu Aug 25 21:40:53 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux DISTRO=RHEL-8.7.0-20220829.1 driver: i40e version: 4.18.0-422.el8.x86_64 firmware-version: 7.10 0x800075df 19.5.12 expansion-rom-version: bus-info: 0000:3b:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes I mentioned 2 different errors in #comment0 and #comment1: for #comment1, increasing from 40 to 41, it works well now for #comment0, reduce from 64 to 63, I still got the same error: Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.3.3', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 338, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes NmProfiles(self.context).apply_config(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 100, in apply_config self._ctx.wait_all_finish() File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile uuid:d1b8e2ce-9554-475f-87b2-c891f50de887 iface:ens1f0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED of type NM.DeviceStateReason> Everything works fine on RHEL 9 nmstate-2.1.3-1.el9.x86_64 NetworkManager-1.39.90-1.el9.x86_64 Hi Beniamino, Could you take a look on above logs regarding NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED failure of SR-IOV changes? Thank you! > <debug> [1661912355.4769] platform: (ens1f0) link: VF 0 mac FE:68:46:1F:42:FE spoofchk 1 trust 0
> <debug> [1661912355.4770] platform: (ens1f0) link: VF 1 mac 96:71:B0:5A:41:FA spoofchk 1 trust 0
> <debug> [1661912355.4770] platform: (ens1f0) link: VF 2 mac 62:E6:BB:05:46:3A spoofchk 1 trust 0
> ...
> <debug> [1661912355.4782] platform: (ens1f0) link: VF 63 mac 26:B6:D0:84:9A:C3 spoofchk 1 trust 0
> <error> [1661912355.4783] device (ens1f0): failed to apply SR-IOV VFs
I think the problem is that we allocate a 4KiB buffer for the netlink message and when there are many VFs with parameters the buffer is not large enough. I'll prepare a patch for that.
I have opened a upstream merge request for NM to increase the buffer based on the number of VFs. Should I reassign this bz to NM or clone it? we do see some crashes, postponing ITM a bit We cannot reproduce the crash on the 1.40.2-1 and the original machine was reinstalled, moving back to ON_QA Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2968 |
Description of problem: When there are many VFs of a PF, nmstate cannot increase or reduce VF. The definition of "many" seems depending on the environment of different hosts. Version-Release number of selected component (if applicable): nmstate-1.2.0-1.el8.x86_64 nispor-1.2.2-1.el8.x86_64 NetworkManager-1.36.0-0.3.el8.x86_64 driver: i40e version: 4.18.0-357.el8.x86_64 firmware-version: 6.00 0x800036cb 1.1747.0 expansion-rom-version: bus-info: 0000:e3:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes How reproducible: 100% Steps to Reproduce: # cat many-vfs.yaml --- interfaces: - name: ens4f0 type: ethernet state: up ethernet: sr-iov: total-vfs: 64 [15:25:32@netqe-amd-01 ~/repo-nmstate/sriov]0# nmstatectl apply many-vfs.yaml /tmp/nmstatelog/2022-01-07-15:25:48-358932058.log Desired state applied: --- interfaces: - name: ens4f0 type: ethernet state: up ethernet: sr-iov: total-vfs: 64 /tmp/nmstatelog/2022-01-07-15:25:48-358932058.0.log nmstatectl apply many-vfs.yaml return 0 #reduce 64 to 63, failed: [15:25:56@netqe-amd-01 ~/repo-nmstate/sriov]0# sed 's/64/63/' many-vfs.yaml | nmstatectl apply - /tmp/nmstatelog/2022-01-07-15:26:29-753811422.log Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.2.0', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 74, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 355, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 419, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 89, in apply _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 122, in _apply_ifaces_state plugin.apply_changes(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 233, in apply_changes NmProfiles(self.context).apply_config(net_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/nm/profiles.py", line 99, in apply_config self._ctx.wait_all_finish() File "/usr/lib/python3.6/site-packages/libnmstate/nm/context.py", line 217, in wait_all_finish raise tmp_error libnmstate.error.NmstateLibnmError: Activate profile uuid:d7e9349a-a8e8-408b-95e8-e6fd76fc7a7d iface:ens4f0 type: ethernet failed: reason=<enum NM_ACTIVE_CONNECTION_STATE_REASON_DEVICE_DISCONNECTED of type NM.ActiveConnectionStateReason><enum NM_DEVICE_STATE_REASON_SRIOV_CONFIGURATION_FAILED of type NM.DeviceStateReason> Actual results: Failed Expected results: No failure Additional info: Changing 10 VFs to 9 or 11 is OK, but changing from 60 to 59 or 61 fails. This is an extending of https://bugzilla.redhat.com/show_bug.cgi?id=1938675 , and nothing to do with VF profiles.