Bug 1972766 - [OSP 16.1][ML2/OVS] sr-iov agent failing on get_vfs function while processing port
Summary: [OSP 16.1][ML2/OVS] sr-iov agent failing on get_vfs function while processing...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-pyroute2
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Rodolfo Alonso
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks: 1973700 1973730
TreeView+ depends on / blocked
 
Reported: 2021-06-16 15:09 UTC by Matt Flusche
Modified: 2024-12-20 20:16 UTC (History)
11 users (show)

Fixed In Version: python-pyroute2-0.5.6-6.1.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1973700 1973730 (view as bug list)
Environment:
Last Closed: 2021-12-09 20:20:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-5241 0 None None None 2021-11-18 11:34:36 UTC
Red Hat Knowledge Base (Solution) 6526111 0 None None None 2021-11-18 19:07:23 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:20:38 UTC

Description Matt Flusche 2021-06-16 15:09:41 UTC
Description of problem:
New 16.1.6 OSP deployment with ml2/ovs and sr-iov.  sr-iov interfaces are intel x710 cards.  pretty standard sr-iov deployment

After creating a sr-iov port and attempting to launching an instance, the following error is seen in the neutron sr-iov agent log.

2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     device_info = self.scan_devices(devices, updated_devices_copy)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     result = f(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     for device in embedded_switch.get_assigned_devices_info():
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     mac = self.get_pci_device(pci_slot)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     vfs = ip.link.get_vfs()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return input_func(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return self.channel.remote_call(name, args, kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     raise exc_type(*result[2])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2021-06-15 17:10:26.970 224488 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Agent rpc_loop - iteration:6846 started daemon_loop /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:456

In troubleshooting this issue, we can see the initial setup of the sr-iov port.  The pci device is allocated and the VF is setup with the correct MAC then the above error is raised and the port provisioning never completes.

This is a pretty generic sr-iov config, I'll provide additional details in private comments.


Version-Release number of selected component (if applicable):
16.1.6 current


How reproducible:
100% in this environment

Steps to Reproduce:
1. Create sr-iov port
2. deploy instance with sr-iov interface
3.


Additional info:
I'll provide additional information and specific config details in follow up comments

Comment 3 Matt Flusche 2021-06-17 13:21:44 UTC
This seems to be caused by the number or VFs allocated.  On this system, there are two interfaces configured with 64 VFs each, 128 total.  We dropped the VFs to 16 per interface and sr-iov now works properly.

Comment 40 errata-xmlrpc 2021-12-09 20:20:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.