Bug 1972766

Summary: [OSP 16.1][ML2/OVS] sr-iov agent failing on get_vfs function while processing port
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: python-pyroute2Assignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED ERRATA QA Contact: Candido Campos <ccamposr>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: atragler, ccamposr, chrisw, dalvarez, dhill, jpretori, kbeavers, pkundal, pveiga, ralonsoh, scohen
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-pyroute2-0.5.6-6.1.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1973700 1973730 (view as bug list) Environment:
Last Closed: 2021-12-09 20:20:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1973700, 1973730    

Description Matt Flusche 2021-06-16 15:09:41 UTC
Description of problem:
New 16.1.6 OSP deployment with ml2/ovs and sr-iov.  sr-iov interfaces are intel x710 cards.  pretty standard sr-iov deployment

After creating a sr-iov port and attempting to launching an instance, the following error is seen in the neutron sr-iov agent log.

2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     device_info = self.scan_devices(devices, updated_devices_copy)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     result = f(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     for device in embedded_switch.get_assigned_devices_info():
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     mac = self.get_pci_device(pci_slot)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     vfs = ip.link.get_vfs()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return input_func(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return self.channel.remote_call(name, args, kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     raise exc_type(*result[2])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2021-06-15 17:10:26.970 224488 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Agent rpc_loop - iteration:6846 started daemon_loop /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:456

In troubleshooting this issue, we can see the initial setup of the sr-iov port.  The pci device is allocated and the VF is setup with the correct MAC then the above error is raised and the port provisioning never completes.

This is a pretty generic sr-iov config, I'll provide additional details in private comments.


Version-Release number of selected component (if applicable):
16.1.6 current


How reproducible:
100% in this environment

Steps to Reproduce:
1. Create sr-iov port
2. deploy instance with sr-iov interface
3.


Additional info:
I'll provide additional information and specific config details in follow up comments

Comment 3 Matt Flusche 2021-06-17 13:21:44 UTC
This seems to be caused by the number or VFs allocated.  On this system, there are two interfaces configured with 64 VFs each, 128 total.  We dropped the VFs to 16 per interface and sr-iov now works properly.

Comment 40 errata-xmlrpc 2021-12-09 20:20:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762