Description of problem:
New 16.1.6 OSP deployment with ml2/ovs and sr-iov. sr-iov interfaces are intel x710 cards. pretty standard sr-iov deployment
After creating a sr-iov port and attempting to launching an instance, the following error is seen in the neutron sr-iov agent log.
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info():
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs()
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return input_func(*args, **kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs)
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2])
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2021-06-15 17:10:26.970 224488 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Agent rpc_loop - iteration:6846 started daemon_loop /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:456
In troubleshooting this issue, we can see the initial setup of the sr-iov port. The pci device is allocated and the VF is setup with the correct MAC then the above error is raised and the port provisioning never completes.
This is a pretty generic sr-iov config, I'll provide additional details in private comments.
Version-Release number of selected component (if applicable):
16.1.6 current
How reproducible:
100% in this environment
Steps to Reproduce:
1. Create sr-iov port
2. deploy instance with sr-iov interface
3.
Additional info:
I'll provide additional information and specific config details in follow up comments
This seems to be caused by the number or VFs allocated. On this system, there are two interfaces configured with 64 VFs each, 128 total. We dropped the VFs to 16 per interface and sr-iov now works properly.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:3762
Description of problem: New 16.1.6 OSP deployment with ml2/ovs and sr-iov. sr-iov interfaces are intel x710 cards. pretty standard sr-iov deployment After creating a sr-iov port and attempting to launching an instance, the following error is seen in the neutron sr-iov agent log. 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return input_func(*args, **kwargs) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object 2021-06-15 17:10:25.201 224488 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2021-06-15 17:10:26.970 224488 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-92cbc476-51ad-46a6-a108-c524f9da3612 - - - - -] Agent rpc_loop - iteration:6846 started daemon_loop /usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:456 In troubleshooting this issue, we can see the initial setup of the sr-iov port. The pci device is allocated and the VF is setup with the correct MAC then the above error is raised and the port provisioning never completes. This is a pretty generic sr-iov config, I'll provide additional details in private comments. Version-Release number of selected component (if applicable): 16.1.6 current How reproducible: 100% in this environment Steps to Reproduce: 1. Create sr-iov port 2. deploy instance with sr-iov interface 3. Additional info: I'll provide additional information and specific config details in follow up comments