1486076 – Plugging VFs no longer works without a readable phys_switch_id

Bug 1486076 - Plugging VFs no longer works without a readable phys_switch_id

Summary: Plugging VFs no longer works without a readable phys_switch_id

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	beta
Target Release:	12.0 (Pike)
Assignee:	Brent Eagles
QA Contact:	Eran Kuris
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-28 23:49 UTC by Brent Eagles
Modified:	2018-02-05 19:12 UTC (History)
CC List:	12 users (show)
Fixed In Version:	openstack-neutron-11.0.1-0.20170923193224.5b0191f.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-13 21:58:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1713590	None	None	None	2017-08-28 23:49:28 UTC
OpenStack gerrit	499203	None	MERGED	ovs mech: bind only if user request switchdev	2020-12-31 21:26:59 UTC
OpenStack gerrit	504427	None	MERGED	ovs mech: bind only if user request switchdev	2020-12-31 21:26:59 UTC
Red Hat Product Errata	RHEA-2017:3462	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 12.0 Enhancement Advisory	2018-02-16 01:43:25 UTC

Description Brent Eagles 2017-08-28 23:49:08 UTC

Description of problem:

Attempting to plug a VF fails with the following stack trace in the nova compute logs:

2017-08-28 17:50:34.716 2843 ERROR os_vif [req-9fe05e3e-f7ae-4b2d-be27-90d81fe0b9fd 66e36d5620c24020ac6fa6fb8e580b6c df21f729c47347b299783a4c1f83e774 - default default] Failed to plug vif VIFHostDevice(active=False,address=fa:16:3e:de:b2:7d,dev_address=0000:0b:11.0,dev_type='ethernet',has_traffic_filtering=True,id=b5858ca0-c315-4b2a-b1a9-82a5b508bf2f,network=Network(19c75cc1-a553-4d3d-9a1a-9ad010102e31),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True): PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif Traceback (most recent call last):
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/os_vif/__init__.py", line 77, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif plugin.plug(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 191, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif self._plug_vf_passthrough(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 163, in _plug_vf_passthrough
2017-08-28 17:50:34.716 2843 ERROR os_vif pci_slot, pf_interface=True, switchdev=True)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/linux_net.py", line 373, in get_ifname_by_pci_address
2017-08-28 17:50:34.716 2843 ERROR os_vif raise exception.PciDeviceNotFoundById(id=pci_addr)
2017-08-28 17:50:34.716 2843 ERROR os_vif PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif

It appears that patch https://review.openstack.org/#/c/484051/ altered get_ifname_by_pci_address() always run a new helper function _is_switchdev() (it appears that it is assumed that switchdev should always be True). This causes plugging VFs on systems with drivers that do not support a readable phys_switch_id to fail.

I ran the code interactively on the host system to determine the actual exception:

>>> f = open('/sys/class/net/enp11s17/phys_switch_id', 'r')
>>> print f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 95] Operation not supported

From what I can tell, this should also cause plugging to fail on systems that have no phys_switch_id file at all.


How reproducible: seems it should fail every time.


Steps to Reproduce:

Boot a VM with a 'direct' neutron port.

Comment 2 Brent Eagles 2017-08-29 15:00:04 UTC

Copied from Sean Mooney's comments on the u/s bug:

This is cause by trying to use sriov passthorugh on a host that does not support hardware offload of ovs.

the work around is to list the sriovnic agent before ovs in the ml2 conf.

e.g. change /etc/neutron/plugins/ml2/ml2_conf.ini form
[ml2]
...
mechanism_drivers = openvswitch,sriovnicswitch

to

[ml2]
...
mechanism_drivers = sriovnicswitch,openvswitch

you might want to also make sure that supported_pci_vendor_devs
in the ml2_sriov section does not contain the vendor id and product id of
the vf used for ovs offload. this will ensure that the sriovnic agent will
only manage interfaces that do not require ovs configuration.

if you had a nic that supported ovs offload and it is enable then doing a pci
pass through of the device without os-vif plugging the nic woudl resulst in a broken
dataplane hence the reason from removing them from the supported_pci_vendor_devs.

there is still a bug in os-vif here where we should first check the file exits before trying to use it so we should still harden the code. so lets keep this open to track that.

Comment 3 Brent Eagles 2017-08-30 15:55:32 UTC

A potential fix is being worked on:

https://review.openstack.org/#/c/499203/

The intent is to require users who wish to use the OVS SR-IOV offload feature to include "--binding-profile '{"capabilities": ["switchdev"]}'" when creating the port. Without this, the ovs mechanism driver will not attempt to bind and we will get the original intended behavior.

I am testing this in my environment at the moment but it would be great if we can also test this patch in the original environment where it was found. Would this be possible Yariv?

Comment 4 Brent Eagles 2017-08-31 13:16:23 UTC

I applied the key part of the patch on the test system and the VM plugged fine. The guest is failing to get its IP address at the moment, but that is a different issue.

We'll need to get the upstream patch through the approval/merge process so we can backport to pike.

Comment 8 Assaf Muller 2017-09-16 17:01:18 UTC

Fix merged to master, Ihar is backporting to Pike.

Comment 11 Eran Kuris 2017-11-06 14:49:54 UTC

fix verified: openstack-neutron-11.0.2-0.20171020230401.el7ost.noarch

Comment 14 errata-xmlrpc 2017-12-13 21:58:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.