Bug 1486076 - Plugging VFs no longer works without a readable phys_switch_id
Summary: Plugging VFs no longer works without a readable phys_switch_id
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 12.0 (Pike)
Assignee: Brent Eagles
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-28 23:49 UTC by Brent Eagles
Modified: 2018-02-05 19:12 UTC (History)
12 users (show)

Fixed In Version: openstack-neutron-11.0.1-0.20170923193224.5b0191f.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:58:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1713590 0 None None None 2017-08-28 23:49:28 UTC
OpenStack gerrit 499203 0 None MERGED ovs mech: bind only if user request switchdev 2020-12-31 21:26:59 UTC
OpenStack gerrit 504427 0 None MERGED ovs mech: bind only if user request switchdev 2020-12-31 21:26:59 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Brent Eagles 2017-08-28 23:49:08 UTC
Description of problem:

Attempting to plug a VF fails with the following stack trace in the nova compute logs:

2017-08-28 17:50:34.716 2843 ERROR os_vif [req-9fe05e3e-f7ae-4b2d-be27-90d81fe0b9fd 66e36d5620c24020ac6fa6fb8e580b6c df21f729c47347b299783a4c1f83e774 - default default] Failed to plug vif VIFHostDevice(active=False,address=fa:16:3e:de:b2:7d,dev_address=0000:0b:11.0,dev_type='ethernet',has_traffic_filtering=True,id=b5858ca0-c315-4b2a-b1a9-82a5b508bf2f,network=Network(19c75cc1-a553-4d3d-9a1a-9ad010102e31),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True): PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif Traceback (most recent call last):
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/os_vif/__init__.py", line 77, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif plugin.plug(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 191, in plug
2017-08-28 17:50:34.716 2843 ERROR os_vif self._plug_vf_passthrough(vif, instance_info)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 163, in _plug_vf_passthrough
2017-08-28 17:50:34.716 2843 ERROR os_vif pci_slot, pf_interface=True, switchdev=True)
2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/linux_net.py", line 373, in get_ifname_by_pci_address
2017-08-28 17:50:34.716 2843 ERROR os_vif raise exception.PciDeviceNotFoundById(id=pci_addr)
2017-08-28 17:50:34.716 2843 ERROR os_vif PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found
2017-08-28 17:50:34.716 2843 ERROR os_vif

It appears that patch https://review.openstack.org/#/c/484051/ altered get_ifname_by_pci_address() always run a new helper function _is_switchdev() (it appears that it is assumed that switchdev should always be True). This causes plugging VFs on systems with drivers that do not support a readable phys_switch_id to fail.

I ran the code interactively on the host system to determine the actual exception:

>>> f = open('/sys/class/net/enp11s17/phys_switch_id', 'r')
>>> print f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 95] Operation not supported

From what I can tell, this should also cause plugging to fail on systems that have no phys_switch_id file at all.


How reproducible: seems it should fail every time.


Steps to Reproduce:

Boot a VM with a 'direct' neutron port.

Comment 2 Brent Eagles 2017-08-29 15:00:04 UTC
Copied from Sean Mooney's comments on the u/s bug:

This is cause by trying to use sriov passthorugh on a host that does not support hardware offload of ovs.

the work around is to list the sriovnic agent before ovs in the ml2 conf.

e.g. change /etc/neutron/plugins/ml2/ml2_conf.ini form
[ml2]
...
mechanism_drivers = openvswitch,sriovnicswitch

to

[ml2]
...
mechanism_drivers = sriovnicswitch,openvswitch

you might want to also make sure that supported_pci_vendor_devs
in the ml2_sriov section does not contain the vendor id and product id of
the vf used for ovs offload. this will ensure that the sriovnic agent will
only manage interfaces that do not require ovs configuration.

if you had a nic that supported ovs offload and it is enable then doing a pci
pass through of the device without os-vif plugging the nic woudl resulst in a broken
dataplane hence the reason from removing them from the supported_pci_vendor_devs.

there is still a bug in os-vif here where we should first check the file exits before trying to use it so we should still harden the code. so lets keep this open to track that.

Comment 3 Brent Eagles 2017-08-30 15:55:32 UTC
A potential fix is being worked on:

https://review.openstack.org/#/c/499203/

The intent is to require users who wish to use the OVS SR-IOV offload feature to include "--binding-profile '{"capabilities": ["switchdev"]}'" when creating the port. Without this, the ovs mechanism driver will not attempt to bind and we will get the original intended behavior.

I am testing this in my environment at the moment but it would be great if we can also test this patch in the original environment where it was found. Would this be possible Yariv?

Comment 4 Brent Eagles 2017-08-31 13:16:23 UTC
I applied the key part of the patch on the test system and the VM plugged fine. The guest is failing to get its IP address at the moment, but that is a different issue.

We'll need to get the upstream patch through the approval/merge process so we can backport to pike.

Comment 8 Assaf Muller 2017-09-16 17:01:18 UTC
Fix merged to master, Ihar is backporting to Pike.

Comment 11 Eran Kuris 2017-11-06 14:49:54 UTC
fix verified: openstack-neutron-11.0.2-0.20171020230401.el7ost.noarch

Comment 14 errata-xmlrpc 2017-12-13 21:58:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.