Description of problem: Attempting to plug a VF fails with the following stack trace in the nova compute logs: 2017-08-28 17:50:34.716 2843 ERROR os_vif [req-9fe05e3e-f7ae-4b2d-be27-90d81fe0b9fd 66e36d5620c24020ac6fa6fb8e580b6c df21f729c47347b299783a4c1f83e774 - default default] Failed to plug vif VIFHostDevice(active=False,address=fa:16:3e:de:b2:7d,dev_address=0000:0b:11.0,dev_type='ethernet',has_traffic_filtering=True,id=b5858ca0-c315-4b2a-b1a9-82a5b508bf2f,network=Network(19c75cc1-a553-4d3d-9a1a-9ad010102e31),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True): PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found 2017-08-28 17:50:34.716 2843 ERROR os_vif Traceback (most recent call last): 2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/os_vif/__init__.py", line 77, in plug 2017-08-28 17:50:34.716 2843 ERROR os_vif plugin.plug(vif, instance_info) 2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 191, in plug 2017-08-28 17:50:34.716 2843 ERROR os_vif self._plug_vf_passthrough(vif, instance_info) 2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/ovs.py", line 163, in _plug_vf_passthrough 2017-08-28 17:50:34.716 2843 ERROR os_vif pci_slot, pf_interface=True, switchdev=True) 2017-08-28 17:50:34.716 2843 ERROR os_vif File "/usr/lib/python2.7/site-packages/vif_plug_ovs/linux_net.py", line 373, in get_ifname_by_pci_address 2017-08-28 17:50:34.716 2843 ERROR os_vif raise exception.PciDeviceNotFoundById(id=pci_addr) 2017-08-28 17:50:34.716 2843 ERROR os_vif PciDeviceNotFoundById: PCI device 0000:0b:11.0 not found 2017-08-28 17:50:34.716 2843 ERROR os_vif It appears that patch https://review.openstack.org/#/c/484051/ altered get_ifname_by_pci_address() always run a new helper function _is_switchdev() (it appears that it is assumed that switchdev should always be True). This causes plugging VFs on systems with drivers that do not support a readable phys_switch_id to fail. I ran the code interactively on the host system to determine the actual exception: >>> f = open('/sys/class/net/enp11s17/phys_switch_id', 'r') >>> print f.readline() Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 95] Operation not supported From what I can tell, this should also cause plugging to fail on systems that have no phys_switch_id file at all. How reproducible: seems it should fail every time. Steps to Reproduce: Boot a VM with a 'direct' neutron port.
Copied from Sean Mooney's comments on the u/s bug: This is cause by trying to use sriov passthorugh on a host that does not support hardware offload of ovs. the work around is to list the sriovnic agent before ovs in the ml2 conf. e.g. change /etc/neutron/plugins/ml2/ml2_conf.ini form [ml2] ... mechanism_drivers = openvswitch,sriovnicswitch to [ml2] ... mechanism_drivers = sriovnicswitch,openvswitch you might want to also make sure that supported_pci_vendor_devs in the ml2_sriov section does not contain the vendor id and product id of the vf used for ovs offload. this will ensure that the sriovnic agent will only manage interfaces that do not require ovs configuration. if you had a nic that supported ovs offload and it is enable then doing a pci pass through of the device without os-vif plugging the nic woudl resulst in a broken dataplane hence the reason from removing them from the supported_pci_vendor_devs. there is still a bug in os-vif here where we should first check the file exits before trying to use it so we should still harden the code. so lets keep this open to track that.
A potential fix is being worked on: https://review.openstack.org/#/c/499203/ The intent is to require users who wish to use the OVS SR-IOV offload feature to include "--binding-profile '{"capabilities": ["switchdev"]}'" when creating the port. Without this, the ovs mechanism driver will not attempt to bind and we will get the original intended behavior. I am testing this in my environment at the moment but it would be great if we can also test this patch in the original environment where it was found. Would this be possible Yariv?
I applied the key part of the patch on the test system and the VM plugged fine. The guest is failing to get its IP address at the moment, but that is a different issue. We'll need to get the upstream patch through the approval/merge process so we can backport to pike.
Fix merged to master, Ihar is backporting to Pike.
fix verified: openstack-neutron-11.0.2-0.20171020230401.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462