Bug 1390576 - Couldn't boot several PFs from separate physical interfaces
Summary: Couldn't boot several PFs from separate physical interfaces
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Vladik Romanovsky
QA Contact: Yariv
URL:
Whiteboard:
: 1391680 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-01 12:42 UTC by Eyal Dannon
Modified: 2019-09-09 15:13 UTC (History)
21 users (show)

Fixed In Version: openstack-nova-14.0.1-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:27:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1613434 0 None None None 2016-11-02 13:36:59 UTC
Launchpad 1618984 0 None None None 2016-11-02 13:36:30 UTC
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Eyal Dannon 2016-11-01 12:42:26 UTC
Description of problem:

Part of OSPd10 + SR-IOV verification I've tried to boot 2 VMs with PF as interfaces, each of them from separate physical network.
I couldn't boot an instance because of nova scheduler PciPassthroughFilter issue, here's the "pci_passthrough_whitelist" combination I've tried:

#Only-vf-works#
pci_passthrough_whitelist=[{"devname": "p6p1", "physical_network": "tenant"}, {"devname": "p6p2", "physical_network": "tenant2"}]

#only-vf-works#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant", "address": "0000:06:00.0"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant", "address": "0000:06:00.0"}]

#PF+VF works while using 1 physical network#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant"}]

#Only 1 PF boots(from the first physical_network)#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant"},{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant2"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant2"}]

#Only 1 PF boots(from tenant2)#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant2"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant2"},{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant"},{"vendor_id": "8086", "product_id":"10ed", "physical_network": "tenant"}]

#Couldn't boot any VM with PF#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d"},{"vendor_id": "8086", "product_id":"10ed"}]

#PF-Wont-boot#
pci_passthrough_whitelist=[{"vendor_id": "8086", "product_id": "154d", "physical_network": "tenant"},{"vendor_id": "8086", "product_id":"154d", "physical_network": "tenant2"}]

Version-Release number of selected component (if applicable):

openstack-nova-novncproxy-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-api-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-console-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-common-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-scheduler-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-conductor-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-cert-14.0.0-0.20160929203854.59653c6.el7ost.noarch
openstack-nova-compute-14.0.0-0.20160929203854.59653c6.el7ost.noarch


How reproducible:
Always

Steps to Reproduce:
1.Setup OSPd10 + SR-IOV environment
2.Set the "pci_passthrough_whitelist" as mentioned above
3.Boot an instance with direct-physical port as interface

Actual results:
VM boots to ERROR state

Expected results:
The VM should boot to ACTIVE, RUNNING state

Additional info:
Error example from /var/log/nova/nova-scheduler.log:

2016-11-01 09:50:23.161 2804 INFO nova.filters [req-893f7cfb-5c79-413b-b70d-a20442fe411f 7cdae8e58bd240868fea4cf8f10f4ed7 16d7d8002a954229bee64f24e9ec25bd - - -] Filtering removed all hosts for the request with instance ID 'a736782b-cb40-4ecd-bad5-93cf8f80fbab'. Filter results: ['AvailabilityZoneFilter: (start: 1, end: 1)', 'RamFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'PciPassthroughFilter: (start: 1, end: 0)']

Comment 1 Vladik Romanovsky 2016-11-01 14:33:12 UTC
Hi Eyal,

Yes, currently it's not possible to distinguish between Physical Functions that share the same vendor/product id on a host.

We will need to wait until the following patch is merged:

https://review.openstack.org/#/c/363884

then we will be able to whitelist these according their pci addresses.

For an even more fine grained control, we will need to wait for the following spec to be implemented in the Ocata release:

https://review.openstack.org/#/c/350211

Thanks,
Vladik

Comment 3 Brent Eagles 2016-11-02 13:47:11 UTC
Another reason we'll need the PCI address control is that if a system has only SR-IOV cards of the same make/model but only a few are being used for guests and the others are used for control plane/management/storage/etc., it is very possible that openstack will allocate the device for openstack use - potentially disconnecting the compute node from the cloud. I think we need a some kind of release note strongly warning users to NOT use PFs if they are using similar NIC hardware for multiple purposes on their hypervisor nodes.

Comment 4 Sahid Ferdjaoui 2016-11-04 12:41:16 UTC
An attempt to make it backported on Newton here:

   https://review.openstack.org/#/c/393752/

Comment 10 Stephen Gordon 2016-11-09 00:35:30 UTC
*** Bug 1391680 has been marked as a duplicate of this bug. ***

Comment 13 Ziv Greenberg 2016-11-16 15:47:20 UTC
Hi,

I have verified the following bug.
I was able to boot up two VM's which is sharing the same vendor/product id on the host.


Thank you,
Ziv

Comment 15 errata-xmlrpc 2016-12-14 16:27:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.