Bug 1217493 - Booting instance in nova with pci passthrough will sometimes fail because VF has already been assigned
Summary: Booting instance in nova with pci passthrough will sometimes fail because VF ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 7.0 (Kilo)
Assignee: Vladik Romanovsky
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks: 1340470
TreeView+ depends on / blocked
 
Reported: 2015-04-30 13:42 UTC by Sean Toner
Modified: 2020-03-11 14:54 UTC (History)
25 users (show)

Fixed In Version: openstack-nova-2015.1.4-3.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1340470 (view as bug list)
Environment:
Last Closed: 2016-06-23 17:36:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
How to do pci passthrough and boot instance (17.47 KB, text/plain)
2015-04-30 13:43 UTC, Sean Toner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1313 0 normal SHIPPED_LIVE openstack-nova bug fix advisory 2016-06-23 21:36:11 UTC

Description Sean Toner 2015-04-30 13:42:08 UTC
Description of problem:
=======================

While trying to create several nova instances requesting a PCI passthrough device, one of the instances failed to boot with the following error:

2015-04-30 05:19:48.035 27033 ERROR nova.compute.manager [-] [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b] Instance failed to spawn
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b] Traceback (most recent call last):
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2288, in _build_resources
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     yield resources
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2158, in _build_and_run_instance
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     block_device_info=block_device_info)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2635, in spawn
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     block_device_info, disk_info=disk_info)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4558, in _create_domain_and_network
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     power_on=power_on)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4482, in _create_domain
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     LOG.error(err)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/openstack/common/excutils.py", line 82, in __exit__
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     six.reraise(self.type_, self.value, self.tb)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4472, in _create_domain
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     domain.createWithFlags(launch_flags)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     rv = execute(f, *args, **kwargs)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     six.reraise(c, e, tb)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     rv = meth(*args, **kwargs)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 996, in createWithFlags
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2015-04-30 05:19:48.035 27033 TRACE nova.compute.manager [instance: cb7bda2d-7b92-4c77-94a3-753389991b2b] libvirtError: Requested operation is not valid: PCI device 0000:07:10.3 is in use by driver QEMU, domain instance-00000003


It appears that the database is not tracking which VF's have already been taken.


Version-Release number of selected component (if applicable):
=============================================================

openstack-nova-novncproxy-2014.2.3-9.el7ost.noarch
python-nova-2014.2.3-9.el7ost.noarch
python-novaclient-2.20.0-1.el7ost.noarch
openstack-nova-console-2014.2.3-9.el7ost.noarch
openstack-nova-scheduler-2014.2.3-9.el7ost.noarch
openstack-nova-api-2014.2.3-9.el7ost.noarch
openstack-nova-conductor-2014.2.3-9.el7ost.noarch
openstack-nova-cert-2014.2.3-9.el7ost.noarch
openstack-nova-common-2014.2.3-9.el7ost.noarch
openstack-nova-compute-2014.2.3-9.el7ost.noarch


How reproducible:
=================

It seems to be random how nova selects which VF to use.  So depending on how many available VFs there are for the PCI device, it may take longer or shorter to hit this.  I recommend setting your ethernet driver to not set the VFs too high.  For example, on intel nics do this:

modprobe igb max_vfs=2

That will set 2 VFs per PF.  So if you have a 2 port nic card, you will create 4 VFs.


Steps to Reproduce:
===================
1. Follow the how-to directions attached
2. Start booting instances off of the flavor one by one

Actual results:
===============

Nova might try to create an instance using a VF that has already been assigned to another instance and so creation will fail


Expected results:
=================

You should always be able to create as many instances with the requested PCI devices as there are VFs.  For example, if your flavor has extra_specs of pci_passthrough:alias="my_igb:2"  and you have 8 VFs, you should always be able to create 4 instances with that flavor.

Comment 3 Sean Toner 2015-04-30 13:43:06 UTC
Created attachment 1020593 [details]
How to do pci passthrough and boot instance

Comment 4 Stephen Gordon 2015-06-04 06:04:19 UTC
Does it appear to make a difference how many seconds elapse between instance creation requests?

Comment 21 Jeremy 2016-02-19 20:43:05 UTC
I'm not sure if this helps , but these are my findings: 


*****pci device  domain="0x0000" bus="0x06" slot="0x11" function="0x7" or 0000:06:11.7 is assigned to instance-00000594

2016-02-19 08:21:59.669 47185 DEBUG nova.virt.libvirt.config [req-74ab5a83-a732-4ff2-bc9a-481ac711dc97 233a8c747d034e3a87b03f10663df397 d86bd63166e0413e97eca88a1cd39639 - - -] Generated XML ('<domain type="kvm">\n  <uuid>680dad83-a9f5-418d-a1f9-5938f84c5306</uuid>\n  <name>instance-00000594</name>\n  <memory>134217728</memory>\n  <vcpu>8</vcpu>\n  <metadata>\n    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">\n      <nova:package version="2015.1.1-3.el7ost"/>\n      <nova:name>FRIPCRF1v-PCF-L-CI-ENT-MD--1-SM-01</nova:name>\n      <nova:creationTime>2016-02-19 16:21:59</nova:creationTime>\n      <nova:flavor name="PCF-CI-ENT-SESSIONMGR">\n        <nova:memory>131072</nova:memory>\n        <nova:disk>0</nova:disk>\n        <nova:swap>0</nova:swap>\n        <nova:ephemeral>0</nova:ephemeral>\n        <nova:vcpus>8</nova:vcpus>\n      </nova:flavor>\n      <nova:owner>\n        <nova:user uuid="233a8c747d034e3a87b03f10663df397">ericsson-1</nova:user>\n        <nova:project uuid="d86bd63166e0413e97eca88a1cd39639">ericsson-orchestrator</nova:project>\n      </nova:owner>\n      <nova:root type="image" uuid="9a07966d-b4ac-479a-9d35-f804eaf5df91"/>\n    </nova:instance>\n  </metadata>\n  <sysinfo type="smbios">\n    <system>\n      <entry name="manufacturer">Red Hat</entry>\n      <entry name="product">OpenStack Compute</entry>\n      <entry name="version">2015.1.1-3.el7ost</entry>\n      <entry name="serial">21be02b4-12a0-4541-9db7-f9e32150a2dd</entry>\n      <entry name="uuid">680dad83-a9f5-418d-a1f9-5938f84c5306</entry>\n    </system>\n  </sysinfo>\n  <os>\n    <type>hvm</type>\n    <boot dev="hd"/>\n    <smbios mode="sysinfo"/>\n  </os>\n  <features>\n    <acpi/>\n    <apic/>\n  </features>\n  <cputune>\n    <shares>8192</shares>\n  </cputune>\n  <clock offset="utc">\n    <timer name="pit" tickpolicy="delay"/>\n    <timer name="rtc" tickpolicy="catchup"/>\n    <timer name="hpet" present="no"/>\n  </clock>\n  <cpu mode="host-model" match="exact">\n    <topology sockets="8" cores="1" threads="1"/>\n  </cpu>\n  <devices>\n    <disk type="file" device="disk">\n      <driver name="qemu" type="qcow2" cache="none"/>\n      <source file="/var/lib/nova/instances/680dad83-a9f5-418d-a1f9-5938f84c5306/disk"/>\n      <target bus="virtio" dev="vda"/>\n    </disk>\n    <disk type="file" device="cdrom">\n      <driver name="qemu" type="raw" cache="none"/>\n      <source file="/var/lib/nova/instances/680dad83-a9f5-418d-a1f9-5938f84c5306/disk.config"/>\n      <target bus="ide" dev="hdd"/>\n    </disk>\n    <interface type="bridge">\n      <mac address="fa:16:3e:6b:d0:b6"/>\n      <model type="virtio"/>\n      <source bridge="qbrbbbaee14-6e"/>\n      <target dev="tapbbbaee14-6e"/>\n    </interface>\n    <interface type="hostdev" managed="yes">\n      <mac address="fa:16:3e:99:bf:38"/>\n      <source>\n        <address type="pci" domain="0x0000" bus="0x06" slot="0x11" function="0x7"/>\n      </source>\n      <vlan>\n        <tag id="0"/>\n      </vlan>\n    </interface>\n    <interface type="hostdev" managed="yes">\n      <mac address="fa:16:3e:30:ea:72"/>\n      <source>\n        <address type="pci" domain="0x0000" bus="0x06" slot="0x12" function="0x5"/>\n      </source>\n      <vlan>\n        <tag id="0"/>\n      </vlan>\n    </interface>\n    <serial type="file">\n      <source path="/var/lib/nova/instances/680dad83-a9f5-418d-a1f9-5938f84c5306/console.log"/>\n    </serial>\n    <serial type="pty"/>\n    <input type="tablet" bus="usb"/>\n    <graphics type="vnc" autoport="yes" keymap="en-us" listen="0.0.0.0"/>\n    <video>\n      <model type="cirrus"/>\n    </video>\n    <memballoon model="virtio">\n      <stats period="10"/>\n    </memballoon>\n  </devices>\n</domain>\n',)  to_xml /usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py:82

2016-02-19 08:21:59.705 47185 DEBUG nova.virt.libvirt.driver [req-74ab5a83-a732-4ff2-bc9a-481ac711dc97 233a8c747d034e3a87b03f10663df397 d86bd63166e0413e97eca88a1cd39639 - - -] [instance: 680dad83-a9f5-418d-a1f9-5938f84c5306] End _get_guest_xml xml=<domain type="kvm">
  <uuid>680dad83-a9f5-418d-a1f9-5938f84c5306</uuid>
  <name>instance-00000594</name>
  <memory>134217728</memory>
  <vcpu>8</vcpu>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="2015.1.1-3.el7ost"/>
      <nova:name>FRIPCRF1v-PCF-L-CI-ENT-MD--1-SM-01</nova:name>
      <nova:creationTime>2016-02-19 16:21:59</nova:creationTime>
      <nova:flavor name="PCF-CI-ENT-SESSIONMGR">
        <nova:memory>131072</nova:memory>
        <nova:disk>0</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>8</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="233a8c747d034e3a87b03f10663df397">ericsson-1</nova:user>
        <nova:project uuid="d86bd63166e0413e97eca88a1cd39639">ericsson-orchestrator</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="9a07966d-b4ac-479a-9d35-f804eaf5df91"/>
    </nova:instance>
  </metadata>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">Red Hat</entry>
      <entry name="product">OpenStack Compute</entry>
      <entry name="version">2015.1.1-3.el7ost</entry>
      <entry name="serial">21be02b4-12a0-4541-9db7-f9e32150a2dd</entry>
      <entry name="uuid">680dad83-a9f5-418d-a1f9-5938f84c5306</entry>
    </system>
  </sysinfo>
  <os>
    <type>hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cputune>
    <shares>8192</shares>
  </cputune>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
  </clock>
  <cpu mode="host-model" match="exact">
    <topology sockets="8" cores="1" threads="1"/>
  </cpu>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source file="/var/lib/nova/instances/680dad83-a9f5-418d-a1f9-5938f84c5306/disk"/>
      <target bus="virtio" dev="vda"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw" cache="none"/>
      <source file="/var/lib/nova/instances/680dad83-a9f5-418d-a1f9-5938f84c5306/disk.config"/>
      <target bus="ide" dev="hdd"/>
    </disk>
    <interface type="bridge">
      <mac address="fa:16:3e:6b:d0:b6"/>
      <model type="virtio"/>
      <source bridge="qbrbbbaee14-6e"/>
      <target dev="tapbbbaee14-6e"/>
    </interface>
    <interface type="hostdev" managed="yes">
      <mac address="fa:16:3e:99:bf:38"/>
      <source>
        <address type="pci" domain="0x0000" bus="0x06" slot="0x11" function="0x7"/>

*****resource tracker says 0000:06:11.7 is assignable??


2016-02-19 08:22:44.238 47185 DEBUG nova.compute.resource_tracker [req-9977ce40-b8c5-46d2-816e-bd0c8f74cf7b - - - - -] Hypervisor: assignable PCI devices: [{"dev_id": "pci_0000_00_00_0", "product_id": "0e00", "dev_type": "type-PCI", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_0e00", "address": "0000:00:00.0"}, {"dev_id": "pci_0000_00_01_0", "product_id": "0e02", "dev_type": "type-PCI", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_0e02", "address": "0000:00:01.0"}, {"dev_id": "pci_0000_00_01_1", "product_id": "0e03", "dev_type": "type-PCI", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_0e03", "address": "0000:00:01.1"}, {"dev_id": "pci_0000_00_02_0", "product_id": "0e04", "dev_type": "type-PCI", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_0e04", "address": "0000:00:02.0"}, {"dev_id": "pci_0000_06_00_0", "product_id": "10fb", "dev_type": "type-PCI", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10fb", "address": "0000:06:00.0"}, {"dev_id": "pci_0000_06_00_1", "product_id": "10fb", "dev_type": "type-PF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10fb", "address": "0000:06:00.1"}, {"dev_id": "pci_0000_06_10_1", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:10.1"}, {"dev_id": "pci_0000_06_10_3", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:10.3"}, {"dev_id": "pci_0000_06_10_5", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:10.5"}, {"dev_id": "pci_0000_06_10_7", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:10.7"}, {"dev_id": "pci_0000_06_11_1", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:11.1"}, {"dev_id": "pci_0000_06_11_3", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:11.3"}, {"dev_id": "pci_0000_06_11_5", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:11.5"}, {"dev_id": "pci_0000_06_11_7", "product_id": "10ed", "dev_type": "type-VF", "numa_node": 0, "vendor_id": "8086", "label": "label_8086_10ed", "phys_function": "0000:06:00.1", "address": "0000:06:11.7"}

**** 0000:06:11.7 is attempted to assign to new d4a8262f-958a-4802-84e1-c09d22b7d6df, we get ERROR!!! 


2016-02-19 08:37:16.263 47185 DEBUG nova.compute.utils [req-0949b11d-d9be-4cdb-afc2-b94a981597a6 233a8c747d034e3a87b03f10663df397 d86bd63166e0413e97eca88a1cd39639 - - -] [instance: d4a8262f-958a-4802-84e1-c09d22b7d6df] Requested operation is not valid: PCI device 0000:06:11.7 is in use by driver QEMU, domain instance-00000594 notify_about_instance_usage /usr/lib/python2.7/site-packages/nova/compute/utils.py:310
2016-02-19 08:37:16.263 47185 DEBUG nova.compute.manager [req-0949b11d-d9be-4cdb-afc2-b94a981597a6 233a8c747d034e3a87b03f10663df397 d86bd63166e0413e97eca88a1cd39639 - - -] [instance: d4a8262f-958a-4802-84e1-c09d22b7d6df] Build of instance d4a8262f-958a-4802-84e1-c09d22b7d6df was re-scheduled: Requested operation is not valid: PCI device 0000:06:11.7 is in use by driver QEMU, domain instance-00000594 _do_build_and_run_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:2268

Comment 24 Jeremy 2016-02-19 23:37:50 UTC
The device-$date files were created using:

mysql -u root nova -e "SELECT hypervisor_hostname, address, instance_uuid, status FROM pci_devices JOIN compute_nodes on compute_nodes.id=compute_node_id" > devices-$(date +%Y%m%d%H%M%S)

on the controller. Also, it appears there are duplicate PCI device entries in the DB:

MariaDB [nova]> select hypervisor_hostname,address,count(*) from pci_devices JOIN compute_nodes on compute_nodes.id=compute_node_id group by hypervisor_hostname,address having count(*) > 1;
+-----------------------------+--------------+----------+
| hypervisor_hostname         | address      | count(*) |
+-----------------------------+--------------+----------+
| l3-compute1.vz.rhelosp.demo | 0000:06:10.1 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:10.3 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:10.5 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:10.7 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:11.1 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:11.3 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:11.5 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:11.7 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:12.1 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:12.3 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:12.5 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:12.7 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:13.1 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:13.3 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:13.5 |        2 |
| l3-compute1.vz.rhelosp.demo | 0000:06:13.7 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:10.1 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:10.3 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:10.5 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:10.7 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:11.1 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:11.3 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:11.5 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:11.7 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:12.1 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:12.3 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:12.5 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:12.7 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:13.1 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:13.3 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:13.5 |        2 |
| l3-compute2.vz.rhelosp.demo | 0000:06:13.7 |        2 |
+-----------------------------+--------------+----------+

Comment 25 Vladik Romanovsky 2016-02-20 07:47:53 UTC
After looking at the logs again and the db dump, I agree with Sahid.
It looks like we are reusing the same pci devices on reschedule.
Considering this, there will be no re-allocation of the devices, since the network is already allocated.

It just looked like a weird coincidence that the pci devices that are being chosen for the instances are exactly the same on all of the nodes, while there are so many available..

Also, I can confirm that the problem is not related to the allocation of the pci devices, as it is possible to see that the status of the first instance's (the one that has been successfully spawned on the l3-compute4 host) pci devices has been changed to 'allocated' before the rescheduled instance was launched on the host, later the status is getting cleared.. (still need to look into this point..)

l3-compute4.vz.rhelosp.demo	0000:06:11.7	680dad83-a9f5-418d-a1f9-5938f84c5306	allocated
l3-compute4.vz.rhelosp.demo	0000:06:12.5	680dad83-a9f5-418d-a1f9-5938f84c5306	allocated

I will submit a patch upstream to address the deal location of the networks on reschedule.

Thanks,
Vladik

Comment 26 Tom Bonds 2016-02-20 14:06:09 UTC
(In reply to Vladik Romanovsky from comment #25)
> After looking at the logs again and the db dump, I agree with Sahid.
> It looks like we are reusing the same pci devices on reschedule.
> Considering this, there will be no re-allocation of the devices, since the
> network is already allocated.
> 
> It just looked like a weird coincidence that the pci devices that are being
> chosen for the instances are exactly the same on all of the nodes, while
> there are so many available..
> 
> Also, I can confirm that the problem is not related to the allocation of the
> pci devices, as it is possible to see that the status of the first
> instance's (the one that has been successfully spawned on the l3-compute4
> host) pci devices has been changed to 'allocated' before the rescheduled
> instance was launched on the host, later the status is getting cleared..
> (still need to look into this point..)
> 
> l3-compute4.vz.rhelosp.demo	0000:06:11.7
> 680dad83-a9f5-418d-a1f9-5938f84c5306	allocated
> l3-compute4.vz.rhelosp.demo	0000:06:12.5
> 680dad83-a9f5-418d-a1f9-5938f84c5306	allocated
> 
> I will submit a patch upstream to address the deal location of the networks
> on reschedule.
> 
> Thanks,
> Vladik

The reason the allocated status was cleared for 680dad83-a9f5-418d-a1f9-5938f84c5306 / instance-00000594 pci devices is the instance was deleted.

The orchestration tool being used deletes the entire set of instances it spawned when there is a failure. Let me know if you need anything else from the environment.

Thanks!

-tom

Comment 27 Sahid Ferdjaoui 2016-02-22 15:02:38 UTC
You can see here [1] test-packages. Please let us to know any feedbacks from customer.

Thanks

[1] https://brewweb.devel.redhat.com/taskinfo?taskID=10538873

Comment 36 Prasanth Anbalagan 2016-06-15 18:49:21 UTC
Verified as follows - Used "modprobe igb max_vfs=2" and since the setup had 2 port nic card, there were a total of 4VFs available. The flavor was set up as  "pci_pass_test:1". We were able to create 4 instances successfully (as expected).

********
VERSION
********

[root@rhos-compute-node-02 ~(keystone_admin)]# yum list installed | grep openstack-nova
openstack-nova-api.noarch            2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-cert.noarch           2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-common.noarch         2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-compute.noarch        2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-conductor.noarch      2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-console.noarch        2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-novncproxy.noarch     2015.1.4-5.el7ost       @rhelosp-7.0-puddle
openstack-nova-scheduler.noarch      2015.1.4-5.el7ost       @rhelosp-7.0-puddle


******
LOGS
******

[root@serverA ~(keystone_admin)]# nova flavor-show pci-pass
+----------------------------+----------------------------------------------+
| Property                   | Value                                        |
+----------------------------+----------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                        |
| OS-FLV-EXT-DATA:ephemeral  | 0                                            |
| disk                       | 5                                            |
| extra_specs                | {"pci_passthrough:alias": "pci_pass_test:1"} |
| id                         | 100                                          |
| name                       | pci-pass                                     |
| os-flavor-access:is_public | True                                         |
| ram                        | 512                                          |
| rxtx_factor                | 1.0                                          |
| swap                       |                                              |
| vcpus                      | 1                                            |
+----------------------------+----------------------------------------------+


[root@serverA ~(keystone_admin)]# nova boot --flavor pci-pass --image cirros --nic net-id=8ad820a7-9382-4319-afbb-153058c3271d vm_pci
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000003                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | v6thAL3ycwJP                                  |
| config_drive                         |                                               |
| created                              | 2016-06-15T18:32:27Z                          |
| flavor                               | pci-pass (100)                                |
| hostId                               |                                               |
| id                                   | 5cbb85af-dbf0-4a0f-b175-64afa147c984          |
| image                                | cirros (1b484e42-8e2d-4f89-aab5-c74d8e874a1b) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | vm_pci                                        |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 539312767fc24dd78293d660373a7293              |
| updated                              | 2016-06-15T18:32:27Z                          |
| user_id                              | 621dbc15638845779419793637abab66              |
+--------------------------------------+-----------------------------------------------+
[root@serverA ~(keystone_admin)]# nova delete vm_pci
Request to delete server vm_pci has been accepted.
[root@serverA ~(keystone_admin)]# nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+
[root@serverA ~(keystone_admin)]# nova boot --flavor pci-pass --image cirros --nic net-id=8ad820a7-9382-4319-afbb-153058c3271d vm_pci
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000004                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | dHn6Hf42VzWS                                  |
| config_drive                         |                                               |
| created                              | 2016-06-15T18:35:02Z                          |
| flavor                               | pci-pass (100)                                |
| hostId                               |                                               |
| id                                   | fae0b702-5b7b-48ca-b881-1ad5884f289c          |
| image                                | cirros (1b484e42-8e2d-4f89-aab5-c74d8e874a1b) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | vm_pci                                        |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 539312767fc24dd78293d660373a7293              |
| updated                              | 2016-06-15T18:35:02Z                          |
| user_id                              | 621dbc15638845779419793637abab66              |
+--------------------------------------+-----------------------------------------------+
[root@serverA ~(keystone_admin)]# nov alist^C
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci | ACTIVE | -          | Running     | private=10.0.0.6 |
+--------------------------------------+--------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# nova boot --flavor pci-pass --image cirros --nic net-id=8ad820a7-9382-4319-afbb-153058c3271d vm_pci1
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000005                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | Qk86kHT8zsnj                                  |
| config_drive                         |                                               |
| created                              | 2016-06-15T18:38:59Z                          |
| flavor                               | pci-pass (100)                                |
| hostId                               |                                               |
| id                                   | 3b5fbe04-46c0-403b-863b-621d490283f3          |
| image                                | cirros (1b484e42-8e2d-4f89-aab5-c74d8e874a1b) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | vm_pci1                                       |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 539312767fc24dd78293d660373a7293              |
| updated                              | 2016-06-15T18:38:59Z                          |
| user_id                              | 621dbc15638845779419793637abab66              |
+--------------------------------------+-----------------------------------------------+
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | BUILD  | spawning   | NOSTATE     | private=10.0.0.7 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# nova boot --flavor pci-pass --image cirros --nic net-id=8ad820a7-9382-4319-afbb-153058c3271d vm_pci2
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000006                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | JHN2a7XtZVny                                  |
| config_drive                         |                                               |
| created                              | 2016-06-15T18:39:17Z                          |
| flavor                               | pci-pass (100)                                |
| hostId                               |                                               |
| id                                   | 67833c76-ebe8-4be6-9480-bc89e3ba8679          |
| image                                | cirros (1b484e42-8e2d-4f89-aab5-c74d8e874a1b) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | vm_pci2                                       |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 539312767fc24dd78293d660373a7293              |
| updated                              | 2016-06-15T18:39:17Z                          |
| user_id                              | 621dbc15638845779419793637abab66              |
+--------------------------------------+-----------------------------------------------+
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
| 67833c76-ebe8-4be6-9480-bc89e3ba8679 | vm_pci2 | BUILD  | spawning   | NOSTATE     | private=10.0.0.8 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
| 67833c76-ebe8-4be6-9480-bc89e3ba8679 | vm_pci2 | ACTIVE | -          | Running     | private=10.0.0.8 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# nova boot --flavor pci-pass --image cirros --nic net-id=8ad820a7-9382-4319-afbb-153058c3271d vm_pci3
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000007                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| adminPass                            | 4DxENjbLC8uo                                  |
| config_drive                         |                                               |
| created                              | 2016-06-15T18:39:58Z                          |
| flavor                               | pci-pass (100)                                |
| hostId                               |                                               |
| id                                   | d8607f43-7c0f-42fb-816a-7f2b0bdcdedb          |
| image                                | cirros (1b484e42-8e2d-4f89-aab5-c74d8e874a1b) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | vm_pci3                                       |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| security_groups                      | default                                       |
| status                               | BUILD                                         |
| tenant_id                            | 539312767fc24dd78293d660373a7293              |
| updated                              | 2016-06-15T18:39:58Z                          |
| user_id                              | 621dbc15638845779419793637abab66              |
+--------------------------------------+-----------------------------------------------+
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# 
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
| 67833c76-ebe8-4be6-9480-bc89e3ba8679 | vm_pci2 | ACTIVE | -          | Running     | private=10.0.0.8 |
| d8607f43-7c0f-42fb-816a-7f2b0bdcdedb | vm_pci3 | BUILD  | spawning   | NOSTATE     | private=10.0.0.9 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
| 67833c76-ebe8-4be6-9480-bc89e3ba8679 | vm_pci2 | ACTIVE | -          | Running     | private=10.0.0.8 |
| d8607f43-7c0f-42fb-816a-7f2b0bdcdedb | vm_pci3 | ACTIVE | -          | Running     | private=10.0.0.9 |
+--------------------------------------+---------+--------+------------+-------------+------------------+
[root@serverA ~(keystone_admin)]# nova list
+--------------------------------------+---------+--------+------------+-------------+------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks         |
+--------------------------------------+---------+--------+------------+-------------+------------------+
| fae0b702-5b7b-48ca-b881-1ad5884f289c | vm_pci  | ACTIVE | -          | Running     | private=10.0.0.6 |
| 3b5fbe04-46c0-403b-863b-621d490283f3 | vm_pci1 | ACTIVE | -          | Running     | private=10.0.0.7 |
| 67833c76-ebe8-4be6-9480-bc89e3ba8679 | vm_pci2 | ACTIVE | -          | Running     | private=10.0.0.8 |
| d8607f43-7c0f-42fb-816a-7f2b0bdcdedb | vm_pci3 | ACTIVE | -          | Running     | private=10.0.0.9 |
+--------------------------------------+---------+--------+------------+-------------+------------------+

Comment 38 errata-xmlrpc 2016-06-23 17:36:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1313


Note You need to log in before you can comment on or make changes to this bug.