Bug 1767797

Summary: When unshelving an SR-IOV instance, the binding profile isn't reclaimed or rescheduled, and this might cause PCI-PT conflicts
Product: Red Hat OpenStack Reporter: David Vallee Delisle <dvd>
Component: openstack-novaAssignee: Artom Lifshitz <alifshit>
Status: CLOSED ERRATA QA Contact: James Parker <jparker>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: alifshit, dasmith, dhill, don.weeks, ebarrera, eglynn, jhakimra, jlema, jparker, jzaher, kchamart, lhh, lyarwood, mflusche, mircea.vutcovici, rurena, sbauza, sgordon, smooney, vromanso
Target Milestone: gaKeywords: Patch, Triaged, ZStream
Target Release: 17.0Flags: rurena: needinfo? (mbooth)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-23.2.1-0.20220428212241.327693a.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1852110 (view as bug list) Environment:
Last Closed: 2022-09-21 12:07:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
overcloud database dump none

Description David Vallee Delisle 2019-11-01 12:46:58 UTC
Description of problem:
When we re-shelve an instance, it might land on a different compute node. Apparently, the binding profile isn't re-calculated and we use the original one.

Based on comment 5 in #1413010 this should have been fixed with this commit [a] which they have in their version.


Version-Release number of selected component (if applicable):
openstack-nova 14.1.0-26

How reproducible:
Everytime an instance is unshelved on a host where the PCI device isn't available

Steps to Reproduce:
1. Shelve instance
2. Find host with matching PCI devices already in use
3. Unshelve on that host (by disable nova-compute on all the other hosts)

Actual results:
See [1]

Expected results:
PCI devices should be re-calculated


Additional info:
[a] https://review.opendev.org/#/c/242573
[1]
~~~
nova-compute.log:2019-10-29 19:42:35.309 75567 INFO nova.compute.manager [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Unshelving
nova-compute.log:2019-10-29 19:42:35.741 75567 INFO nova.network.neutronv2.api [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Updating port 6a364c2d-ac68-497c-949c-9321555c917c with attributes {'binding:host_id': 'compute1.mydomain.com'}
nova-compute.log:2019-10-29 19:42:36.318 75567 INFO nova.network.neutronv2.api [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Updating port 6a364c2d-ac68-497c-949c-9321555c917c with attributes {'binding:host_id': 'compute1.mydomain.com'}
nova-compute.log:2019-10-29 19:42:37.036 75567 INFO nova.network.neutronv2.api [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Updating port 6a364c2d-ac68-497c-949c-9321555c917c with attributes {'binding:host_id': 'compute1.mydomain.com'}
nova-compute.log:2019-10-29 19:42:37.612 75567 INFO nova.network.neutronv2.api [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Updating port 6a364c2d-ac68-497c-949c-9321555c917c with attributes {'binding:host_id': 'compute1.mydomain.com'}
nova-compute.log:2019-10-29 19:42:39.016 75567 WARNING nova.compute.resource_tracker [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Host field should not be set on the instance until resources have been claimed.
nova-compute.log:2019-10-29 19:42:39.016 75567 WARNING nova.compute.resource_tracker [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Node field should not be set on the instance until resources have been claimed.
nova-compute.log:2019-10-29 19:42:39.029 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Attempting claim: memory 16384 MB, disk 40 GB, vcpus 4 CPU
nova-compute.log:2019-10-29 19:42:39.029 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Total memory: 523955 MB, used: 24576.00 MB
nova-compute.log:2019-10-29 19:42:39.030 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] memory limit: 523955.00 MB, free: 499379.00 MB
nova-compute.log:2019-10-29 19:42:39.030 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Total disk: 2047 GB, used: 40.00 GB
nova-compute.log:2019-10-29 19:42:39.030 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] disk limit not specified, defaulting to unlimited
nova-compute.log:2019-10-29 19:42:39.030 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Total vcpu: 36 VCPU, used: 8.00 VCPU
nova-compute.log:2019-10-29 19:42:39.031 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] vcpu limit: 36.00 VCPU, free: 28.00 VCPU
nova-compute.log:2019-10-29 19:42:39.104 75567 INFO nova.compute.claims [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Claim successful
nova-api.log:2019-10-29 19:42:39.348 588801 INFO nova.api.openstack.compute.server_external_events [req-6b40efea-bd48-4445-bdd1-e84532ec364e 94b28f9a4eb84b19bee6b3f9f9a312b1 bb56be8b88264cea94923c2399b06bd3 - default default] Creating event network-changed:None for instance 1220e935-1b76-4e49-ad54-35975a3c8c51
nova-compute.log:2019-10-29 19:42:39.392 75567 WARNING nova.scheduler.client.report [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Unable to refresh my resource provider record
nova-compute.log:2019-10-29 19:42:39.463 75567 INFO nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Creating image
nova-api.log:2019-10-29 19:43:05.713 588842 INFO nova.osapi_compute.wsgi.server [req-ea45d7cd-b916-4af3-9b64-1cbe0d7fbe48 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - default default] 192.168.12.7 "GET /v2.1/servers/1220e935-1b76-4e49-ad54-35975a3c8c51 HTTP/1.1" status: 200 len: 5241 time: 0.8003979
nova-api.log:2019-10-29 19:43:11.172 588843 INFO nova.osapi_compute.wsgi.server [req-8d7d0f08-5bad-4fd2-ac8e-c50d308cc742 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - default default] 192.168.12.7 "GET /v2.1/servers/1220e935-1b76-4e49-ad54-35975a3c8c51 HTTP/1.1" status: 200 len: 5241 time: 0.7007968
nova-compute.log:2019-10-29 19:43:18.698 75567 INFO os_vif [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:10:0a:f7,bridge_name='qbr6a364c2d-ac',has_traffic_filtering=True,id=6a364c2d-ac68-497c-949c-9321555c917c,network=Network(5be8676e-2b43-49cc-bc7f-459fd9c9f962),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=True,vif_name='tap6a364c2d-ac')
nova-compute.log:2019-10-29 19:43:18.792 75567 INFO os_vif [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:28:30:84,bridge_name='qbr210b5e5b-75',has_traffic_filtering=True,id=210b5e5b-7546-4fe9-8a75-9872aeca7097,network=Network(12af7a9c-2ba1-4bed-aef8-5f90f5c77607),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=True,vif_name='tap210b5e5b-75')
nova-compute.log:2019-10-29 19:43:19.133 75567 INFO nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Using config drive
nova-compute.log:2019-10-29 19:43:19.311 75567 INFO nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Creating config drive at /var/lib/nova/instances/1220e935-1b76-4e49-ad54-35975a3c8c51/disk.config
nova-compute.log:2019-10-29 19:43:19.543 75567 ERROR nova.virt.libvirt.guest [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Error launching a defined domain with XML: <domain type='kvm'>
nova-compute.log:2019-10-29 19:43:19.543 75567 ERROR nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Failed to start libvirt guest
nova-compute.log:2019-10-29 19:43:19.643 75567 INFO os_vif [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Successfully unplugged vif VIFBridge(active=False,address=fa:16:3e:10:0a:f7,bridge_name='qbr6a364c2d-ac',has_traffic_filtering=True,id=6a364c2d-ac68-497c-949c-9321555c917c,network=Network(5be8676e-2b43-49cc-bc7f-459fd9c9f962),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=True,vif_name='tap6a364c2d-ac')
nova-compute.log:2019-10-29 19:43:19.726 75567 INFO os_vif [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Successfully unplugged vif VIFBridge(active=False,address=fa:16:3e:28:30:84,bridge_name='qbr210b5e5b-75',has_traffic_filtering=True,id=210b5e5b-7546-4fe9-8a75-9872aeca7097,network=Network(12af7a9c-2ba1-4bed-aef8-5f90f5c77607),plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=True,vif_name='tap210b5e5b-75')
nova-compute.log:2019-10-29 19:43:19.739 75567 INFO nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Deleting instance files /var/lib/nova/instances/1220e935-1b76-4e49-ad54-35975a3c8c51_del
nova-compute.log:2019-10-29 19:43:19.740 75567 INFO nova.virt.libvirt.driver [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Deletion of /var/lib/nova/instances/1220e935-1b76-4e49-ad54-35975a3c8c51_del complete
nova-compute.log:2019-10-29 19:43:20.031 75567 WARNING nova.scheduler.client.report [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] No authentication information found for placement API. Placement is optional in Newton, but required in Ocata. Please enable the placement service before upgrading.
nova-compute.log:2019-10-29 19:43:20.285 75567 WARNING nova.scheduler.client.report [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Unable to refresh my resource provider record
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Instance failed to spawn
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Traceback (most recent call last):
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     block_device_info=block_device_info)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     destroy_disks_on_failure=True)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     destroy_disks_on_failure)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     self.force_reraise()
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     post_xml_callback=post_xml_callback)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     guest.launch(pause=pause)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     self._encoded_xml, errors='ignore')
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     self.force_reraise()
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     return self._domain.createWithFlags(flags)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     result = proxy_call(self._autowrap, f, *args, **kwargs)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     rv = execute(f, *args, **kwargs)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     six.reraise(c, e, tb)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     rv = meth(*args, **kwargs)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.4 is in use by driver QEMU, domain instance-00003be0
nova-compute.log:2019-10-29 19:43:20.319 75567 ERROR nova.compute.manager [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51]
nova-compute.log:2019-10-29 19:43:20.592 75567 INFO nova.compute.manager [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] [instance: 1220e935-1b76-4e49-ad54-35975a3c8c51] Successfully reverted task state from spawning on failure for instance.
nova-compute.log:2019-10-29 19:43:20.614 75567 ERROR oslo_messaging.rpc.server [req-5a4213b5-8684-4a5c-9c70-db5ad1bbce21 0ef1edad07ae478296c53922739d0a0f 7d0f409f58554e249ce3cdbc72ac1794 - - -] Exception during message handling
~~~

Comment 1 David Vallee Delisle 2019-11-01 12:48:42 UTC
I was referring to bz1413010 in my previous comment.

Comment 4 David Vallee Delisle 2019-11-06 17:35:24 UTC
Hello

I think I understand the issue. 
- When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1] 
- when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
- That brings us back to the commit [3] that Sean pointed yesterday and this upstream bug [4]
- I would assume that if we remove the "migration is not None" test, we will fail with this bug [3] because we get the pci_mapping from a migration object

Now I'm not sure how to generate the pci_mapping without a migration object/context.

Maybe I'm wrong also, please enlighten me.

Many thanks,

DVD

[1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
[2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
[3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
[4] https://bugs.launchpad.net/nova/+bug/1677621/

Comment 5 David Vallee Delisle 2019-11-06 18:14:10 UTC
I opened a bug upstream [1]

[1] https://bugs.launchpad.net/nova/+bug/1851545

Comment 12 David Vallee Delisle 2019-11-26 19:19:14 UTC
After talking with Sean from engineering, we're going to try this workaround until this is fixed.
- Ideally, unshelve on a compute with available PCI devices
- If it's not possible, we're going to try this:
-- openstack port set --binding-profile pci_vendor_info=xxx dc50d863-8922-4820-b6a3-4bcb3182cfdb --binding-profile pci_slot='xxx' --binding-profile physical_network='xxx'
-- Retry unshelve and validate pci_devices in nova database
-- If pci_devices table isn't updated (which is expected because nova populates neutron, not the other way around), we might need a support exception to update the table with the right information.
-- If nova.pci_devices isn't update, it might generate erroneous XML, or at least, break the ressource tracker.

Comment 13 David Vallee Delisle 2020-01-30 02:38:58 UTC
We hit another issue when unshelving like this: If the unshelved instance (with PCIPT) was originally scheduled on numaX, and it's unshelved on numaY, and since the pci_request isn't recalculated during unshelving, the unshelve process will bind PCI devices on the wrong numa node. This can be impacting performance wise, but also, this will break scheduling of future instances on the same compute node. Apparently, the nova compute tracker will try to re-assign the newly reserved PCI devices on new instances. It's like they are not reserved for some reason, but we see them mapped in ovs_neutron DB.

Comment 14 David Vallee Delisle 2020-01-30 20:18:14 UTC
We were hoping we could force the re-schedule of PCI devices by setting --no-binding-profile on the port(s), but the instantiation fails on the compute with this traceback [1]. I reproduced this in one of our internal lab if you're interested in playing with the environment.

Thanks,

DVD

[1]
~~~
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 66, in wrapped
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 188, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     LOG.warning(msg, e, instance=instance)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 157, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 613, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 216, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 204, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4332, in unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     do_unshelve_instance()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4331, in do_unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     filter_properties, node)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4390, in _unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     instance=instance)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     block_device_info=block_device_info)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2737, in spawn
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     block_device_info=block_device_info)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4891, in _get_guest_xml
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     context)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4717, in _get_guest_config
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     flavor, virt_type, self._host)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 640, in get_config
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     vif_obj = os_vif_util.nova_to_osvif_vif(vif)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/network/os_vif_util.py", line 408, in nova_to_osvif_vif
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server     {'type': vif['type'], 'func': funcname})
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
~~~

Comment 17 David Vallee Delisle 2020-03-19 20:21:13 UTC
I was able to reproduce this issue on RHOSP13:
openstack-nova-compute-17.0.13-2.el7ost.noarch

When we clear the binding profile, we have this failure [1].

When we don't clear the binding profile, we have this failure [2].

I'll attach sosreport from all overcloud nodes, as well as database dump to this BZ. I reproduced this in our of our internal lab environment and I'll gladly give you guys access if it helps. The only thing is that some people are waiting to use this lab, so it would have to be this week ideally. 

Thanks,

DVD

[1]
~~~
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server [req-bb05827d-cf3d-4f72-88b9-6f712b340861 010b0d44dce1415ebabb5f0848699601 e774604d0b5e4454984ef838266479b8 - default default] Exception during message handling: KeyError: 'pci_slot'
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1021, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5183, in unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     do_unshelve_instance()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5182, in do_unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     filter_properties, node)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5259, in _unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     self._nil_out_instance_obj_host_and_node(instance)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5243, in _unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3181, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     mdevs=mdevs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5493, in _get_guest_xml
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     context, mdevs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5290, in _get_guest_config
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     flavor, virt_type, self._host)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 701, in get_config
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     inst_type, virt_type, host)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 397, in get_config_hw_veb
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server     conf, net_type, profile['pci_slot'],
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server KeyError: 'pci_slot'
~~~

[2]
~~~
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server [req-d98a1ace-47b7-44b8-b056-83af58e6d069 010b0d44dce1415ebabb5f0848699601 e774604d0b5e4454984ef838266479b8 - default default] Exception during message handling: libvirtError: Requested operation is not valid: PCI device 0000:af:01.6 is in use by driver QEMU, domain instance-00000131
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1021, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5183, in unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     do_unshelve_instance()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5182, in do_unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     filter_properties, node)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5259, in _unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self._nil_out_instance_obj_host_and_node(instance)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5243, in _unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     block_device_info=block_device_info)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3186, in spawn
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     destroy_disks_on_failure=True)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5709, in _create_domain_and_network
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     destroy_disks_on_failure)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5678, in _create_domain_and_network
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     post_xml_callback=post_xml_callback)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5613, in _create_domain
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     guest.launch(pause=pause)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self._encoded_xml, errors='ignore')
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     return self._domain.createWithFlags(flags)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     rv = execute(f, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     six.reraise(c, e, tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     rv = meth(*args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server libvirtError: Requested operation is not valid: PCI device 0000:af:01.6 is in use by driver QEMU, domain instance-00000131
~~~

Comment 18 David Vallee Delisle 2020-03-19 20:29:35 UTC
Created attachment 1671589 [details]
overcloud database dump

Comment 20 David Vallee Delisle 2020-03-20 14:02:22 UTC
So I tested the manual definition of the binding profile on this port [1]:
# openstack port set --binding-profile '{"pci_slot": "0000:af:01.7", "physical_network": "sriov1", "pci_vendor_info": "15b3:1018"}' cc84e61e-e188-4129-a6c0-b95789e84e49

I was able to unshelve the instance and the XML had the right information [2].

The only issue I see is the pci_devices table in nova that isn't updated [3], so that could possible cause some scheduling conflicts in the future.



[1]
~~~
| binding:host_id       | ess13sriov-scpu-1.gsslab.rdu2.redhat.com                                                   |
| binding:profile       | {"pci_slot": "0000:af:01.6", "physical_network": "sriov1", "pci_vendor_info": "15b3:1018"} |
| binding:vif_details   | {"port_filter": false, "vlan": "1270"}                                                     |
| binding:vif_type      | hw_veb                                                                                     |
| binding:vnic_type     | direct      
~~~



[2]
~~~
      <source>
        <address type='pci' domain='0x0000' bus='0xaf' slot='0x01' function='0x7'/>
      </source>
~~~

[3]
~~~
[root@ess13sriov-ctrl-0 ~]# docker exec -ti galera-bundle-docker-0 mysql -D nova -e "select * from pci_devices where address rlike '0000:af:01.[6-7]' and compute_node_id = 2\G"
*************************** 1. row ***************************
     created_at: 2020-03-18 07:03:31
     updated_at: 2020-03-18 17:29:21
     deleted_at: NULL
        deleted: 0
             id: 107
compute_node_id: 2
        address: 0000:af:01.6
     product_id: 1018
      vendor_id: 15b3
       dev_type: type-VF
         dev_id: pci_0000_af_01_6
          label: label_15b3_1018
         status: allocated
     extra_info: {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"}
  instance_uuid: 7c6350dd-4284-476a-a4b7-ee2e7edcbfeb
     request_id: e8197a9d-f2f3-474d-9195-b5a7cda065ad
      numa_node: 1
    parent_addr: 0000:af:00.0
           uuid: fb486d42-e8d5-4784-a2fc-3fab4f822e20
*************************** 2. row ***************************
     created_at: 2020-03-18 07:03:31
     updated_at: 2020-03-19 18:24:51
     deleted_at: NULL
        deleted: 0
             id: 113
compute_node_id: 2
        address: 0000:af:01.7
     product_id: 1018
      vendor_id: 15b3
       dev_type: type-VF
         dev_id: pci_0000_af_01_7
          label: label_15b3_1018
         status: available
     extra_info: {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"}
  instance_uuid: NULL
     request_id: NULL
      numa_node: 1
    parent_addr: 0000:af:00.0
           uuid: 24c72412-1524-49be-89c4-828ff3c5741d
~~~

Comment 31 Lon Hohberger 2022-06-17 15:49:32 UTC
*** Bug 1851490 has been marked as a duplicate of this bug. ***

Comment 38 errata-xmlrpc 2022-09-21 12:07:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543

Comment 39 Artom Lifshitz 2023-07-13 05:44:08 UTC
*** Bug 1911710 has been marked as a duplicate of this bug. ***