Bug 1767797
| Summary: | When unshelving an SR-IOV instance, the binding profile isn't reclaimed or rescheduled, and this might cause PCI-PT conflicts | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Vallee Delisle <dvd> | ||||
| Component: | openstack-nova | Assignee: | Artom Lifshitz <alifshit> | ||||
| Status: | CLOSED ERRATA | QA Contact: | James Parker <jparker> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 13.0 (Queens) | CC: | alifshit, dasmith, dhill, don.weeks, ebarrera, eglynn, jhakimra, jlema, jparker, jzaher, kchamart, lhh, lyarwood, mflusche, mircea.vutcovici, rurena, sbauza, sgordon, smooney, vromanso | ||||
| Target Milestone: | ga | Keywords: | Patch, Triaged, ZStream | ||||
| Target Release: | 17.0 | Flags: | rurena:
needinfo?
(mbooth) |
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-nova-23.2.1-0.20220428212241.327693a.el9ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1852110 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-09-21 12:07:58 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
David Vallee Delisle
2019-11-01 12:46:58 UTC
I was referring to bz1413010 in my previous comment. Hello
I think I understand the issue.
- When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1]
- when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
- That brings us back to the commit [3] that Sean pointed yesterday and this upstream bug [4]
- I would assume that if we remove the "migration is not None" test, we will fail with this bug [3] because we get the pci_mapping from a migration object
Now I'm not sure how to generate the pci_mapping without a migration object/context.
Maybe I'm wrong also, please enlighten me.
Many thanks,
DVD
[1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
[2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
[3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
[4] https://bugs.launchpad.net/nova/+bug/1677621/
I opened a bug upstream [1] [1] https://bugs.launchpad.net/nova/+bug/1851545 After talking with Sean from engineering, we're going to try this workaround until this is fixed. - Ideally, unshelve on a compute with available PCI devices - If it's not possible, we're going to try this: -- openstack port set --binding-profile pci_vendor_info=xxx dc50d863-8922-4820-b6a3-4bcb3182cfdb --binding-profile pci_slot='xxx' --binding-profile physical_network='xxx' -- Retry unshelve and validate pci_devices in nova database -- If pci_devices table isn't updated (which is expected because nova populates neutron, not the other way around), we might need a support exception to update the table with the right information. -- If nova.pci_devices isn't update, it might generate erroneous XML, or at least, break the ressource tracker. We hit another issue when unshelving like this: If the unshelved instance (with PCIPT) was originally scheduled on numaX, and it's unshelved on numaY, and since the pci_request isn't recalculated during unshelving, the unshelve process will bind PCI devices on the wrong numa node. This can be impacting performance wise, but also, this will break scheduling of future instances on the same compute node. Apparently, the nova compute tracker will try to re-assign the newly reserved PCI devices on new instances. It's like they are not reserved for some reason, but we see them mapped in ovs_neutron DB. We were hoping we could force the re-schedule of PCI devices by setting --no-binding-profile on the port(s), but the instantiation fails on the compute with this traceback [1]. I reproduced this in one of our internal lab if you're interested in playing with the environment.
Thanks,
DVD
[1]
~~~
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 66, in wrapped
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 188, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 157, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 613, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 216, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 204, in decorated_function
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4332, in unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server do_unshelve_instance()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4331, in do_unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server filter_properties, node)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4390, in _unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server instance=instance)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2737, in spawn
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4891, in _get_guest_xml
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server context)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4717, in _get_guest_config
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server flavor, virt_type, self._host)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 640, in get_config
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server vif_obj = os_vif_util.nova_to_osvif_vif(vif)
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/network/os_vif_util.py", line 408, in nova_to_osvif_vif
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server {'type': vif['type'], 'func': funcname})
2020-01-30 20:07:29.202 52372 ERROR oslo_messaging.rpc.server NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
~~~
I was able to reproduce this issue on RHOSP13:
openstack-nova-compute-17.0.13-2.el7ost.noarch
When we clear the binding profile, we have this failure [1].
When we don't clear the binding profile, we have this failure [2].
I'll attach sosreport from all overcloud nodes, as well as database dump to this BZ. I reproduced this in our of our internal lab environment and I'll gladly give you guys access if it helps. The only thing is that some people are waiting to use this lab, so it would have to be this week ideally.
Thanks,
DVD
[1]
~~~
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server [req-bb05827d-cf3d-4f72-88b9-6f712b340861 010b0d44dce1415ebabb5f0848699601 e774604d0b5e4454984ef838266479b8 - default default] Exception during message handling: KeyError: 'pci_slot'
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1021, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5183, in unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server do_unshelve_instance()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5182, in do_unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server filter_properties, node)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5259, in _unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server self._nil_out_instance_obj_host_and_node(instance)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server self.force_reraise()
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5243, in _unshelve_instance
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3181, in spawn
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server mdevs=mdevs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5493, in _get_guest_xml
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server context, mdevs)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5290, in _get_guest_config
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server flavor, virt_type, self._host)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 701, in get_config
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server inst_type, virt_type, host)
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/vif.py", line 397, in get_config_hw_veb
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server conf, net_type, profile['pci_slot'],
/var/log/containers/nova/nova-compute.log.1:2020-03-18 17:53:00.589 9 ERROR oslo_messaging.rpc.server KeyError: 'pci_slot'
~~~
[2]
~~~
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server [req-d98a1ace-47b7-44b8-b056-83af58e6d069 010b0d44dce1415ebabb5f0848699601 e774604d0b5e4454984ef838266479b8 - default default] Exception during message handling: libvirtError: Requested operation is not valid: PCI device 0000:af:01.6 is in use by driver QEMU, domain instance-00000131
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 189, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 159, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1021, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 217, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5183, in unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server do_unshelve_instance()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5182, in do_unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server filter_properties, node)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5259, in _unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self._nil_out_instance_obj_host_and_node(instance)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5243, in _unshelve_instance
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3186, in spawn
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server destroy_disks_on_failure=True)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5709, in _create_domain_and_network
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server destroy_disks_on_failure)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5678, in _create_domain_and_network
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server post_xml_callback=post_xml_callback)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5613, in _create_domain
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server guest.launch(pause=pause)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self._encoded_xml, errors='ignore')
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server return self._domain.createWithFlags(flags)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server six.reraise(c, e, tb)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-03-19 20:17:34.552 9 ERROR oslo_messaging.rpc.server libvirtError: Requested operation is not valid: PCI device 0000:af:01.6 is in use by driver QEMU, domain instance-00000131
~~~
Created attachment 1671589 [details]
overcloud database dump
So I tested the manual definition of the binding profile on this port [1]:
# openstack port set --binding-profile '{"pci_slot": "0000:af:01.7", "physical_network": "sriov1", "pci_vendor_info": "15b3:1018"}' cc84e61e-e188-4129-a6c0-b95789e84e49
I was able to unshelve the instance and the XML had the right information [2].
The only issue I see is the pci_devices table in nova that isn't updated [3], so that could possible cause some scheduling conflicts in the future.
[1]
~~~
| binding:host_id | ess13sriov-scpu-1.gsslab.rdu2.redhat.com |
| binding:profile | {"pci_slot": "0000:af:01.6", "physical_network": "sriov1", "pci_vendor_info": "15b3:1018"} |
| binding:vif_details | {"port_filter": false, "vlan": "1270"} |
| binding:vif_type | hw_veb |
| binding:vnic_type | direct
~~~
[2]
~~~
<source>
<address type='pci' domain='0x0000' bus='0xaf' slot='0x01' function='0x7'/>
</source>
~~~
[3]
~~~
[root@ess13sriov-ctrl-0 ~]# docker exec -ti galera-bundle-docker-0 mysql -D nova -e "select * from pci_devices where address rlike '0000:af:01.[6-7]' and compute_node_id = 2\G"
*************************** 1. row ***************************
created_at: 2020-03-18 07:03:31
updated_at: 2020-03-18 17:29:21
deleted_at: NULL
deleted: 0
id: 107
compute_node_id: 2
address: 0000:af:01.6
product_id: 1018
vendor_id: 15b3
dev_type: type-VF
dev_id: pci_0000_af_01_6
label: label_15b3_1018
status: allocated
extra_info: {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"}
instance_uuid: 7c6350dd-4284-476a-a4b7-ee2e7edcbfeb
request_id: e8197a9d-f2f3-474d-9195-b5a7cda065ad
numa_node: 1
parent_addr: 0000:af:00.0
uuid: fb486d42-e8d5-4784-a2fc-3fab4f822e20
*************************** 2. row ***************************
created_at: 2020-03-18 07:03:31
updated_at: 2020-03-19 18:24:51
deleted_at: NULL
deleted: 0
id: 113
compute_node_id: 2
address: 0000:af:01.7
product_id: 1018
vendor_id: 15b3
dev_type: type-VF
dev_id: pci_0000_af_01_7
label: label_15b3_1018
status: available
extra_info: {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"}
instance_uuid: NULL
request_id: NULL
numa_node: 1
parent_addr: 0000:af:00.0
uuid: 24c72412-1524-49be-89c4-828ff3c5741d
~~~
*** Bug 1851490 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 *** Bug 1911710 has been marked as a duplicate of this bug. *** |