Bug 1413010
Summary: | unable to unshelve instances | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Pratik Pravin Bandarkar <pbandark> | |
Component: | openstack-nova | Assignee: | Vladik Romanovsky <vromanso> | |
Status: | CLOSED ERRATA | QA Contact: | awaugama | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 8.0 (Liberty) | CC: | aguetta, berrange, ccollett, cshastri, dasmith, eglynn, jhakimra, jjoyce, kchamart, mbooth, mlopes, mschuppe, pbandark, sbauza, sferdjao, sgordon, srevivo, vaggarwa, vromanso | |
Target Milestone: | zstream | Keywords: | OtherQA, TestOnly, Triaged, ZStream | |
Target Release: | 8.0 (Liberty) | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-12.0.6-14.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1414965 (view as bug list) | Environment: | ||
Last Closed: | 2017-10-25 17:10:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1409356 | |||
Bug Blocks: | 1414965 |
Description
Pratik Pravin Bandarkar
2017-01-13 12:08:24 UTC
Hello, Apologize for the delay. Unfortunately, I didn't find any mentioning of the provided traces in the logs. These traces are from 2017-01-03, but the attached logs contain only activity from 2017-01-13. Will it be possible to reproduce the issue and capture the logs right after it occurred? Also, in OSP8 we didn't allocate new pci devices during unshelve/rebuild/evacuate operations. We are also not updating the neutron port binding, that holds the pci address of the device - nova libvirt driver uses it to configure the virtual interfaces.. as in [1] However, this change relies on the work that has been done across 2 cycles, (Mitaka and Newton) that introduced a migration context object and made resources to be claimed and allocated during the above operations ([2] and [3]). These patches are not backportable, due to RPC and object changes. [1] https://review.openstack.org/#/c/242573 [2] https://review.openstack.org/#/q/topic:bug/1417667 [3] https://review.openstack.org/#/q/topic:bp/migration-fix-resource-tracking I may have a path to follow... As Vladik indicated on comment #14, the filter 'pci_passthrough_filter.py' returns true whether it does not find any pci_requests attached to the instance. My thinking is that when the API is loading the instance to then pass it to the compute API, conductor and finally to the scheduler, the instance does not have the attribute 'pci_requests' loaded resulting that the instance can be offloaded on a compute node which can't accept the request. That is the patch I would propose I could still provide test-build if customer prefer. diff --git a/nova/api/openstack/compute/shelve.py b/nova/api/openstack/compute/shelve.py index 6f9f8ae..2f31554 100644 --- a/nova/api/openstack/compute/shelve.py +++ b/nova/api/openstack/compute/shelve.py @@ -59,7 +59,8 @@ class ShelveController(wsgi.Controller): context = req.environ["nova.context"] authorize(context, action='shelve_offload') - instance = common.get_instance(self.compute_api, context, id) + instance = common.get_instance( + self.compute_api, context, id, expected_attrs=['pci_requests']) try: self.compute_api.shelve_offload(context, instance) except exception.InstanceUnknownCell as e: An other way to "fix" the issue (if that is really the root cause) would be to replace that part of code [0], by a call to the database to get the pci_requests related to the instance scheduled but I would say it's going to create a larger overhead since for each instance scheduled the database is going to be hit. [0] https://code.engineering.redhat.com/gerrit/gitweb?p=nova.git;a=blob;f=nova/scheduler/filter_scheduler.py;h=ec986252f49f60640b8d75f8162b6a39aa640fd1;hb=refs/heads/rhos-8.0-patches#l114 Like I said in comment #16, the problem is that the instance we get when calling unshelve is not having the pci_requests field set. So, yeah, I definitely agree with the proposal of comment #17 to load the PCI bits when getting the instance. To be clear, that issue is not present in OSP9 because we should get the original RequestSpec record that includes the pci_requests field when calling unshelve *but* the upstream Gerrit change I commented on comment #16 is not backportable given lots of RPC changes and DB modifications involved by that feature. About comment #18, I disagree to provide such modification in the filter. Conceptually, we don't want for performance reasons (mostly) to query the Nova DB when we lookup the filters (in particular the instances table which is vrey large). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3068 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |