Bug 1714039

Summary: [RFE] In limited resources environments, rebuilding VMs fails due to NUMATopologyFilter because scheduling isn't skipped.
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: dasmith, eglynn, jhakimra, kchamart, madgupta, mbooth, rlondhe, sbauza, sgordon, smooney, vromanso
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-15 15:05:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2019-05-26 20:18:12 UTC
Description of problem:
In limited resources environments, rebuilding VMs fails due to NUMATopologyFilter because scheduling isn't skipped even though the image metadata are the same:

nova-scheduler.log:2019-04-25 15:37:07.031 3002 INFO nova.filters [req-c49164d8-95d4-4483-85ac-7cf6eb5e5264 e5a411de82e94a94ae80181bc710696b 17d1fb77d99b4f24a4898222e8c912ab - - -] Filtering removed all hosts for the request with instance ID 'a89e4dff-f247-4c10-b6fb-4d20faa77af8'. Filter results: ['AvailabilityZoneFilter: (start: 1, end: 1)', 'RamFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'PciPassthroughFilter: (start: 1, end: 1)', 'NUMATopologyFilter: (start: 1, end: 0)']

Version-Release number of selected component (if applicable):
openstack-nova-compute-14.1.0-40.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create VMs with pinned vcpus until the overcloud is saturated 
2. Try rebuilding one while also changing the image
3.

Actual results:
Fails

Expected results:
Succeeds if metadata is the same.

Additional info:

Comment 1 Artom Lifshitz 2019-05-30 16:52:08 UTC
FWIW, I think the proper long term fix for this is to use Placement update allocations API [1] along with the standard CPU resource tracking spec [2] and eventually NUMA in placement [3] [4]. Not sure what would be an acceptable short-term solution/workaround.

[1] https://developer.openstack.org/api-ref/placement/?expanded=update-allocations-detail#update-allocations
[2] https://review.opendev.org/#/c/555081/
[3] https://review.opendev.org/#/c/662191/
[4] https://review.opendev.org/#/c/658510/

Comment 2 Matthew Booth 2019-05-31 14:45:26 UTC
We think this is a valid bug, but it's still an open question whether we can fix it in OSP10.

Comment 3 Artom Lifshitz 2019-06-07 14:59:07 UTC
We can't fix this in OSP10 unfortunately, but we'd like to keep tracking this for OSP17. It's far away, but realistically it's going to be the first release where we might be able to address this.

Comment 5 Stephen Finucane 2019-07-25 15:31:02 UTC
*** Bug 1731847 has been marked as a duplicate of this bug. ***

Comment 12 smooney 2019-10-15 15:05:29 UTC

*** This bug has been marked as a duplicate of bug 1700412 ***