Description of problem: Instance creation fails onto NUMATopologyFilter when it seems there's at least 1 numa node with enough ressources. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.2.2 (Train) How reproducible: Every time we try to spawn a instance using this flavor. Steps to Reproduce: 1. Try to create a VM using the flavor. 2. 3. Actual results: Being able to create a VM within a numa node with available ressource. Expected results: Creation gets block at NUMATopologyFilter. Additional info: [stack@director ]$ openstack flavor show ovn-dpdk +----------------------------+------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +----------------------------+------------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 20 | | extra_specs | {'hw:cpu_policy': 'dedicated', 'hw:emulator_threads_policy': 'isolate', 'hw:mem_page_size': '1GB', 'ovn-dpdk': 'true'} | | name | ovn-dpdk | | os-flavor-access:is_public | True | | properties | hw:cpu_policy='dedicated', hw:emulator_threads_policy='isolate', hw:mem_page_size='1GB', ovn-dpdk='true' | | ram | 4096 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+------------------------------------------------------------------------------------------------------------------------+
updating the title to reject that this is being used to track improving logging. tl;dr the original bug report was invalid because the customer did not actually have enough space to boot all the vms they wanted on the host in question. however while debugging this we noticed that _numa_cells_support_network_metadata does not have any logging so when it eliminates a host cell because the numa aware switch feature is in use there is not log to indicate that. As such it makes debugging scheduling issues related to numa aware vswitchs very difficult without intimate knowledge of the code. we can improve this trivially by adding logging at debug and or info level when a cell is eliminated.
I'm going to convert this to a bug to improve logging in that area of the code, target 16.x because we'll need it for customer cases.
Upstream bug at: https://bugs.launchpad.net/nova/+bug/1751784
I think aiming for 16.2.6 with this is realistic, given how small the patch is.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974