+++ This bug was initially created as a clone of Bug #1664701 +++ Description of problem: As described in [1], the fix to [2] appears to have inadvertently broken oversubscription of memory for instances with a NUMA topology but no hugepages. Version-Release number of selected component (if applicable): N/A How reproducible: Always. Steps to Reproduce: 1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance: $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa $ openstack flavor set test.numa --property hw:numa_nodes=2 2. Boot an instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test 3. Boot another instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2 Actual results: The second instance fails to boot. We see the following error message in the logs. nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}} nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}} If we revert the patch that addressed the bug [3] then we revert to the correct behaviour and the instance boots. With this though, we obviously lose whatever benefits that change gave us. Expected results: The second instance should boot. Additional info: [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html [2] https://bugs.launchpad.net/nova/+bug/1734204 [3] https://review.openstack.org/#/c/532168
Verification steps: # 2 compute nodes with ~6GB of memory each [stack@undercloud-0 ~]$ for i in 6 8; do ssh heat-admin.24.$i 'echo $(hostname) $(grep MemTotal /proc/meminfo)'; done compute-1 MemTotal: 5944884 kB compute-0 MemTotal: 5944892 kB # Create a large flavor with numa_nodes [stack@undercloud-0 ~]$ openstack flavor create --vcpu 2 --disk 0 --ram 4096 test.numa [stack@undercloud-0 ~]$ openstack flavor set test.numa --property hw:numa_nodes=1 # boot 2 instances with this flavor. Works because each instance goes on a separate compute [stack@undercloud-0 ~]$ nova boot --poll --image cirros --flavor test.numa test1 --nic net-id=353d787b-7788-40b0-aaff-a0ab2325b64e [stack@undercloud-0 ~]$ nova boot --poll --image cirros --flavor test.numa test2 --nic net-id=353d787b-7788-40b0-aaff-a0ab2325b64e # Negative test, booting a third instance will fail with the 'No valid host error' [stack@undercloud-0 ~]$ nova boot --poll --image cirros --flavor test.numa test3 --nic net-id=353d787b-7788-40b0-aaff-a0ab2325b64e # Modify `ram_allocation_ratio` in nova.conf on the compute node [heat-admin@compute-1 ~]$ sudo grep ram_allocation_ratio /etc/nova/nova.conf ram_allocation_ratio=2.0 # Boot a 4th instance, it boots successfully [stack@undercloud-0 ~]$ nova boot --poll --image cirros --flavor test.numa test4 --nic net-id=353d787b-7788-40b0-aaff-a0ab2325b64e [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+------------------------+ | 4baccd63-0a8e-4288-97a0-b2b449d45a39 | test1 | ACTIVE | - | Running | private=192.168.100.9 | | ff0a5dd2-a1b8-4937-a3e9-c8a45f5253dd | test2 | ACTIVE | - | Running | private=192.168.100.6 | | 5bb3597c-a193-479a-9292-6d652b799a66 | test3 | ERROR | - | NOSTATE | | | 81ce205a-1a15-48f6-8055-3c1a39334602 | test4 | ACTIVE | - | Running | private=192.168.100.16 | +--------------------------------------+-------+--------+------------+-------------+------------------------+ # Package version: openstack-nova-common.noarch 1:14.1.0-44.el7ost @rhos-10.0-signed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0923