Bug 1664701 - [OSP13] Oversubscription broken for instances with NUMA topologies
Summary: [OSP13] Oversubscription broken for instances with NUMA topologies
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: All
OS: All
high
high
Target Milestone: z6
: 13.0 (Queens)
Assignee: Stephen Finucane
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1625120 1664698
Blocks: 1664702
TreeView+ depends on / blocked
 
Reported: 2019-01-09 13:36 UTC by Stephen Finucane
Modified: 2023-03-21 19:10 UTC (History)
10 users (show)

Fixed In Version: openstack-nova-17.0.9-5.el7ost
Doc Type: Known Issue
Doc Text:
A recent change made memory allocation for instances with NUMA topologies pagesize aware. With this change, memory for instances with NUMA topologies can no longer be oversubscribed. As a result, memory oversubscription is currently disabled for all instances with a NUMA topology, whereas previously only instances with hugepages were not allowed to use oversubscription. This affects instances with an explicit NUMA topology and those with an implicit topology. An instance can have an implicit NUMA topology due to the use of hugepages or CPU pinning. If possible, avoid the use of explicit NUMA topologies. If CPU pinning is required, resulting in an implicit NUMA topology, there is no workaround.
Clone Of: 1664698
: 1664702 (view as bug list)
Environment:
Last Closed: 2019-04-30 17:13:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1810977 0 None None None 2019-01-09 13:36:21 UTC
Red Hat Product Errata RHBA-2019:0924 0 None None None 2019-04-30 17:14:13 UTC

Description Stephen Finucane 2019-01-09 13:36:22 UTC
+++ This bug was initially created as a clone of Bug #1664698 +++

Description of problem:

As described in [1], the fix to [2] appears to have inadvertently broken oversubscription of memory for instances with a NUMA topology but no hugepages.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always.

Steps to Reproduce:

1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance:

   $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
   $ openstack flavor set test.numa --property hw:numa_nodes=2

2. Boot an instance using this flavor:

   $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test

3. Boot another instance using this flavor:

   $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2

Actual results:

The second instance fails to boot. We see the following error message in the logs.

  nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
  nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}

If we revert the patch that addressed the bug [3] then we revert to the correct behaviour and the instance boots. With this though, we obviously lose whatever benefits that change gave us.

Expected results:

The second instance should boot.

Additional info:

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
[2] https://bugs.launchpad.net/nova/+bug/1734204
[3] https://review.openstack.org/#/c/532168

Comment 8 Joe H. Rahme 2019-04-29 16:02:24 UTC
* Puddle version

	[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed
	13  -p 2019-04-18.2

* Compute nodes

	(overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show bb4c605c-2fa6-4cf6-bdf4-b05e6b157d33 MEMORY_MB
	+------------------+-------+
	| Field            | Value |
	+------------------+-------+
	| allocation_ratio | 3.0   |
	| max_unit         | 6143  |
	| reserved         | 4096  |
	| step_size        | 1     |
	| min_unit         | 1     |
	| total            | 6143  |
	+------------------+-------+
	(overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show 5b51b165-82ce-42d9-80e6-9874a7f4b0ab MEMORY_MB
	+------------------+-------+
	| Field            | Value |
	+------------------+-------+
	| allocation_ratio | 1.0   |
	| max_unit         | 6143  |
	| reserved         | 4096  |
	| step_size        | 1     |
	| min_unit         | 1     |
	| total            | 6143  |
	+------------------+-------+



* Instance flavor

	(overcloud) [stack@undercloud-0 ~]$ openstack flavor create --vcpu 2 --disk 0 --ram 3000 half_node_flavor
	+----------------------------+--------------------------------------+
	| Field                      | Value                                |
	+----------------------------+--------------------------------------+
	| OS-FLV-DISABLED:disabled   | False                                |
	| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
	| disk                       | 0                                    |
	| id                         | 64918380-b183-43f5-9896-8b59020d1a3a |
	| name                       | half_node_flavor                     |
	| os-flavor-access:is_public | True                                 |
	| properties                 |                                      |
	| ram                        | 3000                                 |
	| rxtx_factor                | 1.0                                  |
	| swap                       |                                      |
	| vcpus                      | 2                                    |
	+----------------------------+--------------------------------------+
	(overcloud) [stack@undercloud-0 ~]$ openstack flavor set half_node_flavor --property hw:numa_nodes=1


* Boot 3 instances

	(overcloud) [stack@undercloud-0 os-smoke]$ for i in {1..3}; do openstack server create --flavor half_node_flavor --image cirros-0.3.5-x86_64-disk.img --nic net-id=fd62fe37-9669-4720-b8da-5574c09d0fc2 $i --wait;
	done

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000b                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:21.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.11                                             |
	| adminPass                           | oMPhrkvN9iCm                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:10Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 94d8c935-131b-4a39-b899-a076c1ce273a                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 1                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:21Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000e                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:43.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.6                                              |
	| adminPass                           | yJrMvrSKndL9                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:33Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 86217109-9003-45d5-ac4a-59952378bbe7                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 2                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:43Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000e                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:43.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.6                                              |
	| adminPass                           | yJrMvrSKndL9                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:33Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 86217109-9003-45d5-ac4a-59952378bbe7                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 2                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:43Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

Comment 10 errata-xmlrpc 2019-04-30 17:13:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0924


Note You need to log in before you can comment on or make changes to this bug.