Bug 1664701

Summary: [OSP13] Oversubscription broken for instances with NUMA topologies
Product: Red Hat OpenStack Reporter: Stephen Finucane <stephenfin>
Component: openstack-novaAssignee: Stephen Finucane <stephenfin>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: amcleod, dasmith, eglynn, jhakimra, kchamart, lyarwood, mbooth, sbauza, sgordon, vromanso
Target Milestone: z6Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-nova-17.0.9-5.el7ost Doc Type: Known Issue
Doc Text:
A recent change made memory allocation for instances with NUMA topologies pagesize aware. With this change, memory for instances with NUMA topologies can no longer be oversubscribed. As a result, memory oversubscription is currently disabled for all instances with a NUMA topology, whereas previously only instances with hugepages were not allowed to use oversubscription. This affects instances with an explicit NUMA topology and those with an implicit topology. An instance can have an implicit NUMA topology due to the use of hugepages or CPU pinning. If possible, avoid the use of explicit NUMA topologies. If CPU pinning is required, resulting in an implicit NUMA topology, there is no workaround.
Story Points: ---
Clone Of: 1664698
: 1664702 (view as bug list) Environment:
Last Closed: 2019-04-30 17:13:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1625120, 1664698    
Bug Blocks: 1664702    

Description Stephen Finucane 2019-01-09 13:36:22 UTC
+++ This bug was initially created as a clone of Bug #1664698 +++

Description of problem:

As described in [1], the fix to [2] appears to have inadvertently broken oversubscription of memory for instances with a NUMA topology but no hugepages.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always.

Steps to Reproduce:

1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance:

   $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
   $ openstack flavor set test.numa --property hw:numa_nodes=2

2. Boot an instance using this flavor:

   $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test

3. Boot another instance using this flavor:

   $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2

Actual results:

The second instance fails to boot. We see the following error message in the logs.

  nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
  nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}

If we revert the patch that addressed the bug [3] then we revert to the correct behaviour and the instance boots. With this though, we obviously lose whatever benefits that change gave us.

Expected results:

The second instance should boot.

Additional info:

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
[2] https://bugs.launchpad.net/nova/+bug/1734204
[3] https://review.openstack.org/#/c/532168

Comment 8 Joe H. Rahme 2019-04-29 16:02:24 UTC
* Puddle version

	[stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed
	13  -p 2019-04-18.2

* Compute nodes

	(overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show bb4c605c-2fa6-4cf6-bdf4-b05e6b157d33 MEMORY_MB
	+------------------+-------+
	| Field            | Value |
	+------------------+-------+
	| allocation_ratio | 3.0   |
	| max_unit         | 6143  |
	| reserved         | 4096  |
	| step_size        | 1     |
	| min_unit         | 1     |
	| total            | 6143  |
	+------------------+-------+
	(overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show 5b51b165-82ce-42d9-80e6-9874a7f4b0ab MEMORY_MB
	+------------------+-------+
	| Field            | Value |
	+------------------+-------+
	| allocation_ratio | 1.0   |
	| max_unit         | 6143  |
	| reserved         | 4096  |
	| step_size        | 1     |
	| min_unit         | 1     |
	| total            | 6143  |
	+------------------+-------+



* Instance flavor

	(overcloud) [stack@undercloud-0 ~]$ openstack flavor create --vcpu 2 --disk 0 --ram 3000 half_node_flavor
	+----------------------------+--------------------------------------+
	| Field                      | Value                                |
	+----------------------------+--------------------------------------+
	| OS-FLV-DISABLED:disabled   | False                                |
	| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
	| disk                       | 0                                    |
	| id                         | 64918380-b183-43f5-9896-8b59020d1a3a |
	| name                       | half_node_flavor                     |
	| os-flavor-access:is_public | True                                 |
	| properties                 |                                      |
	| ram                        | 3000                                 |
	| rxtx_factor                | 1.0                                  |
	| swap                       |                                      |
	| vcpus                      | 2                                    |
	+----------------------------+--------------------------------------+
	(overcloud) [stack@undercloud-0 ~]$ openstack flavor set half_node_flavor --property hw:numa_nodes=1


* Boot 3 instances

	(overcloud) [stack@undercloud-0 os-smoke]$ for i in {1..3}; do openstack server create --flavor half_node_flavor --image cirros-0.3.5-x86_64-disk.img --nic net-id=fd62fe37-9669-4720-b8da-5574c09d0fc2 $i --wait;
	done

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000b                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:21.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.11                                             |
	| adminPass                           | oMPhrkvN9iCm                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:10Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 94d8c935-131b-4a39-b899-a076c1ce273a                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 1                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:21Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000e                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:43.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.6                                              |
	| adminPass                           | yJrMvrSKndL9                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:33Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 86217109-9003-45d5-ac4a-59952378bbe7                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 2                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:43Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

	+-------------------------------------+---------------------------------------------------------------------+
	| Field                               | Value                                                               |
	+-------------------------------------+---------------------------------------------------------------------+
	| OS-DCF:diskConfig                   | MANUAL                                                              |
	| OS-EXT-AZ:availability_zone         | nova                                                                |
	| OS-EXT-SRV-ATTR:host                | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain                                               |
	| OS-EXT-SRV-ATTR:instance_name       | instance-0000000e                                                   |
	| OS-EXT-STS:power_state              | Running                                                             |
	| OS-EXT-STS:task_state               | None                                                                |
	| OS-EXT-STS:vm_state                 | active                                                              |
	| OS-SRV-USG:launched_at              | 2019-04-29T15:59:43.000000                                          |
	| OS-SRV-USG:terminated_at            | None                                                                |
	| accessIPv4                          |                                                                     |
	| accessIPv6                          |                                                                     |
	| addresses                           | devstack=192.168.100.6                                              |
	| adminPass                           | yJrMvrSKndL9                                                        |
	| config_drive                        |                                                                     |
	| created                             | 2019-04-29T15:59:33Z                                                |
	| flavor                              | half_node_flavor (64918380-b183-43f5-9896-8b59020d1a3a)             |
	| hostId                              | 6d6b6aa9727938b4d53fe771464bd22e688b32326d21866e0130240a            |
	| id                                  | 86217109-9003-45d5-ac4a-59952378bbe7                                |
	| image                               | cirros-0.3.5-x86_64-disk.img (020c49be-3cb5-40ec-acad-8d6f67663785) |
	| key_name                            | None                                                                |
	| name                                | 2                                                                   |
	| progress                            | 0                                                                   |
	| project_id                          | 19a12ec527c649b2928fadb009f84196                                    |
	| properties                          |                                                                     |
	| security_groups                     | name='default'                                                      |
	| status                              | ACTIVE                                                              |
	| updated                             | 2019-04-29T15:59:43Z                                                |
	| user_id                             | 47becba725b741b8800ca5a15591924b                                    |
	| volumes_attached                    |                                                                     |
	+-------------------------------------+---------------------------------------------------------------------+

Comment 10 errata-xmlrpc 2019-04-30 17:13:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0924