Bug 1664698
Summary: | [OSP14] Oversubscription broken for instances with NUMA topologies | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Stephen Finucane <stephenfin> | |
Component: | openstack-nova | Assignee: | Stephen Finucane <stephenfin> | |
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 14.0 (Rocky) | CC: | dasmith, dvd, eglynn, jhakimra, kchamart, lyarwood, marjones, mbooth, sbauza, sclewis, sgordon, tvignaud, vromanso | |
Target Milestone: | z2 | Keywords: | Triaged, ZStream | |
Target Release: | 14.0 (Rocky) | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-18.1.1-0.20190313210932.300c9b2.el7 | Doc Type: | Known Issue | |
Doc Text: |
A recent change made memory allocation for instances with NUMA topologies pagesize aware. With this change, memory for instances with NUMA topologies can no longer be oversubscribed.
Memory oversubscription is currently disabled for all instances with a NUMA topology, whereas previously only instances with hugepages were not allowed to use oversubscription. This affects instances with an explicit NUMA topology and those with an implicit topology. An instance can have an implicit NUMA topology due to the use of hugepages or CPU pinning.
If possible, avoid the use of explicit NUMA topologies. If CPU pinning is required, resulting in an implicit NUMA topology, there is no workaround.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1664701 (view as bug list) | Environment: | ||
Last Closed: | 2019-04-30 17:47:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1625122 | |||
Bug Blocks: | 1664701, 1664702 |
Description
Stephen Finucane
2019-01-09 13:33:46 UTC
Test case A: This is a positive test case. 1. Configure the host to use a RAM overcommit ratio of 16.0, adding the following to nova.conf: ram_allocation_ratio = 16.0 1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance: $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa $ openstack flavor set test.numa --property hw:numa_nodes=2 2. Boot an instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test 3. Boot another instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2 Both instances should boot. Test case B: This is a negative test case. 1. Configure the host to use a RAM overcommit ratio of 1.0, adding the following to nova.conf: ram_allocation_ratio = 1.0 1. Create a flavor that will consume > 50% available memory for your host(s) and specify an explicit NUMA topology. For example, on my all-in-one deployment where the host has 32GB RAM, we will request a 20GB instance: $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa $ openstack flavor set test.numa --property hw:numa_nodes=2 2. Boot an instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test 3. Boot another instance using this flavor: $ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2 The first instance should boot. The second instance should fail to boot. As of OSP13, the ram alloc ratio is defined in the placement API * nova version (overcloud) [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed 14 -p 2019-04-05.1 * Compute nodes (overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show 8cc922c3-3f8d-4a2f-ac64-c5090423dcf1 MEMORY_MB +------------------+-------+ | Field | Value | +------------------+-------+ | allocation_ratio | 3.0 | | max_unit | 6143 | | reserved | 4096 | | step_size | 1 | | min_unit | 1 | | total | 6143 | +------------------+-------+ (overcloud) [stack@undercloud-0 ~]$ openstack resource provider inventory show f16eeb0d-5833-4471-b615-fb61ed3fa49c MEMORY_MB +------------------+-------+ | Field | Value | +------------------+-------+ | allocation_ratio | 1.5 | | max_unit | 6143 | | reserved | 4096 | | step_size | 1 | | min_unit | 1 | | total | 6143 | +------------------+-------+ * Instance flavor (overcloud) [stack@undercloud-0 ~]$ openstack flavor show half_node_flavor +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 0 | | id | 8394a344-24eb-4d72-bd83-3c34ef57e2d5 | | name | half_node_flavor | | os-flavor-access:is_public | True | | properties | hw:numa_nodes='1' | | ram | 3000 | | rxtx_factor | 1.0 | | swap | | | vcpus | 2 | +----------------------------+--------------------------------------+ * Boot 3 instances (validates that oversubscription works): (overcloud) [stack@undercloud-0 ~]$ for i in {1..3}; do openstack server create --flavor half_node_flavor --image cirros-0.3.5-x86_64-disk.img --nic net-id=cada4c5e-3987-4ac8-9e82-9bff0a9fb291 $i --wait; done +-------------------------------------+---------------------------------------------------------------------+ | Field | Value | +-------------------------------------+---------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-1.localdomain | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-00000026 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2019-04-29T15:18:12.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | private=192.168.200.15 | | adminPass | RRez8juH22Zh | | config_drive | | | created | 2019-04-29T15:17:57Z | | flavor | half_node_flavor (8394a344-24eb-4d72-bd83-3c34ef57e2d5) | | hostId | 1ff3d27b587c01eff3a0d6a53e67207207ded52654e8597517c5ef26 | | id | 0973bd20-3acd-4331-a265-68d0016271e9 | | image | cirros-0.3.5-x86_64-disk.img (f272f221-fa36-4e18-800d-1bbcb9eb1bb2) | | key_name | None | | name | 1 | | progress | 0 | | project_id | be47c8cfb80446fd85e58f9ef060db0f | | properties | | | security_groups | name='default' | | status | ACTIVE | | updated | 2019-04-29T15:18:12Z | | user_id | 17be1bcfab9c438f99201442cd73f6e2 | | volumes_attached | | +-------------------------------------+---------------------------------------------------------------------+ +-------------------------------------+---------------------------------------------------------------------+ | Field | Value | +-------------------------------------+---------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-0.localdomain | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-0.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-00000029 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2019-04-29T15:18:45.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | private=192.168.200.26 | | adminPass | QgEzm29JBN3c | | config_drive | | | created | 2019-04-29T15:18:30Z | | flavor | half_node_flavor (8394a344-24eb-4d72-bd83-3c34ef57e2d5) | | hostId | 07663e9fd4675aadfe5d7732494e7db23f8d4700383dd6697e8b05dd | | id | de6581af-5dc5-408d-ae8a-b681741f4b5c | | image | cirros-0.3.5-x86_64-disk.img (f272f221-fa36-4e18-800d-1bbcb9eb1bb2) | | key_name | None | | name | 2 | | progress | 0 | | project_id | be47c8cfb80446fd85e58f9ef060db0f | | properties | | | security_groups | name='default' | | status | ACTIVE | | updated | 2019-04-29T15:18:45Z | | user_id | 17be1bcfab9c438f99201442cd73f6e2 | | volumes_attached | | +-------------------------------------+---------------------------------------------------------------------+ +-------------------------------------+---------------------------------------------------------------------+ | Field | Value | +-------------------------------------+---------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-1.localdomain | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-0000002c | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2019-04-29T15:19:17.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | private=192.168.200.8 | | adminPass | ZjvsHSY8BcXA | | config_drive | | | created | 2019-04-29T15:19:06Z | | flavor | half_node_flavor (8394a344-24eb-4d72-bd83-3c34ef57e2d5) | | hostId | 1ff3d27b587c01eff3a0d6a53e67207207ded52654e8597517c5ef26 | | id | bb5ec2b9-eaa5-4b9b-beea-45b7d5ff0d66 | | image | cirros-0.3.5-x86_64-disk.img (f272f221-fa36-4e18-800d-1bbcb9eb1bb2) | | key_name | None | | name | 3 | | progress | 0 | | project_id | be47c8cfb80446fd85e58f9ef060db0f | | properties | | | security_groups | name='default' | | status | ACTIVE | | updated | 2019-04-29T15:19:17Z | | user_id | 17be1bcfab9c438f99201442cd73f6e2 | | volumes_attached | | +-------------------------------------+---------------------------------------------------------------------+ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0941 |