Bug 1348732 - Nova incorrectly assumes it can use all allocated hugepages for VM creation
Summary: Nova incorrectly assumes it can use all allocated hugepages for VM creation
Keywords:
Status: CLOSED DUPLICATE of bug 1300680
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Sahid Ferdjaoui
QA Contact: Prasanth Anbalagan
URL:
Whiteboard: hot
Depends On:
Blocks: 1194008 1295530
TreeView+ depends on / blocked
 
Reported: 2016-06-21 22:24 UTC by joycej
Modified: 2019-09-09 13:37 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-24 15:00:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description joycej 2016-06-21 22:24:32 UTC
Description of problem:
Nova determines the number of Huge pages allocated to a compute node (and the numa nodes) at its initial run time.   It ignores the fact that other users might use huge pages and never updates what might be available based on these other users.   Therefore its scheduling can become screwed up as it thinks thre is enough hugepages to support a VM, while there isn't and the VM will fail to spawn.  Exasperating this problem is the fact that the scheduler generally will schedule all Vms on a single NUMA node before moving to the next NUMA node.   


Version-Release number of selected component (if applicable):


How reproducible:
Easily

Steps to Reproduce:
1. Consume some huge pages from a compute host outside of Nova at least as many as the VM flavor created in step 2. 
2. Create VMs on that compute host beyond the limits of what a single NUMA node can support 
3. 

Actual results:
VMs fail to spawn with a memory error.  


Expected results:

VMs should be able to be spawned up to the limits of all NUMA nodes on the compute. 

Additional info:

Joe:
    As I mentioned there are a couple openstack bugs that are problematic for our solution.  The most critical one is:  

https://bugs.launchpad.net/nova/+bug/1594529

Due to this bug we can lose an entire NUMA node of compute capacity.   This problem has been fixed in Newton via the following patches: 

https://review.openstack.org/#/c/292499/
https://review.openstack.org/#/c/324379/
https://review.openstack.org/#/c/292500/

The community won’t backport these to Liberty as it just taking security patches.   The community is also unlikely to backport these as is to Mitaka due to the implementation and the way the objects are versioned.   

There is a simple 1 line fix that can be applied to Nova that would essentially fix this for us,  but that one line change uses a hard coded value that may not be appropriate for any others facing this same issue.  

In summary we will need to work with you on a fix that Redhat is willing to support, even if that fix is Cisco specific until we get to Newton.   I think the best approach here is for you to get an expert from Redhat to evaluate the above info and then we can have a short discussion on possible solutions that Redhat would be willing to support.  Paul from our team has been pretty active with the Nova experts and what they might or might not approve of in newton.

Comment 1 Sahid Ferdjaoui 2016-06-22 10:30:26 UTC
Investigating possibilities of a downstream patch for OSP8 and 9 without breaking upgrades.

Probably dup of bug 1300680

Comment 2 joycej 2016-06-22 14:52:02 UTC
Yes - this is a almost certainly a dup of 1300680,  but from 1300680 it did not appear there was any resolution or patch available for this problem.   Please advise.

Comment 4 Sahid Ferdjaoui 2016-06-24 15:00:25 UTC
One possible solution (the easy one) is to have that option reserved_huge_pages set for all services, so the scheduler will know about the number of pages reserved. But the problem here is that all compute nodes are going to share the same number of pages reserved.

An other solution (probably better) would be to have a fix libvirt driver specific. So that option reserved_huge_pages will be read when the driver is computing available resources, the number of pages available will be stored subtracted by the number of pages reserved.

I'm closing this one as duplicated since we do not want to track 2 BZ for the same problem.

*** This bug has been marked as a duplicate of bug 1300680 ***

Comment 5 joycej 2016-06-24 15:14:15 UTC
I don't have an issue marking this as a duplicate,  but I do have any issue with no activity on the base bug.  There has been no meaningful activity on the other bug in almost 3 months.   Can someone provide on update on when this is targeted to be fixed?


Note You need to log in before you can comment on or make changes to this bug.