Bug 1245714 - set mem overcommit to 1:1
Summary: set mem overcommit to 1:1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: All
OS: Linux
high
high
Target Milestone: y1
: 7.0 (Kilo)
Assignee: Marios Andreou
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
: 1257001 (view as bug list)
Depends On:
Blocks: 1259539 1425164
TreeView+ depends on / blocked
 
Reported: 2015-07-22 14:54 UTC by Joe Talerico
Modified: 2023-02-22 23:02 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-50.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1259539 1391446 (view as bug list)
Environment:
Last Closed: 2016-11-03 10:45:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 218944 0 None MERGED Set the nova scheduler ram_allocation_ration to 1.0 2021-01-18 06:23:41 UTC
Red Hat Product Errata RHSA-2015:1862 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update 2015-10-08 16:05:50 UTC

Description Joe Talerico 2015-07-22 14:54:07 UTC
Description of problem:
[heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep ram_allocation
#ram_allocation_ratio=1.5
[heat-admin@overcloud-compute-0 ~]$ cat /proc/meminfo | grep Swap
SwapCached:            0 kB
SwapTotal:             0 kB
SwapFree:              0 kB


How reproducible:
Install Overcloud

Steps to Reproduce:
Install Overcloud

Actual results:
No Swap

Expected results:
By Default OSP Ships with 1.5:1 Memory overcommit. 

Two options:
Change it to 1:1 so we don't allow overcommit.

or 

Create enough swap to allow overcommit 1.5:1

Comment 3 chris alfonso 2015-07-24 15:30:46 UTC
Joe, which of the options that you suggested do you recommend. We need to isolate it to one implementation.

Comment 4 Joe Talerico 2015-07-27 11:40:23 UTC
Chris, It would be best to allocate some swap to accommodate the typical 1.5:1 over-commit.

Comment 5 Marios Andreou 2015-08-04 16:54:17 UTC
poking at this today. I am getting something wrong in my syntax to add swap_size to the OS::Nova::Server... (will parameterize later, just testing for now) latest one I tried looks like:

  NovaCompute:
    properties:
      block_device_mapping_v2:
      - swap_size: 1

I took http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server as a guide... but deploy gives me

ERROR: openstack ERROR: Failed to validate: Failed to validate: any() takes exactly one argument (3 given)

I can't see that tuskar is doing sthing strange with this param ... in any case will continue to poke tomorrow

Comment 6 Marios Andreou 2015-08-07 15:19:57 UTC
An update in case anyone is looking here. Working on what should be an easy addition (swap_size) but I am hitting a bug, I believe with heatclient. I updated the relevant template to include the swap space like:

  NovaCompute:
    type: OS::Nova::Server    
    properties:
      image:
        {get_param: Image}
      image_update_policy:
        get_param: ImageUpdatePolicy
      flavor: {get_param: Flavor}
      key_name: {get_param: KeyName}
      networks:
        - network: ctlplane
      block_device_mapping_v2:
        - swap_size: 1

(I also tried including explicit "" for all the things in block_device_mapping_v2 but same result)

I'm pretty sure the syntax is right; as comparison, block_device_mapping_v2 and networks have the same type/syntax according to http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server
 (and networks is/deploys fine).
 
After reloading the roles like https://repos.fedorapeople.org/repos/openstack-m/docs/internal/master/advanced_deployment/managing_plans_and_roles.html#reload-the-deployment-plan-and-all-deployment-roles and deploying like 

openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1

I always get this:
ERROR: openstack ERROR: Failed to validate: Failed to validate: any() takes exactly one argument (3 given). A fuller stack trace is like [1] below.

The reason I think it is an issue with heatclient (or at least it happens there, could be from jsonutils.dumps oslo_serialization), is because when I grab the templates from tuskar like 

tuskar plan-templates -O directory plan_uuid

and then 

heat -v template-validate --template-file plan.yaml -e environment.yaml

they validate fine :/ so seems once they are passed through heatclient something happens. 




[1] > /usr/lib/python2.7/site-packages/heatclient/common/http.py(257)json_request()
-> def json_request(self, method, url, **kwargs):
'{"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "Failed to validate: Failed to validate: any() takes exactly one argument (3 given)", "traceback": "Traceback (most recent call last):
\\n\\n  File \\"/usr/lib/python2.7/site-packages/heat/common/context.py\\", line 300, in wrapped\\n    return func(self, ctx, *args, **kwargs)\\n\\n  
File \\"/usr/lib/python2.7/site-packages/heat/engine/service.py\\", line 669, in create_stack\\n    parent_resource_name)\\n\\n  
File \\"/usr/lib/python2.7/site-packages/heat/engine/service.py\\", line 577, in _parse_template_and_validate_stack\\n    stack.validate()\\n\\n
 File \\"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\\", line 105, in wrapper\\n    return f(*args, **kwargs)\\n\\n  
 File \\"/usr/lib/python2.7/site-packages/heat/engine/stack.py\\", line 598, in validate\\n    raise ex\\n\\nStackValidationFailed: Failed to validate: Failed to validate: any() takes exactly one argument (3 given)\\n", "type": "StackValidationFailed"}, "title": "Bad Request"}'

Comment 7 Mike Burns 2015-08-26 05:38:08 UTC
*** Bug 1257001 has been marked as a duplicate of this bug. ***

Comment 8 Marios Andreou 2015-08-27 14:05:41 UTC
    tl;dr Since this bug is a blocker, I am going to proceed with plan A below so a docs update to add --swap to the flavor creation and then a small hopefully heat template fixup to enable the swap space.

    The quickest way to achieve this right now is to define the swap space as part of the baremetal flavor creation: 

    openstack flavor create --id auto --ram 4096 --disk 40 --swap 2 --vcpus 1 baremetal
    openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal

    will add 2 GB swap space for all overcloud nodes (deployed with this flavor). However, this doesn't get us all the way there... though the disk partition is created correctly, it isn't used by booted nodes (eg there is no entry in /etc/fstab for it, not clear to me yet if this is a limitation of our overcloud-full.qcow2). In any case the workaround is to just swapon for the given device, for example:

    [heat-admin@overcloud-compute-0 ~]$ swap_device=$(sudo fdisk -l | grep swap | awk '{print $1}')
    [heat-admin@overcloud-compute-0 ~]$ echo $swap_device
    /dev/sda1
    [heat-admin@overcloud-compute-0 ~]$ sudo swapon $swap_device
    [heat-admin@overcloud-compute-0 ~]$ cat /proc/meminfo | grep Swap
    SwapCached:            0 kB
    SwapTotal:          2044 kB
    SwapFree:           2044 kB
    [heat-admin@overcloud-compute-0 ~]$ cat /proc/swaps 
    Filename				Type		Size	Used	Priority
    /dev/sda1                               partition	2044	0	-1

    For now this can happen by delivering a simple script (via the heat templates for example to /etc/rc.d/rc.local on each node to do the above sthing like swap_device=$(sudo fdisk -l | grep swap | awk '{print $1}') ; sudo swapon $swap_device. 

    This last part is something i can work on straight away (will try get a simple poc before eod).

    Moving forward, if we want to make this a configurable parameter, we can do so, but will need more work. First of all, we still/always need to define a flavor with swap defined, but can then override when we nova boot (so via the heat templates). In this case, what is on the flavor becomes the maximum (as well as the default), but you can override (in the nova cli this is like 'nova boot --swap X'), so we would set the swap value in the template for each node type set by a relevant template param (i.e. we define a baremetal flavor with swap 10GB, launch controller --swap 2GB and compute --swap 3GB. The 10GB is the max, but also what you get if you don't specify swap and use that flavor).

    Since this bug is a blocker, I am going to proceed with plan A above so a docs update to add --swap to the flavor creation and then a small hopefully heat template fixup to enable the swap space.

    I can/will work on getting the more general configuration of this if it is useful/necessary later.

    thoughts and comments very much appreciated, 




    (extra info might be useful for anyone poking at this... our current undercloud novaclient (python-novaclient-2.23.0-1.el7ost.noarch) needs patching in order to get nova boot --swap to work... there are many relevant bugs but the fix at https://review.openstack.org/#/c/216401/1/novaclient/v2/servers.py for https://bugs.launchpad.net/python-novaclient/+bug/1442501 worked for me )

Comment 9 Marios Andreou 2015-08-27 17:10:43 UTC
So as promised a workaround that is usable right now. I have a WIP upstream review mainly so I can point at it here, details below. Even if not landed upstream and we continue/want to use this workaround we can just document how to create that file, like we do for general first boot customization example in docs. You should be able to copy/paste the following, in the undercloud, having sourced .stackrc

# First grab the first boot enable swap, from the review at https://review.openstack.org/#/c/217796/
# Need root permissions as we are putting directly into /usr/share/openstack-tripleo-heat-templates for minimum hassle
sudo su
curl "https://review.openstack.org/gitweb?p=openstack/tripleo-heat-templates.git;a=blob_plain;f=firstboot/enable_swap.yaml;h=29e56bf8c662d21ee9a6779646ee65f1c4213f5a;hb=feae87572125388b0922a9ad424b3b9b688053c0" > /usr/share/openstack-tripleo-heat-templates/firstboot/enable_swap.yaml
exit

# Now define a local environment file called "enable_swap.yaml" to call that script at first boot (NodeUserData):
cat << EOF > enable_swap.yaml
resource_registry:
  OS::TripleO::NodeUserData: /usr/share/openstack-tripleo-heat-templates/firstboot/enable_swap.yaml
EOF

# Delete the baremetal flavor 
openstack flavor delete baremetal

# Define a baremetal flavor with the swap space you want, here 2GB:
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 --swap 2 baremetal
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal

# Deploy with the swap_on.yaml:
openstack overcloud deploy --templates --compute-scale 1 --control-scale 1 -e enable_swap.yaml

you should be able to easily verify on your nodes that:

[root@overcloud-compute-0 heat-admin]# cat /proc/swaps 
Filename				Type		Size	Used	Priority
/dev/sda1                               partition	2044	0	-1
[root@overcloud-compute-0 heat-admin]# tail -2 /etc/rc.d/rc.local 
touch /var/lock/subsys/local
swapon /dev/sda1 
[root@overcloud-compute-0 heat-admin]# cat /proc/meminfo | grep Swap
SwapCached:            0 kB
SwapTotal:          2044 kB
SwapFree:           2044 kB

Comment 10 Marios Andreou 2015-08-28 08:48:22 UTC
> # Define a baremetal flavor with the swap space you want, here 2GB:

MB, not GB sorry

Comment 11 Marios Andreou 2015-08-28 16:06:49 UTC
today I spent some time trying to get the swap space creation working via Heat. I hit a nit @ https://review.openstack.org/#/q/I2c538161d88a51022b91b584f16c1439848e7ada,n,z (on our openstack-heat-engine-2015.1.0-5.el7ost.noarch ). Otherwise I couldn't get this to work and have posted an upstream request for info @ openstack-dev http://lists.openstack.org/pipermail/openstack-dev/2015-August/073154.html (especially since block_storage_mappings_v2 is a relatively recent addition, landed 06/2015).

The reason we need that ^^^ is so we can then add (for example) ComputeSwap, ControlSwap parameters and then feed those into the nodes for deploy. Note that this would only override (in theory, if the heat part works) what is on the Baremetal flavor (and if that flavor has no swap space defined then you can't launch an instance with swap at all... we will always need to add a Baremetal flavor with swap already defined, regardless of the fix here).

Comment 12 chris alfonso 2015-08-31 16:19:20 UTC
Marios, would you be able to do the work for the second option before Thursday EOD?

Change it to 1:1 so we don't allow overcommit.

Comment 13 Emilien Macchi 2015-08-31 16:36:52 UTC
If you decide to change the default nova.conf option, you should change the default parameter like I'm doing here:

https://review.openstack.org/218944

Comment 15 Mike Burns 2015-09-03 00:32:24 UTC
There are two things going on here: 1. setting the ram_allocation_ratio for nova scheduler in nova.conf, and 2. enabling swap space on overcloud nodes.

1. setting the ram_allocation_ratio for nova scheduler in nova.conf

https://review.openstack.org/#/c/218944/3


2. enabling swap space on overcloud nodes
You need to define your baremetal flavor as having swap space and then enable it with a heat template change @

https://review.openstack.org/#/c/217796/ 


I also filed an upstream bug at https://bugs.launchpad.net/tripleo/+bug/1491335

Comment 17 Mike Burns 2015-09-03 00:33:48 UTC
(In reply to Mike Burns from comment #15)
> There are two things going on here: 1. setting the ram_allocation_ratio for
> nova scheduler in nova.conf, and 2. enabling swap space on overcloud nodes.
> 
> 1. setting the ram_allocation_ratio for nova scheduler in nova.conf
> 
> https://review.openstack.org/#/c/218944/3
> 
> 
> 2. enabling swap space on overcloud nodes
> You need to define your baremetal flavor as having swap space and then
> enable it with a heat template change @
> 
> https://review.openstack.org/#/c/217796/ 
> 
> 
> I also filed an upstream bug at
> https://bugs.launchpad.net/tripleo/+bug/1491335

So we need to split this bug, which I will do shortly.  This bug will be for setting the ram_allocation_ratio only.  The enabling of swap and dynamic configuration of ram allocation will be part of the new bug.

Comment 20 Alexander Chuzhoy 2015-09-21 20:24:18 UTC
FailedQA

Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

So based on comment #17, this bug will be for setting the ram_allocation_ratio only.

[heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep ram_allocation
#ram_allocation_ratio=1.5

Comment 21 Marios Andreou 2015-09-22 07:15:01 UTC
(In reply to Alexander Chuzhoy from comment #20)
> FailedQA
> 
> Environment:
> openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch
> 
> So based on comment #17, this bug will be for setting the
> ram_allocation_ratio only.
> 
> [heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep
> ram_allocation
> #ram_allocation_ratio=1.5

Hi Alexander, can you check the controller instead (I had the same thought actually, see comments @ [1] ). We have nova-compute on the computes but nova-scheduler is actually on the controllers (so we set this config value there).

Another thing to confirm is that the heat templates from which you are deploying include this (my latest poodle deploy definitely does), like [2] from my undercloud vm.




[1] https://review.openstack.org/#/c/218944/2/puppet/hieradata/controller.yaml
[2] 
[root@instack ~]# grep -rni 'ram_allocation_ratio' /usr/share/openstack-tripleo-heat-templates/puppet/*
/usr/share/openstack-tripleo-heat-templates/puppet/hieradata/controller.yaml:76:nova::scheduler::filter::ram_allocation_ratio: '1.0'

Comment 22 Alexander Chuzhoy 2015-09-22 13:23:11 UTC
On controllers:
sudo cat /etc/nova/nova.conf  | grep ram_alloc
#ram_allocation_ratio=1.5
ram_allocation_ratio=1.0

Template:
grep -rni 'ram_allocation_ratio' /usr/share/openstack-tripleo-heat-templates/puppet/*
/usr/share/openstack-tripleo-heat-templates/puppet/hieradata/controller.yaml:76:nova::scheduler::filter::ram_allocation_ratio: '1.0'

Comment 23 Joe Talerico 2015-09-22 14:34:25 UTC
This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.

Comment 24 Joe Talerico 2015-09-22 14:34:31 UTC
This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.

Comment 25 Alexander Chuzhoy 2015-09-22 14:37:25 UTC
Verified:


Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

Verifying based on comments #21-24
Getting the correct value on the controllers.

Comment 26 Alexander Chuzhoy 2015-09-22 14:43:35 UTC
correction:
nova.conf should be identical for all the nodes (controllers+computes)
moving back to assigned.

Comment 27 Marios Andreou 2015-09-23 09:55:38 UTC
(In reply to Alexander Chuzhoy from comment #26)
> correction:
> nova.conf should be identical for all the nodes (controllers+computes)
> moving back to assigned.

is this a hard requirement? To include this in the nova.conf of the compute nodes, we need to also include the nova::scheduler::filter class, as we do for controllers like [1] and discussed somewhat in [2] (that bug isn't relevant, just the comments, the use of ram_allocation_ratio in the example there is a happy coincidence :) ).

AFAIK we have not enforced the requirement for conf files to be homogeneous cross different node types. As just one example, in [3] we are removing a nova conf from the hieradata since it isn't being used/consumed (actually because the class isn't included, like here).

Since nova-scheduler isn't running on the compute nodes I don't see why this needs to be set there but please correct me if I am wrong. Further if there is a requirement for all nova.conf to be the same, then we should file a BZ to track that because I don't think this is the only place we will have divergence.


[1] https://review.openstack.org/#/c/218944/3/puppet/manifests/overcloud_controller_pacemaker.pp
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1243971#c3
[3] https://review.openstack.org/#/c/226264/

Comment 28 Marios Andreou 2015-09-23 09:57:04 UTC
(In reply to marios from comment #27)
> (In reply to Alexander Chuzhoy from comment #26)
> > correction:
> > nova.conf should be identical for all the nodes (controllers+computes)
> > moving back to assigned.
> 
> is this a hard requirement? To include this in the nova.conf of the compute
> nodes, we need to also include the nova::scheduler::filter class, as we do
> for controllers like [1] and discussed somewhat in [2] (that bug isn't

^^^ sorry, should have stressed - which means running the nova-scheduler on computes too (unless we declare the class but then make sure the service doesn't start here, but again, do we really need to do that?)

Comment 29 chris alfonso 2015-09-23 16:07:42 UTC
Joe, Any response? We are tracking this as a blocker for Y1 and need to resolve it.

Comment 30 James Slagle 2015-09-23 17:01:07 UTC
(In reply to chris alfonso from comment #29)
> Joe, Any response? We are tracking this as a blocker for Y1 and need to
> resolve it.

joe has already provided the info in comment 23 and comment 24:

This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.


ram_allocation_ratio is a nova scheduler configruation which only runs on the controllers.

The action on the bug is now on sasha to clarify this comment 26. I don't think that is accurate. There is not a need I'm aware of to have nova.conf identical on both controllers and computes.

Comment 31 Alexander Chuzhoy 2015-09-23 17:10:28 UTC
Joe, could you please clarify comment #26 for James.

Comment 32 Joe Talerico 2015-09-23 17:18:18 UTC
Chatting with nova developer, Sylvain Bauza it was recommended to have the nova.conf be identical across nodes. That is why I recommended it to Alexander to have the nova.conf be identical across nodes.

Comment 33 Alexander Chuzhoy 2015-09-23 17:29:28 UTC
Verified:


Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

Verifying based on comments #21-32
Getting the corrected ratio on controllers is sufficient.

Comment 35 errata-xmlrpc 2015-10-08 12:15:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862

Comment 36 Joe Talerico 2016-11-02 23:57:30 UTC
Reopening this bug:

Our Undercloud is set properly :
[stack@undercloud ~]$ sudo grep ram_allo /etc/nova/nova.conf 
ram_allocation_ratio=1.0
[stack@undercloud ~]$ 

However the Overcloud is using the default(1.5:1):
[heat-admin@overcloud-controller-0 ~]$ sudo grep ram_allo /etc/nova/nova.conf 
[heat-admin@overcloud-controller-0 ~]$

This is using OSP10:
openstack-tripleo-heat-templates-5.0.0-1.1.el7ost.noarch

Comment 37 Mike Burns 2016-11-03 10:45:04 UTC
We can't re-open bugs that were closed Errata.  We need to clone it to a new bug.

Comment 38 Mike Burns 2016-11-03 10:51:28 UTC
Cloned to bug 1391446


Note You need to log in before you can comment on or make changes to this bug.