Bug 1391446 - set mem overcommit to 1:1
Summary: set mem overcommit to 1:1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Ollie Walsh
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-03 10:50 UTC by Mike Burns
Modified: 2016-12-14 16:29 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.0.0-1.4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1245714
Environment:
Last Closed: 2016-12-14 16:29:09 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 392108 None None None 2016-11-03 15:51:04 UTC
OpenStack gerrit 394488 None None None 2016-11-07 15:37:53 UTC

Description Mike Burns 2016-11-03 10:50:14 UTC
This appears to be not working anymore.  See last comment from Joe T in the description.

+++ This bug was initially created as a clone of Bug #1245714 +++

Description of problem:
[heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep ram_allocation
#ram_allocation_ratio=1.5
[heat-admin@overcloud-compute-0 ~]$ cat /proc/meminfo | grep Swap
SwapCached:            0 kB
SwapTotal:             0 kB
SwapFree:              0 kB


How reproducible:
Install Overcloud

Steps to Reproduce:
Install Overcloud

Actual results:
No Swap

Expected results:
By Default OSP Ships with 1.5:1 Memory overcommit. 

Two options:
Change it to 1:1 so we don't allow overcommit.

or 

Create enough swap to allow overcommit 1.5:1


--- Additional comment from chris alfonso on 2015-07-24 11:30:46 EDT ---

Joe, which of the options that you suggested do you recommend. We need to isolate it to one implementation.

--- Additional comment from Joe Talerico on 2015-07-27 07:40:23 EDT ---

Chris, It would be best to allocate some swap to accommodate the typical 1.5:1 over-commit.

--- Additional comment from marios on 2015-08-04 12:54:17 EDT ---

poking at this today. I am getting something wrong in my syntax to add swap_size to the OS::Nova::Server... (will parameterize later, just testing for now) latest one I tried looks like:

  NovaCompute:
    properties:
      block_device_mapping_v2:
      - swap_size: 1

I took http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server as a guide... but deploy gives me

ERROR: openstack ERROR: Failed to validate: Failed to validate: any() takes exactly one argument (3 given)

I can't see that tuskar is doing sthing strange with this param ... in any case will continue to poke tomorrow

--- Additional comment from marios on 2015-08-07 11:19:57 EDT ---

An update in case anyone is looking here. Working on what should be an easy addition (swap_size) but I am hitting a bug, I believe with heatclient. I updated the relevant template to include the swap space like:

  NovaCompute:
    type: OS::Nova::Server    
    properties:
      image:
        {get_param: Image}
      image_update_policy:
        get_param: ImageUpdatePolicy
      flavor: {get_param: Flavor}
      key_name: {get_param: KeyName}
      networks:
        - network: ctlplane
      block_device_mapping_v2:
        - swap_size: 1

(I also tried including explicit "" for all the things in block_device_mapping_v2 but same result)

I'm pretty sure the syntax is right; as comparison, block_device_mapping_v2 and networks have the same type/syntax according to http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server
 (and networks is/deploys fine).
 
After reloading the roles like https://repos.fedorapeople.org/repos/openstack-m/docs/internal/master/advanced_deployment/managing_plans_and_roles.html#reload-the-deployment-plan-and-all-deployment-roles and deploying like 

openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1

I always get this:
ERROR: openstack ERROR: Failed to validate: Failed to validate: any() takes exactly one argument (3 given). A fuller stack trace is like [1] below.

The reason I think it is an issue with heatclient (or at least it happens there, could be from jsonutils.dumps oslo_serialization), is because when I grab the templates from tuskar like 

tuskar plan-templates -O directory plan_uuid

and then 

heat -v template-validate --template-file plan.yaml -e environment.yaml

they validate fine :/ so seems once they are passed through heatclient something happens. 




[1] > /usr/lib/python2.7/site-packages/heatclient/common/http.py(257)json_request()
-> def json_request(self, method, url, **kwargs):
'{"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "Failed to validate: Failed to validate: any() takes exactly one argument (3 given)", "traceback": "Traceback (most recent call last):
\\n\\n  File \\"/usr/lib/python2.7/site-packages/heat/common/context.py\\", line 300, in wrapped\\n    return func(self, ctx, *args, **kwargs)\\n\\n  
File \\"/usr/lib/python2.7/site-packages/heat/engine/service.py\\", line 669, in create_stack\\n    parent_resource_name)\\n\\n  
File \\"/usr/lib/python2.7/site-packages/heat/engine/service.py\\", line 577, in _parse_template_and_validate_stack\\n    stack.validate()\\n\\n
 File \\"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\\", line 105, in wrapper\\n    return f(*args, **kwargs)\\n\\n  
 File \\"/usr/lib/python2.7/site-packages/heat/engine/stack.py\\", line 598, in validate\\n    raise ex\\n\\nStackValidationFailed: Failed to validate: Failed to validate: any() takes exactly one argument (3 given)\\n", "type": "StackValidationFailed"}, "title": "Bad Request"}'

--- Additional comment from Mike Burns on 2015-08-26 01:38:08 EDT ---



--- Additional comment from marios on 2015-08-27 10:05:41 EDT ---

    tl;dr Since this bug is a blocker, I am going to proceed with plan A below so a docs update to add --swap to the flavor creation and then a small hopefully heat template fixup to enable the swap space.

    The quickest way to achieve this right now is to define the swap space as part of the baremetal flavor creation: 

    openstack flavor create --id auto --ram 4096 --disk 40 --swap 2 --vcpus 1 baremetal
    openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal

    will add 2 GB swap space for all overcloud nodes (deployed with this flavor). However, this doesn't get us all the way there... though the disk partition is created correctly, it isn't used by booted nodes (eg there is no entry in /etc/fstab for it, not clear to me yet if this is a limitation of our overcloud-full.qcow2). In any case the workaround is to just swapon for the given device, for example:

    [heat-admin@overcloud-compute-0 ~]$ swap_device=$(sudo fdisk -l | grep swap | awk '{print $1}')
    [heat-admin@overcloud-compute-0 ~]$ echo $swap_device
    /dev/sda1
    [heat-admin@overcloud-compute-0 ~]$ sudo swapon $swap_device
    [heat-admin@overcloud-compute-0 ~]$ cat /proc/meminfo | grep Swap
    SwapCached:            0 kB
    SwapTotal:          2044 kB
    SwapFree:           2044 kB
    [heat-admin@overcloud-compute-0 ~]$ cat /proc/swaps 
    Filename				Type		Size	Used	Priority
    /dev/sda1                               partition	2044	0	-1

    For now this can happen by delivering a simple script (via the heat templates for example to /etc/rc.d/rc.local on each node to do the above sthing like swap_device=$(sudo fdisk -l | grep swap | awk '{print $1}') ; sudo swapon $swap_device. 

    This last part is something i can work on straight away (will try get a simple poc before eod).

    Moving forward, if we want to make this a configurable parameter, we can do so, but will need more work. First of all, we still/always need to define a flavor with swap defined, but can then override when we nova boot (so via the heat templates). In this case, what is on the flavor becomes the maximum (as well as the default), but you can override (in the nova cli this is like 'nova boot --swap X'), so we would set the swap value in the template for each node type set by a relevant template param (i.e. we define a baremetal flavor with swap 10GB, launch controller --swap 2GB and compute --swap 3GB. The 10GB is the max, but also what you get if you don't specify swap and use that flavor).

    Since this bug is a blocker, I am going to proceed with plan A above so a docs update to add --swap to the flavor creation and then a small hopefully heat template fixup to enable the swap space.

    I can/will work on getting the more general configuration of this if it is useful/necessary later.

    thoughts and comments very much appreciated, 




    (extra info might be useful for anyone poking at this... our current undercloud novaclient (python-novaclient-2.23.0-1.el7ost.noarch) needs patching in order to get nova boot --swap to work... there are many relevant bugs but the fix at https://review.openstack.org/#/c/216401/1/novaclient/v2/servers.py for https://bugs.launchpad.net/python-novaclient/+bug/1442501 worked for me )

--- Additional comment from marios on 2015-08-27 13:10:43 EDT ---

So as promised a workaround that is usable right now. I have a WIP upstream review mainly so I can point at it here, details below. Even if not landed upstream and we continue/want to use this workaround we can just document how to create that file, like we do for general first boot customization example in docs. You should be able to copy/paste the following, in the undercloud, having sourced .stackrc

# First grab the first boot enable swap, from the review at https://review.openstack.org/#/c/217796/
# Need root permissions as we are putting directly into /usr/share/openstack-tripleo-heat-templates for minimum hassle
sudo su
curl "https://review.openstack.org/gitweb?p=openstack/tripleo-heat-templates.git;a=blob_plain;f=firstboot/enable_swap.yaml;h=29e56bf8c662d21ee9a6779646ee65f1c4213f5a;hb=feae87572125388b0922a9ad424b3b9b688053c0" > /usr/share/openstack-tripleo-heat-templates/firstboot/enable_swap.yaml
exit

# Now define a local environment file called "enable_swap.yaml" to call that script at first boot (NodeUserData):
cat << EOF > enable_swap.yaml
resource_registry:
  OS::TripleO::NodeUserData: /usr/share/openstack-tripleo-heat-templates/firstboot/enable_swap.yaml
EOF

# Delete the baremetal flavor 
openstack flavor delete baremetal

# Define a baremetal flavor with the swap space you want, here 2GB:
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 --swap 2 baremetal
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal

# Deploy with the swap_on.yaml:
openstack overcloud deploy --templates --compute-scale 1 --control-scale 1 -e enable_swap.yaml

you should be able to easily verify on your nodes that:

[root@overcloud-compute-0 heat-admin]# cat /proc/swaps 
Filename				Type		Size	Used	Priority
/dev/sda1                               partition	2044	0	-1
[root@overcloud-compute-0 heat-admin]# tail -2 /etc/rc.d/rc.local 
touch /var/lock/subsys/local
swapon /dev/sda1 
[root@overcloud-compute-0 heat-admin]# cat /proc/meminfo | grep Swap
SwapCached:            0 kB
SwapTotal:          2044 kB
SwapFree:           2044 kB

--- Additional comment from marios on 2015-08-28 04:48:22 EDT ---


> # Define a baremetal flavor with the swap space you want, here 2GB:

MB, not GB sorry

--- Additional comment from marios on 2015-08-28 12:06:49 EDT ---

today I spent some time trying to get the swap space creation working via Heat. I hit a nit @ https://review.openstack.org/#/q/I2c538161d88a51022b91b584f16c1439848e7ada,n,z (on our openstack-heat-engine-2015.1.0-5.el7ost.noarch ). Otherwise I couldn't get this to work and have posted an upstream request for info @ openstack-dev http://lists.openstack.org/pipermail/openstack-dev/2015-August/073154.html (especially since block_storage_mappings_v2 is a relatively recent addition, landed 06/2015).

The reason we need that ^^^ is so we can then add (for example) ComputeSwap, ControlSwap parameters and then feed those into the nodes for deploy. Note that this would only override (in theory, if the heat part works) what is on the Baremetal flavor (and if that flavor has no swap space defined then you can't launch an instance with swap at all... we will always need to add a Baremetal flavor with swap already defined, regardless of the fix here).

--- Additional comment from chris alfonso on 2015-08-31 12:19:20 EDT ---

Marios, would you be able to do the work for the second option before Thursday EOD?

Change it to 1:1 so we don't allow overcommit.

--- Additional comment from Emilien Macchi on 2015-08-31 12:36:52 EDT ---

If you decide to change the default nova.conf option, you should change the default parameter like I'm doing here:

https://review.openstack.org/218944

--- Additional comment from marios on 2015-09-02 06:54:40 EDT ---


There are two things going on here: 1. setting the ram_allocation_ratio for nova scheduler in nova.conf, and 2. enabling swap space on overcloud nodes.

1. setting the ram_allocation_ratio for nova scheduler in nova.conf

upstream review https://review.openstack.org/#/c/218944/3


2. enabling swap space on overcloud nodes
You need to define your baremetal flavor as having swap space and then enable it with a heat template change @

upstream @ https://review.openstack.org/#/c/217796/ 



I also filed an upstream bug at https://bugs.launchpad.net/tripleo/+bug/1491335


--- Additional comment from Mike Burns on 2015-09-02 20:33:48 EDT ---

(In reply to Mike Burns from comment #15)
> There are two things going on here: 1. setting the ram_allocation_ratio for
> nova scheduler in nova.conf, and 2. enabling swap space on overcloud nodes.
> 
> 1. setting the ram_allocation_ratio for nova scheduler in nova.conf
> 
> https://review.openstack.org/#/c/218944/3
> 
> 
> 2. enabling swap space on overcloud nodes
> You need to define your baremetal flavor as having swap space and then
> enable it with a heat template change @
> 
> https://review.openstack.org/#/c/217796/ 
> 
> 
> I also filed an upstream bug at
> https://bugs.launchpad.net/tripleo/+bug/1491335

So we need to split this bug, which I will do shortly.  This bug will be for setting the ram_allocation_ratio only.  The enabling of swap and dynamic configuration of ram allocation will be part of the new bug.

--- Additional comment from Alexander Chuzhoy on 2015-09-21 16:24:18 EDT ---

FailedQA

Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

So based on comment #17, this bug will be for setting the ram_allocation_ratio only.

[heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep ram_allocation
#ram_allocation_ratio=1.5

--- Additional comment from marios on 2015-09-22 03:15:01 EDT ---

(In reply to Alexander Chuzhoy from comment #20)
> FailedQA
> 
> Environment:
> openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch
> 
> So based on comment #17, this bug will be for setting the
> ram_allocation_ratio only.
> 
> [heat-admin@overcloud-compute-0 ~]$ sudo cat /etc/nova/nova.conf  | grep
> ram_allocation
> #ram_allocation_ratio=1.5

Hi Alexander, can you check the controller instead (I had the same thought actually, see comments @ [1] ). We have nova-compute on the computes but nova-scheduler is actually on the controllers (so we set this config value there).

Another thing to confirm is that the heat templates from which you are deploying include this (my latest poodle deploy definitely does), like [2] from my undercloud vm.




[1] https://review.openstack.org/#/c/218944/2/puppet/hieradata/controller.yaml
[2] 
[root@instack ~]# grep -rni 'ram_allocation_ratio' /usr/share/openstack-tripleo-heat-templates/puppet/*
/usr/share/openstack-tripleo-heat-templates/puppet/hieradata/controller.yaml:76:nova::scheduler::filter::ram_allocation_ratio: '1.0'

--- Additional comment from Alexander Chuzhoy on 2015-09-22 09:23:11 EDT ---

On controllers:
sudo cat /etc/nova/nova.conf  | grep ram_alloc
#ram_allocation_ratio=1.5
ram_allocation_ratio=1.0

Template:
grep -rni 'ram_allocation_ratio' /usr/share/openstack-tripleo-heat-templates/puppet/*
/usr/share/openstack-tripleo-heat-templates/puppet/hieradata/controller.yaml:76:nova::scheduler::filter::ram_allocation_ratio: '1.0'

--- Additional comment from Joe Talerico on 2015-09-22 10:34:25 EDT ---

This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.

--- Additional comment from Joe Talerico on 2015-09-22 10:34:31 EDT ---

This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.

--- Additional comment from Alexander Chuzhoy on 2015-09-22 10:37:25 EDT ---

Verified:


Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

Verifying based on comments #21-24
Getting the correct value on the controllers.

--- Additional comment from Alexander Chuzhoy on 2015-09-22 10:43:35 EDT ---

correction:
nova.conf should be identical for all the nodes (controllers+computes)
moving back to assigned.

--- Additional comment from marios on 2015-09-23 05:55:38 EDT ---

(In reply to Alexander Chuzhoy from comment #26)
> correction:
> nova.conf should be identical for all the nodes (controllers+computes)
> moving back to assigned.

is this a hard requirement? To include this in the nova.conf of the compute nodes, we need to also include the nova::scheduler::filter class, as we do for controllers like [1] and discussed somewhat in [2] (that bug isn't relevant, just the comments, the use of ram_allocation_ratio in the example there is a happy coincidence :) ).

AFAIK we have not enforced the requirement for conf files to be homogeneous cross different node types. As just one example, in [3] we are removing a nova conf from the hieradata since it isn't being used/consumed (actually because the class isn't included, like here).

Since nova-scheduler isn't running on the compute nodes I don't see why this needs to be set there but please correct me if I am wrong. Further if there is a requirement for all nova.conf to be the same, then we should file a BZ to track that because I don't think this is the only place we will have divergence.


[1] https://review.openstack.org/#/c/218944/3/puppet/manifests/overcloud_controller_pacemaker.pp
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1243971#c3
[3] https://review.openstack.org/#/c/226264/

--- Additional comment from marios on 2015-09-23 05:57:04 EDT ---

(In reply to marios from comment #27)
> (In reply to Alexander Chuzhoy from comment #26)
> > correction:
> > nova.conf should be identical for all the nodes (controllers+computes)
> > moving back to assigned.
> 
> is this a hard requirement? To include this in the nova.conf of the compute
> nodes, we need to also include the nova::scheduler::filter class, as we do
> for controllers like [1] and discussed somewhat in [2] (that bug isn't

^^^ sorry, should have stressed - which means running the nova-scheduler on computes too (unless we declare the class but then make sure the service doesn't start here, but again, do we really need to do that?)

--- Additional comment from chris alfonso on 2015-09-23 12:07:42 EDT ---

Joe, Any response? We are tracking this as a blocker for Y1 and need to resolve it.

--- Additional comment from James Slagle on 2015-09-23 13:01:07 EDT ---

(In reply to chris alfonso from comment #29)
> Joe, Any response? We are tracking this as a blocker for Y1 and need to
> resolve it.

joe has already provided the info in comment 23 and comment 24:

This is correct. I thought we migrated to have this setting on the computes, but it is on the controllers.


ram_allocation_ratio is a nova scheduler configruation which only runs on the controllers.

The action on the bug is now on sasha to clarify this comment 26. I don't think that is accurate. There is not a need I'm aware of to have nova.conf identical on both controllers and computes.

--- Additional comment from Alexander Chuzhoy on 2015-09-23 13:10:28 EDT ---

Joe, could you please clarify comment #26 for James.

--- Additional comment from Joe Talerico on 2015-09-23 13:18:18 EDT ---

Chatting with nova developer, Sylvain Bauza it was recommended to have the nova.conf be identical across nodes. That is why I recommended it to Alexander to have the nova.conf be identical across nodes.

--- Additional comment from Alexander Chuzhoy on 2015-09-23 13:29:28 EDT ---

Verified:


Environment:
openstack-tripleo-heat-templates-0.8.6-64.el7ost.noarch

Verifying based on comments #21-32
Getting the corrected ratio on controllers is sufficient.


--- Additional comment from errata-xmlrpc on 2015-10-08 08:15:36 EDT ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862

--- Additional comment from Joe Talerico on 2016-11-02 19:57:30 EDT ---

Reopening this bug:

Our Undercloud is set properly :
[stack@undercloud ~]$ sudo grep ram_allo /etc/nova/nova.conf 
ram_allocation_ratio=1.0
[stack@undercloud ~]$ 

However the Overcloud is using the default(1.5:1):
[heat-admin@overcloud-controller-0 ~]$ sudo grep ram_allo /etc/nova/nova.conf 
[heat-admin@overcloud-controller-0 ~]$

This is using OSP10:
openstack-tripleo-heat-templates-5.0.0-1.1.el7ost.noarch

--- Additional comment from Mike Burns on 2016-11-03 06:45:04 EDT ---

We can't re-open bugs that were closed Errata.  We need to clone it to a new bug.

Comment 2 Prasanth Anbalagan 2016-11-15 22:58:11 UTC
Verified as follows - ram_allocation ratio is set to 1.0.

*************
VERSION
*************

[stack@undercloud-0 ~]$ rpm -qi openstack-tripleo-heat-templates.noarch
Name        : openstack-tripleo-heat-templates
Version     : 5.0.0
Release     : 1.7.el7ost
Architecture: noarch
Install Date: Tue 15 Nov 2016 01:07:31 PM EST
Group       : System Environment/Base
Size        : 1368646
License     : ASL 2.0
Signature   : (none)
Source RPM  : openstack-tripleo-heat-templates-5.0.0-1.7.el7ost.src.rpm
Build Date  : Mon 14 Nov 2016 10:57:27 AM EST
Build Host  : x86-019.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://wiki.openstack.org/wiki/TripleO
Summary     : Heat templates for TripleO
Description :
OpenStack TripleO Heat Templates is a collection of templates and tools for
building Heat Templates to do deployments of OpenStack.

***********
LOGS
***********

[stack@undercloud-0 ~]$ ssh heat-admin@192.0.2.14
The authenticity of host '192.0.2.14 (192.0.2.14)' can't be established.
ECDSA key fingerprint is 61:b7:f9:c2:a8:33:a5:71:9a:ef:a7:34:3f:1f:98:99.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.0.2.14' (ECDSA) to the list of known hosts.
Last login: Tue Nov 15 19:02:48 2016 from 192.0.2.254

[heat-admin@controller-0 ~]$ sudo grep "ram_allocation" /etc/nova/nova.conf 
ram_allocation_ratio=1.0

Comment 5 errata-xmlrpc 2016-12-14 16:29:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.