Bug 1288153 - [Docs] [Director] Add Tuning Instructions in Troubleshooting Section
[Docs] [Director] Add Tuning Instructions in Troubleshooting Section
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity medium
: ga
: 8.0 (Liberty)
Assigned To: Dan Macpherson
Radek Bíba
: Documentation
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-03 12:04 EST by Dan Yocum
Modified: 2016-04-13 00:42 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-13 00:42:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dan Yocum 2015-12-03 12:04:39 EST
Description of problem:

The Director node spec'd in the guide is much too small.  Even with 2vCPU and 12GB of RAM, I am unable to successfully deploy a 20 node overcloud

Version-Release number of selected component (if applicable):

7

Solution:

slagle suggests a minimum of 8 cores and 16GB of RAM on a dedicated baremetal system for a 65 node overcloud.

For a production, >100 node overcloud, 16 cores and 32GB of RAM is recommended.
Comment 2 Andrew Dahms 2015-12-07 17:57:33 EST
Assigning to Dan for review.
Comment 3 Dan Yocum 2015-12-07 21:20:08 EST
I have a feeling that the undercloud can be smaller *if* the following BZs are addressed before y2:

https://bugzilla.redhat.com/show_bug.cgi?id=1289287
https://bugzilla.redhat.com/show_bug.cgi?id=1212126
Comment 5 Dan Yocum 2015-12-08 09:57:34 EST
Slightly OT, but still document related: I think there should be a tuning section added to the Troubleshooting portion of the guide.  Off the top of my head, these things should be included:

1) A crontab to purge the keystone.token table so that it doesn't grow without bound.  This may need to be done more than once per day - possibly every hour:

3 1 * * * /bin/keystone-manage token_flush

2) A crontab to purge the heat.raw_templates table so that it doesn't grow without bound.  This may need to be done more than once per week - possibly every day - and may need to cleared more than every 30 days:

3 2 * * 6 /bin/heat-manage purge_deleted -g days 30

3) If heat-engine and heat-api consume too many resources (i.e., they peg the CPU repeatedly and for long periods of time) set max_resources_per_stack=-1 in /etc/heat/heat.conf
Comment 6 Dan Macpherson 2015-12-08 11:32:57 EST
OT, yes, but I don't think another BZ is necessary for the moment.

I think it's a great idea so I'll add these things in another section in the guide tomorrow.
Comment 7 Dan Yocum 2015-12-08 14:26:17 EST
4) If deployment fails and the system load is very high, reduce the number of concurrent instances builds to something less than the default of 10.  Edit /etc/nova/nova.conf and set:

max_concurrent_builds=3

And restart nova-api and nova-scheduler services.

5) Tune the mariadb to increase max_connections (if not already 4096) and several innodb parameters.  Edit /etc/my.cnf.d/server.cnf:


innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size = 1000M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_max_purge_lag = 10000
innodb_thread_concurrency = <2*(NumCPUs+NumDisks)>

NB: Ensure that the director has enough RAM, typically 512MB to 1GB more than the size of the innodb_buffer_pool_size.
Comment 8 Dan Yocum 2015-12-28 11:13:11 EST
From this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1290949, comment 15, do this:

1. edit the file /etc/heat/heat.conf on the undercloud and uncomment the line:
#num_engine_workers = 4

2. restart openstack-heat-engine
Comment 9 Dan Yocum 2015-12-28 11:14:17 EST
Also, by far the cronjob to clean out the keystone.token table has had the biggest positive influence on tuning the undercloud.
Comment 10 Dan Macpherson 2016-01-21 22:46:53 EST
Resetting to ASSIGNED due to other docs changes.
Comment 18 Dan Macpherson 2016-03-16 22:51:26 EDT
Thanks, Radek.

Have implemented a commit with your suggested changes:

https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Linux_OpenStack_Platform/commit/8525a8c46347cf773f6f29992a2052bb88967b9a
Comment 19 Radek Bíba 2016-03-17 02:38:04 EDT
Great, moving to VERIFIED.
Comment 20 Andrew Dahms 2016-04-13 00:42:31 EDT
This content is now live on the Customer Portal.

Closing.

Note You need to log in before you can comment on or make changes to this bug.