Red Hat Bugzilla – Bug 1288153
[Docs] [Director] Add Tuning Instructions in Troubleshooting Section
Last modified: 2018-06-12 03:24:03 EDT
Description of problem:
The Director node spec'd in the guide is much too small. Even with 2vCPU and 12GB of RAM, I am unable to successfully deploy a 20 node overcloud
Version-Release number of selected component (if applicable):
slagle suggests a minimum of 8 cores and 16GB of RAM on a dedicated baremetal system for a 65 node overcloud.
For a production, >100 node overcloud, 16 cores and 32GB of RAM is recommended.
Assigning to Dan for review.
I have a feeling that the undercloud can be smaller *if* the following BZs are addressed before y2:
Slightly OT, but still document related: I think there should be a tuning section added to the Troubleshooting portion of the guide. Off the top of my head, these things should be included:
1) A crontab to purge the keystone.token table so that it doesn't grow without bound. This may need to be done more than once per day - possibly every hour:
3 1 * * * /bin/keystone-manage token_flush
2) A crontab to purge the heat.raw_templates table so that it doesn't grow without bound. This may need to be done more than once per week - possibly every day - and may need to cleared more than every 30 days:
3 2 * * 6 /bin/heat-manage purge_deleted -g days 30
3) If heat-engine and heat-api consume too many resources (i.e., they peg the CPU repeatedly and for long periods of time) set max_resources_per_stack=-1 in /etc/heat/heat.conf
OT, yes, but I don't think another BZ is necessary for the moment.
I think it's a great idea so I'll add these things in another section in the guide tomorrow.
4) If deployment fails and the system load is very high, reduce the number of concurrent instances builds to something less than the default of 10. Edit /etc/nova/nova.conf and set:
And restart nova-api and nova-scheduler services.
5) Tune the mariadb to increase max_connections (if not already 4096) and several innodb parameters. Edit /etc/my.cnf.d/server.cnf:
innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size = 1000M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_max_purge_lag = 10000
innodb_thread_concurrency = <2*(NumCPUs+NumDisks)>
NB: Ensure that the director has enough RAM, typically 512MB to 1GB more than the size of the innodb_buffer_pool_size.
From this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1290949, comment 15, do this:
1. edit the file /etc/heat/heat.conf on the undercloud and uncomment the line:
#num_engine_workers = 4
2. restart openstack-heat-engine
Also, by far the cronjob to clean out the keystone.token table has had the biggest positive influence on tuning the undercloud.
Resetting to ASSIGNED due to other docs changes.
Have implemented a commit with your suggested changes:
Great, moving to VERIFIED.
This content is now live on the Customer Portal.