1288153 – [Docs] [Director] Add Tuning Instructions in Troubleshooting Section

Bug 1288153 - [Docs] [Director] Add Tuning Instructions in Troubleshooting Section

Summary: [Docs] [Director] Add Tuning Instructions in Troubleshooting Section

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	documentation
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	ga
Target Release:	8.0 (Liberty)
Assignee:	Dan Macpherson
QA Contact:	Radek Bíba
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-03 17:04 UTC by Dan Yocum
Modified:	2022-08-09 14:28 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-04-13 04:42:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-8477	0	None	None	None	2022-08-09 14:28:37 UTC

Description Dan Yocum 2015-12-03 17:04:39 UTC

Description of problem:

The Director node spec'd in the guide is much too small.  Even with 2vCPU and 12GB of RAM, I am unable to successfully deploy a 20 node overcloud

Version-Release number of selected component (if applicable):

7

Solution:

slagle suggests a minimum of 8 cores and 16GB of RAM on a dedicated baremetal system for a 65 node overcloud.

For a production, >100 node overcloud, 16 cores and 32GB of RAM is recommended.

Comment 2 Andrew Dahms 2015-12-07 22:57:33 UTC

Assigning to Dan for review.

Comment 3 Dan Yocum 2015-12-08 02:20:08 UTC

I have a feeling that the undercloud can be smaller *if* the following BZs are addressed before y2:

https://bugzilla.redhat.com/show_bug.cgi?id=1289287
https://bugzilla.redhat.com/show_bug.cgi?id=1212126

Comment 5 Dan Yocum 2015-12-08 14:57:34 UTC

Slightly OT, but still document related: I think there should be a tuning section added to the Troubleshooting portion of the guide.  Off the top of my head, these things should be included:

1) A crontab to purge the keystone.token table so that it doesn't grow without bound.  This may need to be done more than once per day - possibly every hour:

3 1 * * * /bin/keystone-manage token_flush

2) A crontab to purge the heat.raw_templates table so that it doesn't grow without bound.  This may need to be done more than once per week - possibly every day - and may need to cleared more than every 30 days:

3 2 * * 6 /bin/heat-manage purge_deleted -g days 30

3) If heat-engine and heat-api consume too many resources (i.e., they peg the CPU repeatedly and for long periods of time) set max_resources_per_stack=-1 in /etc/heat/heat.conf

Comment 6 Dan Macpherson 2015-12-08 16:32:57 UTC

OT, yes, but I don't think another BZ is necessary for the moment.

I think it's a great idea so I'll add these things in another section in the guide tomorrow.

Comment 7 Dan Yocum 2015-12-08 19:26:17 UTC

4) If deployment fails and the system load is very high, reduce the number of concurrent instances builds to something less than the default of 10.  Edit /etc/nova/nova.conf and set:

max_concurrent_builds=3

And restart nova-api and nova-scheduler services.

5) Tune the mariadb to increase max_connections (if not already 4096) and several innodb parameters.  Edit /etc/my.cnf.d/server.cnf:


innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size = 1000M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_max_purge_lag = 10000
innodb_thread_concurrency = <2*(NumCPUs+NumDisks)>

NB: Ensure that the director has enough RAM, typically 512MB to 1GB more than the size of the innodb_buffer_pool_size.

Comment 8 Dan Yocum 2015-12-28 16:13:11 UTC

From this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1290949, comment 15, do this:

1. edit the file /etc/heat/heat.conf on the undercloud and uncomment the line:
#num_engine_workers = 4

2. restart openstack-heat-engine

Comment 9 Dan Yocum 2015-12-28 16:14:17 UTC

Also, by far the cronjob to clean out the keystone.token table has had the biggest positive influence on tuning the undercloud.

Comment 10 Dan Macpherson 2016-01-22 03:46:53 UTC

Resetting to ASSIGNED due to other docs changes.

Comment 18 Dan Macpherson 2016-03-17 02:51:26 UTC

Thanks, Radek.

Have implemented a commit with your suggested changes:

https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Linux_OpenStack_Platform/commit/8525a8c46347cf773f6f29992a2052bb88967b9a

Comment 19 Radek Bíba 2016-03-17 06:38:04 UTC

Great, moving to VERIFIED.

Comment 20 Andrew Dahms 2016-04-13 04:42:31 UTC

This content is now live on the Customer Portal.

Closing.

Note You need to log in before you can comment on or make changes to this bug.