Bug 1347305 - Overcloud deployed with keystone as single process leads to abysmal performance
Summary: Overcloud deployed with keystone as single process leads to abysmal performance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Marios Andreou
QA Contact: Rodrigo Duarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-16 13:32 UTC by Alex Krzos
Modified: 2016-08-24 13:01 UTC (History)
16 users (show)

Fixed In Version: openstack-tripleo-heat-templates-2.0.0-21.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, as noted in the upstream bug report (https://bugs.launchpad.net/tripleo/+bug/1598092), the default for upstream puppet-keystone was to have 1 worker and $::processorcount threads. Consequently, these settings were inefficient when keystone was deployed on a multicore CPU machine, (as discussed at https://bugzilla.redhat.com/show_bug.cgi?id=1347305#c2 and proposed at https://review.openstack.org/#/c/297342/ (note that the review proved controversial and did not merge into puppet-keystone)). With this update, the defaults for these parameters are inverted in director; set to 1 thread and $::processorcount workers. These two parameters are also configurable if you need to override them, by using the tripleo-heat-templates: an existing template parameter "KeystoneWorkers" now points to keystone::wsgi::apache::workers (before pointing to "admin_workers" which is removed in Mitaka) and keystone::wsgi::apache::threads is hardcoded to '1' in the hieradata (this is user-overridable). Refer to the upstream for more details on the fix: https://review.openstack.org/#/c/336520/ As a result, /etc/httpd/conf.d/10-keystone_wsgi_admin.conf should contain the "processes" and "threads" config items, set to $::processorcount and 1 respectively (or whichever other value the user has set for these). When keystone is deployed on a multicore machine the defaults set by director should provide for much better performance from keystone compared to the current upstream puppet-keystone defaults.
Clone Of:
Environment:
Last Closed: 2016-08-24 13:01:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1598092 None None None 2016-07-01 10:20:34 UTC
OpenStack gerrit 336520 None None None 2016-07-01 11:56:28 UTC
Red Hat Product Errata RHEA-2016:1762 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director Advisory 2016-08-24 16:59:57 UTC

Description Alex Krzos 2016-06-16 13:32:47 UTC
Description of problem:

I deployed OSP9 on a multi-core machine and keystone is deployed in apache however it is tuned to a single process and threads == logical-core-count.  This is known to be suboptimal as the threads can not use more than a single core.  We need a multi-process setup to allow keystone to use more cores and improve its performance and capacity.  This is literally the exact same issue as reported in bug https://bugzilla.redhat.com/show_bug.cgi?id=1330980 but now for the overcloud.


Version-Release number of selected component (if applicable):
OSP 9 Mitaka

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Alex Krzos 2016-06-16 13:36:54 UTC
Simple rally results:

1 process / 24 threads:

Action	Min (sec)	Median (sec)	90%ile (sec)	95%ile (sec)	Max (sec)	Avg (sec)	Success	Count
authenticate.keystone	5.84	9.098	11.461	11.828	13.684	9.377	100.0%	500


24 processes / 1 threads:

Action	Min (sec)	Median (sec)	90%ile (sec)	95%ile (sec)	Max (sec)	Avg (sec)	Success	Count
authenticate.keystone	0.764	1.244	1.548	1.603	2.07	1.278	100.0%	500


More comprehensive results to come, however the above shows very clearly how this issue affects keystone's ability to handle concurrent requests

Comment 3 Mark McLoughlin 2016-06-22 13:18:26 UTC
Noting from bz #1330980 the proposed fix to puppet-keystone in https://review.openstack.org/297342

The way I interpret the feedback is the puppet-keystone is supposed to be "unopinionated" whereas TripelO and OSP director is a bit more opinionated ... I suspect a patch somewhere like puppet-tripleo might be better received

Comment 5 Jiri Stransky 2016-06-24 16:08:04 UTC
In the meantime workaround should be possible via an environment file, e.g.:

parameter_defaults:
  controllerExtraConfig:
    keystone::wsgi::apache::workers: 32
    keystone::wsgi::apache::threads: 1

Comment 6 Alex Krzos 2016-06-28 17:54:50 UTC
(In reply to Jiri Stransky from comment #5)
> In the meantime workaround should be possible via an environment file, e.g.:
> 
> parameter_defaults:
>   controllerExtraConfig:
>     keystone::wsgi::apache::workers: 32
>     keystone::wsgi::apache::threads: 1

Hi Jiri

This work around works great.  My Cloud now deploys with more keystone processes.  Can we get tripleo to do this by default?  My assumption is that most people won't know that a single Keystone process limits performance of Keystone itself.

-Alex

Comment 7 Marios Andreou 2016-07-22 11:19:16 UTC
https://review.openstack.org/#/c/336520/ "Repurpose KeystoneWorkers add keystone::wsgi::apache::threads" landed into stable/mitaka, which makes it so that keystone::wsgi::apache::threads is 1 and keystone::wsgi::apache::workers is $processorcount.

Comment 11 Emilien Macchi 2016-08-12 12:26:46 UTC
I took over https://review.openstack.org/#/c/297342/ and try to fix it upstream.

Comment 12 Rodrigo Duarte 2016-08-17 13:06:04 UTC
This looks not fixed yet since the upstream patch for puppet-keystone is still under review? Should I move back to ASSIGNED?

Comment 13 Mike Burns 2016-08-17 13:28:07 UTC
(In reply to Rodrigo Duarte from comment #12)
> This looks not fixed yet since the upstream patch for puppet-keystone is
> still under review? Should I move back to ASSIGNED?

This was fixed in tripleo-heat-templates, not puppet-keystone.  If there is a need for the puppet fix in addition to this, then please clone this to openstack-puppet-modules.  

Failing this because a different patch isn't merged isn't the right thing to do.

Comment 14 Rodrigo Duarte 2016-08-17 23:01:43 UTC
verified for openstack-tripleo-heat-templates-2.0.0-31.el7ost.noarch

As proposed by the fix, the number of keystone processes should match the number of cores in the overcloud controller:

1 - Confirming the number of cores of the overcloud controller:

# cat /proc/cpuinfo | grep -c proc
4

2 - Checking httpd configs:

# cat 10-keystone_wsgi_admin.conf | grep proc
  WSGIDaemonProcess keystone_admin display-name=keystone-admin group=keystone processes=4 threads=1 user=keystone

# cat 10-keystone_wsgi_main.conf | grep proc
  WSGIDaemonProcess keystone_main display-name=keystone-main group=keystone processes=4 threads=1 user=keystone

3 - Checking the number of running processes:

We should have 4 processes for "keystone_admin" and 4 for processes for "keystone_main" as defined at /etc/httpd/conf.d/10-keystone_wsgi_admin.conf and /etc/httpd/conf.d/10-keystone_wsgi_main.conf

# ps aux | grep keystone
keystone 24988  0.1  0.4 536424 76160 ?        Sl   Ago16   1:13 keystone-admin  -DFOREGROUND
keystone 24989  0.1  0.4 536424 76092 ?        Sl   Ago16   1:13 keystone-admin  -DFOREGROUND
keystone 24990  0.1  0.4 536424 74196 ?        Sl   Ago16   1:13 keystone-admin  -DFOREGROUND
keystone 24991  0.1  0.4 536424 75992 ?        Sl   Ago16   1:16 keystone-admin  -DFOREGROUND
keystone 24992  0.0  0.4 536424 75000 ?        Sl   Ago16   1:08 keystone-main   -DFOREGROUND
keystone 24993  0.0  0.4 536424 73504 ?        Sl   Ago16   1:08 keystone-main   -DFOREGROUND
keystone 24994  0.0  0.4 536424 73464 ?        Sl   Ago16   1:06 keystone-main   -DFOREGROUND
keystone 24995  0.0  0.4 536424 75928 ?        Sl   Ago16   1:07 keystone-main   -DFOREGROUND

4 - Running rally benchmark:

+--------------------------------------------------------------------------------------------------------------------------+
|                                                   Response Times (sec)                                                   |
+-----------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action                | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+-----------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| authenticate.keystone | 0.579     | 1.698        | 2.064        | 2.261        | 2.347     | 1.642     | 100.0%  | 100   |
| total                 | 0.579     | 1.698        | 2.064        | 2.261        | 2.347     | 1.642     | 100.0%  | 100   |
+-----------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

The performance results above are much superior than what was experienced in the bug report.

Comment 16 errata-xmlrpc 2016-08-24 13:01:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1762.html


Note You need to log in before you can comment on or make changes to this bug.