Bug 1247924 - RFE: MariaDB max_connections must be based on controller count * core count
RFE: MariaDB max_connections must be based on controller count * core count
Status: NEW
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Damien Ciabrini
Udi Shkalim
: FutureFeature
: 1273557 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2015-07-29 05:13 EDT by Giulio Fidente
Modified: 2018-07-13 21:54 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1240824
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 2 Ofer Blaut 2015-08-04 07:59:42 EDT
We need automated solution for https://bugzilla.redhat.com/show_bug.cgi?id=1240824 , not just configure one manual parameter
Comment 3 Giulio Fidente 2015-08-06 07:21:45 EDT
Some investigation reveals the process which fork to the number of cores are the following:

keystone-all (x2 per core)
neutron-server (x2 per core)
Comment 4 Giulio Fidente 2015-08-06 07:23:01 EDT
There are also some more recommendations in https://access.redhat.com/articles/1432053 from https://bugzilla.redhat.com/show_bug.cgi?id=1195292
Comment 6 Jiri Stransky 2015-09-16 07:20:48 EDT
The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default? Details why i'd like to avoid doing this dynamically follow.


I'm not convinced to go with the dynamically generated default, at least not on Puppet level. In case the controllers are deployed on not fully homogenous hardware (e.g. PoC environments), each Galera cluster member could have a different max_connections value. If this would cause some issue with Galera, it could be hard to find the cause (one doesn't expect this setting to be different on each cluster member). Furthermore, computing that value means that user doesn't know it before triggering the deployment, it wouldn't be visible from config files or manifests (grepping the puppet manifests or hiera files for the particular number would yield no results), so it wouldn't be immediately obvious where that value came from. I think dynamic defaults computed on Puppet level like this add unpredictability and obscurity into the deployment, and we'd probably better avoid them if possible.

Another way to do this dynamically would be to shift the responsibility into the phase before triggering Heat stack creation. That means the code would reside in CLI (or better yet tripleo-common), it would work with ironic introspection data and generate a parameter for the Heat stack. This would mean there's a single value for the whole cluster always, and that the user can review the value prior to kicking off the deployment. However, that's probably quite a major RFE and the cost/benefit ratio would better be evaluated here (i'm not convinced it's worth it at this point).
Comment 7 Jiri Stransky 2015-09-16 07:24:24 EDT
(In reply to Jiri Stransky from comment #6)
> it wouldn't be visible from config files

Just to clarify -- i meant it wouldn't be visible from *Puppet* config files (the Hiera files under /etc/puppet/hieradata).
Comment 8 Michael Bayer 2015-09-16 10:49:23 EDT
> The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default?

That's not a suggestion, that's an *example*.   This document has no guidance at all on how to set this number, and this is a question I'm working on right now.    

Suffice to say 15360 is not a number that is achievable on all hardware because max_connections directly links to number of threads that is feasible to be run per-process on the target server, e.g. the number that is in /proc/sys/kernel/threads-max, which is determined by the kernel at boot time based on available memory pages.  However, the number in threads-max is still twice the size of the default ulimit on the server, so really the max connections without changing ulimits is that of "ulimit -u".  In my own testing, I've experimented with not only upping the ulimits but also upping the threads-max kernel setting, and so far it seems that while you can make as many threads as you want by manipulating these numbers, those connections/threads are unusable for doing any work even at very low levels of activity, so the "ulimit -u" default number is already a pretty good baseline to work from.    There is also the question of using mariadb's thread-pooling feature which again makes lots more *idle* connections possible, but doesn't increase the amount of concurrent SQL work that is feasible. 

tldr; we don't have a non-arbitrary number for max_connections right now, nor does anyone else.
Comment 9 Mike Burns 2015-10-20 13:12:10 EDT
*** Bug 1273557 has been marked as a duplicate of this bug. ***
Comment 10 Sadique Puthen 2015-10-20 13:30:38 EDT
I have worked with Micheal Bayer to develop an article that explains the formula to get the number. It's here https://access.redhat.com/solutions/1990433

Can we apply this logic and decide this dynamically for each deployment?
Comment 11 Jiri Stransky 2015-10-21 05:10:30 EDT
As i wrote in comments 6/7, my opinion is that doing this dynamically with the current capabilities (on Heat/Puppet level) would open a possibility for uncertainty and new type of deployment bugs. Setting this value automatically in the phase of GUI/CLI, before a deployment is triggered, could be better, but is probably a non-trivial RFE and should be weighed in priority against other planned features.

However, if a short term improvement is needed, we can raise the current default of 4096 to a higher value, if there's some recommendation for a better value. (I don't know the numbers, but my thinking here is that if we see that we need to raise the value for, say, half of the deployments, we could raise the default to work well for a higher percentage of deployments.)
Comment 12 Mike Burns 2016-04-07 16:47:27 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 14 Michael Bayer 2016-11-17 13:56:30 EST
looking at http://lists.openstack.org/pipermail/openstack-dev/2016-September/104819.html

this subject area is changing.

Per the above thread, processorcount is no longer the only determining factor in worker configuration, and i think it means that the puppet installer is going to install a non-blank value in the service .conf files that caps process count at 8. Additionally, most openstack services are moving off the eventlet server and onto Apache, which has a much more mature process model.

at the very least, max_connections should be driving off of these new values if they are actually being pushed into the .conf files and/or apache service files.
Comment 16 Red Hat Bugzilla Rules Engine 2017-06-03 22:23:21 EDT
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Note You need to log in before you can comment on or make changes to this bug.