Red Hat Bugzilla – Bug 1247924
RFE: MariaDB max_connections must be based on controller count * core count
Last modified: 2018-02-13 22:01:44 EST
We need automated solution for https://bugzilla.redhat.com/show_bug.cgi?id=1240824 , not just configure one manual parameter
Some investigation reveals the process which fork to the number of cores are the following:
keystone-all (x2 per core)
neutron-server (x2 per core)
There are also some more recommendations in https://access.redhat.com/articles/1432053 from https://bugzilla.redhat.com/show_bug.cgi?id=1195292
The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default? Details why i'd like to avoid doing this dynamically follow.
I'm not convinced to go with the dynamically generated default, at least not on Puppet level. In case the controllers are deployed on not fully homogenous hardware (e.g. PoC environments), each Galera cluster member could have a different max_connections value. If this would cause some issue with Galera, it could be hard to find the cause (one doesn't expect this setting to be different on each cluster member). Furthermore, computing that value means that user doesn't know it before triggering the deployment, it wouldn't be visible from config files or manifests (grepping the puppet manifests or hiera files for the particular number would yield no results), so it wouldn't be immediately obvious where that value came from. I think dynamic defaults computed on Puppet level like this add unpredictability and obscurity into the deployment, and we'd probably better avoid them if possible.
Another way to do this dynamically would be to shift the responsibility into the phase before triggering Heat stack creation. That means the code would reside in CLI (or better yet tripleo-common), it would work with ironic introspection data and generate a parameter for the Heat stack. This would mean there's a single value for the whole cluster always, and that the user can review the value prior to kicking off the deployment. However, that's probably quite a major RFE and the cost/benefit ratio would better be evaluated here (i'm not convinced it's worth it at this point).
(In reply to Jiri Stransky from comment #6)
> it wouldn't be visible from config files
Just to clarify -- i meant it wouldn't be visible from *Puppet* config files (the Hiera files under /etc/puppet/hieradata).
> The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default?
That's not a suggestion, that's an *example*. This document has no guidance at all on how to set this number, and this is a question I'm working on right now.
Suffice to say 15360 is not a number that is achievable on all hardware because max_connections directly links to number of threads that is feasible to be run per-process on the target server, e.g. the number that is in /proc/sys/kernel/threads-max, which is determined by the kernel at boot time based on available memory pages. However, the number in threads-max is still twice the size of the default ulimit on the server, so really the max connections without changing ulimits is that of "ulimit -u". In my own testing, I've experimented with not only upping the ulimits but also upping the threads-max kernel setting, and so far it seems that while you can make as many threads as you want by manipulating these numbers, those connections/threads are unusable for doing any work even at very low levels of activity, so the "ulimit -u" default number is already a pretty good baseline to work from. There is also the question of using mariadb's thread-pooling feature which again makes lots more *idle* connections possible, but doesn't increase the amount of concurrent SQL work that is feasible.
tldr; we don't have a non-arbitrary number for max_connections right now, nor does anyone else.
*** Bug 1273557 has been marked as a duplicate of this bug. ***
I have worked with Micheal Bayer to develop an article that explains the formula to get the number. It's here https://access.redhat.com/solutions/1990433
Can we apply this logic and decide this dynamically for each deployment?
As i wrote in comments 6/7, my opinion is that doing this dynamically with the current capabilities (on Heat/Puppet level) would open a possibility for uncertainty and new type of deployment bugs. Setting this value automatically in the phase of GUI/CLI, before a deployment is triggered, could be better, but is probably a non-trivial RFE and should be weighed in priority against other planned features.
However, if a short term improvement is needed, we can raise the current default of 4096 to a higher value, if there's some recommendation for a better value. (I don't know the numbers, but my thinking here is that if we see that we need to raise the value for, say, half of the deployments, we could raise the default to work well for a higher percentage of deployments.)
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
looking at http://lists.openstack.org/pipermail/openstack-dev/2016-September/104819.html
this subject area is changing.
Per the above thread, processorcount is no longer the only determining factor in worker configuration, and i think it means that the puppet installer is going to install a non-blank value in the service .conf files that caps process count at 8. Additionally, most openstack services are moving off the eventlet server and onto Apache, which has a much more mature process model.
at the very least, max_connections should be driving off of these new values if they are actually being pushed into the .conf files and/or apache service files.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.