We need automated solution for https://bugzilla.redhat.com/show_bug.cgi?id=1240824 , not just configure one manual parameter
Some investigation reveals the process which fork to the number of cores are the following: neutron-metadata-agent heat-engine glance-registry cinder-api keystone-all (x2 per core) glance-api proxy-server neutron-server (x2 per core) nova-conductor
There are also some more recommendations in https://access.redhat.com/articles/1432053 from https://bugzilla.redhat.com/show_bug.cgi?id=1195292
The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default? Details why i'd like to avoid doing this dynamically follow. ----- I'm not convinced to go with the dynamically generated default, at least not on Puppet level. In case the controllers are deployed on not fully homogenous hardware (e.g. PoC environments), each Galera cluster member could have a different max_connections value. If this would cause some issue with Galera, it could be hard to find the cause (one doesn't expect this setting to be different on each cluster member). Furthermore, computing that value means that user doesn't know it before triggering the deployment, it wouldn't be visible from config files or manifests (grepping the puppet manifests or hiera files for the particular number would yield no results), so it wouldn't be immediately obvious where that value came from. I think dynamic defaults computed on Puppet level like this add unpredictability and obscurity into the deployment, and we'd probably better avoid them if possible. Another way to do this dynamically would be to shift the responsibility into the phase before triggering Heat stack creation. That means the code would reside in CLI (or better yet tripleo-common), it would work with ironic introspection data and generate a parameter for the Heat stack. This would mean there's a single value for the whole cluster always, and that the user can review the value prior to kicking off the deployment. However, that's probably quite a major RFE and the cost/benefit ratio would better be evaluated here (i'm not convinced it's worth it at this point).
(In reply to Jiri Stransky from comment #6) > it wouldn't be visible from config files Just to clarify -- i meant it wouldn't be visible from *Puppet* config files (the Hiera files under /etc/puppet/hieradata).
> The performance tuning document which Giulio linked suggests setting max_connections = 15360. Should we just use this as default? That's not a suggestion, that's an *example*. This document has no guidance at all on how to set this number, and this is a question I'm working on right now. Suffice to say 15360 is not a number that is achievable on all hardware because max_connections directly links to number of threads that is feasible to be run per-process on the target server, e.g. the number that is in /proc/sys/kernel/threads-max, which is determined by the kernel at boot time based on available memory pages. However, the number in threads-max is still twice the size of the default ulimit on the server, so really the max connections without changing ulimits is that of "ulimit -u". In my own testing, I've experimented with not only upping the ulimits but also upping the threads-max kernel setting, and so far it seems that while you can make as many threads as you want by manipulating these numbers, those connections/threads are unusable for doing any work even at very low levels of activity, so the "ulimit -u" default number is already a pretty good baseline to work from. There is also the question of using mariadb's thread-pooling feature which again makes lots more *idle* connections possible, but doesn't increase the amount of concurrent SQL work that is feasible. tldr; we don't have a non-arbitrary number for max_connections right now, nor does anyone else.
*** Bug 1273557 has been marked as a duplicate of this bug. ***
I have worked with Micheal Bayer to develop an article that explains the formula to get the number. It's here https://access.redhat.com/solutions/1990433 Can we apply this logic and decide this dynamically for each deployment?
As i wrote in comments 6/7, my opinion is that doing this dynamically with the current capabilities (on Heat/Puppet level) would open a possibility for uncertainty and new type of deployment bugs. Setting this value automatically in the phase of GUI/CLI, before a deployment is triggered, could be better, but is probably a non-trivial RFE and should be weighed in priority against other planned features. However, if a short term improvement is needed, we can raise the current default of 4096 to a higher value, if there's some recommendation for a better value. (I don't know the numbers, but my thinking here is that if we see that we need to raise the value for, say, half of the deployments, we could raise the default to work well for a higher percentage of deployments.)
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
looking at http://lists.openstack.org/pipermail/openstack-dev/2016-September/104819.html this subject area is changing. Per the above thread, processorcount is no longer the only determining factor in worker configuration, and i think it means that the puppet installer is going to install a non-blank value in the service .conf files that caps process count at 8. Additionally, most openstack services are moving off the eventlet server and onto Apache, which has a much more mature process model. at the very least, max_connections should be driving off of these new values if they are actually being pushed into the .conf files and/or apache service files.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
AFAIK, OpenStack service workers are no longer scaled linearly with CPU core counts, given the dramatic increase in server core density. I'd be interested to see if there are improvements we can make in keeping max_connections in line with whatever determination Director is making on how many service workers to run, but I believe tying it to core count no longer makes any sense.
it would have to aggregate the settings that are being made within all the service-level puppet-xyz packages, e.g. connection pool settings * number of worker processes * number of controllers for each one, then add all that up. it might be best if each puppet-xyz can report this number individually in case there are idiosyncratic behaviors specific to a certain sub-package. whatever the number is though it should probably be doubled and floored at 4096 in any case.