Bug 1240824

Summary: MariaDB max_connections made configurable in yaml files
Product: Red Hat OpenStack Reporter: Ofer Blaut <oblaut>
Component: rhosp-directorAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Asaf Hirshberg <ahirshbe>
Severity: urgent Docs Contact:
Priority: high    
Version: DirectorCC: ahirshbe, akrzos, dmacpher, fdinitto, gfidente, jliberma, jslagle, jstransk, mandreou, markmc, mbayer, mburns, mwagner, oblaut, ohochman, rhel-osp-director-maint, rohara, rrosa, sclewis, vcojot
Target Milestone: gaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-33.el7ost Doc Type: Known Issue
Doc Text:
The number of connections to the database scaled depending on the number of Controllers and the cores of each Controller. In HA scenarios with three controllers where each has more than 12 cores, the database connections could reach the default max_connections value set to 1024. This caused services to not respond to requests. As a workaround, increase the max_connections limit with the following command: $ openstack management plan set [tuskar_plan_uuid] -P "Controller-1::MysqlMaxConnections=4096" Replace [tuskar_plan_uuid] with the actual plan UUID, which you can find with: $ openstack management plan list To increase the max_connections value when deploying with the --templates argument, provide to the deploy command an additional customization environment file containing the following: parameters: MysqlMaxConnections: 4096 Add it to the deploy command with: $ openstack management deploy --plan overcloud -e /path/to/custom_environment_file.yaml
Story Points: ---
Clone Of:
: 1240826 1247924 (view as bug list) Environment:
Last Closed: 2015-08-05 13:58:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1266253    

Description Ofer Blaut 2015-07-07 21:01:12 UTC
Description of problem:

Currently mariadb max_connections is 1024 regardless of number of cores


Each conductor node will start multiple worker threads, and each of
those threads has a connection pool. The connection pool start with 5 upto 15 large, so you could potentially have three controller nodes with eight cpus:

  3 * 8 * 15 = 360

This will be the same for nova API, so double the result by 2


in our setup we have 24 core:

 3 * 24 * 5 ( or more) * 2  = 720 and more.

when we hit limit galera resource agent will fail to access DB, so pacemaker will restart the  mariadb 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.check max_connections configured

MariaDB [(none)]> show variables like "max_connections";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1024  |
+-----------------+-------+

2.check connections used


MariaDB [(none)]> SHOW STATUS WHERE `variable_name` = 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 1014  |
+-------------------+-------+
1 row in set (0.00 sec)

MariaDB [(none)]>

MariaDB [(none)]> SHOW STATUS WHERE `variable_name` = 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 1     |
+-------------------+-------+
1 row in set (0.00 sec)

3. when running load, check connections to mariadb are working and connection counts 

Actual results:


Expected results:


Additional info:

Comment 2 Giulio Fidente 2015-07-08 11:04:01 UTC
*** Bug 1241050 has been marked as a duplicate of this bug. ***

Comment 3 Mike Burns 2015-07-08 20:36:47 UTC
There are 3 things we could do here, I think:

1.  make the value configurable generically (template, not hiera)
2.  Set a very high default
3.  Make the value smarter like the summary says.

I think we need #1 regardless
#2 would be a temporary stopgap solution
#3 is a significant amount of background work.

Comment 5 Giulio Fidente 2015-07-10 01:31:15 UTC
To start with we make it configurable

Comment 7 Mark Wagner 2015-07-10 16:47:09 UTC
If we are kicking up the number of connections, do we need to look at things like the amount of memory allocated as well ?

Each connection requires a certain amount of memory and can possibly use more than that while running queries.

Comment 8 Giulio Fidente 2015-07-13 12:06:32 UTC
hi Mark, we still don't have a formula to scale the value automatically so the patch currently makes max_connections configurable but does not change the setting which is left to 1024.

Comment 9 Mark Wagner 2015-07-13 12:20:51 UTC
We know that 1024 fails on a 3 node HA configuration with fairly standard hardware. Please make the default 4096 so it works "out of the box". People not using it in production or on their laptops can tune it smaller. The default needs to work for production use.

Comment 10 Mark McLoughlin 2015-07-16 07:40:40 UTC
Please note https://bugzilla.redhat.com/show_bug.cgi?id=1241569#c13 where we discovered in a setup consisting of 3 controllers with 96 logical CPUs, that the steady state was over 2000 connections to the DB

We also discovered that the haproxy max connections setting for the DB proxy frontend was set to 2000, so we increased that to over 8192 since the DB max connections was set to 8192

Comment 11 Giulio Fidente 2015-07-16 12:16:37 UTC
hi, thanks for adding some real data into the thread; there are similar problems with rabbitmq as well so I pushed a bunch of additional changes to make parametric the rabbitmq fd limit [1], increase the default mysql max_connections setting [2] and haproxy frontend maxconn [3]

1. https://review.openstack.org/#/c/201796/
2. https://review.openstack.org/#/c/202516/
3. https://review.openstack.org/#/c/202525/

Comment 12 Giulio Fidente 2015-07-20 08:56:22 UTC
With the patch attached to the BZ it is possible to customize the Galera max_connections setting but it is not scaled by default with the number of cores.

To customze max_connections one can deploy using a custom environment file containing:

parameter_defaults: 
  MysqlMaxConnections: 2048

Comment 13 Giulio Fidente 2015-07-20 09:16:41 UTC
Use of parameter_defaults: is unsupported when deploying with Tuskar; alternatively the following will work when deploying with Tuskar and also without Tuskar:

parameters:
  MysqlMaxConnections: 2048

this should be edited into a .yaml file passed with the -e argument when deploying or it can be appended into the parameters: section of a pre-existing .yaml (eg. network-isolation.yaml)

Comment 14 Perry Myers 2015-07-21 12:12:46 UTC
@gfidente I just took a look at the patches in gerrit, and the value is now configurable but the default is set to 1024. Is that what is being presently QA'd?

If so, I have to agree with MarkW in comment #9 that we should set the default higher, otherwise every person doing a realistic deployment is going to have to remember to set this to 4096 or higher.

I think the only negative to setting to 4096 by default is the memory consumption of mariadb when it starts up? Is there any other reason why we shouldn't do this by default so that out of the box this is optimized for production usage.

@mbayer, can you comment on the above?

Secondly... we also found that HAProxy needed it's connection limit adjusted from the default of 2000 per VIP, because the mariadb connections are long-lived. So, even if you set the max connections in mariadb to 4096, if the HAProxy settings are left as defaults, you'll be capped at 2000 simultaneous connections anyhow.

@rohara has suggested to either set this value to at or higher than the DB max connections value, or to leave the max connections for HAProxy completely unbounded. @rohara, please chime in.

Comment 15 Fabio Massimo Di Nitto 2015-07-21 12:34:15 UTC
I agree with Perry and Mark. Default should be "enterprise/production ready".

Comment 16 Giulio Fidente 2015-07-21 13:11:41 UTC
Ack, I think it is safe to cherry-pick at least:

https://review.openstack.org/#/c/202516/

which increases the db max_connections to 4096.

There is also a matching change for haproxy to increase the per-frontend limit to same amount and this should be safe to merge as well (both landed upstream already):

https://review.openstack.org/#/c/202525/

The change which increases the RabbitMQ FD limit instead is not passing CI and it needs an update to make puppet echo the actual limit into rabbitmq-env.conf so it might need a little more time before it's ready.

Comment 17 Mike Burns 2015-07-21 13:49:58 UTC
I don't disagree with the comments at all.  My one question is whether these are release blockers or should we clone this to a bug for the next release to increase the defaults.  Or should we respin and likely slip the release?

Comment 18 Ryan O'Hara 2015-07-21 14:11:51 UTC
We do not want haproxy refusing connections for galera. If anything is going to enforce maximum connections, if should be the database itself. Therefore haproxy maxconn limit for the galera proxy should be greater than the database max_connections.

Also, I am not entirely sure what is setting the max_connections in TripleO (puppet?), but the puppet module we are using in RHOS6 (read: old) will set max_connections based on hardware resources. I think the script included with the puppet-mysql module is mysqltuner.pl. Perhaps this has changed in newer versions of puppet-myself, and perhaps if you explicitly set max_connections as a parameter in the puppet class if will avoid tuning this value.

Comment 19 Ofer Blaut 2015-07-27 13:45:51 UTC
Hi 

The bug is on_qa and i'm not sure what is expected here ( comment #15) ? 


yaml files should use default values without need to configure them ?

test increase the default values by editing network-isolation.yaml ?

Which parameters ?  

MysqlMaxConnections to 4096 

$ sudo grep -i MysqlMaxConnections /usr/share/openstack-tripleo-heat-templates/puppet/controller-puppet.yaml -A 4

Still 1024 


haproxy_global_maxconn = 20480
haproxy_default_maxconn = 4096

[stack@puma33 ~]$ sudo grep -ir haproxy_global_maxconn /usr/share/openstack-puppet/modules/tripleo/manifests/loadbalancer.pp  -A 4


Still 10000

Comment 20 Marios Andreou 2015-07-28 15:55:21 UTC
I came here with the intent to help out on the doctext... didn't quite get there but to that extent I have another workaround to offer for time.now(). From my local deploy of latest poodle it seems Giulio's patches for making this configurable ARE downstream, whereas his patches for making this default to 4096 are not. 

Besides Giulio's workaround above at comment 13 you can also patch the openstack overcloud deployment plan - I just tested this with (what i got when i deployed latest i think) openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
 and it works like so:

(before deploying an overcloud):

    #  get the plan id:
    openstack management plan list
    # set the mysql connections max to 4096:
    openstack management plan set 44874dcc-9b5b-404e-929d-471e36c7b5e0 -P "Controller-1::MysqlMaxConnections=4096"
    # deploy your overcloud:
    openstack management deploy --plan overcloud --control-scale 3 --compute-scale 3
    # after 'Overcloud Deployed' is reported, confirm this limit on the controllers:
    controller_ips=`nova list | grep control | grep ctlplane | awk '{print $12}' | tr "ctlplane="  "\n"`
    for i in ${controller_ips[@]}; do ssh heat-admin@$i "sudo grep -rni 'max_connections' /etc/*"; echo " ===== "; done

Comment 21 Giulio Fidente 2015-07-29 09:09:54 UTC
hi Ofer, only the patch which allows for customization of max_connections is included in the initial release so to test this the only option we have is to customize the default setting and check if that is increased in galera.cnf.

I have pased into doc_text some commands to customize it before the deployment; it would indeed be nice to check if the instructions work as intended.

Comment 22 Ofer Blaut 2015-08-02 17:36:16 UTC
option suggested in comment 13 didn't work
parameters:
  MysqlMaxConnections: 2048

 openstack overcloud deploy -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml --plan overcloud   --control-scale 3 --compute-scale 1   --neutron-network-type vxlan  --neutron-tunnel-types vxlan --ntp-server clock.redhat.com 




/tmp/tmp7NIm1z/extraconfig/post_deploy/default.yaml
/tmp/tmp7NIm1z/puppet/all-nodes-config.yaml
ERROR: openstack ERROR: The Parameter (MysqlMaxConnections) was not defined in template.

Comment 23 Ofer Blaut 2015-08-02 19:42:22 UTC
only thing worked is


MariaDB [(none)]> show variables like "max_connections";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 4096  |
+-----------------+-------+
1 row in set (0.00 sec)

parameters:
  Controller-1::NeutronExternalNetworkBridge: "''"
  Controller-1::MysqlMaxConnections: 4096



please advise 


openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch

Comment 24 Giulio Fidente 2015-08-03 09:44:19 UTC
hi Ofer,

thanks for the feedback, comment #13 would still only work with standard templated, the instructions in the doc text should be correct though:

Controller-1::MysqlMaxConnections

is the parameter name to be used when deploying with tuskar and it seems to have worked as per your comment #23, while:

MysqlMaxConnections

is the parameter name to be used when deploying with regular templates (passing --templates to the deploy command)

Comment 25 Ofer Blaut 2015-08-04 11:59:07 UTC
Workaround works, bug is cloned to https://bugzilla.redhat.com/show_bug.cgi?id=1247924 , for automated solution

Comment 27 errata-xmlrpc 2015-08-05 13:58:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Comment 28 Mike Burns 2015-08-27 16:46:23 UTC
*** Bug 1246084 has been marked as a duplicate of this bug. ***

Comment 29 James Slagle 2015-12-07 20:39:32 UTC
adding the link to the downstream patch that made the default 4096

Comment 30 jliberma@redhat.com 2016-06-20 05:31:03 UTC
If we determine that the MysqlMaxConnections should be 8192, and the haproxy maxconn default is still 4096, what is the best way to increase the HAproxy maxconn through the templates in a repeatable way?  Is this exposed anywhere?

I see haproxy_default_maxconn is set in /usr/share/openstack-puppet/modules/tripleo/manifests/loadbalancer.pp, but is there a way to change it in Heat templates?

Is this exposed anywhere or do I have to manually change the puppet manifest?

Comment 31 Jiri Stransky 2016-06-20 09:03:28 UTC
(Gfidente is just getting back from PTO today so i'll reduce the e-mail flood on him by answering this one :) )

It can be set in a way described here:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/616-customizing-puppet-configuration-data

So e.g.

parameter_defaults:
  controllerExtraConfig:
    tripleo::loadbalancer::haproxy_default_maxconn: 8192


There are two relevant changes coming in Newton: `controllerExtraConfig` will be deprecated in favor of `ControllerExtraConfig`, and `tripleo::loadbalancer` will become `tripleo::haproxy`.