Bug 1281584

Summary: Director does not create an haproxy configuration that conforms to our best-practice recommendations
Product: Red Hat OpenStack Reporter: Lars Kellogg-Stedman <lars>
Component: openstack-puppet-modulesAssignee: Jason Guiditta <jguiditt>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: agarciam, aortega, bperkins, dciabrin, dnavale, emacchi, fdinitto, ggillies, jcoufal, jguiditt, jstransk, lbezdick, mburns, rhel-osp-director-maint, rohara, sasha, yeylon
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-puppet-modules-2015.1.8-32.el7ost Doc Type: Bug Fix
Doc Text:
Previously, although the haproxy is configured at allow a value of 10000 for the 'maxconn' parameter for all proxies together, there is a default 'maxconn' value of 2000 for each proxy individually. If the specific proxy used for MySQL reached the limit of 2000, it dropped all further connections to the database and the client would not retry, which caused API timeout and subsequent commands to fail. With this update, the default value for 'maxconn' parameter has been increased to work better for production environments, As a result, the database connections are far less likely to time out.
Story Points: ---
Clone Of:
: 1289180 (view as bug list) Environment:
Last Closed: 2015-12-21 17:11:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1289180    

Description Lars Kellogg-Stedman 2015-11-12 20:54:51 UTC
Description of problem:

According to https://access.redhat.com/solutions/1595673 (and confirmed by rohara):

> Though haproxy is configured to allow maxconn 10000 for all proxies together,
> there is a default maxconn of 2000 for each proxy. If the specific proxy used
> for mysql reaches 2000 limit, it will drop further connections to database
> and client would not retry which causes API timeout and subsequent command to
> fail.
>
> [...]
>
> If you have decided 4096 is the right value for your deployment, add maxconn
> 4096 to mysql proxy.

But a director-installed system has an haproxy.cfg that contains:

  listen mysql
    bind 10.19.94.11:3306 
    option tcpka
    option httpchk
    stick on dst
    stick-table type ip size 1000
    timeout client 0
    timeout server 0
    server overcloud-controller-0 10.19.94.15:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
    server overcloud-controller-1 10.19.94.13:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
    server overcloud-controller-2 10.19.94.16:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2

This suggests to me that haproxy will start dropping connections well
before the 4096 max_connections setting in /etc/my.cnf.d/galera.cnf.

(openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch)

Comment 2 Jason Guiditta 2015-12-02 21:41:02 UTC
For comparison, this is how we configured haproxy for galera in ofi/osp7:

listen galera
  bind 192.168.201.7:3306
  mode tcp
  maxconn 3996
  option tcplog
  option httpchk
  option tcpka
  stick on dst
  stick-table type ip size 1000
  timeout client 90m
  timeout server 90m
  server pcmk-c1a1 192.168.200.10:3306 backup check inter 1s on-marked-down shutdown-sessions port 9200
  server pcmk-c1a2 192.168.200.20:3306 backup check inter 1s on-marked-down shutdown-sessions port 9200
  server pcmk-c1a3 192.168.200.30:3306 backup check inter 1s on-marked-down shutdown-sessions port 9200

Comment 3 Perry Myers 2015-12-05 18:58:51 UTC
*** Bug 1287988 has been marked as a duplicate of this bug. ***

Comment 4 Jiri Stransky 2015-12-07 15:26:08 UTC
Changing the component to OPM, as the linked patch is to puppet-tripleo, which is packaged with OPM.

Comment 6 Jason Guiditta 2015-12-07 20:38:50 UTC
(In reply to Jiri Stransky from comment #4)
> Changing the component to OPM, as the linked patch is to puppet-tripleo,
> which is packaged with OPM.

So, it appears to me this is fixed upstream in the patch referenced in the duplicate bug, patch being: https://review.openstack.org/#/c/202525/ .  If this is correct, I believe all this BZ needs is to have the puppet-tripleo module in opm updated and a new osp build of same?

Sasha, looking at the patch, it appears there should be a maxconn of 4096 in the default options, and a global maxconn of 20480

Comment 7 Jason Guiditta 2015-12-07 21:08:52 UTC
Lukas, this is the other one we are tracking to get into a build asap

Comment 10 Leonid Natapov 2015-12-16 10:25:13 UTC
openstack-puppet-modules-2015.1.8-32.el7ost.noarch
 maxconn  20480

Comment 11 Ryan O'Hara 2015-12-17 16:36:47 UTC
(In reply to Leonid Natapov from comment #10)
> openstack-puppet-modules-2015.1.8-32.el7ost.noarch
>  maxconn  20480

This lacks context. Where is haproxy getting "maxconn 20480"? What about the per proxy maxconn?

Comment 13 errata-xmlrpc 2015-12-21 17:11:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2677