Bug 1211781 - "stick" setting in HAProxy fails for highly concurrent MySQL connections
Summary: "stick" setting in HAProxy fails for highly concurrent MySQL connections
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-foreman-installer
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z4
: Installer
Assignee: Jason Guiditta
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-14 21:07 UTC by Michael Bayer
Modified: 2015-08-24 15:18 UTC (History)
11 users (show)

Fixed In Version: openstack-foreman-installer-3.0.25-1.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-24 15:18:29 UTC
Target Upstream Version:


Attachments (Terms of Use)
show hosts script (2.99 KB, text/plain)
2015-04-14 21:08 UTC, Michael Bayer
no flags Details
haproxy config (1.62 KB, text/plain)
2015-04-14 21:09 UTC, Michael Bayer
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1662 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Installer update 2015-08-24 19:16:51 UTC

Description Michael Bayer 2015-04-14 21:07:51 UTC
Description of problem:

Testing HAProxy for its behavior when many connections to a Mariadb-Galera cluster occur at once, the "stick" setting we use to memoize on destination IP is not honored in all cases.

I've been playing with variants of this test for a few weeks now and at this point I've been staring at it too long; so I'm hoping there isn't some obvious thing I'm missing.


How reproducible:

100%


Steps to Reproduce:

1. Start with a three-node Galera setup and an HAProxy gateway, using a config equivalent to the attached haproxy.cfg.

2.  run the attached show_hosts.py script.  this script runs a series of Python processes each which reconnect every five seconds and which continuously runs the query "SHOW VARIABLES WHERE Variable_name = 'hostname'", to show what host we are currently connected to.   In my environment, running it looks like:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10

3. Restart servers periodically to also watch failover sending connections to random nodes

Actual results:

In my environment it immediately shows that it is connecting to more than one server, meaning that not every connection is going through the stick setting, which should be pointing everyone to the same server based on dest IP:

1429044327.14 Effective host <unknown> modulus 0 Selected new host rhel7-3
1429044327.14 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044327.14 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429044327.14 Effective host <unknown> modulus 1 Selected new host rhel7-1
1429044327.14 Effective host <unknown> modulus 3 Selected new host rhel7-1
1429044327.15 Effective host <unknown> modulus 5 Selected new host rhel7-3
1429044327.16 Effective host <unknown> modulus 7 Selected new host rhel7-1
1429044327.16 Effective host <unknown> modulus 6 Selected new host rhel7-1
1429044327.19 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429044327.2 Effective host <unknown> modulus 8 Selected new host rhel7-3
1429044332.18 Effective host rhel7-3 modulus 5 Host switched from rhel7-3 to rhel7-1
1429044332.18 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1
1429044332.48 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1
1429044332.54 Effective host rhel7-3 modulus 0 Host switched from rhel7-3 to rhel7-1
1429044332.77 Effective host rhel7-3 modulus 8 Host switched from rhel7-3 to rhel7-1


Then restart the Mariadb node that's getting traffic.  In my environment, I'm doing a "kill" and waiting for pacemaker to restart it.   First you'll see all the connection failures, this is normal:

(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044376.8 Effective host rhel7-1 modulus 1 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044376.89 Effective host rhel7-1 modulus 8 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044377.07 Effective host rhel7-1 modulus 0 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')

etc.  but then, you'll see them all fail over to *either* of the two other nodes, not just one - the stick table here refers to a dead server and is completely ignored:

1429044378.65 Effective host <unknown> modulus 5 Selected new host rhel7-3
1429044378.65 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429044378.65 Effective host <unknown> modulus 1 Selected new host rhel7-3
1429044378.66 Effective host <unknown> modulus 8 Selected new host rhel7-3
1429044378.66 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044378.66 Effective host <unknown> modulus 6 Selected new host rhel7-2

5. When the cluster is back up, it will stabilize, typically one on of the failover nodes, but if we restart the script, which has the most concurrent "new connection" effect, we still get random distro of nodes:

$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10
1429044520.3 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429044520.3 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429044520.31 Effective host <unknown> modulus 9 Selected new host rhel7-3
1429044520.31 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429044520.31 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 7 Selected new host rhel7-2




Expected results:

All workers in this script should be connected to only one host at any given time.  Any time that the host switches, all connections should be forcibly ejected from the previous host.


Additional info:

The configuration that works best is to name two of the servers as "backups" which makes HAProxy much more conservative in its selection of servers, especially during failover.  Then use the stick table with the "nopurge" option which will establish the "primary" node as the one and only "sticky" node, so that during "fail up", connections go back to the original node.   With this setting, we start cleanly:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10       
1429045372.81 Effective host <unknown> modulus 0 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 1 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 3 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 7 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 4 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 9 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 6 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 5 Selected new host rhel7-1


shutting down rhel7-1, we see disconnects:

1429045382.06 Effective host rhel7-1 modulus 4 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.15 Effective host rhel7-1 modulus 3 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.19 Effective host rhel7-1 modulus 6 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.21 Effective host rhel7-1 modulus 1 Error!
..

then a clean switch to node 2:

1429045385.39 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 7 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 8 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 4 Selected new host rhel7-2

When we fail back, nodes move back to rhel7-1 cleanly, however in this case we *still* are talking to multiple nodes at once while this happens:

1429045412.99 Effective host rhel7-2 modulus 3 Host switched from rhel7-2 to rhel7-1
1429045415.91 Effective host rhel7-2 modulus 1 Host switched from rhel7-2 to rhel7-1
1429045416.56 Effective host rhel7-2 modulus 8 Host switched from rhel7-2 to rhel7-1
1429045416.79 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1
1429045416.81 Effective host rhel7-2 modulus 7 Host switched from rhel7-2 to rhel7-1
1429045416.96 Effective host rhel7-2 modulus 2 Host switched from rhel7-2 to rhel7-1
1429045417.02 Effective host rhel7-2 modulus 5 Host switched from rhel7-2 to rhel7-1
1429045417.27 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1
1429045417.39 Effective host rhel7-2 modulus 6 Host switched from rhel7-2 to rhel7-1
1429045417.46 Effective host rhel7-2 modulus 0 Host switched from rhel7-2 to rhel7-1


If we don't use "nopurge" on the stick table, we'd expect rhel7-2 to be "sticky" - which again, this *sort of* happens, but as always, if we restart the script and make lots of concurrent connections, the stick table is ignored:

$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10
1429045602.11 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045602.11 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 4 Selected new host rhel7-1
1429045602.12 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429045602.12 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 7 Selected new host rhel7-2

Comment 1 Michael Bayer 2015-04-14 21:08:53 UTC
Created attachment 1014506 [details]
show hosts script

Comment 2 Michael Bayer 2015-04-14 21:09:14 UTC
Created attachment 1014507 [details]
haproxy config

Comment 3 Michael Bayer 2015-04-14 21:10:27 UTC
note that the script also has a "delay" setting, which will make it space out connects and reconnects by N seconds.  When this setting is greater or equal to 0.01 seconds, the issue generally goes away and the stick table seems to always take effect:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10 -d0.01
1429045809.14 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045809.15 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045809.16 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045809.17 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045809.18 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429045809.19 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045809.2 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045809.21 Effective host <unknown> modulus 7 Selected new host rhel7-2
1429045809.22 Effective host <unknown> modulus 8 Selected new host rhel7-2
1429045809.23 Effective host <unknown> modulus 9 Selected new host rhel7-2

Comment 5 Ryan O'Hara 2015-04-15 13:22:16 UTC
Are you absolutely sure that all traffic is going through the same haproxy node? I'm asking because we do not have stick tables synchronized across haproxy nodes, so if there was a moment when traffic hit a different haproxy node, it could get redirected to a different backend server. Make sense?

Could you test this with a single haproxy node and/or verify that haproxy logs on the other two node show no db traffic?

Comment 6 Michael Bayer 2015-04-15 13:36:46 UTC
> Are you absolutely sure that all traffic is going through the same haproxy node?

yes.   for the series of outputs you see here I disabled it in pacemaker and pointed the script directly at a single HAProxy node, just to make sure.   On these runs you can see I'm pointing the script at the "rhel7-1" node directly with an alternate port.

Also, the difference between running the script with no delay, vs. with a delay, is like night and day.  When you first start the script and ten connections pile on simultaneously, it sends a few to other nodes 99% of the time.  Turn up the delay and this vanishes.

The only test I haven't done is to turn on SQL logging on all three MySQL instances and actually tail their logs to triple check that they are in fact all receiving SQL traffic, to confirm my SELECT of the hostname query on each server is not somehow being corrupted.  I guess you'd see traffic at the Galera level in any case, but this script just does a SELECT anyway.

Comment 7 Michael Bayer 2015-04-15 19:50:19 UTC
let me point out that one thing that has *not* been tested is, this behavior on any other environment other than my QEMU VMs running RHEL7 hosted on a Fedora 21 laptop.   It seems plausible that networking issues within any of these elements could contribute towards what I'm seeing.  If someone wants to put me onto some other different kind of hosted environment I can try reproducing elsewhere.

Comment 15 Michael Bayer 2015-05-13 19:11:46 UTC
OK so as mentioned in the thread there's no problem making stick table 1000.   This is a table of IP numbers, we're talking less memory than it takes to store the text for a single large SQL statement, it's nothing.

with this config the system stays on one host at all times; on failover, it fails to a new host, and then there's no failback so there's never any split situation:

  stick-table type ip size 1000
    stick on dst
    server rhos-node1 rhel7-1:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos-node2 rhel7-2:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos-node3 rhel7-3:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions

we should get this config into our installer / HA setup documentation ASAP.

Comment 16 Ryan O'Hara 2015-05-21 16:00:01 UTC
Moving this to openstack-foreman-installer since it is not an haproxy bug, rather a configuration problem.

Comment 21 Crag Wolfe 2015-05-21 19:25:51 UTC
Merged into staypuft/ofi: https://github.com/redhat-openstack/astapor/pull/518

Comment 22 Ryan O'Hara 2015-07-20 22:00:32 UTC
(In reply to Crag Wolfe from comment #21)
> Merged into staypuft/ofi:
> https://github.com/redhat-openstack/astapor/pull/518

Was this be fixed in RHOS6 A4 release?

Comment 23 Jason Guiditta 2015-07-21 12:17:16 UTC
(In reply to Ryan O'Hara from comment #22)
> (In reply to Crag Wolfe from comment #21)
> > Merged into staypuft/ofi:
> > https://github.com/redhat-openstack/astapor/pull/518
> 
> Was this be fixed in RHOS6 A4 release?

I _think_ so, but will have to defer to Mike on this for OSP 6.  The referenced change is merged and will be in OSP 7 (ofi) release though (it is already in beta builds).

Comment 24 Jason Guiditta 2015-08-06 14:33:33 UTC
Backported to OSP 6

Comment 26 Alexander Chuzhoy 2015-08-18 17:27:48 UTC
Verified:

Environment:
openstack-foreman-installer-3.0.26-1.el7ost.noarch

Based on Comment #15


Verified /etc/haproxy/haproxy.cfg has the following:
listen galera
  bind 192.168.0.13:3306
  mode  tcp
  option  tcplog
  option  httpchk
  option  tcpka
  stick  on dst
  stick-table  type ip size 1000
  timeout  client 90m
  timeout  server 90m
  server pcmk-maca25400702876 192.168.0.7:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions
  server pcmk-maca25400702877 192.168.0.10:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions
  server pcmk-maca25400702875 192.168.0.9:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions

Comment 28 errata-xmlrpc 2015-08-24 15:18:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1662.html


Note You need to log in before you can comment on or make changes to this bug.