Description of problem: Testing HAProxy for its behavior when many connections to a Mariadb-Galera cluster occur at once, the "stick" setting we use to memoize on destination IP is not honored in all cases. I've been playing with variants of this test for a few weeks now and at this point I've been staring at it too long; so I'm hoping there isn't some obvious thing I'm missing. How reproducible: 100% Steps to Reproduce: 1. Start with a three-node Galera setup and an HAProxy gateway, using a config equivalent to the attached haproxy.cfg. 2. run the attached show_hosts.py script. this script runs a series of Python processes each which reconnect every five seconds and which continuously runs the query "SHOW VARIABLES WHERE Variable_name = 'hostname'", to show what host we are currently connected to. In my environment, running it looks like: [mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root -n 10 3. Restart servers periodically to also watch failover sending connections to random nodes Actual results: In my environment it immediately shows that it is connecting to more than one server, meaning that not every connection is going through the stick setting, which should be pointing everyone to the same server based on dest IP: 1429044327.14 Effective host <unknown> modulus 0 Selected new host rhel7-3 1429044327.14 Effective host <unknown> modulus 4 Selected new host rhel7-2 1429044327.14 Effective host <unknown> modulus 2 Selected new host rhel7-1 1429044327.14 Effective host <unknown> modulus 1 Selected new host rhel7-1 1429044327.14 Effective host <unknown> modulus 3 Selected new host rhel7-1 1429044327.15 Effective host <unknown> modulus 5 Selected new host rhel7-3 1429044327.16 Effective host <unknown> modulus 7 Selected new host rhel7-1 1429044327.16 Effective host <unknown> modulus 6 Selected new host rhel7-1 1429044327.19 Effective host <unknown> modulus 9 Selected new host rhel7-2 1429044327.2 Effective host <unknown> modulus 8 Selected new host rhel7-3 1429044332.18 Effective host rhel7-3 modulus 5 Host switched from rhel7-3 to rhel7-1 1429044332.18 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1 1429044332.48 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1 1429044332.54 Effective host rhel7-3 modulus 0 Host switched from rhel7-3 to rhel7-1 1429044332.77 Effective host rhel7-3 modulus 8 Host switched from rhel7-3 to rhel7-1 Then restart the Mariadb node that's getting traffic. In my environment, I'm doing a "kill" and waiting for pacemaker to restart it. First you'll see all the connection failures, this is normal: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429044376.8 Effective host rhel7-1 modulus 1 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429044376.89 Effective host rhel7-1 modulus 8 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429044377.07 Effective host rhel7-1 modulus 0 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') etc. but then, you'll see them all fail over to *either* of the two other nodes, not just one - the stick table here refers to a dead server and is completely ignored: 1429044378.65 Effective host <unknown> modulus 5 Selected new host rhel7-3 1429044378.65 Effective host <unknown> modulus 3 Selected new host rhel7-2 1429044378.65 Effective host <unknown> modulus 1 Selected new host rhel7-3 1429044378.66 Effective host <unknown> modulus 8 Selected new host rhel7-3 1429044378.66 Effective host <unknown> modulus 4 Selected new host rhel7-2 1429044378.66 Effective host <unknown> modulus 6 Selected new host rhel7-2 5. When the cluster is back up, it will stabilize, typically one on of the failover nodes, but if we restart the script, which has the most concurrent "new connection" effect, we still get random distro of nodes: $ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root -n 10 1429044520.3 Effective host <unknown> modulus 0 Selected new host rhel7-2 1429044520.3 Effective host <unknown> modulus 2 Selected new host rhel7-1 1429044520.31 Effective host <unknown> modulus 9 Selected new host rhel7-3 1429044520.31 Effective host <unknown> modulus 8 Selected new host rhel7-1 1429044520.31 Effective host <unknown> modulus 1 Selected new host rhel7-2 1429044520.32 Effective host <unknown> modulus 5 Selected new host rhel7-2 1429044520.32 Effective host <unknown> modulus 4 Selected new host rhel7-2 1429044520.32 Effective host <unknown> modulus 6 Selected new host rhel7-2 1429044520.32 Effective host <unknown> modulus 3 Selected new host rhel7-2 1429044520.32 Effective host <unknown> modulus 7 Selected new host rhel7-2 Expected results: All workers in this script should be connected to only one host at any given time. Any time that the host switches, all connections should be forcibly ejected from the previous host. Additional info: The configuration that works best is to name two of the servers as "backups" which makes HAProxy much more conservative in its selection of servers, especially during failover. Then use the stick table with the "nopurge" option which will establish the "primary" node as the one and only "sticky" node, so that during "fail up", connections go back to the original node. With this setting, we start cleanly: [mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root -n 10 1429045372.81 Effective host <unknown> modulus 0 Selected new host rhel7-1 1429045372.81 Effective host <unknown> modulus 2 Selected new host rhel7-1 1429045372.81 Effective host <unknown> modulus 1 Selected new host rhel7-1 1429045372.81 Effective host <unknown> modulus 3 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 7 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 4 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 9 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 8 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 6 Selected new host rhel7-1 1429045372.82 Effective host <unknown> modulus 5 Selected new host rhel7-1 shutting down rhel7-1, we see disconnects: 1429045382.06 Effective host rhel7-1 modulus 4 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429045382.15 Effective host rhel7-1 modulus 3 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429045382.19 Effective host rhel7-1 modulus 6 Error! (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') 1429045382.21 Effective host rhel7-1 modulus 1 Error! .. then a clean switch to node 2: 1429045385.39 Effective host <unknown> modulus 5 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 0 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 9 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 6 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 7 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 1 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 3 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 2 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 8 Selected new host rhel7-2 1429045385.39 Effective host <unknown> modulus 4 Selected new host rhel7-2 When we fail back, nodes move back to rhel7-1 cleanly, however in this case we *still* are talking to multiple nodes at once while this happens: 1429045412.99 Effective host rhel7-2 modulus 3 Host switched from rhel7-2 to rhel7-1 1429045415.91 Effective host rhel7-2 modulus 1 Host switched from rhel7-2 to rhel7-1 1429045416.56 Effective host rhel7-2 modulus 8 Host switched from rhel7-2 to rhel7-1 1429045416.79 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1 1429045416.81 Effective host rhel7-2 modulus 7 Host switched from rhel7-2 to rhel7-1 1429045416.96 Effective host rhel7-2 modulus 2 Host switched from rhel7-2 to rhel7-1 1429045417.02 Effective host rhel7-2 modulus 5 Host switched from rhel7-2 to rhel7-1 1429045417.27 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1 1429045417.39 Effective host rhel7-2 modulus 6 Host switched from rhel7-2 to rhel7-1 1429045417.46 Effective host rhel7-2 modulus 0 Host switched from rhel7-2 to rhel7-1 If we don't use "nopurge" on the stick table, we'd expect rhel7-2 to be "sticky" - which again, this *sort of* happens, but as always, if we restart the script and make lots of concurrent connections, the stick table is ignored: $ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root -n 10 1429045602.11 Effective host <unknown> modulus 1 Selected new host rhel7-2 1429045602.11 Effective host <unknown> modulus 0 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 3 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 5 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 2 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 4 Selected new host rhel7-1 1429045602.12 Effective host <unknown> modulus 8 Selected new host rhel7-1 1429045602.12 Effective host <unknown> modulus 9 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 6 Selected new host rhel7-2 1429045602.12 Effective host <unknown> modulus 7 Selected new host rhel7-2
Created attachment 1014506 [details] show hosts script
Created attachment 1014507 [details] haproxy config
note that the script also has a "delay" setting, which will make it space out connects and reconnects by N seconds. When this setting is greater or equal to 0.01 seconds, the issue generally goes away and the stick table seems to always take effect: [mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root -n 10 -d0.01 1429045809.14 Effective host <unknown> modulus 0 Selected new host rhel7-2 1429045809.15 Effective host <unknown> modulus 1 Selected new host rhel7-2 1429045809.16 Effective host <unknown> modulus 2 Selected new host rhel7-2 1429045809.17 Effective host <unknown> modulus 3 Selected new host rhel7-2 1429045809.18 Effective host <unknown> modulus 4 Selected new host rhel7-2 1429045809.19 Effective host <unknown> modulus 5 Selected new host rhel7-2 1429045809.2 Effective host <unknown> modulus 6 Selected new host rhel7-2 1429045809.21 Effective host <unknown> modulus 7 Selected new host rhel7-2 1429045809.22 Effective host <unknown> modulus 8 Selected new host rhel7-2 1429045809.23 Effective host <unknown> modulus 9 Selected new host rhel7-2
Are you absolutely sure that all traffic is going through the same haproxy node? I'm asking because we do not have stick tables synchronized across haproxy nodes, so if there was a moment when traffic hit a different haproxy node, it could get redirected to a different backend server. Make sense? Could you test this with a single haproxy node and/or verify that haproxy logs on the other two node show no db traffic?
> Are you absolutely sure that all traffic is going through the same haproxy node? yes. for the series of outputs you see here I disabled it in pacemaker and pointed the script directly at a single HAProxy node, just to make sure. On these runs you can see I'm pointing the script at the "rhel7-1" node directly with an alternate port. Also, the difference between running the script with no delay, vs. with a delay, is like night and day. When you first start the script and ten connections pile on simultaneously, it sends a few to other nodes 99% of the time. Turn up the delay and this vanishes. The only test I haven't done is to turn on SQL logging on all three MySQL instances and actually tail their logs to triple check that they are in fact all receiving SQL traffic, to confirm my SELECT of the hostname query on each server is not somehow being corrupted. I guess you'd see traffic at the Galera level in any case, but this script just does a SELECT anyway.
let me point out that one thing that has *not* been tested is, this behavior on any other environment other than my QEMU VMs running RHEL7 hosted on a Fedora 21 laptop. It seems plausible that networking issues within any of these elements could contribute towards what I'm seeing. If someone wants to put me onto some other different kind of hosted environment I can try reproducing elsewhere.
OK so as mentioned in the thread there's no problem making stick table 1000. This is a table of IP numbers, we're talking less memory than it takes to store the text for a single large SQL statement, it's nothing. with this config the system stays on one host at all times; on failover, it fails to a new host, and then there's no failback so there's never any split situation: stick-table type ip size 1000 stick on dst server rhos-node1 rhel7-1:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions server rhos-node2 rhel7-2:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions server rhos-node3 rhel7-3:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions we should get this config into our installer / HA setup documentation ASAP.
Moving this to openstack-foreman-installer since it is not an haproxy bug, rather a configuration problem.
Merged into staypuft/ofi: https://github.com/redhat-openstack/astapor/pull/518
(In reply to Crag Wolfe from comment #21) > Merged into staypuft/ofi: > https://github.com/redhat-openstack/astapor/pull/518 Was this be fixed in RHOS6 A4 release?
(In reply to Ryan O'Hara from comment #22) > (In reply to Crag Wolfe from comment #21) > > Merged into staypuft/ofi: > > https://github.com/redhat-openstack/astapor/pull/518 > > Was this be fixed in RHOS6 A4 release? I _think_ so, but will have to defer to Mike on this for OSP 6. The referenced change is merged and will be in OSP 7 (ofi) release though (it is already in beta builds).
Backported to OSP 6
Verified: Environment: openstack-foreman-installer-3.0.26-1.el7ost.noarch Based on Comment #15 Verified /etc/haproxy/haproxy.cfg has the following: listen galera bind 192.168.0.13:3306 mode tcp option tcplog option httpchk option tcpka stick on dst stick-table type ip size 1000 timeout client 90m timeout server 90m server pcmk-maca25400702876 192.168.0.7:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions server pcmk-maca25400702877 192.168.0.10:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions server pcmk-maca25400702875 192.168.0.9:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1662.html