Bug 1211781

Summary:

"stick" setting in HAProxy fails for highly concurrent MySQL connections

Product:

Red Hat OpenStack

Reporter:

Michael Bayer <mbayer>

Component:

openstack-foreman-installer

Assignee:

Jason Guiditta <jguiditt>

Status:

CLOSED ERRATA

QA Contact:

Leonid Natapov <lnatapov>

Severity:

medium

Docs Contact:

Priority:

high

Version:

6.0 (Juno)

CC:

bperkins, cwolfe, fdinitto, mburns, morazi, ohochman, rhos-maint, rohara, sasha, sputhenp, yeylon

Target Milestone:

Keywords:

ZStream

Target Release:

Installer

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-foreman-installer-3.0.25-1.el7ost

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-24 15:18:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
show hosts script	none
haproxy config	none

Description Michael Bayer 2015-04-14 21:07:51 UTC

Description of problem:

Testing HAProxy for its behavior when many connections to a Mariadb-Galera cluster occur at once, the "stick" setting we use to memoize on destination IP is not honored in all cases.

I've been playing with variants of this test for a few weeks now and at this point I've been staring at it too long; so I'm hoping there isn't some obvious thing I'm missing.


How reproducible:

100%


Steps to Reproduce:

1. Start with a three-node Galera setup and an HAProxy gateway, using a config equivalent to the attached haproxy.cfg.

2.  run the attached show_hosts.py script.  this script runs a series of Python processes each which reconnect every five seconds and which continuously runs the query "SHOW VARIABLES WHERE Variable_name = 'hostname'", to show what host we are currently connected to.   In my environment, running it looks like:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10

3. Restart servers periodically to also watch failover sending connections to random nodes

Actual results:

In my environment it immediately shows that it is connecting to more than one server, meaning that not every connection is going through the stick setting, which should be pointing everyone to the same server based on dest IP:

1429044327.14 Effective host <unknown> modulus 0 Selected new host rhel7-3
1429044327.14 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044327.14 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429044327.14 Effective host <unknown> modulus 1 Selected new host rhel7-1
1429044327.14 Effective host <unknown> modulus 3 Selected new host rhel7-1
1429044327.15 Effective host <unknown> modulus 5 Selected new host rhel7-3
1429044327.16 Effective host <unknown> modulus 7 Selected new host rhel7-1
1429044327.16 Effective host <unknown> modulus 6 Selected new host rhel7-1
1429044327.19 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429044327.2 Effective host <unknown> modulus 8 Selected new host rhel7-3
1429044332.18 Effective host rhel7-3 modulus 5 Host switched from rhel7-3 to rhel7-1
1429044332.18 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1
1429044332.48 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1
1429044332.54 Effective host rhel7-3 modulus 0 Host switched from rhel7-3 to rhel7-1
1429044332.77 Effective host rhel7-3 modulus 8 Host switched from rhel7-3 to rhel7-1


Then restart the Mariadb node that's getting traffic.  In my environment, I'm doing a "kill" and waiting for pacemaker to restart it.   First you'll see all the connection failures, this is normal:

(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044376.8 Effective host rhel7-1 modulus 1 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044376.89 Effective host rhel7-1 modulus 8 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429044377.07 Effective host rhel7-1 modulus 0 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')

etc.  but then, you'll see them all fail over to *either* of the two other nodes, not just one - the stick table here refers to a dead server and is completely ignored:

1429044378.65 Effective host <unknown> modulus 5 Selected new host rhel7-3
1429044378.65 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429044378.65 Effective host <unknown> modulus 1 Selected new host rhel7-3
1429044378.66 Effective host <unknown> modulus 8 Selected new host rhel7-3
1429044378.66 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044378.66 Effective host <unknown> modulus 6 Selected new host rhel7-2

5. When the cluster is back up, it will stabilize, typically one on of the failover nodes, but if we restart the script, which has the most concurrent "new connection" effect, we still get random distro of nodes:

$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10
1429044520.3 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429044520.3 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429044520.31 Effective host <unknown> modulus 9 Selected new host rhel7-3
1429044520.31 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429044520.31 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429044520.32 Effective host <unknown> modulus 7 Selected new host rhel7-2




Expected results:

All workers in this script should be connected to only one host at any given time.  Any time that the host switches, all connections should be forcibly ejected from the previous host.


Additional info:

The configuration that works best is to name two of the servers as "backups" which makes HAProxy much more conservative in its selection of servers, especially during failover.  Then use the stick table with the "nopurge" option which will establish the "primary" node as the one and only "sticky" node, so that during "fail up", connections go back to the original node.   With this setting, we start cleanly:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10       
1429045372.81 Effective host <unknown> modulus 0 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 2 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 1 Selected new host rhel7-1
1429045372.81 Effective host <unknown> modulus 3 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 7 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 4 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 9 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 6 Selected new host rhel7-1
1429045372.82 Effective host <unknown> modulus 5 Selected new host rhel7-1


shutting down rhel7-1, we see disconnects:

1429045382.06 Effective host rhel7-1 modulus 4 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.15 Effective host rhel7-1 modulus 3 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.19 Effective host rhel7-1 modulus 6 Error!
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
1429045382.21 Effective host rhel7-1 modulus 1 Error!
..

then a clean switch to node 2:

1429045385.39 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 7 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 8 Selected new host rhel7-2
1429045385.39 Effective host <unknown> modulus 4 Selected new host rhel7-2

When we fail back, nodes move back to rhel7-1 cleanly, however in this case we *still* are talking to multiple nodes at once while this happens:

1429045412.99 Effective host rhel7-2 modulus 3 Host switched from rhel7-2 to rhel7-1
1429045415.91 Effective host rhel7-2 modulus 1 Host switched from rhel7-2 to rhel7-1
1429045416.56 Effective host rhel7-2 modulus 8 Host switched from rhel7-2 to rhel7-1
1429045416.79 Effective host rhel7-2 modulus 4 Host switched from rhel7-2 to rhel7-1
1429045416.81 Effective host rhel7-2 modulus 7 Host switched from rhel7-2 to rhel7-1
1429045416.96 Effective host rhel7-2 modulus 2 Host switched from rhel7-2 to rhel7-1
1429045417.02 Effective host rhel7-2 modulus 5 Host switched from rhel7-2 to rhel7-1
1429045417.27 Effective host rhel7-2 modulus 9 Host switched from rhel7-2 to rhel7-1
1429045417.39 Effective host rhel7-2 modulus 6 Host switched from rhel7-2 to rhel7-1
1429045417.46 Effective host rhel7-2 modulus 0 Host switched from rhel7-2 to rhel7-1


If we don't use "nopurge" on the stick table, we'd expect rhel7-2 to be "sticky" - which again, this *sort of* happens, but as always, if we restart the script and make lots of concurrent connections, the stick table is ignored:

$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10
1429045602.11 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045602.11 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 4 Selected new host rhel7-1
1429045602.12 Effective host <unknown> modulus 8 Selected new host rhel7-1
1429045602.12 Effective host <unknown> modulus 9 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045602.12 Effective host <unknown> modulus 7 Selected new host rhel7-2

Comment 1 Michael Bayer 2015-04-14 21:08:53 UTC

Created attachment 1014506 [details]
show hosts script

Comment 2 Michael Bayer 2015-04-14 21:09:14 UTC

Created attachment 1014507 [details]
haproxy config

Comment 3 Michael Bayer 2015-04-14 21:10:27 UTC

note that the script also has a "delay" setting, which will make it space out connects and reconnects by N seconds.  When this setting is greater or equal to 0.01 seconds, the issue generally goes away and the stick table seems to always take effect:

[mbayer@thinkpad hammer]$ .venv/bin/python show_hosts.py -u root -H rhel7-1 -P 3456 -p root  -n 10 -d0.01
1429045809.14 Effective host <unknown> modulus 0 Selected new host rhel7-2
1429045809.15 Effective host <unknown> modulus 1 Selected new host rhel7-2
1429045809.16 Effective host <unknown> modulus 2 Selected new host rhel7-2
1429045809.17 Effective host <unknown> modulus 3 Selected new host rhel7-2
1429045809.18 Effective host <unknown> modulus 4 Selected new host rhel7-2
1429045809.19 Effective host <unknown> modulus 5 Selected new host rhel7-2
1429045809.2 Effective host <unknown> modulus 6 Selected new host rhel7-2
1429045809.21 Effective host <unknown> modulus 7 Selected new host rhel7-2
1429045809.22 Effective host <unknown> modulus 8 Selected new host rhel7-2
1429045809.23 Effective host <unknown> modulus 9 Selected new host rhel7-2

Comment 5 Ryan O'Hara 2015-04-15 13:22:16 UTC

Are you absolutely sure that all traffic is going through the same haproxy node? I'm asking because we do not have stick tables synchronized across haproxy nodes, so if there was a moment when traffic hit a different haproxy node, it could get redirected to a different backend server. Make sense?

Could you test this with a single haproxy node and/or verify that haproxy logs on the other two node show no db traffic?

Comment 6 Michael Bayer 2015-04-15 13:36:46 UTC

> Are you absolutely sure that all traffic is going through the same haproxy node?

yes.   for the series of outputs you see here I disabled it in pacemaker and pointed the script directly at a single HAProxy node, just to make sure.   On these runs you can see I'm pointing the script at the "rhel7-1" node directly with an alternate port.

Also, the difference between running the script with no delay, vs. with a delay, is like night and day.  When you first start the script and ten connections pile on simultaneously, it sends a few to other nodes 99% of the time.  Turn up the delay and this vanishes.

The only test I haven't done is to turn on SQL logging on all three MySQL instances and actually tail their logs to triple check that they are in fact all receiving SQL traffic, to confirm my SELECT of the hostname query on each server is not somehow being corrupted.  I guess you'd see traffic at the Galera level in any case, but this script just does a SELECT anyway.

Comment 7 Michael Bayer 2015-04-15 19:50:19 UTC

let me point out that one thing that has *not* been tested is, this behavior on any other environment other than my QEMU VMs running RHEL7 hosted on a Fedora 21 laptop.   It seems plausible that networking issues within any of these elements could contribute towards what I'm seeing.  If someone wants to put me onto some other different kind of hosted environment I can try reproducing elsewhere.

Comment 15 Michael Bayer 2015-05-13 19:11:46 UTC

OK so as mentioned in the thread there's no problem making stick table 1000.   This is a table of IP numbers, we're talking less memory than it takes to store the text for a single large SQL statement, it's nothing.

with this config the system stays on one host at all times; on failover, it fails to a new host, and then there's no failback so there's never any split situation:

  stick-table type ip size 1000
    stick on dst
    server rhos-node1 rhel7-1:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos-node2 rhel7-2:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions
    server rhos-node3 rhel7-3:3306 check inter 1s port 9200 backup on-marked-down shutdown-sessions

we should get this config into our installer / HA setup documentation ASAP.

Comment 16 Ryan O'Hara 2015-05-21 16:00:01 UTC

Moving this to openstack-foreman-installer since it is not an haproxy bug, rather a configuration problem.

Comment 21 Crag Wolfe 2015-05-21 19:25:51 UTC

Merged into staypuft/ofi: https://github.com/redhat-openstack/astapor/pull/518

Comment 22 Ryan O'Hara 2015-07-20 22:00:32 UTC

(In reply to Crag Wolfe from comment #21)
> Merged into staypuft/ofi:
> https://github.com/redhat-openstack/astapor/pull/518

Was this be fixed in RHOS6 A4 release?

Comment 23 Jason Guiditta 2015-07-21 12:17:16 UTC

(In reply to Ryan O'Hara from comment #22)
> (In reply to Crag Wolfe from comment #21)
> > Merged into staypuft/ofi:
> > https://github.com/redhat-openstack/astapor/pull/518
> 
> Was this be fixed in RHOS6 A4 release?

I _think_ so, but will have to defer to Mike on this for OSP 6.  The referenced change is merged and will be in OSP 7 (ofi) release though (it is already in beta builds).

Comment 24 Jason Guiditta 2015-08-06 14:33:33 UTC

Backported to OSP 6

Comment 26 Alexander Chuzhoy 2015-08-18 17:27:48 UTC

Verified:

Environment:
openstack-foreman-installer-3.0.26-1.el7ost.noarch

Based on Comment #15


Verified /etc/haproxy/haproxy.cfg has the following:
listen galera
  bind 192.168.0.13:3306
  mode  tcp
  option  tcplog
  option  httpchk
  option  tcpka
  stick  on dst
  stick-table  type ip size 1000
  timeout  client 90m
  timeout  server 90m
  server pcmk-maca25400702876 192.168.0.7:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions
  server pcmk-maca25400702877 192.168.0.10:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions
  server pcmk-maca25400702875 192.168.0.9:3306  check inter 1s port 9200 backup on-marked-down shutdown-sessions

Comment 28 errata-xmlrpc 2015-08-24 15:18:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1662.html