| Summary: | HAProxy configuration for mysql on port 9200 hence HAProxy will not be aware of galera-master service down | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | VIKRANT <vaggarwa> |
| Component: | galera | Assignee: | Damien Ciabrini <dciabrin> |
| Status: | CLOSED DUPLICATE | QA Contact: | Shai Revivo <srevivo> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.0 (Mitaka) | CC: | fdinitto, mbayer, srevivo, ushkalim, vaggarwa |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-04 10:01:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
The two settings reported as erroneous are there on purpose. 1) haproxy uses an intermediate script called clustercheck to check for the availability of the galera server on a node: . clustercheck is run on demand via xinetd and its output is exposed over port 9200 on each controller node. . port 9200 is regularly polled by haproxy to update its view of the galera servers available for forwarding queries to. When "pcs cluster standby controller-1" is issued, the galera server on controller-1 will stop, and the next status check performed by haproxy (over port 9200) will return that galera is not available on this node, and haproxy will stop forwarding traffic to this node. 2) stick on dst is set beause we want active/passive behaviour when targetting mariadb/galera in an OpenStack. All DB writes being sent to the DB VIP, the HAProxy's stick table will tell HAProxy to forward all traffic to the same backend until it fails. *** This bug has been marked as a duplicate of bug 1389413 *** |
Description of problem: Maintenance mode basically stop the node by "pcs cluster standby controller-1". But HAProxy is not aware such down galera instance due to 9200 port is listened by xinetd. Our local workaround is to shutdown xinetd manually during maintenance mode of controller. Think about then galera is down on controller, HAProxy is not aware of such happening, HAProxy will still direct DB connections to this down controller. This will trigger Openstack API DB connection errors Version-Release number of selected component (if applicable): RHEL OSP 9 and older ones. How reproducible: Everytime. Steps to Reproduce: 1. Put the pacemaker node in standby mode. 2. Check the port 9200 listening status. 3. Actual results: Port 9200 is in use. Expected results: Port 9200 should listen Additional info: Results from test lab. 1) putting controller-0 in standby mode. ~~~ [root@overcloud-controller-0 ~]# pcs cluster standby overcloud-controller-0 ~~~ 2) galera and rest of services are stopped on node. ~~~ Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-1 overcloud-controller-2 ] Stopped: [ overcloud-controller-0 ] ~~~ 3) But still the port 9200 is in use. ~~~ [root@overcloud-controller-0 ~]# lsof -ni:9200 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME xinetd 2175 root 5u IPv6 21972 0t0 TCP *:wap-wsp (LISTEN) ~~~ 4) From haproxy.cfg file, we can see that port 9200 should also not be listening to completely shutdown the mysql. ~~~ listen mysql bind 192.168.124.20:3306 transparent option tcpka option httpchk stick on dst stick-table type ip size 1000 timeout client 90m timeout server 90m server overcloud-controller-0 192.168.124.25:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-1 192.168.124.24:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-2 192.168.124.23:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 ~~~ Workaround is to manually stop the xinetd on node which has been put in standby mode. Some recommendations from Cu to circumvent this issue : You should change 9200 port in HAProxy configuration to be listened by HAProxy. And "stick on dst" is no necessary in our opinion.