Bug 1299404
Summary: | Galera resource agent cannot connect to custom host/port | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Damien Ciabrini <dciabrin> | |
Component: | resource-agents | Assignee: | Damien Ciabrini <dciabrin> | |
Status: | CLOSED ERRATA | QA Contact: | Asaf Hirshberg <ahirshbe> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.3 | CC: | agk, ahirshbe, cfeist, cluster-maint, dciabrin, fdinitto, mbayer, mkolaja, oalbrigt, royoung, snagar | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | resource-agents-3.9.5-56.el7 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1304711 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-04 00:00:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1304711 |
Description
Damien Ciabrini
2016-01-18 10:05:04 UTC
fixed upstream in https://github.com/ClusterLabs/resource-agents/pull/680 Hi We need instructions how to test in in OSPd Thanks Ofer Instruction for test While galera cluster is up, create a new user for the test create user 'testuser'@'%' identified by 'test'; Allow user to log in to a tcp socket grant all privileges on 'testuser'@'%'; Stop galera. pcs resource disable galera Once stopped, change /etc/my.cnf.d/galera.cnf to have mysql listen to port 4242 on an available interface: bind_address=0.0.0.0 port=4242 Then configure /etc/sysconfig/clustercheck to make the resource agent contact mysqld on the tcp port above. Modify the entries so that the file looks like: MYSQL_USERNAME="testuser" MYSQL_PASSWORD="test" MYSQL_HOST="127.0.0.1" MYSQL_PORT="4242" (don't set MYSQL_HOST to "localhost" otherwise mysql will try to connect via UNIX socket) Start galera again. pcs resource enable galera After all machine are marked as master, verify that mysqld is listening to port 4242. netstat -tnlp | grep mysqld If so, the test is working as expected since the resource agent polls mysqld regularly to ensure that galera has started properly and is still running. For cleaning up, stop galera, reset the two config files to their original values, and restart galera. Once started, you can delete the test user drop user 'testuser'@'%'; After following Damien Ciabrini instructions on comment 9 the end result is that mysql listening to the given port: [root@overcloud-controller-0 ~]# netstat -tnlp | grep mysqld tcp 0 0 0.0.0.0:4242 0.0.0.0:* LISTEN 20232/mysqld All machines were marked as masters but all openstack resources managed by pacemaker are in failed state: Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Failed Actions: * openstack-nova-scheduler_start_0 on overcloud-controller-0 'not running' (7): call=493, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:18 2016', queued=0ms, exec=134380ms * openstack-heat-engine_start_0 on overcloud-controller-0 'not running' (7): call=507, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:36 2016', queued=0ms, exec=2108ms * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-0 'not running' (7): call=483, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:11 2016', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-0 'not running' (7): call=476, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:09 2016', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-0 'not running' (7): call=472, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:06 2016', queued=0ms, exec=120568ms * openstack-nova-scheduler_start_0 on overcloud-controller-1 'not running' (7): call=483, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:19 2016', queued=0ms, exec=134432ms * openstack-heat-engine_start_0 on overcloud-controller-1 'not running' (7): call=497, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:36 2016', queued=0ms, exec=2122ms * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-1 'not running' (7): call=475, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:11 2016', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-1 'not running' (7): call=468, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:09 2016', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-1 'not running' (7): call=464, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:06 2016', queued=0ms, exec=120717ms * openstack-nova-scheduler_start_0 on overcloud-controller-2 'not running' (7): call=474, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:19 2016', queued=0ms, exec=134380ms * openstack-heat-engine_start_0 on overcloud-controller-2 'not running' (7): call=488, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:36 2016', queued=0ms, exec=2108ms * openstack-cinder-scheduler_monitor_60000 on overcloud-controller-2 'not running' (7): call=466, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:11 2016', queued=0ms, exec=0ms * openstack-cinder-api_monitor_60000 on overcloud-controller-2 'not running' (7): call=459, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:12:09 2016', queued=0ms, exec=0ms * neutron-server_start_0 on overcloud-controller-2 'not running' (7): call=455, status=complete, exitreason='none', last-rc-change='Tue Mar 8 10:09:06 2016', queued=0ms, exec=120576ms After deleting the changes all resources started. So I want to make sure that this is the expected results of the changes made. System info: RHEL-OSP director 8.0 puddle - 2016-03-03.1 [root@overcloud-controller-0 ~]# rpm -qa|grep resource-agents resource-agents-3.9.5-54.el7_2.6.x86_64 Asaf, I think some steps are missing in #c9. Could you please check whether haproxy is listening as expected on port 3306 on the VIP, and that the haproxy.cfg setting forwards request to port 4242, i.e. config looks similar to the lines below: server overcloud-controller-0 192.0.2.21:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-1 192.0.2.20:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-2 192.0.2.22:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 Veirifed on RHEL-OSP director 8.0 puddle - 2016-03-03.1 [root@overcloud-controller-0 ~]# rpm -qa|grep resource-agents resource-agents-3.9.5-54.el7_2.6.x86_64 1) Create a new user in mysqland grant it with all privileges: i) create user 'testuser'@'%' identified by 'test'; ii) grant all privileges on *.* to 'testuser'@'%'; 2) Stop galera cluster using pacemaker: i) pcs resource disable galera 3) After galera is down change the port mysql listening to in /etc/my.cnf.d/galera.cnf: i) vi /etc/my.cnf.d/galera.cnf : bind_address=0.0.0.0 port=4242 4) Then configure /etc/sysconfig/clustercheck to make the resource agent contact mysqld on the tcp port above. Modify the entries so that the file looks like: i) vi /etc/sysconfig/clustercheck MYSQL_USERNAME="testuser" MYSQL_PASSWORD="test" MYSQL_HOST="127.0.0.1" MYSQL_PORT="4242" 5) Change also the ports haproxy forwarding to in /etc/haproxy/haproxy.cfg at mysql section: i) vi /etc/haproxy/haproxy.cfg listen mysql bind 172.17.0.11:3306 transparent option tcpka option httpchk stick on dst stick-table type ip size 1000 timeout client 90m timeout server 90m server overcloud-controller-0 172.17.0.16:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-1 172.17.0.17:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 server overcloud-controller-2 172.17.0.15:4242 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2 6) Restart haproxy service: i) systemctl restart haproxy ii) systemctl status haproxy 7) Start galera cluster using pacemaker: i) pcs resource enalbe galera 8) Verify that the galera cluster is up and machines are marked as master, also make sure that all openstack-resources are started. i) pcs status ii) netstat -tupln |grep haproxy # haproxy should listen to the old port iii) clustercheck Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2174.html |