Bug 1235408

Summary: HAProxy should use clustercheck for galera nodes health checks
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: bperkins, calfonso, dmacpher, mburns, ohochman, rhel-osp-director-maint, rrosa, yeylon
Target Milestone: betaKeywords: Triaged
Target Release: Director   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-17.el7ost Doc Type: Bug Fix
Doc Text:
HAProxy did not use clustercheck to check MariaDB's backends status. This caused HAProxy to forward requests to MariaDB nodes responsive at the TCP check but not in synchronization with the Galera cluster. This fix now uses clustercheck to check MariaDB's backends status. HAProxy now forwards requests to MariaDB nodes correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:55:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2015-06-24 17:43:22 UTC
Description of problem:
HAProxy should use clustercheck for galera nodes health checks in order to get valid status of the db servers. 

Version-Release number of selected component (if applicable):
openstack-puppet-modules-2015.1.7-2.el7ost.noarch

Additional info:
https://review.openstack.org/#/c/194960/2

Comment 3 Omri Hochman 2015-06-24 19:57:07 UTC
On HA environment:  galera_start failed after rebooting of controller_0 : 


pcs status : 
-------------
Failed actions:
    openstack-cinder-volume_start_0 on overcloud-controller-2 'not running' (7): call=314, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:47 2015', queued=2001ms, exec=4ms
    galera_start_0 on overcloud-controller-0 'unknown error' (1): call=216, status=Timed Out, exit-reason='none', last-rc-change='Wed Jun 24 15:47:30 2015', queued=0ms, exec=120003ms
    redis_start_0 on overcloud-controller-0 'unknown error' (1): call=219, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:49:35 2015', queued=0ms, exec=21910ms
    openstack-nova-scheduler_start_0 on overcloud-controller-0 'not running' (7): call=236, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:30 2015', queued=2001ms, exec=2ms
    openstack-nova-consoleauth_start_0 on overcloud-controller-0 'not running' (7): call=238, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:34 2015', queued=2001ms, exec=5ms
    openstack-cinder-api_start_0 on overcloud-controller-0 'not running' (7): call=242, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:39 2015', queued=2002ms, exec=5ms
    neutron-server_start_0 on overcloud-controller-0 'not running' (7): call=246, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:46 2015', queued=2001ms, exec=3ms
    openstack-cinder-volume_start_0 on overcloud-controller-1 'not running' (7): call=325, status=complete, exit-reason='none', last-rc-change='Wed Jun 24 15:50:41 2015', queued=2001ms, exec=2ms


PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Comment 6 Giulio Fidente 2015-06-25 15:10:11 UTC
The puppet-tripleo change should be included in openstack-puppet-modules-2015.1.7-5.el7ost

Comment 8 Omri Hochman 2015-06-26 21:33:03 UTC
Verified: openstack-tripleo-heat-templates-0.8.6-19.el7ost.noarch

from sudo vi /etc/haproxy/haproxy.cfg

listen cinder
  bind 192.168.0.6:8776
  option httpchk GET /
  server overcloud-controller-0 192.168.0.11:8776 check fall 5 inter 2000 rise 2
  server overcloud-controller-1 192.168.0.12:8776 check fall 5 inter 2000 rise 2
  server overcloud-controller-2 192.168.0.10:8776 check fall 5 inter 2000 rise 2


[heat-admin@overcloud-controller-1 ~]$ sudo grep httpchk /etc/haproxy/haproxy.cfg
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /
  option httpchk GET /info

Comment 10 errata-xmlrpc 2015-08-05 13:55:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549