Bug 1311025

Summary:	rabbitmq-cluster resource instance may not be able to rejoin the rabbitmq cluster.
Product:	Red Hat OpenStack	Reporter:	Michele Baldessari <michele>
Component:	openstack-tripleo-heat-templates	Assignee:	Jiri Stransky <jstransk>
Status:	CLOSED ERRATA	QA Contact:	Marius Cornea <mcornea>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	ealcaniz, jstransk, mburns, mcornea, nlevinki, plemenko, rhel-osp-director-maint, srevivo
Target Milestone:	---
Target Release:	7.0 (Kilo)
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-125.el7ost	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1311005	Environment:
Last Closed:	2016-07-06 15:05:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1311005
Bug Blocks:	1247303

Description Michele Baldessari 2016-02-23 08:12:44 UTC

+++ This bug was initially created as a clone of Bug #1311005 +++

Description of problem:
Sometimes when a controller is rebooted the node is unable to join the rabbitmq
cluster. This issue needs the following to happen to be fixed:

1) BZ https://bugzilla.redhat.com/show_bug.cgi?id=1247303 needs a fix in the resource agent (see this BZ for more history about this issue)

2) We need to add the meta parameter "notify=true" on fresh installs and on updates.


Note, that if the fix to THT makes it in before the fix of the resource agent,
nothing happens (i.e. things keep going like before)

Comment 6 Peter Lemenkov 2016-06-24 13:33:41 UTC

If someone still sees this issue, then please test this package:

resource-agents-3.9.5-76.el7

Comment 7 Marius Cornea 2016-06-27 09:09:38 UTC

Checking THT with the latest 7 build:

[stack@undercloud ~]$ rpm -qa | grep tripleo-heat-templates
openstack-tripleo-heat-templates-0.8.6-127.el7ost.noarch

[stack@undercloud ~]$ grep -A6 "pacemaker::resource::ocf { 'rabbitmq':" /usr/share/openstack-tripleo-heat-templates/puppet/manifests/overcloud_controller_pacemaker.pp 
    pacemaker::resource::ocf { 'rabbitmq':
      ocf_agent_name  => 'heartbeat:rabbitmq-cluster',
      resource_params => 'set_policy=\'ha-all ^(?!amq\.).* {"ha-mode":"all"}\'',
      clone_params    => 'ordered=true interleave=true',
      meta_params     => 'notify=true',
      require         => Class['::rabbitmq'],
    }
[stack@undercloud ~]$ grep 'rabbitmq meta notify=true' /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/yum_update.sh
    pcs -f $pacemaker_dumpfile resource update rabbitmq meta notify=true

Results in the following resource on deployed overcloud:

[root@overcloud-controller-0 heat-admin]# pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" 
  Meta Attrs: notify=true 
  Operations: start interval=0s timeout=100 (rabbitmq-start-interval-0s)
              stop interval=0s timeout=90 (rabbitmq-stop-interval-0s)
              monitor interval=10 timeout=40 (rabbitmq-monitor-interval-10)

Comment 9 errata-xmlrpc 2016-07-06 15:05:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1387