+++ This bug was initially created as a clone of Bug #1311005 +++ Description of problem: Sometimes when a controller is rebooted the node is unable to join the rabbitmq cluster. This issue needs the following to happen to be fixed: 1) BZ https://bugzilla.redhat.com/show_bug.cgi?id=1247303 needs a fix in the resource agent (see this BZ for more history about this issue) 2) We need to add the meta parameter "notify=true" on fresh installs and on updates. Note, that if the fix to THT makes it in before the fix of the resource agent, nothing happens (i.e. things keep going like before)
If someone still sees this issue, then please test this package: resource-agents-3.9.5-76.el7
Checking THT with the latest 7 build: [stack@undercloud ~]$ rpm -qa | grep tripleo-heat-templates openstack-tripleo-heat-templates-0.8.6-127.el7ost.noarch [stack@undercloud ~]$ grep -A6 "pacemaker::resource::ocf { 'rabbitmq':" /usr/share/openstack-tripleo-heat-templates/puppet/manifests/overcloud_controller_pacemaker.pp pacemaker::resource::ocf { 'rabbitmq': ocf_agent_name => 'heartbeat:rabbitmq-cluster', resource_params => 'set_policy=\'ha-all ^(?!amq\.).* {"ha-mode":"all"}\'', clone_params => 'ordered=true interleave=true', meta_params => 'notify=true', require => Class['::rabbitmq'], } [stack@undercloud ~]$ grep 'rabbitmq meta notify=true' /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/yum_update.sh pcs -f $pacemaker_dumpfile resource update rabbitmq meta notify=true Results in the following resource on deployed overcloud: [root@overcloud-controller-0 heat-admin]# pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" Meta Attrs: notify=true Operations: start interval=0s timeout=100 (rabbitmq-start-interval-0s) stop interval=0s timeout=90 (rabbitmq-stop-interval-0s) monitor interval=10 timeout=40 (rabbitmq-monitor-interval-10)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1387