1122314 – RabbitMQ clustering fails depending on which node has the VIP

Bug 1122314 - RabbitMQ clustering fails depending on which node has the VIP

Summary: RabbitMQ clustering fails depending on which node has the VIP

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	Foreman (RHEL 6)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ga
Target Release:	Installer
Assignee:	John Eckersberg
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1120288 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-22 22:35 UTC by John Eckersberg
Modified:	2014-08-21 18:06 UTC (History)
CC List:	6 users (show)
Fixed In Version:	openstack-foreman-installer-2.0.17-1.el6ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-21 18:06:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1090	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory	2014-08-22 15:28:08 UTC

Description John Eckersberg 2014-07-22 22:35:30 UTC

I will defer to the comment I just stuck in the code to explain:

# This is very subtle but important. The node that is first in
# lb_backend_server_names needs to come up first. The names
# array and the addrs array are ordered the same, e.g. names[i]
# is the same host as addrs[i] for all i. So the IP we pull off
# the front of addrs will be on the first host in names. This
# matters because the names array is what generates the
# cluster_nodes value in the rabbitmq config. When a node
# starts the first time and it is configured to cluster, it
# tries to join each node in cluster_nodes in succession.
# Whichever node is first to start will try to join a cluster
# with the others, time out against each, and then start a new
# cluster with only itself as a member. Each additional host to
# start will then try each host in order until it get to a node
# which has already been started, and join the cluster.
#
# However, there is a problem if the first node to start is not
# the first node in the list. Suppose the third node in the
# list starts first, and then the first two nodes in the list
# start up in parallel. The first node will attempt to cluster
# with the second node (it realizes that the first node is
# itself and skips it). The second node tries to cluster with
# the first node. Because neither host has an initialized
# cluster, the clustering operation will fail on both nodes.
#
# By forcing the first node in the config to come up first, the
# others can be started in parallel and be guaranteed to join
# the cluster via the first node and its running cluster.

Presently RabbitMQ starts first on whatever node has the VIP. If that node is not the first in the cluster_nodes list, the above problem exhibits. Change the logic to start the service on the first node before the others.

Comment 2 John Eckersberg 2014-07-23 01:10:58 UTC

https://github.com/redhat-openstack/astapor/pull/326

Comment 7 Leonid Natapov 2014-08-18 10:53:09 UTC

openstack-foreman-installer-2.0.20-1.el6ost

Verified according to this:


    Review the puppet logs for the controllers. One host will not run the exec for i-am-first-rabbitmq-node-OR-rabbitmq-is-up-on-first-node. The other two will. Note which one does not.

    Look at the cluster_nodes list in /etc/rabbitmq/rabbitmq.config, and note the first node in the list. This should match the node above.

Comment 8 John Eckersberg 2014-08-19 20:26:04 UTC

*** Bug 1120288 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2014-08-21 18:06:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html

Note You need to log in before you can comment on or make changes to this bug.