Bug 1122314
| Summary: | RabbitMQ clustering fails depending on which node has the VIP | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | John Eckersberg <jeckersb> |
| Component: | openstack-foreman-installer | Assignee: | John Eckersberg <jeckersb> |
| Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | Foreman (RHEL 6) | CC: | jguiditt, lnatapov, mburns, morazi, rhos-maint, yeylon |
| Target Milestone: | ga | ||
| Target Release: | Installer | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-foreman-installer-2.0.17-1.el6ost | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-08-21 18:06:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
openstack-foreman-installer-2.0.20-1.el6ost
Verified according to this:
Review the puppet logs for the controllers. One host will not run the exec for i-am-first-rabbitmq-node-OR-rabbitmq-is-up-on-first-node. The other two will. Note which one does not.
Look at the cluster_nodes list in /etc/rabbitmq/rabbitmq.config, and note the first node in the list. This should match the node above.
*** Bug 1120288 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1090.html |
I will defer to the comment I just stuck in the code to explain: # This is very subtle but important. The node that is first in # lb_backend_server_names needs to come up first. The names # array and the addrs array are ordered the same, e.g. names[i] # is the same host as addrs[i] for all i. So the IP we pull off # the front of addrs will be on the first host in names. This # matters because the names array is what generates the # cluster_nodes value in the rabbitmq config. When a node # starts the first time and it is configured to cluster, it # tries to join each node in cluster_nodes in succession. # Whichever node is first to start will try to join a cluster # with the others, time out against each, and then start a new # cluster with only itself as a member. Each additional host to # start will then try each host in order until it get to a node # which has already been started, and join the cluster. # # However, there is a problem if the first node to start is not # the first node in the list. Suppose the third node in the # list starts first, and then the first two nodes in the list # start up in parallel. The first node will attempt to cluster # with the second node (it realizes that the first node is # itself and skips it). The second node tries to cluster with # the first node. Because neither host has an initialized # cluster, the clustering operation will fail on both nodes. # # By forcing the first node in the config to come up first, the # others can be started in parallel and be guaranteed to join # the cluster via the first node and its running cluster. Presently RabbitMQ starts first on whatever node has the VIP. If that node is not the first in the cluster_nodes list, the above problem exhibits. Change the logic to start the service on the first node before the others.