Description of problem: In a three node cluster, configured to auto correct network partitions using pause_minority in cluster_partition_handling, the cluster will solve network partitions by shutting the tcp listener and restarting if the connectivity is regained. If the connectivity is lost for around 60 seconds this process is not initiated and the partition will remain. The impact on OpenStack HA with the mirrored queues across the cluster is that there will be two separate master databases active. A split brain situation, only resolved after restarting the rabbitmq-server on the minority node. Also see: http://rabbitmq.1065348.n5.nabble.com/Problems-with-the-cluster-partition-handling-pause-minority-option-td33863.html Version-Release number of selected component (if applicable): rabbitmq-server-3.3.5-3.el7ost.noarch How reproducible: Install a three node rabbitmq cluster with this configuration file in /etc/rabbitmq/rabbitmq.config % config to configure clustering with defaults, except: % network partition response (ignore), management console on, management agent on [ {rabbit, [ {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}, {cluster_partition_handling, pause_minority}, {default_user, <<"guest">>}, {default_pass, <<"guest">>} ]}, {rabbitmq_management, [{listener, [{port, 15672}]}]}, {rabbitmq_management_agent, [ {force_fine_statistics, true} ] }, {kernel, [ ]} ]. Steps to Reproduce: iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d rabbit2 -j DROP sleep 90; systemctl restart firewalld Result (outage start at Tue Feb 3 18:10:40 CET 2015) ---> node 1 =ERROR REPORT==== 3-Feb-2015::18:14:34 === ** Node rabbit@rabbit3 not responding ** ** Removing (timedout) connection ** =INFO REPORT==== 3-Feb-2015::18:14:34 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 3-Feb-2015::18:14:34 === node rabbit@rabbit3 down: net_tick_timeout ---> node 2 =ERROR REPORT==== 3-Feb-2015::18:14:21 === ** Node rabbit@rabbit3 not responding ** ** Removing (timedout) connection ** =INFO REPORT==== 3-Feb-2015::18:14:21 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 3-Feb-2015::18:14:21 === node rabbit@rabbit3 down: net_tick_timeout =ERROR REPORT==== 3-Feb-2015::18:14:33 === Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3} ---> node 3 =INFO REPORT==== 3-Feb-2015::18:14:20 === rabbit on node rabbit@rabbit2 down =ERROR REPORT==== 3-Feb-2015::18:14:32 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit2} =INFO REPORT==== 3-Feb-2015::18:14:33 === node rabbit@rabbit2 down: connection_closed =INFO REPORT==== 3-Feb-2015::18:14:33 === rabbit on node rabbit@rabbit1 down =INFO REPORT==== 3-Feb-2015::18:14:33 === node rabbit@rabbit1 down: connection_closed Fix: No auto recovery [root@rabbit3 ~]# rabbitmqctl environment | grep cluster {cluster_nodes,{[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3],disc}}, {cluster_partition_handling,pause_minority}, [root@rabbit3 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit3]}, {cluster_name,<<"rabbit@rabbit1">>}, {partitions,[{rabbit@rabbit3,[rabbit@rabbit2]}]}] ...done. [root@rabbit3 ~]# systemctl restart rabbitmq-server [root@rabbit3 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}, {cluster_name,<<"rabbit@rabbit1">>}, {partitions,[]}] ...done. Expected results: =INFO REPORT==== 3-Feb-2015::17:53:03 === rabbit on node rabbit@rabbit1 down =INFO REPORT==== 3-Feb-2015::17:53:06 === node rabbit@rabbit1 down: connection_closed =WARNING REPORT==== 3-Feb-2015::17:53:06 === Cluster minority status detected - awaiting recovery =INFO REPORT==== 3-Feb-2015::17:53:06 === rabbit on node rabbit@rabbit2 down =INFO REPORT==== 3-Feb-2015::17:53:06 === Stopping RabbitMQ =INFO REPORT==== 3-Feb-2015::17:53:06 === node rabbit@rabbit2 down: connection_closed =WARNING REPORT==== 3-Feb-2015::17:53:06 === Cluster minority status detected - awaiting recovery =INFO REPORT==== 3-Feb-2015::17:53:13 === Statistics database started. =INFO REPORT==== 3-Feb-2015::17:53:13 === stopped TCP Listener on 192.168.122.83:5672 =ERROR REPORT==== 3-Feb-2015::17:53:31 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit1} =ERROR REPORT==== 3-Feb-2015::17:53:31 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit2} =INFO REPORT==== 3-Feb-2015::17:53:31 === Starting RabbitMQ 3.3.5 on Erlang R16B03 Copyright (C) 2007-2014 GoPivotal, Inc. Licensed under the MPL. See http://www.rabbitmq.com/ =INFO REPORT==== 3-Feb-2015::17:53:31 === node : rabbit@rabbit3 home dir : /var/lib/rabbitmq config file(s) : /etc/rabbitmq/rabbitmq.config cookie hash : RflLlXitNm70/ikHN/7Tsw== log : /var/log/rabbitmq/rabbit sasl log : /var/log/rabbitmq/rabbit database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbit3 =INFO REPORT==== 3-Feb-2015::17:53:31 === Limiting to approx 924 file handles (829 sockets) =INFO REPORT==== 3-Feb-2015::17:53:31 === Memory limit set to 397MB of 993MB total. =INFO REPORT==== 3-Feb-2015::17:53:31 === Disk free limit set to 50MB =INFO REPORT==== 3-Feb-2015::17:53:31 === msg_store_transient: using rabbit_msg_store_ets_index to provide index =INFO REPORT==== 3-Feb-2015::17:53:31 === msg_store_persistent: using rabbit_msg_store_ets_index to provide index =INFO REPORT==== 3-Feb-2015::17:53:31 === started TCP Listener on 192.168.122.83:5672 =INFO REPORT==== 3-Feb-2015::17:53:31 === rabbit on node rabbit@rabbit1 up =INFO REPORT==== 3-Feb-2015::17:53:31 === Management plugin started. Port: 15672 =WARNING REPORT==== 3-Feb-2015::17:53:31 === The on_load function for module sd_notify returned {error, {upgrade, "Upgrade not supported by this NIF library."}} =INFO REPORT==== 3-Feb-2015::17:53:31 === Server startup complete; 6 plugins started. * rabbitmq_management * rabbitmq_web_dispatch * webmachine * mochiweb * rabbitmq_management_agent * amqp_client Additional info:
I'm going to move this to OFI and fix it by forcing the TCP timeout down to 5 seconds. If the other node drops off the net (or you firewall it away like above), then the intercluster connection will close with a timeout error on both sides, and avoid the asymmetrical disconnect noted in the linked forum thread.
https://github.com/redhat-openstack/astapor/pull/478
Would it be possible to show how to implement the change on the command-line so we can test the effectiveness? I believe the implementation of the solution is not 'fixed' in puppet configuration?
(In reply to Bart van den Heuvel from comment #7) > Would it be possible to show how to implement the change on the command-line > so we can test the effectiveness? I believe the implementation of the > solution is not 'fixed' in puppet configuration? Add this line to /etc/rabbitmq/rabbitmq-env.conf: RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"
Tested the proposed solution. It does not work as expected. See the results below. (I will update the bugzilla) Three node rabbitmq cluster: cat > /etc/rabbitmq/rabbitmq.config << EOF % config to configure clustering with defaults, except: % network partition response (ignore), management console on, management agent on [ {rabbit, [ {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}, {cluster_partition_handling, pause_minority}, {default_user, <<"guest">>}, {default_pass, <<"guest">>} ]}, {rabbitmq_management, [{listener, [{port, 15672}]}]}, {rabbitmq_management_agent, [ {force_fine_statistics, true} ] }, {kernel, [ ]} ]. EOF scp /etc/rabbitmq/rabbitmq.config rabbit2:/etc/rabbitmq/ scp /etc/rabbitmq/rabbitmq.config rabbit3:/etc/rabbitmq/ Did this on each of the cluster nodes (rabbit1, rabbit2, rabbit3) echo 'RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"' >>/etc/rabbitmq/rabbitmq-env.conf systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server ssh rabbit2 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server" ssh rabbit3 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server" rabbitmqctl add_user admin pocroot rabbitmqctl set_user_tags admin administrator rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}' rabbitmqctl environment | grep cluster rabbitmqctl cluster_status [root@rabbit1 ~]# rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, [root@rabbit1 ~]# ssh rabbit2 rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, [root@rabbit1 ~]# ssh rabbit3 rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, [root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions {partitions,[]}] [root@rabbit3 rabbitmq]# date ; iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d rabbit2 -j DROP Fri Feb 20 18:04:59 CET 2015 [root@rabbit3 rabbitmq]# sleep 60; systemctl restart firewalld results: [root@rabbit3 rabbitmq]# date; rabbitmqctl cluster_status Fri Feb 20 18:08:16 CET 2015 Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit3]}, {cluster_name,<<"rabbit@rabbit1">>}, {partitions,[{rabbit@rabbit3,[rabbit@rabbit2]}]}] ...done. rabbit1 log =INFO REPORT==== 20-Feb-2015::18:05:10 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 20-Feb-2015::18:05:10 === node rabbit@rabbit3 down: etimedout Rabbit2 log =INFO REPORT==== 20-Feb-2015::18:05:08 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 20-Feb-2015::18:05:08 === node rabbit@rabbit3 down: etimedout =ERROR REPORT==== 20-Feb-2015::18:06:01 === Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3} Rabbit3 log =INFO REPORT==== 20-Feb-2015::18:05:08 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 20-Feb-2015::18:05:08 === node rabbit@rabbit3 down: etimedout =ERROR REPORT==== 20-Feb-2015::18:06:01 === Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3}
I can reproduce what you're seeing. The good news is that the tcp timeout for the two "good" nodes is working. They detect and flag it down (with etimedout as expected) within about 10 seconds of it being firewalled off. However I would expect the "bad" node to notice the other two are gone after about 10 second as well, but clearly that's not happening. I suspect there's some weird bug when iptables/netfilter gets involved, probably triggering the same behavior as bug 1189241. Going to try applying that fix and testing again. Stay tuned.
Installing my test kernel with the patch for bug 1189241 seems to fix this.
With the new kernel: Success (using iptables) # Make sure we run the intended kernel uname -a ssh rabbit2 'uname -a' ssh rabbit3 'uname -a' Linux rabbit1.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux Linux rabbit2.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux Linux rabbit3.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux # Setup the cluster, reset the state of everything systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server ssh rabbit2 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server" ssh rabbit3 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server" rabbitmqctl add_user admin pocroot rabbitmqctl set_user_tags admin administrator rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}' rabbitmqctl environment | grep cluster rabbitmqctl cluster_status # check environment is in sync with partition recovery rabbitmqctl environment | grep pause_minority ssh rabbit2 rabbitmqctl environment | grep pause_minority ssh rabbit3 rabbitmqctl environment | grep pause_minority [root@rabbit1 ~]# rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, [root@rabbit1 ~]# ssh rabbit2 rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, [root@rabbit1 ~]# ssh rabbit3 rabbitmqctl environment | grep pause_minority {cluster_partition_handling,pause_minority}, # make sure no partitions to start rabbitmqctl cluster_status | grep partitions [root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions {partitions,[]}] # First test, isolate rabbit3 using iptables date ; iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d rabbit2 -j DROP sleep 60; systemctl restart firewalld # second test, isolating rabbit3 by disabling the nic using libvirt rabbit1 log --------------------------- =INFO REPORT==== 24-Feb-2015::13:59:36 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 24-Feb-2015::13:59:36 === node rabbit@rabbit3 down: etimedout =INFO REPORT==== 24-Feb-2015::14:00:28 === rabbit on node rabbit@rabbit3 up rabbit2 log --------------------------- =INFO REPORT==== 24-Feb-2015::13:59:42 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 24-Feb-2015::13:59:42 === node rabbit@rabbit3 down: etimedout =INFO REPORT==== 24-Feb-2015::14:00:28 === rabbit on node rabbit@rabbit3 up rabbit3 log --------------------------- =INFO REPORT==== 24-Feb-2015::13:59:52 === rabbit on node rabbit@rabbit2 down =INFO REPORT==== 24-Feb-2015::13:59:59 === node rabbit@rabbit2 down: etimedout =WARNING REPORT==== 24-Feb-2015::13:59:59 === Cluster minority status detected - awaiting recovery =INFO REPORT==== 24-Feb-2015::13:59:59 === rabbit on node rabbit@rabbit1 down =INFO REPORT==== 24-Feb-2015::13:59:59 === Stopping RabbitMQ =INFO REPORT==== 24-Feb-2015::13:59:59 === node rabbit@rabbit1 down: etimedout =WARNING REPORT==== 24-Feb-2015::13:59:59 === Cluster minority status detected - awaiting recovery =INFO REPORT==== 24-Feb-2015::14:00:06 === Statistics database started. =INFO REPORT==== 24-Feb-2015::14:00:06 === stopped TCP Listener on 192.168.122.83:5672 =ERROR REPORT==== 24-Feb-2015::14:00:28 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit1} =ERROR REPORT==== 24-Feb-2015::14:00:28 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit2} =INFO REPORT==== 24-Feb-2015::14:00:28 === Starting RabbitMQ 3.3.5 on Erlang R16B03 Copyright (C) 2007-2014 GoPivotal, Inc. Licensed under the MPL. See http://www.rabbitmq.com/ =INFO REPORT==== 24-Feb-2015::14:00:28 === node : rabbit@rabbit3 home dir : /var/lib/rabbitmq config file(s) : /etc/rabbitmq/rabbitmq.config cookie hash : RflLlXitNm70/ikHN/7Tsw== log : /var/log/rabbitmq/rabbit sasl log : /var/log/rabbitmq/rabbit database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbit3 =INFO REPORT==== 24-Feb-2015::14:00:28 === Limiting to approx 924 file handles (829 sockets) =INFO REPORT==== 24-Feb-2015::14:00:28 === Memory limit set to 397MB of 993MB total. =INFO REPORT==== 24-Feb-2015::14:00:28 === Disk free limit set to 50MB =INFO REPORT==== 24-Feb-2015::14:00:28 === msg_store_transient: using rabbit_msg_store_ets_index to provide index =INFO REPORT==== 24-Feb-2015::14:00:28 === msg_store_persistent: using rabbit_msg_store_ets_index to provide index =INFO REPORT==== 24-Feb-2015::14:00:28 === started TCP Listener on 192.168.122.83:5672 =INFO REPORT==== 24-Feb-2015::14:00:28 === rabbit on node rabbit@rabbit1 up =INFO REPORT==== 24-Feb-2015::14:00:28 === Management plugin started. Port: 15672 =INFO REPORT==== 24-Feb-2015::14:00:28 === rabbit on node rabbit@rabbit2 up =WARNING REPORT==== 24-Feb-2015::14:00:28 === The on_load function for module sd_notify returned {error, {upgrade, "Upgrade not supported by this NIF library."}} =INFO REPORT==== 24-Feb-2015::14:00:28 === Server startup complete; 6 plugins started. * rabbitmq_management * rabbitmq_web_dispatch * webmachine * mochiweb * rabbitmq_management_agent * amqp_client
Doing the tests i noticed that there is a number of seconds between the etimedout and isolation action. Adding additional tests revealed the following when trying to hit that window with a reconnect: date ; iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d rabbit2 -j DROP sleep 30; systemctl restart firewalld Rabbit1 =INFO REPORT==== 24-Feb-2015::14:08:55 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 24-Feb-2015::14:08:55 === node rabbit@rabbit3 down: etimedout Rabbit2 =INFO REPORT==== 24-Feb-2015::14:08:58 === rabbit on node rabbit@rabbit3 down =INFO REPORT==== 24-Feb-2015::14:08:58 === node rabbit@rabbit3 down: etimedout =ERROR REPORT==== 24-Feb-2015::14:09:15 === Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3} Rabbit3 =INFO REPORT==== 24-Feb-2015::14:09:07 === rabbit on node rabbit@rabbit2 down =ERROR REPORT==== 24-Feb-2015::14:09:15 === Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit2} =INFO REPORT==== 24-Feb-2015::14:09:16 === node rabbit@rabbit2 down: etimedout =INFO REPORT==== 24-Feb-2015::14:09:16 === rabbit on node rabbit@rabbit1 down =INFO REPORT==== 24-Feb-2015::14:09:16 === node rabbit@rabbit1 down: connection_closed Result: [root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions {partitions,[{rabbit@rabbit2,[rabbit@rabbit3]}]}]
Merged
One important thing here, which we've kinda overlooked... OSP is not configuring cluster_partition_handling at all, which means it's using the default value of ignore. That means if the cluster gets partitioned for any reason, it will stay partitioned until an administrator explicitly takes action to correct the partition. I think this is crummy, and we should default to setting cluster_partition_handling to pause_minority. I'll throw together a pull request to do just that.
https://github.com/redhat-openstack/astapor/pull/486
John,this BZ depends on BZ #1189241 which is not fixed yet. Can I verify this bug now or should I wait until the BZ #1189241 will be fixed. Thanks, Leonid.
Verified: Environment: openstack-foreman-installer-3.0.26-1.el7ost.noarch The cluster is being restored after the outage - details are below: This is from the rabbitmq log on the node where the iptables blocking rules were added: =WARNING REPORT==== 18-Aug-2015::11:23:42 === Cluster minority status detected - awaiting recovery =INFO REPORT==== 18-Aug-2015::11:23:56 === Mirrored queue 'cinder-volume' in vhost '/': Slave <rabbit.1108.0> saw deaths of mirrors <rabbit.991.0> <rabbit.1259.0> =INFO REPORT==== 18-Aug-2015::11:23:56 === Mirrored queue 'cinder-volume' in vhost '/': Promoting slave <rabbit.1108.0> to master =INFO REPORT==== 18-Aug-2015::11:23:56 === Mirrored queue 'engine_fanout_d175b2c76c7c4d6892c05249b3392344' in vhost '/': Slave <rabbit.1847.0> saw deaths of mirrors <rabbit.2003.0> <rabbit.1750.0> =INFO REPORT==== 18-Aug-2015::11:23:56 === Mirrored queue 'engine_fanout_d175b2c76c7c4d6892c05249b3392344' in vhost '/': Promoting slave <rabbit.1847.0> to master This is from the node where the blocking rules were added: After adding the blocking rules: rabbitmqctl cluster_status Cluster status of node 'rabbit@lb-backend-maca25400702876' ... [{nodes,[{disc,['rabbit@lb-backend-maca25400702875', 'rabbit@lb-backend-maca25400702876', 'rabbit@lb-backend-maca25400702877']}]}] ...done. [root@mac Right after restarting the firewall: rabbitmqctl cluster_status Cluster status of node 'rabbit@lb-backend-maca25400702876' ... Error: unable to connect to node 'rabbit@lb-backend-maca25400702876': nodedown DIAGNOSTICS =========== attempted to contact: ['rabbit@lb-backend-maca25400702876'] rabbit@lb-backend-maca25400702876: * connected to epmd (port 4369) on lb-backend-maca25400702876 * epmd reports: node 'rabbit' not running at all other nodes on lb-backend-maca25400702876: [rabbitmqctl395] * suggestion: start the node current node details: - node name: rabbitmqctl395@maca25400702876 - home dir: /var/lib/rabbitmq - cookie hash: soeIWU2jk2YNseTyDSlsEA== After restarting the firewall (restored): rabbitmqctl cluster_status Cluster status of node 'rabbit@lb-backend-maca25400702876' ... [{nodes,[{disc,['rabbit@lb-backend-maca25400702875', 'rabbit@lb-backend-maca25400702876', 'rabbit@lb-backend-maca25400702877']}]}, {running_nodes,['rabbit@lb-backend-maca25400702875', 'rabbit@lb-backend-maca25400702877', 'rabbit@lb-backend-maca25400702876']}, {cluster_name,<<"rabbit.com">>}, {partitions,[]}] ...done. This is on remote node: After the blocking rules were added: rabbitmqctl cluster_status Cluster status of node 'rabbit@lb-backend-maca25400702875' ... [{nodes,[{disc,['rabbit@lb-backend-maca25400702875', 'rabbit@lb-backend-maca25400702876', 'rabbit@lb-backend-maca25400702877']}]}, {running_nodes,['rabbit@lb-backend-maca25400702877', 'rabbit@lb-backend-maca25400702875']}, {cluster_name,<<"rabbit.com">>}, {partitions,[]}] ...done. After the blocking rules were moved: rabbitmqctl cluster_status Cluster status of node 'rabbit@lb-backend-maca25400702875' ... [{nodes,[{disc,['rabbit@lb-backend-maca25400702875', 'rabbit@lb-backend-maca25400702876', 'rabbit@lb-backend-maca25400702877']}]}, {running_nodes,['rabbit@lb-backend-maca25400702876', 'rabbit@lb-backend-maca25400702877', 'rabbit@lb-backend-maca25400702875']}, {cluster_name,<<"rabbit.com">>}, {partitions,[]}] ...done.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1662.html
May I ask why not use 'autoheal'? Apparently, Mirantis is using 'autoheal' instead of 'pause_minority' as per https://review.openstack.org/#/c/115518/. The problem we are facing with a 3-controller OpenStack cluster deployed with Red Hat Director is that the RabbitMQ cluster does not survive when two nodes go down. It only survives losing a single node. That is, by default, we have N+1 instead of N+2, which is not optimal, IMHO.
(In reply to Felipe Alfaro Solana from comment #33) > May I ask why not use 'autoheal'? Apparently, Mirantis is using 'autoheal' > instead of 'pause_minority' as per https://review.openstack.org/#/c/115518/. > The problem we are facing with a 3-controller OpenStack cluster deployed > with Red Hat Director is that the RabbitMQ cluster does not survive when two > nodes go down. It only survives losing a single node. That is, by default, > we have N+1 instead of N+2, which is not optimal, IMHO. Hello Felipe, It's a CAP theorem question. Both give you partition tolerance. With pause_minority, you get consistency while sacrificing availability. The minority node(s) will pause and disconnect all clients. The clients will reconnect to other nodes in the majority-half of the cluster and resume normal operation. With autoheal, you get availability while sacrificing consistency. The cluster becomes "split-brained". The success of each RPC request is contingent upon all participating connections involved in the request being on the same partition as one another, which is not very likely. So until the partition ends, the system will be in a degraded state and most things are going to fail. Basically Red Hat engineering preferred pause_minority and chose it as the default configuration because the architecture of openstack RPC means a partitioned-but-inconsistent cluster is almost useless. Regards, Pablo.