1189480 – Rabbitmq cluster remains partitioned after short network partition incident

Bug 1189480 - Rabbitmq cluster remains partitioned after short network partition incident

Summary: Rabbitmq cluster remains partitioned after short network partition incident

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	6.0 (Juno)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z4
Target Release:	Installer
Assignee:	John Eckersberg
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Depends On:	1151756 1189241
Blocks:	1186672
TreeView+	depends on / blocked

Reported:	2015-02-05 13:19 UTC by Bart van den Heuvel
Modified:	2023-02-22 23:02 UTC (History)
CC List:	19 users (show)
Fixed In Version:	openstack-foreman-installer-3.0.17-1.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-24 15:18:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1365233	0	None	None	None	Never
Red Hat Product Errata	RHBA-2015:1662	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Installer update	2015-08-24 19:16:51 UTC

Description Bart van den Heuvel 2015-02-05 13:19:12 UTC

Description of problem:
In a three node cluster, configured to auto correct network partitions using pause_minority in cluster_partition_handling, the cluster will solve network partitions by shutting the tcp listener and restarting if the connectivity is regained. 

If the connectivity is lost for around 60 seconds this process is not initiated and the partition will remain.

The impact on OpenStack HA with the mirrored queues across the cluster is that there will be two separate master databases active. A split brain situation, only resolved after restarting the rabbitmq-server on the minority node.

Also see: http://rabbitmq.1065348.n5.nabble.com/Problems-with-the-cluster-partition-handling-pause-minority-option-td33863.html

Version-Release number of selected component (if applicable):

rabbitmq-server-3.3.5-3.el7ost.noarch

How reproducible:

Install a three node rabbitmq cluster with this configuration file in /etc/rabbitmq/rabbitmq.config

% config to configure clustering with defaults, except:
% network partition response (ignore), management console on, management agent on
[
  {rabbit, [
    {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}},
    {cluster_partition_handling, pause_minority},
    {default_user, <<"guest">>},
    {default_pass, <<"guest">>}
  ]},
  {rabbitmq_management, [{listener, [{port, 15672}]}]},
  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] },
  {kernel, [ ]}
].

Steps to Reproduce:
  iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j   DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d   rabbit2 -j DROP
sleep 90; systemctl restart firewalld 
Result (outage start at Tue Feb  3 18:10:40 CET 2015)
---> node 1
=ERROR REPORT==== 3-Feb-2015::18:14:34 ===
** Node rabbit@rabbit3 not responding **
** Removing (timedout) connection **
=INFO REPORT==== 3-Feb-2015::18:14:34 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 3-Feb-2015::18:14:34 ===
node rabbit@rabbit3 down: net_tick_timeout
---> node 2
=ERROR REPORT==== 3-Feb-2015::18:14:21 ===
** Node rabbit@rabbit3 not responding **
** Removing (timedout) connection **
=INFO REPORT==== 3-Feb-2015::18:14:21 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 3-Feb-2015::18:14:21 ===
node rabbit@rabbit3 down: net_tick_timeout
=ERROR REPORT==== 3-Feb-2015::18:14:33 ===
Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3}
---> node 3
=INFO REPORT==== 3-Feb-2015::18:14:20 ===
rabbit on node rabbit@rabbit2 down
=ERROR REPORT==== 3-Feb-2015::18:14:32 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit2}
=INFO REPORT==== 3-Feb-2015::18:14:33 ===
node rabbit@rabbit2 down: connection_closed
=INFO REPORT==== 3-Feb-2015::18:14:33 ===
rabbit on node rabbit@rabbit1 down
=INFO REPORT==== 3-Feb-2015::18:14:33 ===
node rabbit@rabbit1 down: connection_closed
Fix:
No auto recovery
[root@rabbit3 ~]# rabbitmqctl environment | grep cluster
 {cluster_nodes,{[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3],disc}},
 {cluster_partition_handling,pause_minority},
[root@rabbit3 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit3]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[{rabbit@rabbit3,[rabbit@rabbit2]}]}]
...done.
[root@rabbit3 ~]# systemctl restart rabbitmq-server
[root@rabbit3 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[]}]
...done.

Expected results:

=INFO REPORT==== 3-Feb-2015::17:53:03 ===
rabbit on node rabbit@rabbit1 down
=INFO REPORT==== 3-Feb-2015::17:53:06 ===
node rabbit@rabbit1 down: connection_closed
=WARNING REPORT==== 3-Feb-2015::17:53:06 ===
Cluster minority status detected - awaiting recovery
=INFO REPORT==== 3-Feb-2015::17:53:06 ===
rabbit on node rabbit@rabbit2 down
=INFO REPORT==== 3-Feb-2015::17:53:06 ===
Stopping RabbitMQ
=INFO REPORT==== 3-Feb-2015::17:53:06 ===
node rabbit@rabbit2 down: connection_closed
=WARNING REPORT==== 3-Feb-2015::17:53:06 ===
Cluster minority status detected - awaiting recovery
=INFO REPORT==== 3-Feb-2015::17:53:13 ===
Statistics database started.
=INFO REPORT==== 3-Feb-2015::17:53:13 ===
stopped TCP Listener on 192.168.122.83:5672
=ERROR REPORT==== 3-Feb-2015::17:53:31 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit1}
=ERROR REPORT==== 3-Feb-2015::17:53:31 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit2}
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03
Copyright (C) 2007-2014 GoPivotal, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
node           : rabbit@rabbit3
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : RflLlXitNm70/ikHN/7Tsw==
log            : /var/log/rabbitmq/rabbit
sasl log       : /var/log/rabbitmq/rabbit
database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbit3
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Memory limit set to 397MB of 993MB total.
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Disk free limit set to 50MB
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
started TCP Listener on 192.168.122.83:5672
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
rabbit on node rabbit@rabbit1 up
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Management plugin started. Port: 15672
=WARNING REPORT==== 3-Feb-2015::17:53:31 ===
The on_load function for module sd_notify returned {error,
                                                    {upgrade,
                                                     "Upgrade not supported by this NIF library."}}
=INFO REPORT==== 3-Feb-2015::17:53:31 ===
Server startup complete; 6 plugins started.
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * mochiweb
 * rabbitmq_management_agent
 * amqp_client


Additional info:

Comment 3 John Eckersberg 2015-02-10 16:57:03 UTC

I'm going to move this to OFI and fix it by forcing the TCP timeout down to 5 seconds.  If the other node drops off the net (or you firewall it away like above), then the intercluster connection will close with a timeout error on both sides, and avoid the asymmetrical disconnect noted in the linked forum thread.

Comment 6 John Eckersberg 2015-02-11 19:18:56 UTC

https://github.com/redhat-openstack/astapor/pull/478

Comment 7 Bart van den Heuvel 2015-02-12 11:07:58 UTC

Would it be possible to show how to implement the change on the command-line so we can test the effectiveness? I believe the implementation of the solution is not 'fixed' in puppet configuration?

Comment 8 John Eckersberg 2015-02-12 14:02:38 UTC

(In reply to Bart van den Heuvel from comment #7)
> Would it be possible to show how to implement the change on the command-line
> so we can test the effectiveness? I believe the implementation of the
> solution is not 'fixed' in puppet configuration?

Add this line to /etc/rabbitmq/rabbitmq-env.conf:

RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"

Comment 9 Bart van den Heuvel 2015-02-20 18:44:44 UTC

Tested the proposed solution. It does not work as expected. See the results below. (I will update the bugzilla)
Three node rabbitmq cluster:
cat > /etc/rabbitmq/rabbitmq.config << EOF
% config to configure clustering with defaults, except:
% network partition response (ignore), management console on, management agent on
[
  {rabbit, [
    {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}},
    {cluster_partition_handling, pause_minority},
    {default_user, <<"guest">>},
    {default_pass, <<"guest">>}
  ]},
  {rabbitmq_management, [{listener, [{port, 15672}]}]},
  {rabbitmq_management_agent, [ {force_fine_statistics, true} ] },
  {kernel, [ ]}
].
EOF
scp /etc/rabbitmq/rabbitmq.config rabbit2:/etc/rabbitmq/
scp /etc/rabbitmq/rabbitmq.config rabbit3:/etc/rabbitmq/
Did this on each of the cluster nodes (rabbit1, rabbit2, rabbit3)
echo 'RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"' >>/etc/rabbitmq/rabbitmq-env.conf
systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server
ssh rabbit2 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server"
ssh rabbit3 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server"
rabbitmqctl  add_user admin pocroot
rabbitmqctl  set_user_tags admin administrator
rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}'
rabbitmqctl environment | grep cluster
rabbitmqctl cluster_status
[root@rabbit1 ~]# rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
[root@rabbit1 ~]# ssh rabbit2 rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
[root@rabbit1 ~]# ssh rabbit3 rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
[root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions
 {partitions,[]}]
[root@rabbit3 rabbitmq]# date ; iptables -A INPUT -s rabbit1 -j DROP; iptables -A OUTPUT -d rabbit1 -j   DROP ; iptables -A INPUT -s rabbit2 -j DROP; iptables -A OUTPUT -d   rabbit2 -j DROP
Fri Feb 20 18:04:59 CET 2015
[root@rabbit3 rabbitmq]# sleep 60; systemctl restart firewalld 
results:
[root@rabbit3 rabbitmq]# date; rabbitmqctl cluster_status
Fri Feb 20 18:08:16 CET 2015
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit3]},
 {cluster_name,<<"rabbit@rabbit1">>},
 {partitions,[{rabbit@rabbit3,[rabbit@rabbit2]}]}]
...done.
rabbit1 log
=INFO REPORT==== 20-Feb-2015::18:05:10 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 20-Feb-2015::18:05:10 ===
node rabbit@rabbit3 down: etimedout
Rabbit2 log 
=INFO REPORT==== 20-Feb-2015::18:05:08 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 20-Feb-2015::18:05:08 ===
node rabbit@rabbit3 down: etimedout
=ERROR REPORT==== 20-Feb-2015::18:06:01 ===
Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3}
Rabbit3 log 
=INFO REPORT==== 20-Feb-2015::18:05:08 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 20-Feb-2015::18:05:08 ===
node rabbit@rabbit3 down: etimedout
=ERROR REPORT==== 20-Feb-2015::18:06:01 ===
Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3}

Comment 10 John Eckersberg 2015-02-20 22:58:33 UTC

I can reproduce what you're seeing.  The good news is that the tcp timeout for the two "good" nodes is working.  They detect and flag it down (with etimedout as expected) within about 10 seconds of it being firewalled off.

However I would expect the "bad" node to notice the other two are gone after about 10 second as well, but clearly that's not happening.  I suspect there's some weird bug when iptables/netfilter gets involved, probably triggering the same behavior as bug 1189241.  Going to try applying that fix and testing again.  Stay tuned.

Comment 11 John Eckersberg 2015-02-21 00:48:36 UTC

Installing my test kernel with the patch for bug 1189241 seems to fix this.

Comment 14 Bart van den Heuvel 2015-02-24 13:10:05 UTC

With the new kernel: Success (using iptables)
# Make sure we run the intended kernel
uname -a
ssh rabbit2 'uname -a'
ssh rabbit3 'uname -a'
Linux rabbit1.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
Linux rabbit2.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
Linux rabbit3.zokahn.thinkpad 3.10.0-229.el7.x86_64 #1 SMP Fri Feb 6 15:36:18 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
# Setup the cluster, reset the state of everything 
systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server
ssh rabbit2 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server"
ssh rabbit3 "systemctl stop rabbitmq-server; rm -rf /var/lib/rabbitmq/mnesia/*; systemctl start rabbitmq-server"
rabbitmqctl  add_user admin pocroot
rabbitmqctl  set_user_tags admin administrator
rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode": "all"}'
rabbitmqctl environment | grep cluster
rabbitmqctl cluster_status
# check environment is in sync with partition recovery
rabbitmqctl environment | grep pause_minority
ssh rabbit2 rabbitmqctl environment | grep pause_minority
ssh rabbit3 rabbitmqctl environment | grep pause_minority
[root@rabbit1 ~]# rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
[root@rabbit1 ~]# ssh rabbit2 rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
[root@rabbit1 ~]# ssh rabbit3 rabbitmqctl environment | grep pause_minority
 {cluster_partition_handling,pause_minority},
# make sure no partitions to start
rabbitmqctl cluster_status | grep partitions
[root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions
 {partitions,[]}]
# First test, isolate rabbit3 using iptables 
date ;  iptables -A INPUT -s rabbit1 -j DROP; iptables -A  OUTPUT -d rabbit1  -j   DROP ; iptables -A INPUT -s rabbit2 -j DROP;  iptables -A OUTPUT  -d   rabbit2 -j DROP
sleep 60; systemctl restart firewalld 
# second test, isolating rabbit3 by disabling the nic using libvirt
rabbit1 log
---------------------------
=INFO REPORT==== 24-Feb-2015::13:59:36 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 24-Feb-2015::13:59:36 ===
node rabbit@rabbit3 down: etimedout
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
rabbit on node rabbit@rabbit3 up
rabbit2 log
---------------------------
=INFO REPORT==== 24-Feb-2015::13:59:42 ===
rabbit on node rabbit@rabbit3 down
=INFO REPORT==== 24-Feb-2015::13:59:42 ===
node rabbit@rabbit3 down: etimedout
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
rabbit on node rabbit@rabbit3 up
rabbit3 log
---------------------------
=INFO REPORT==== 24-Feb-2015::13:59:52 ===
rabbit on node rabbit@rabbit2 down
=INFO REPORT==== 24-Feb-2015::13:59:59 ===
node rabbit@rabbit2 down: etimedout
=WARNING REPORT==== 24-Feb-2015::13:59:59 ===
Cluster minority status detected - awaiting recovery
=INFO REPORT==== 24-Feb-2015::13:59:59 ===
rabbit on node rabbit@rabbit1 down
=INFO REPORT==== 24-Feb-2015::13:59:59 ===
Stopping RabbitMQ
=INFO REPORT==== 24-Feb-2015::13:59:59 ===
node rabbit@rabbit1 down: etimedout
=WARNING REPORT==== 24-Feb-2015::13:59:59 ===
Cluster minority status detected - awaiting recovery
=INFO REPORT==== 24-Feb-2015::14:00:06 ===
Statistics database started.
=INFO REPORT==== 24-Feb-2015::14:00:06 ===
stopped TCP Listener on 192.168.122.83:5672
=ERROR REPORT==== 24-Feb-2015::14:00:28 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit1}
=ERROR REPORT==== 24-Feb-2015::14:00:28 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@rabbit2}
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03
Copyright (C) 2007-2014 GoPivotal, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
node           : rabbit@rabbit3
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : RflLlXitNm70/ikHN/7Tsw==
log            : /var/log/rabbitmq/rabbit
sasl log       : /var/log/rabbitmq/rabbit
database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbit3
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Memory limit set to 397MB of 993MB total.
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Disk free limit set to 50MB
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
started TCP Listener on 192.168.122.83:5672
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
rabbit on node rabbit@rabbit1 up
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Management plugin started. Port: 15672
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
rabbit on node rabbit@rabbit2 up
=WARNING REPORT==== 24-Feb-2015::14:00:28 ===
The on_load function for module sd_notify returned {error,
                                                    {upgrade,
                                                     "Upgrade not supported by this NIF library."}}
=INFO REPORT==== 24-Feb-2015::14:00:28 ===
Server startup complete; 6 plugins started.
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * mochiweb
 * rabbitmq_management_agent
 * amqp_client

Comment 15 Bart van den Heuvel 2015-02-24 13:15:04 UTC

Doing the tests i noticed that there is a number of seconds between the etimedout and isolation action. Adding additional tests revealed the following when trying to hit that window with a reconnect:

date ;  iptables -A INPUT -s rabbit1 -j DROP; iptables -A  OUTPUT -d rabbit1  -j   DROP ; iptables -A INPUT -s rabbit2 -j DROP;  iptables -A OUTPUT  -d   rabbit2 -j DROP
sleep 30; systemctl restart firewalld 

Rabbit1

=INFO REPORT==== 24-Feb-2015::14:08:55 ===
rabbit on node rabbit@rabbit3 down

=INFO REPORT==== 24-Feb-2015::14:08:55 ===
node rabbit@rabbit3 down: etimedout

Rabbit2

=INFO REPORT==== 24-Feb-2015::14:08:58 ===
rabbit on node rabbit@rabbit3 down

=INFO REPORT==== 24-Feb-2015::14:08:58 ===
node rabbit@rabbit3 down: etimedout

=ERROR REPORT==== 24-Feb-2015::14:09:15 ===
Mnesia(rabbit@rabbit2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit3}

Rabbit3

=INFO REPORT==== 24-Feb-2015::14:09:07 ===
rabbit on node rabbit@rabbit2 down

=ERROR REPORT==== 24-Feb-2015::14:09:15 ===
Mnesia(rabbit@rabbit3): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbit2}

=INFO REPORT==== 24-Feb-2015::14:09:16 ===
node rabbit@rabbit2 down: etimedout

=INFO REPORT==== 24-Feb-2015::14:09:16 ===
rabbit on node rabbit@rabbit1 down

=INFO REPORT==== 24-Feb-2015::14:09:16 ===
node rabbit@rabbit1 down: connection_closed


Result:

[root@rabbit1 ~]# rabbitmqctl cluster_status | grep partitions
 {partitions,[{rabbit@rabbit2,[rabbit@rabbit3]}]}]

Comment 17 Jason Guiditta 2015-03-06 20:35:45 UTC

Merged

Comment 18 John Eckersberg 2015-03-06 21:45:43 UTC

One important thing here, which we've kinda overlooked...

OSP is not configuring cluster_partition_handling at all, which means it's using the default value of ignore.  That means if the cluster gets partitioned for any reason, it will stay partitioned until an administrator explicitly takes action to correct the partition.  I think this is crummy, and we should default to setting cluster_partition_handling to pause_minority.  I'll throw together a pull request to do just that.

Comment 19 John Eckersberg 2015-03-06 22:10:53 UTC

https://github.com/redhat-openstack/astapor/pull/486

Comment 20 Jason Guiditta 2015-03-09 17:32:11 UTC

Merged

Comment 24 Leonid Natapov 2015-03-22 15:10:25 UTC

John,this BZ depends on BZ #1189241 which is not fixed yet.
Can I verify this bug now or should I wait until the  BZ #1189241 will be fixed.

Thanks,
Leonid.

Comment 30 Alexander Chuzhoy 2015-08-18 15:52:27 UTC

Verified:


Environment:
openstack-foreman-installer-3.0.26-1.el7ost.noarch


The cluster is being restored after the outage - details are below:

This is from the rabbitmq log on the node where the iptables blocking rules were added:

=WARNING REPORT==== 18-Aug-2015::11:23:42 ===
Cluster minority status detected - awaiting recovery

=INFO REPORT==== 18-Aug-2015::11:23:56 ===
Mirrored queue 'cinder-volume' in vhost '/': Slave <rabbit.1108.0> saw deaths of mirrors <rabbit.991.0> <rabbit.1259.0>

=INFO REPORT==== 18-Aug-2015::11:23:56 ===
Mirrored queue 'cinder-volume' in vhost '/': Promoting slave <rabbit.1108.0> to master

=INFO REPORT==== 18-Aug-2015::11:23:56 ===
Mirrored queue 'engine_fanout_d175b2c76c7c4d6892c05249b3392344' in vhost '/': Slave <rabbit.1847.0> saw deaths of mirrors <rabbit.2003.0> <rabbit.1750.0>

=INFO REPORT==== 18-Aug-2015::11:23:56 ===
Mirrored queue 'engine_fanout_d175b2c76c7c4d6892c05249b3392344' in vhost '/': Promoting slave <rabbit.1847.0> to master




This is from the node where the blocking rules were added:

After adding the blocking rules:
rabbitmqctl cluster_status         
Cluster status of node 'rabbit@lb-backend-maca25400702876' ...
[{nodes,[{disc,['rabbit@lb-backend-maca25400702875',          
                'rabbit@lb-backend-maca25400702876',          
                'rabbit@lb-backend-maca25400702877']}]}]      
...done.                                                      
[root@mac

Right after restarting the firewall:

rabbitmqctl cluster_status
Cluster status of node 'rabbit@lb-backend-maca25400702876' ...
Error: unable to connect to node 'rabbit@lb-backend-maca25400702876': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@lb-backend-maca25400702876']

rabbit@lb-backend-maca25400702876:
  * connected to epmd (port 4369) on lb-backend-maca25400702876
  * epmd reports: node 'rabbit' not running at all             
                  other nodes on lb-backend-maca25400702876: [rabbitmqctl395]
  * suggestion: start the node                                               

current node details:
- node name: rabbitmqctl395@maca25400702876
- home dir: /var/lib/rabbitmq              
- cookie hash: soeIWU2jk2YNseTyDSlsEA==    




After restarting the firewall (restored):
rabbitmqctl cluster_status
Cluster status of node 'rabbit@lb-backend-maca25400702876' ...
[{nodes,[{disc,['rabbit@lb-backend-maca25400702875',
                'rabbit@lb-backend-maca25400702876',
                'rabbit@lb-backend-maca25400702877']}]},
 {running_nodes,['rabbit@lb-backend-maca25400702875',
                 'rabbit@lb-backend-maca25400702877',
                 'rabbit@lb-backend-maca25400702876']},
 {cluster_name,<<"rabbit.com">>},
 {partitions,[]}]
...done.












This is on remote node:
After the blocking rules were added:
 rabbitmqctl cluster_status
Cluster status of node 'rabbit@lb-backend-maca25400702875' ...
[{nodes,[{disc,['rabbit@lb-backend-maca25400702875',
                'rabbit@lb-backend-maca25400702876',
                'rabbit@lb-backend-maca25400702877']}]},
 {running_nodes,['rabbit@lb-backend-maca25400702877',
                 'rabbit@lb-backend-maca25400702875']},
 {cluster_name,<<"rabbit.com">>},
 {partitions,[]}]
...done.

After the blocking rules were moved:
 rabbitmqctl cluster_status
Cluster status of node 'rabbit@lb-backend-maca25400702875' ...
[{nodes,[{disc,['rabbit@lb-backend-maca25400702875',
                'rabbit@lb-backend-maca25400702876',
                'rabbit@lb-backend-maca25400702877']}]},
 {running_nodes,['rabbit@lb-backend-maca25400702876',
                 'rabbit@lb-backend-maca25400702877',
                 'rabbit@lb-backend-maca25400702875']},
 {cluster_name,<<"rabbit.com">>},
 {partitions,[]}]
...done.

Comment 32 errata-xmlrpc 2015-08-24 15:18:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1662.html

Comment 33 Felipe Alfaro Solana 2016-02-04 23:44:59 UTC

May I ask why not use 'autoheal'? Apparently, Mirantis is using 'autoheal' instead of 'pause_minority' as per https://review.openstack.org/#/c/115518/. The problem we are facing with a 3-controller OpenStack cluster deployed with Red Hat Director is that the RabbitMQ cluster does not survive when two nodes go down. It only survives losing a single node. That is, by default, we have N+1 instead of N+2, which is not optimal, IMHO.

Comment 34 Pablo Caruana 2016-06-05 13:36:25 UTC

(In reply to Felipe Alfaro Solana from comment #33)
> May I ask why not use 'autoheal'? Apparently, Mirantis is using 'autoheal'
> instead of 'pause_minority' as per https://review.openstack.org/#/c/115518/.
> The problem we are facing with a 3-controller OpenStack cluster deployed
> with Red Hat Director is that the RabbitMQ cluster does not survive when two
> nodes go down. It only survives losing a single node. That is, by default,
> we have N+1 instead of N+2, which is not optimal, IMHO.

Hello Felipe,

It's a CAP theorem question.  Both give you partition tolerance.

With pause_minority, you get consistency while sacrificing availability.
The minority node(s) will pause and disconnect all clients.  The clients
will reconnect to other nodes in the majority-half of the cluster and
resume normal operation.

With autoheal, you get availability while sacrificing consistency.  The
cluster becomes "split-brained".  The success of each RPC request is
contingent upon all participating connections involved in the request
being on the same partition as one another, which is not very likely.
So until the partition ends, the system will be in a degraded state and
most things are going to fail.

Basically Red Hat engineering  preferred pause_minority and chose it as the default configuration because the architecture of openstack RPC means a
partitioned-but-inconsistent cluster is almost useless.

Regards,
Pablo.

Note You need to log in before you can comment on or make changes to this bug.