Description of problem: In a three node cluster with pause_minority, nodes 1 and 2 become network isolated; however, nodes 1 and 2 can still talk to node 3. The behavior is that node 1 and 2 both stop due to the pause_minority config and node 3 seems to crash. The rabbit cluster is unavailable. Will provide full logs. ########### From node1 ########### =INFO REPORT==== 28-Jun-2017::13:41:17 === rabbit on node rabbit@node2 down =ERROR REPORT==== 28-Jun-2017::13:41:21 === Partial partition detected: * We saw DOWN from rabbit@node2 * We can still see rabbit@node3 which can see rabbit@node2 * pause_minority mode enabled We will therefore pause until the *entire* cluster recovers =WARNING REPORT==== 28-Jun-2017::13:41:21 === Cluster minority/secondary status detected - awaiting recovery =INFO REPORT==== 28-Jun-2017::13:41:21 === Stopping RabbitMQ ########### From node2 ########### =INFO REPORT==== 28-Jun-2017::13:41:17 === rabbit on node rabbit@node1 down =ERROR REPORT==== 28-Jun-2017::13:41:21 === Partial partition detected: * We saw DOWN from rabbit@node1 * We can still see rabbit@node3 which can see rabbit@node1 * pause_minority mode enabled We will therefore pause until the *entire* cluster recovers =WARNING REPORT==== 28-Jun-2017::13:41:21 === Cluster minority/secondary status detected - awaiting recovery =INFO REPORT==== 28-Jun-2017::13:41:21 === Stopping RabbitMQ ########### From node3 ########### =WARNING REPORT==== 28-Jun-2017::13:41:18 === Received a 'DOWN' message from rabbit@node1 but still can communicate with it =WARNING REPORT==== 28-Jun-2017::13:41:18 === Received a 'DOWN' message from rabbit@node2 but still can communicate with it Then 150+ Generic server terminating messages. =ERROR REPORT==== 28-Jun-2017::13:41:20 === ** Generic server <0.14879.0> terminating -- =ERROR REPORT==== 28-Jun-2017::13:41:20 === ** Generic server <0.22414.0> terminating -- =ERROR REPORT==== 28-Jun-2017::13:41:20 === ** Generic server <0.14877.0> terminating -- =ERROR REPORT==== 28-Jun-2017::13:41:20 === ** Generic server <0.22412.0> terminating Version-Release number of selected component (if applicable): rabbitmq-server-3.6.3-6.el7ost.noarch How reproducible: Unknown; it's happened twice in this specific env. Steps to Reproduce: 1. Unknown 2. 3. Actual results: Cluster crash and rabbit is unavailable Expected results: degraded but available rabbit service Additional info: Will provide full logs and sosreports
Based on the last comment and the lack of further feedback from the customer, it sounds like we should close this as "can't fix".