Created attachment 654195 [details]
corosync log files
Description of problem:
Version-Release number of selected component (if applicable):
[root@gcluster74 ~]# lsb_release -a
LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.2 (Santiago)
[gbase@gcluster77 ~]$ rpm -qa | grep corosync
debug: on ## only 74 is on, other nodes is off
Four cluster Node, ip from 192.168.9.71 ~ 74
unplug the network cable on one or two node, wait a few second, then replugin the network cable.
Steps to Reproduce:
1. unplug the network cable
2. wait a few seconds
3. replugin the network cable
The cluster node in the infinite loop of Gather state 11.
The cluster node in the consistent status.
see the logfile attached.
The network interface is down.
Are you running NM? If so, please turn it off and use static configuration. Corosync has really big problems if interface is shutdown (at least route table change A LOT).
There is little better explanation: https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface
It's in TODO to fix ifdown problem generally, but even it's quite high priority, there are bugs with even higher priority.
I will keep this BZ open as TODO.
*** Bug 883080 has been marked as a duplicate of this bug. ***
*** Bug 989934 has been marked as a duplicate of this bug. ***
Proper solution of this bug means change in huge part of very sensitive code. Also bug has well known causes and workaround (don't test cluster failover by ifdown and don't use NetworkManager) so closing it as wontfix.