Bug 990623 - corosync does not notice network down
corosync does not notice network down
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks: 991412
  Show dependency treegraph
 
Reported: 2013-07-31 10:59 EDT by michal novacek
Modified: 2013-08-05 04:14 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 991412 (view as bug list)
Environment:
Last Closed: 2013-08-05 04:14:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
relevant part of corosync log. (26.19 KB, text/x-log)
2013-07-31 10:59 EDT, michal novacek
no flags Details

  None (edit)
Description michal novacek 2013-07-31 10:59:32 EDT
Created attachment 781194 [details]
relevant part of corosync log.

Description of problem:
I have a pacemaker+corosync cluster of 3 virtual nodes with no services. I run
at the same time on all of them script dropping all communication on INPUT ond
OUTPUT chains except ssh. Pacemaker never notice that network is unreacheable
and reports all the nodes online and cluster quorate. Corosync does seem to
notice though.


Version-Release number of selected component (if applicable):
corosync-1.4.1-15.el6_4.1.x86_64
pacemaker-1.1.8-7.el6.x86_64

How reproducible: always

Steps to Reproduce:
1. setup corosync+pacemaker cluster
2. add DROP rule to INPUT and OUTPUT chains in iptables on all nodes at the
same time.

Actual results: pacemaker would not notice anything wrong.

Expected results: pacemaker noticing and each node forms an inquorate island

Additional info:
Node in the cluster:

# pcs status
Last updated: Wed Jul 31 16:43:25 2013
Last change: Wed Jul 31 15:09:27 2013 via crm_resource on virt-064
Stack: cman
Current DC: virt-070 - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
1 Resources configured.


Online: [ virt-064 virt-065 virt-070 ]

Full list of resources:

 virt-fencing   (stonith:fence_xvm):    Stopped 

# iptables-save
# Generated by iptables-save v1.4.7 on Wed Jul 31 16:50:16 2013
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT 
-A INPUT -j DROP 
-A OUTPUT -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT 
-A OUTPUT -j DROP 
COMMIT
# Completed on Wed Jul 31 16:50:16 2013

# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster config_version="1" name="STSRHTS7932">
        <cman/>
        <totem token="3000"/>
        <fence_daemon clean_start="0" post_join_delay="20"/>
        <clusternodes>
                <clusternode name="virt-064" nodeid="1" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-064"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="virt-065" nodeid="2" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-065"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="virt-070" nodeid="3" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-070"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_pcmk" name="pcmk"/>
        </fencedevices>
</cluster>

There is a lot of "Totem is unable to form a cluster because of an operating
system or network fault. The most common cause of this message is that the
local firewall is configured improperly." messages in corosync.log so corosync
seems to know about the problem.
Comment 2 michal novacek 2013-08-02 07:38:45 EDT
The same behaviour happens with having -j DROP on INPUT only.
Comment 3 Jan Friesse 2013-08-05 04:14:46 EDT
Filtering only INPUT (or OUTPUT) by iptables is unsupported. Also at least localhost must not be filtered otherwise corosync doesn't work and will be unable to create single node membership. This is configuration error, not a bug.

Note You need to log in before you can comment on or make changes to this bug.