990623 – corosync does not notice network down

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 990623 - corosync does not notice network down

Summary: corosync does not notice network down

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	991412
TreeView+	depends on / blocked

Reported:	2013-07-31 14:59 UTC by michal novacek
Modified:	2013-08-05 08:14 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	991412 (view as bug list)
Environment:
Last Closed:	2013-08-05 08:14:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
relevant part of corosync log. (26.19 KB, text/x-log) 2013-07-31 14:59 UTC, michal novacek	no flags	Details
View All

Description michal novacek 2013-07-31 14:59:32 UTC

Created attachment 781194 [details]
relevant part of corosync log.

Description of problem:
I have a pacemaker+corosync cluster of 3 virtual nodes with no services. I run
at the same time on all of them script dropping all communication on INPUT ond
OUTPUT chains except ssh. Pacemaker never notice that network is unreacheable
and reports all the nodes online and cluster quorate. Corosync does seem to
notice though.


Version-Release number of selected component (if applicable):
corosync-1.4.1-15.el6_4.1.x86_64
pacemaker-1.1.8-7.el6.x86_64

How reproducible: always

Steps to Reproduce:
1. setup corosync+pacemaker cluster
2. add DROP rule to INPUT and OUTPUT chains in iptables on all nodes at the
same time.

Actual results: pacemaker would not notice anything wrong.

Expected results: pacemaker noticing and each node forms an inquorate island

Additional info:
Node in the cluster:

# pcs status
Last updated: Wed Jul 31 16:43:25 2013
Last change: Wed Jul 31 15:09:27 2013 via crm_resource on virt-064
Stack: cman
Current DC: virt-070 - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
1 Resources configured.


Online: [ virt-064 virt-065 virt-070 ]

Full list of resources:

 virt-fencing   (stonith:fence_xvm):    Stopped 

# iptables-save
# Generated by iptables-save v1.4.7 on Wed Jul 31 16:50:16 2013
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT 
-A INPUT -j DROP 
-A OUTPUT -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT 
-A OUTPUT -j DROP 
COMMIT
# Completed on Wed Jul 31 16:50:16 2013

# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster config_version="1" name="STSRHTS7932">
        <cman/>
        <totem token="3000"/>
        <fence_daemon clean_start="0" post_join_delay="20"/>
        <clusternodes>
                <clusternode name="virt-064" nodeid="1" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-064"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="virt-065" nodeid="2" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-065"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="virt-070" nodeid="3" votes="1">
                        <fence>
                                <method name="pcmk-redirect">
                                        <device name="pcmk" port="virt-070"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_pcmk" name="pcmk"/>
        </fencedevices>
</cluster>

There is a lot of "Totem is unable to form a cluster because of an operating
system or network fault. The most common cause of this message is that the
local firewall is configured improperly." messages in corosync.log so corosync
seems to know about the problem.

Comment 2 michal novacek 2013-08-02 11:38:45 UTC

The same behaviour happens with having -j DROP on INPUT only.

Comment 3 Jan Friesse 2013-08-05 08:14:46 UTC

Filtering only INPUT (or OUTPUT) by iptables is unsupported. Also at least localhost must not be filtered otherwise corosync doesn't work and will be unable to create single node membership. This is configuration error, not a bug.

Note You need to log in before you can comment on or make changes to this bug.