Bug 990623
Summary: | corosync does not notice network down | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | michal novacek <mnovacek> | ||||
Component: | corosync | Assignee: | Christine Caulfield <ccaulfie> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.4 | CC: | ccaulfie, cluster-maint, dvossel, jfriesse, sdake | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 991412 (view as bug list) | Environment: | |||||
Last Closed: | 2013-08-05 08:14:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 991412 | ||||||
Attachments: |
|
The same behaviour happens with having -j DROP on INPUT only. Filtering only INPUT (or OUTPUT) by iptables is unsupported. Also at least localhost must not be filtered otherwise corosync doesn't work and will be unable to create single node membership. This is configuration error, not a bug. |
Created attachment 781194 [details] relevant part of corosync log. Description of problem: I have a pacemaker+corosync cluster of 3 virtual nodes with no services. I run at the same time on all of them script dropping all communication on INPUT ond OUTPUT chains except ssh. Pacemaker never notice that network is unreacheable and reports all the nodes online and cluster quorate. Corosync does seem to notice though. Version-Release number of selected component (if applicable): corosync-1.4.1-15.el6_4.1.x86_64 pacemaker-1.1.8-7.el6.x86_64 How reproducible: always Steps to Reproduce: 1. setup corosync+pacemaker cluster 2. add DROP rule to INPUT and OUTPUT chains in iptables on all nodes at the same time. Actual results: pacemaker would not notice anything wrong. Expected results: pacemaker noticing and each node forms an inquorate island Additional info: Node in the cluster: # pcs status Last updated: Wed Jul 31 16:43:25 2013 Last change: Wed Jul 31 15:09:27 2013 via crm_resource on virt-064 Stack: cman Current DC: virt-070 - partition with quorum Version: 1.1.8-7.el6-394e906 3 Nodes configured, unknown expected votes 1 Resources configured. Online: [ virt-064 virt-065 virt-070 ] Full list of resources: virt-fencing (stonith:fence_xvm): Stopped # iptables-save # Generated by iptables-save v1.4.7 on Wed Jul 31 16:50:16 2013 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT -A INPUT -j DROP -A OUTPUT -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT -A OUTPUT -j DROP COMMIT # Completed on Wed Jul 31 16:50:16 2013 # cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="1" name="STSRHTS7932"> <cman/> <totem token="3000"/> <fence_daemon clean_start="0" post_join_delay="20"/> <clusternodes> <clusternode name="virt-064" nodeid="1" votes="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="virt-064"/> </method> </fence> </clusternode> <clusternode name="virt-065" nodeid="2" votes="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="virt-065"/> </method> </fence> </clusternode> <clusternode name="virt-070" nodeid="3" votes="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="virt-070"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_pcmk" name="pcmk"/> </fencedevices> </cluster> There is a lot of "Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly." messages in corosync.log so corosync seems to know about the problem.