Bug 842098 - start cman on one node automatically fence the other node
start cman on one node automatically fence the other node
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.2
x86_64 Linux
unspecified Severity urgent
: rc
: ---
Assigned To: Fabio Massimo Di Nitto
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-21 23:21 EDT by feiwang
Modified: 2012-07-22 10:48 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-22 10:48:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
cluster and qdisk configuration (6.64 KB, application/octet-stream)
2012-07-21 23:21 EDT, feiwang
no flags Details

  None (edit)
Description feiwang 2012-07-21 23:21:49 EDT
Created attachment 599562 [details]
cluster and qdisk configuration

Description of problem:

I have been trying to setup a two-node cluster, but found that start cman on one node automatically fence the other node.

Version-Release number of selected component (if applicable):

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 (Santiago)

uname -a
Linux arcx375515347 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

rpm -qa|grep cman
cman-3.0.12.1-23.el6.x86_64

How reproducible:

everytime

Steps to Reproduce:
1.yum install -y openais cman rgmanager lvm2-cluster gfs2-utils ricci after local media installation reposiroty is setup properly 
2.setup /etc/hosts and run passwd ricci to establish trust between the two nodes
3.create cluster.conf which is attached in the attachment section
4.run mkqdisk -c /dev/mapper/360050768018685656800000000000381 -l wf0116 to create qdisk.
5.run service cman start on one node
  
Actual results:

It hangs up for some time at the join fence domain step, and then the other node is fenced shortly after this node joins fence domain successfully, and the same symptom is observed for the other way around. 


Expected results:

cman should be started normally on two nodes without fencing eath other.

Additional info:

-Basically i have ruled out the possibility of IGMP snooping blocking multicast traffic which solved almost all the issues with the same failure symptom as per google search, since the network interface used as cluster interconnect on each node are connected to the same switch, and run "nc -u -vvn -z 227.0.0.63 5405" on one node to generate some multicast udp traffic and then run "tcpdump -i eth0 ether multicast" command on the other node can capture those multicast packet, although it's hard to capture those packets since too many packes is flooding when run "tcpdump -i eth0 ether multicast" command.

-although i am pretty sure two nodes are using a shared disk with the same wwid as a quorum disk, but this shared disk present itself with different minor number(please see attachment for detailed output of mkqdisk -L command), to rule out the possibility of this discrepancy causing any problem, I have tried removing qdisk settings in the cluster.conf, but still get the same symptom.
Comment 2 feiwang 2012-07-21 23:56:16 EDT
forgot some important information that iptables, ip6tables, firewall, selinux are all disabled on these two nodes, and NTP has been set on these two nodes.

sometimes i get this complain from /var/log/messages, completely clueless.

Jul 19 21:37:50 fenced cpg_mcast_joined retry 804400 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804500 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804600 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804700 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804800 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 804900 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805000 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805100 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805200 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805300 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805400 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805500 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805600 protocol
Comment 3 Fabio Massimo Di Nitto 2012-07-22 10:48:48 EDT
(In reply to comment #0)

> Steps to Reproduce:
---
> 5.run service cman start on one node
>   
> Actual results:
> 
> It hangs up for some time at the join fence domain step, and then the other
> node is fenced shortly after this node joins fence domain successfully, and
> the same symptom is observed for the other way around. 
> 
> 
> Expected results:
> 
> cman should be started normally on two nodes without fencing eath other.

The problem here is that you have incorrect expectations. This is not a bug, it is by design.

If you start the two nodes at the same time or withint fence delay timeout, they will not fence each other.

This is also a very well documented step of the setup.

Red Hat has a Global Support organization and a depth of online resources dedicated to addressing technical questions, like yours, from start to finish.

Please start here for knowledgebase, videos, Groups discussions, and many more technical resources:

http://access.redhat.com

Additionally, you can find Red Hat Support phone numbers and case management information here:

https://access.redhat.com/support/

If you have any difficulties or questions, please contact our customer service team, so they may help you further.

https://access.redhat.com/support/contact/customerService.html

Thank you and best regards,

Note You need to log in before you can comment on or make changes to this bug.