Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 842098

Summary: start cman on one node automatically fence the other node
Product: Red Hat Enterprise Linux 6 Reporter: feiwang
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2CC: ccaulfie, cluster-maint, lhh, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-22 14:48:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster and qdisk configuration none

Description feiwang 2012-07-22 03:21:49 UTC
Created attachment 599562 [details]
cluster and qdisk configuration

Description of problem:

I have been trying to setup a two-node cluster, but found that start cman on one node automatically fence the other node.

Version-Release number of selected component (if applicable):

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 (Santiago)

uname -a
Linux arcx375515347 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

rpm -qa|grep cman
cman-3.0.12.1-23.el6.x86_64

How reproducible:

everytime

Steps to Reproduce:
1.yum install -y openais cman rgmanager lvm2-cluster gfs2-utils ricci after local media installation reposiroty is setup properly 
2.setup /etc/hosts and run passwd ricci to establish trust between the two nodes
3.create cluster.conf which is attached in the attachment section
4.run mkqdisk -c /dev/mapper/360050768018685656800000000000381 -l wf0116 to create qdisk.
5.run service cman start on one node
  
Actual results:

It hangs up for some time at the join fence domain step, and then the other node is fenced shortly after this node joins fence domain successfully, and the same symptom is observed for the other way around. 


Expected results:

cman should be started normally on two nodes without fencing eath other.

Additional info:

-Basically i have ruled out the possibility of IGMP snooping blocking multicast traffic which solved almost all the issues with the same failure symptom as per google search, since the network interface used as cluster interconnect on each node are connected to the same switch, and run "nc -u -vvn -z 227.0.0.63 5405" on one node to generate some multicast udp traffic and then run "tcpdump -i eth0 ether multicast" command on the other node can capture those multicast packet, although it's hard to capture those packets since too many packes is flooding when run "tcpdump -i eth0 ether multicast" command.

-although i am pretty sure two nodes are using a shared disk with the same wwid as a quorum disk, but this shared disk present itself with different minor number(please see attachment for detailed output of mkqdisk -L command), to rule out the possibility of this discrepancy causing any problem, I have tried removing qdisk settings in the cluster.conf, but still get the same symptom.

Comment 2 feiwang 2012-07-22 03:56:16 UTC
forgot some important information that iptables, ip6tables, firewall, selinux are all disabled on these two nodes, and NTP has been set on these two nodes.

sometimes i get this complain from /var/log/messages, completely clueless.

Jul 19 21:37:50 fenced cpg_mcast_joined retry 804400 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804500 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804600 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804700 protocol
Jul 19 21:37:50 fenced cpg_mcast_joined retry 804800 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 804900 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805000 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805100 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805200 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805300 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805400 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805500 protocol
Jul 19 21:37:51 fenced cpg_mcast_joined retry 805600 protocol

Comment 3 Fabio Massimo Di Nitto 2012-07-22 14:48:48 UTC
(In reply to comment #0)

> Steps to Reproduce:
---
> 5.run service cman start on one node
>   
> Actual results:
> 
> It hangs up for some time at the join fence domain step, and then the other
> node is fenced shortly after this node joins fence domain successfully, and
> the same symptom is observed for the other way around. 
> 
> 
> Expected results:
> 
> cman should be started normally on two nodes without fencing eath other.

The problem here is that you have incorrect expectations. This is not a bug, it is by design.

If you start the two nodes at the same time or withint fence delay timeout, they will not fence each other.

This is also a very well documented step of the setup.

Red Hat has a Global Support organization and a depth of online resources dedicated to addressing technical questions, like yours, from start to finish.

Please start here for knowledgebase, videos, Groups discussions, and many more technical resources:

http://access.redhat.com

Additionally, you can find Red Hat Support phone numbers and case management information here:

https://access.redhat.com/support/

If you have any difficulties or questions, please contact our customer service team, so they may help you further.

https://access.redhat.com/support/contact/customerService.html

Thank you and best regards,