Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 836429

Summary:	hosta can never rejoin the cluster because of "reservation conflict" unless reboot -f hosta.
Product:	Red Hat Enterprise Linux 6	Reporter:	davidyangyi <davidyangyi>
Component:	cluster	Assignee:	Ryan O'Hara <rohara>
Status:	CLOSED NOTABUG	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.2	CC:	ccaulfie, cluster-maint, davidyangyi, dyasny, fdinitto, lhh, rpeterso, sdake, teigland
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-30 13:36:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description davidyangyi 2012-06-29 05:21:26 UTC

Description of problem:
I have a RHCS of rhel6.2, using iscsi lun as fence device(SCSI reservations). When I unplug wire of the master(hosta), the slave(hostb) fenced successfully and become master. And hosta is still thinking it is the master. 
When I plug wire back into hosta, hosta can never rejoin the cluster because of "reservation conflict" unless reboot -f hosta.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.hosta is master, hostb is slave
2.using iscsi as fence device
3.unplug hosta's wire, waiting hostb becoming master
4.plug hosta's wire
  
Actual results:


Expected results:


Additional info:

Comment 1 davidyangyi 2012-06-29 05:23:24 UTC

cluster.conf

<?xml version="1.0"?>
<cluster config_version="41" name="bcec_img">
        <clusternodes>
                <clusternode name="hosta" nodeid="1">
                        <fence>
                                <method name="scsi">
                                        <device name="scsifence"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="scsifence"/>
                        </unfence>
                </clusternode>
                <clusternode name="hostb" nodeid="2">
                        <fence>
                                <method name="scsi">
                                        <device name="scsifence"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="scsifence"/>
                        </unfence>
                </clusternode>
        </clusternodes>
        <cman broadcast="yes" expected_votes="1" two_node="1"/>
        <rm>
                <resources>
                        <ip address="172.16.200.31/25" monitor_link="on" sleeptime="10"/>
                        <fs device="/dev/mapper/mpathap1" fsid="7094" mountpoint="/bcec_images" name="bcec_imgages" quick_status="on"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="bcec_img" nofailback="1" ordered="0" restricted="0">
                                <failoverdomainnode name="hosta" priority="2"/>
                                <failoverdomainnode name="hostb" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <service domain="bcec_img" exclusive="1" name="bcec_img" recovery="relocate">
                        <ip ref="172.16.200.31/25"/>
                </service>
        </rm>
        <fence_daemon post_join_delay="25"/>
        <fencedevices>
                <fencedevice agent="fence_scsi" name="scsifence"/>
        </fencedevices>
        <logging debug="on">
                <logging_daemon debug="on" name="rgmanager"/>
                <logging_daemon debug="on" name="corosync"/>
                <logging_daemon debug="on" name="fenced"/>
                <logging_daemon debug="on" name="dlm_controld"/>
                <logging_daemon debug="on" name="corosync" subsys="CMAN"/>
        </logging>
</cluster>

Comment 3 Steven Dake 2012-06-29 18:54:56 UTC

If problem exists, in base cluster, not corosync.  Reassigning to cluster for triage.

Comment 4 Fabio Massimo Di Nitto 2012-06-30 04:01:46 UTC

Reassining to fence_scsi maintainer to confirm, but it sounds like the correct behaviour to me.

Comment 5 Fabio Massimo Di Nitto 2012-06-30 04:03:13 UTC

IF anything, cman has to be restarted in order to perform "unfencing".

plug the fiber back
restart cman -> unfencing
node rejoins the cluster.

Comment 6 davidyangyi 2012-06-30 13:15:40 UTC

Thank you.
I mean pull out ethernet cable and plug it back, not fiber to array.

But I can't restart cman or hosta remove hostb from cluster.

Here is the fenced.log on hostb
Jun 26 22:51:25 fenced daemon node 1 stateful merge
Jun 26 22:51:25 fenced daemon node 1 kill due to stateful merge
Jun 26 22:51:25 fenced telling cman to remove nodeid 1 from cluster
Jun 26 22:51:25 fenced daemon cpg_dispatch error 2
Jun 26 22:51:25 fenced cluster is down, exiting
Jun 26 22:51:25 fenced daemon cpg_dispatch error 2


Here is the fenced.log on hosta
Jun 26 22:42:48 fenced daemon node 2 stateful merge
Jun 26 22:42:48 fenced daemon node 2 kill due to stateful merge
Jun 26 22:42:48 fenced telling cman to remove nodeid 2 from cluster
Jun 26 22:42:48 fenced daemon_member 2 zero proto
Jun 26 22:43:04 fenced cluster node 2 removed seq 1224
Jun 26 22:43:04 fenced receive_protocol from 1 max 1.1.1.0 run 1.1.1.1
Jun 26 22:43:04 fenced daemon node 1 max 1.1.1.0 run 1.1.1.1
Jun 26 22:43:04 fenced daemon node 1 join 1340720582 left 0 local quorum 1340720582
Jun 26 22:43:04 fenced fenced:daemon conf 1 0 1 memb 1 join left 2
Jun 26 22:43:04 fenced fenced:daemon ring 1:1224 1 memb 1

Two nodes both think they are both master, and will remove each other from cluster.

In this situation, I have to reboot hosta to rejoin cluster to become the slave one.

Comment 7 Fabio Massimo Di Nitto 2012-06-30 13:36:38 UTC

This is absolutely normal behaviour again.

You cannot plug the cable back without stopping the cluster on the failed node first.

Since you are running RHEL6.2, I strongly recommend you contact GSS that will point you towards the correct documentation on how to use and administer a cluster.

Comment 8 davidyangyi 2012-07-01 02:23:05 UTC

Fabio Massimo Di Nitto

Thank you, how can I contact GSS ?

Comment 9 Fabio Massimo Di Nitto 2012-07-01 06:15:35 UTC

GSS is Red Hat Global Support Services. You can find information on www.redhat.com website on how to open a ticket.