RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 836429 - hosta can never rejoin the cluster because of "reservation conflict" unless reboot -f hosta.
Summary: hosta can never rejoin the cluster because of "reservation conflict" unless r...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ryan O'Hara
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-29 05:21 UTC by davidyangyi
Modified: 2012-07-01 06:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-30 13:36:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description davidyangyi 2012-06-29 05:21:26 UTC
Description of problem:
I have a RHCS of rhel6.2, using iscsi lun as fence device(SCSI reservations). When I unplug wire of the master(hosta), the slave(hostb) fenced successfully and become master. And hosta is still thinking it is the master. 
When I plug wire back into hosta, hosta can never rejoin the cluster because of "reservation conflict" unless reboot -f hosta.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.hosta is master, hostb is slave
2.using iscsi as fence device
3.unplug hosta's wire, waiting hostb becoming master
4.plug hosta's wire
  
Actual results:


Expected results:


Additional info:

Comment 1 davidyangyi 2012-06-29 05:23:24 UTC
cluster.conf

<?xml version="1.0"?>
<cluster config_version="41" name="bcec_img">
        <clusternodes>
                <clusternode name="hosta" nodeid="1">
                        <fence>
                                <method name="scsi">
                                        <device name="scsifence"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="scsifence"/>
                        </unfence>
                </clusternode>
                <clusternode name="hostb" nodeid="2">
                        <fence>
                                <method name="scsi">
                                        <device name="scsifence"/>
                                </method>
                        </fence>
                        <unfence>
                                <device action="on" name="scsifence"/>
                        </unfence>
                </clusternode>
        </clusternodes>
        <cman broadcast="yes" expected_votes="1" two_node="1"/>
        <rm>
                <resources>
                        <ip address="172.16.200.31/25" monitor_link="on" sleeptime="10"/>
                        <fs device="/dev/mapper/mpathap1" fsid="7094" mountpoint="/bcec_images" name="bcec_imgages" quick_status="on"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="bcec_img" nofailback="1" ordered="0" restricted="0">
                                <failoverdomainnode name="hosta" priority="2"/>
                                <failoverdomainnode name="hostb" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <service domain="bcec_img" exclusive="1" name="bcec_img" recovery="relocate">
                        <ip ref="172.16.200.31/25"/>
                </service>
        </rm>
        <fence_daemon post_join_delay="25"/>
        <fencedevices>
                <fencedevice agent="fence_scsi" name="scsifence"/>
        </fencedevices>
        <logging debug="on">
                <logging_daemon debug="on" name="rgmanager"/>
                <logging_daemon debug="on" name="corosync"/>
                <logging_daemon debug="on" name="fenced"/>
                <logging_daemon debug="on" name="dlm_controld"/>
                <logging_daemon debug="on" name="corosync" subsys="CMAN"/>
        </logging>
</cluster>

Comment 3 Steven Dake 2012-06-29 18:54:56 UTC
If problem exists, in base cluster, not corosync.  Reassigning to cluster for triage.

Comment 4 Fabio Massimo Di Nitto 2012-06-30 04:01:46 UTC
Reassining to fence_scsi maintainer to confirm, but it sounds like the correct behaviour to me.

Comment 5 Fabio Massimo Di Nitto 2012-06-30 04:03:13 UTC
IF anything, cman has to be restarted in order to perform "unfencing".

plug the fiber back
restart cman -> unfencing
node rejoins the cluster.

Comment 6 davidyangyi 2012-06-30 13:15:40 UTC
Thank you.
I mean pull out ethernet cable and plug it back, not fiber to array.

But I can't restart cman or hosta remove hostb from cluster.

Here is the fenced.log on hostb
Jun 26 22:51:25 fenced daemon node 1 stateful merge
Jun 26 22:51:25 fenced daemon node 1 kill due to stateful merge
Jun 26 22:51:25 fenced telling cman to remove nodeid 1 from cluster
Jun 26 22:51:25 fenced daemon cpg_dispatch error 2
Jun 26 22:51:25 fenced cluster is down, exiting
Jun 26 22:51:25 fenced daemon cpg_dispatch error 2


Here is the fenced.log on hosta
Jun 26 22:42:48 fenced daemon node 2 stateful merge
Jun 26 22:42:48 fenced daemon node 2 kill due to stateful merge
Jun 26 22:42:48 fenced telling cman to remove nodeid 2 from cluster
Jun 26 22:42:48 fenced daemon_member 2 zero proto
Jun 26 22:43:04 fenced cluster node 2 removed seq 1224
Jun 26 22:43:04 fenced receive_protocol from 1 max 1.1.1.0 run 1.1.1.1
Jun 26 22:43:04 fenced daemon node 1 max 1.1.1.0 run 1.1.1.1
Jun 26 22:43:04 fenced daemon node 1 join 1340720582 left 0 local quorum 1340720582
Jun 26 22:43:04 fenced fenced:daemon conf 1 0 1 memb 1 join left 2
Jun 26 22:43:04 fenced fenced:daemon ring 1:1224 1 memb 1

Two nodes both think they are both master, and will remove each other from cluster.

In this situation, I have to reboot hosta to rejoin cluster to become the slave one.

Comment 7 Fabio Massimo Di Nitto 2012-06-30 13:36:38 UTC
This is absolutely normal behaviour again.

You cannot plug the cable back without stopping the cluster on the failed node first.

Since you are running RHEL6.2, I strongly recommend you contact GSS that will point you towards the correct documentation on how to use and administer a cluster.

Comment 8 davidyangyi 2012-07-01 02:23:05 UTC
Fabio Massimo Di Nitto

Thank you, how can I contact GSS ?

Comment 9 Fabio Massimo Di Nitto 2012-07-01 06:15:35 UTC
GSS is Red Hat Global Support Services. You can find information on www.redhat.com website on how to open a ticket.


Note You need to log in before you can comment on or make changes to this bug.