Bug 836429
| Summary: | hosta can never rejoin the cluster because of "reservation conflict" unless reboot -f hosta. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | davidyangyi <davidyangyi> |
| Component: | cluster | Assignee: | Ryan O'Hara <rohara> |
| Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2 | CC: | ccaulfie, cluster-maint, davidyangyi, dyasny, fdinitto, lhh, rpeterso, sdake, teigland |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-06-30 13:36:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
davidyangyi
2012-06-29 05:21:26 UTC
cluster.conf
<?xml version="1.0"?>
<cluster config_version="41" name="bcec_img">
<clusternodes>
<clusternode name="hosta" nodeid="1">
<fence>
<method name="scsi">
<device name="scsifence"/>
</method>
</fence>
<unfence>
<device action="on" name="scsifence"/>
</unfence>
</clusternode>
<clusternode name="hostb" nodeid="2">
<fence>
<method name="scsi">
<device name="scsifence"/>
</method>
</fence>
<unfence>
<device action="on" name="scsifence"/>
</unfence>
</clusternode>
</clusternodes>
<cman broadcast="yes" expected_votes="1" two_node="1"/>
<rm>
<resources>
<ip address="172.16.200.31/25" monitor_link="on" sleeptime="10"/>
<fs device="/dev/mapper/mpathap1" fsid="7094" mountpoint="/bcec_images" name="bcec_imgages" quick_status="on"/>
</resources>
<failoverdomains>
<failoverdomain name="bcec_img" nofailback="1" ordered="0" restricted="0">
<failoverdomainnode name="hosta" priority="2"/>
<failoverdomainnode name="hostb" priority="1"/>
</failoverdomain>
</failoverdomains>
<service domain="bcec_img" exclusive="1" name="bcec_img" recovery="relocate">
<ip ref="172.16.200.31/25"/>
</service>
</rm>
<fence_daemon post_join_delay="25"/>
<fencedevices>
<fencedevice agent="fence_scsi" name="scsifence"/>
</fencedevices>
<logging debug="on">
<logging_daemon debug="on" name="rgmanager"/>
<logging_daemon debug="on" name="corosync"/>
<logging_daemon debug="on" name="fenced"/>
<logging_daemon debug="on" name="dlm_controld"/>
<logging_daemon debug="on" name="corosync" subsys="CMAN"/>
</logging>
</cluster>
If problem exists, in base cluster, not corosync. Reassigning to cluster for triage. Reassining to fence_scsi maintainer to confirm, but it sounds like the correct behaviour to me. IF anything, cman has to be restarted in order to perform "unfencing". plug the fiber back restart cman -> unfencing node rejoins the cluster. Thank you. I mean pull out ethernet cable and plug it back, not fiber to array. But I can't restart cman or hosta remove hostb from cluster. Here is the fenced.log on hostb Jun 26 22:51:25 fenced daemon node 1 stateful merge Jun 26 22:51:25 fenced daemon node 1 kill due to stateful merge Jun 26 22:51:25 fenced telling cman to remove nodeid 1 from cluster Jun 26 22:51:25 fenced daemon cpg_dispatch error 2 Jun 26 22:51:25 fenced cluster is down, exiting Jun 26 22:51:25 fenced daemon cpg_dispatch error 2 Here is the fenced.log on hosta Jun 26 22:42:48 fenced daemon node 2 stateful merge Jun 26 22:42:48 fenced daemon node 2 kill due to stateful merge Jun 26 22:42:48 fenced telling cman to remove nodeid 2 from cluster Jun 26 22:42:48 fenced daemon_member 2 zero proto Jun 26 22:43:04 fenced cluster node 2 removed seq 1224 Jun 26 22:43:04 fenced receive_protocol from 1 max 1.1.1.0 run 1.1.1.1 Jun 26 22:43:04 fenced daemon node 1 max 1.1.1.0 run 1.1.1.1 Jun 26 22:43:04 fenced daemon node 1 join 1340720582 left 0 local quorum 1340720582 Jun 26 22:43:04 fenced fenced:daemon conf 1 0 1 memb 1 join left 2 Jun 26 22:43:04 fenced fenced:daemon ring 1:1224 1 memb 1 Two nodes both think they are both master, and will remove each other from cluster. In this situation, I have to reboot hosta to rejoin cluster to become the slave one. This is absolutely normal behaviour again. You cannot plug the cable back without stopping the cluster on the failed node first. Since you are running RHEL6.2, I strongly recommend you contact GSS that will point you towards the correct documentation on how to use and administer a cluster. Fabio Massimo Di Nitto Thank you, how can I contact GSS ? GSS is Red Hat Global Support Services. You can find information on www.redhat.com website on how to open a ticket. |