Bug 918502
Summary: | Node fails to rejoin the cluster after restart | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Leon Fauster <leonfauster> |
Component: | corosync | Assignee: | Jan Friesse <jfriesse> |
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | ccaulfie, cluster-maint, dvossel, fdinitto, rpeterso, sdake, teigland |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-06-25 11:29:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Created attachment 705896 [details] crm_report: after node1 has joined the cluster / attribute standby still exists (standby_attr_after_join.tar.bz2) I see several reboots of both nodes in these logs. Which reboot corresponded to the problem occuring? After burning about an hour digging through the logs in attachment #705895 [details], I discovered that "node 1" in the description is apparently not cn1.localdomain but cn2.localdomain.
It looks like CMAN is taking more than 4 minutes to form a combined cluster (11:56:28 -> 12:00:40). During this time, cn1 is quiet but cn2 is logging the following in a tight loop:
Mar 06 11:57:27 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 11:57:27 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.201.21) ; members(old:1 left:0)
Mar 06 11:57:27 corosync [MAIN ] Completed service synchronization, ready to provide service.
Handing off to the corosync/cman team.
Corosync logs from cn1:
Mar 06 11:55:20 corosync [CMAN ] quorum lost, blocking activity
Mar 06 11:55:20 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 06 11:55:20 corosync [QUORUM] Members[1]: 1
Mar 06 11:55:20 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 11:55:20 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:2 left:1)
Mar 06 11:55:20 corosync [MAIN ] Completed service synchronization, ready to provide service.
Mar 06 11:57:55 corosync [CKPT ] ========== Checkpoint Information ===========
Mar 06 11:57:55 corosync [CKPT ] global_ckpt_id: 3
Mar 06 11:58:45 corosync [CKPT ] ========== Checkpoint Information ===========
Mar 06 11:58:45 corosync [CKPT ] global_ckpt_id: 3
Mar 06 12:00:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 12:00:41 corosync [CMAN ] quorum regained, resuming activity
Mar 06 12:00:41 corosync [QUORUM] This node is within the primary component and will provide service.
Mar 06 12:00:41 corosync [QUORUM] Members[2]: 1 2
Mar 06 12:00:41 corosync [QUORUM] Members[2]: 1 2
Mar 06 12:00:41 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:1 left:0)
Mar 06 12:00:41 corosync [MAIN ] Completed service synchronization, ready to provide service.
(In reply to comment #2) > I see several reboots of both nodes in these logs. > Which reboot corresponded to the problem occuring? the last one. <cman two_node="1" expected_votes="1"> should probably be set for this cluster Please attach your cluster.conf (In reply to comment #3) > After burning about an hour digging through the logs in attachment #705895 [details] > [details], I discovered that "node 1" in the description is apparently not > cn1.localdomain but cn2.localdomain. Hello Andrew, sorry that it was not clearly described but it should be cn1.localdomain as in attachment #705896 [details] is shown in crm_mon.txt -> Node cn1.localdomain: standby. The rebooted node gets the standby attribute. > It looks like CMAN is taking more than 4 minutes to form a combined cluster > (11:56:28 -> 12:00:40). i can confirm this - some tests here with following monitor $ count=1; while [ 1 = 1 ]; do count=$(($count + 1)); corosync-objctl | grep members ; echo ; echo $count; sleep 1 ; done shows that a rebooted node needs ~216 seconds to join the cluster. > During this time, cn1 is quiet but cn2 is logging the following in a tight loop: > > Mar 06 11:57:27 corosync [TOTEM ] A processor joined or left the membership > and a new membership was formed. > Mar 06 11:57:27 corosync [CPG ] chosen downlist: sender r(0) > ip(192.168.201.21) ; members(old:1 left:0) > Mar 06 11:57:27 corosync [MAIN ] Completed service synchronization, ready > to provide service. > > Handing off to the corosync/cman team. i expect that the standby attribute will be "deleted" if the node joins the cluster initially after reboot? (In reply to comment #6) > Please attach your cluster.conf <?xml version="1.0"?> <cluster name="sheeHA" config_version="8"> <totem join="1000"/> <logging debug="off"/> <clusternodes> <clusternode name="cn1.localdomain" votes="1" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="cn1.localdomain"/> </method> </fence> </clusternode> <clusternode name="cn2.localdomain" votes="1" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="cn2.localdomain"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster> (In reply to comment #5) > <cman two_node="1" expected_votes="1"> should probably be set for this > cluster also for my configuration (#8) (cman/corosync + pacemaker as crm)? (In reply to comment #9) > (In reply to comment #5) > > <cman two_node="1" expected_votes="1"> should probably be set for this > > cluster > > also for my configuration (#8) (cman/corosync + pacemaker as crm)? Yes, 2 node cluster needs some special handling, both for quorum and fencing. Are those nodes virtual machine? By the look at the 4 minutes delay and logs from the other node it could be a network / firewall problem. Reassigning to corosync. cman doesn´t handle the membership at that level. (In reply to comment #10) > Yes, 2 node cluster needs some special handling, both for quorum and fencing. sure but i was expecting that this is passed through to pacemaker my cib has <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/> </cluster_property_set> </crm_config> in the initial setup i had that two_node config included after some problems i remove it. I will try it again ... > Are those nodes virtual machine? yes (xen) - to get a feeling about ha i started to deploy a ha environment in a test lab. > By the look at the 4 minutes delay and logs from the other node it could be > a network / firewall problem. well - iptables is disabled and the nodes are direct attach to a /24 network. (In reply to comment #11) > (In reply to comment #10) > > Yes, 2 node cluster needs some special handling, both for quorum and fencing. > > sure but i was expecting that this is passed through to pacemaker > > my cib has > > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="1.1.8-7.el6-394e906"/> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" > name="cluster-infrastructure" value="cman"/> > <nvpair id="cib-bootstrap-options-expected-quorum-votes" > name="expected-quorum-votes" value="2"/> > <nvpair id="cib-bootstrap-options-stonith-enabled" > name="stonith-enabled" value="false"/> > <nvpair id="cib-bootstrap-options-no-quorum-policy" > name="no-quorum-policy" value="ignore"/> > </cluster_property_set> > </crm_config> > > > in the initial setup i had that two_node config included after some problems > i remove it. I will try it again ... > AFAIK pacemaker does not configure cman at all. core parameters needs to be configured in cluster.conf. > > > Are those nodes virtual machine? > > yes (xen) - to get a feeling about ha i started to deploy a ha environment > in a test lab. Be aware of the load on the host/hosts. Recently we have experienced "pauses" due to odd scheduling. > > > > By the look at the 4 minutes delay and logs from the other node it could be > > a network / firewall problem. > > well - iptables is disabled and the nodes are direct attach to a /24 network. Are you talking about the hosts or the guests? Check the iptables on both. Some virt implementations automatically add NAT rules that are undesired. i added <cman two_node="1" expected_votes="1"> and after rebooting node2 i started cman manually and waited until the node2 had joined the cluster (node1) This happens after ~170 seconds and immediately node1 began to fencing node2. Mar 7 17:59:48 cn1 corosync[3582]: [CPG ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:1 left:0) Mar 7 17:59:48 cn1 corosync[3582]: [MAIN ] Completed service synchronization, ready to provide service. Mar 7 17:59:59 cn1 fenced[3625]: telling cman to remove nodeid 2 from cluster Mar 7 18:00:09 cn1 corosync[3582]: [TOTEM ] A processor failed, forming new configuration. Mar 7 18:00:11 cn1 kernel: dlm: closing connection to node 2 Mar 7 18:00:11 cn1 corosync[3582]: [QUORUM] Members[1]: 1 Mar 7 18:00:11 cn1 corosync[3582]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Mar 7 18:00:11 cn1 corosync[3582]: [CPG ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:2 left:1) Mar 7 18:00:11 cn1 corosync[3582]: [MAIN ] Completed service synchronization, ready to provide service. Mar 7 18:00:11 cn1 crmd[4158]: notice: crm_update_peer_state: cman_event_callback: Node cn2.localdomain[2] - state is now lost Mar 7 18:00:17 cn1 fenced[3625]: fencing node cn2.localdomain Mar 7 18:00:17 cn1 root: fence_pcmk[5064]: Requesting Pacemaker fence cn2.localdomain (reset) Mar 7 18:00:17 cn1 stonith_admin[5065]: notice: crm_log_args: Invoked: stonith_admin --reboot cn2.localdomain --tolerance 5s Mar 7 18:00:17 cn1 stonith-ng[4154]: notice: handle_request: Client stonith_admin.5065.dc69bbd3 wants to fence (reboot) 'cn2.localdomain' with device '(any)' Mar 7 18:00:17 cn1 stonith-ng[4154]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for cn2.localdomain: ca2c181b-0aa1-430c-807e-17bc2a165169 (0) and corosync was killed on node2 $ corosync-objctl | grep members Could not initialize objdb library. Error 6 $ ps aux | grep coro root 2552 3.0 0.0 105300 896 pts/1 S+ 18:01 0:00 grep coro > > > Are those nodes virtual machine? > > > > yes (xen) - to get a feeling about ha i started to deploy a ha environment in a test lab. > > Be aware of the load on the host/hosts. Recently we have experienced "pauses" due to odd scheduling. its a test workstation -> without load. Deployment will happen on real hw. > > > By the look at the 4 minutes delay and logs from the other node it could be > > > a network / firewall problem. > > > > well - iptables is disabled and the nodes are direct attach to a /24 network. > > Are you talking about the hosts or the guests? Check the iptables on both. > Some virt implementations automatically add NAT rules that are undesired. its bridged - without filter rules. Since i added <cman two_node="1" expected_votes="1">, i do not get the cluster running. Every node is shooting the other node in .... i appreciate any comments (In reply to comment #13) > i added <cman two_node="1" expected_votes="1"> and after rebooting node2 i > started > cman manually and waited until the node2 had joined the cluster (node1) > This happens after ~170 seconds and immediately node1 began to fencing node2. This is a clear sign that something is wrong with the network communication between the nodes. Also you need to fix fencing and add delay to one of the nodes to avoid race conditions. There are documents in RH kbase for it or contact our customer support. (In reply to comment #14) > This is a clear sign that something is wrong with the network communication between the nodes. mmh - i am sure that this is not the case. Rebooting _both_ nodes at the same time (xen starts the vms sequentially). The cluster is formed! while the last node is coming up: ping cn2 PING cn2.localdomain (192.168.201.21) 56(84) bytes of data. 64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=1 ttl=64 time=0.140 ms 64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=3 ttl=64 time=0.846 ms 64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=4 ttl=64 time=0.198 ms 64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=5 ttl=64 time=0.182 ms [root@cn2 ~]# corosync-objctl | grep members runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(192.168.201.20) runtime.totem.pg.mrp.srp.members.1.join_count=1 runtime.totem.pg.mrp.srp.members.1.status=joined runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(192.168.201.21) runtime.totem.pg.mrp.srp.members.2.join_count=1 runtime.totem.pg.mrp.srp.members.2.status=joined [root@cn2 ~]# uptime 00:09:17 up 1 min, 1 user, load average: 0.44, 0.16, 0.05 [root@cn2 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 600 2013-03-08 00:08:30 cn1.localdomain 2 M 600 2013-03-08 00:08:30 cn2.localdomain > Also you need to fix fencing and add delay to one of the nodes to avoid race > conditions. There are documents in RH kbase for it or contact our customer support. i will look at this. Thanks. Have you been able to verify your environment? are you still experiencing the issue? Hi Fabio, sorry for the delay. I switched over to native hardware and did a fresh setup of a two node cluster (fresh updated el6). So far the nodes reconnect to the running cluster after rebooting them. I have tested different reboot scenarios. It looks good so far. Just for the record: N1: FENCED_OPTS="-f 9 -j 30", N2: FENCED_OPTS="-f 3 -j 30" <?xml version="1.0"?> <cluster name="foo" config_version="4"> <cman expected_votes="1" two_node="1"/> <logging debug="off"/> <totem consensus="6000" secauth="0"/> <clusternodes> <clusternode name="bar1.foo" votes="1" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="bar1.foo"/> </method> </fence> </clusternode> <clusternode name="bar2.foo" votes="1" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="bar2.foo"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster> the only thing that confuses: crm_mon shows unknown expected votes $ crm_mon -1 | head -6 Last updated: Sat Jun 15 12:44:21 2013 Last change: Sat Jun 15 12:35:49 2013 via crm_resource on bar1.foo Stack: cman Current DC: bar1.foo - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, unknown expected votes ^^^^^^^^^^^^^^^^^^^^^^ anyway - thank you for your time! after reading [1] i took a look at my test cluster (virtual nodes) again and reconfigured the xen host (dom0) echo 1 > /sys/class/net/xenbr0/bridge/multicast_querier this helped to avoid the "Node fails to rejoin the cluster after restart"-problem. I could't reproduce the problem anymore. [1] https://bugzilla.redhat.com/show_bug.cgi?id=880035#c9 PS: Xen host (dom0) runs EL5 - the nodes EL6 (In reply to Leon Fauster from comment #20) > after reading [1] i took a look at my > test cluster (virtual nodes) again and > reconfigured the xen host (dom0) > > echo 1 > /sys/class/net/xenbr0/bridge/multicast_querier > > this helped to avoid the "Node fails to rejoin the > cluster after restart"-problem. I could't reproduce > the problem anymore. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=880035#c9 > > PS: Xen host (dom0) runs EL5 - the nodes EL6 Perfect. Marking as clone of bug 902454. *** This bug has been marked as a duplicate of bug 902454 *** |
Created attachment 705895 [details] rebooted node1 / not joined / resources started while also running on note2 ---------------------------------------------------------------------------- Description of problem: ---------------------------------------------------------------------------- cman_pre_stop function of pacemakers init script sets node as standby while rebooting. This should only stay until the node comes back. That is not the case in some (unknown) circumstances. cman_pre_stop() { cname=`crm_node --name` crm_attribute -N $cname -n standby -v true -l reboot echo -n "Waiting for shutdown of managed resources" ... ---------------------------------------------------------------------------- Version-Release number of selected component (if applicable): ---------------------------------------------------------------------------- pacemaker-1.1.8-7.el6.x86_64 cman-3.0.12.1-49.el6.x86_64 corosync-1.4.1-15.el6.x86_64 ---------------------------------------------------------------------------- How reproducible: ---------------------------------------------------------------------------- Steps to Reproduce: 1. boot two configured nodes. 2. DC and resources is on node 1. 3. reboot node 1 4. resources will migrate to node 2 5. dc role will be tacked on node 2 6. node 1 will be marked as standby ---------------------------------------------------------------------------- Actual results: ---------------------------------------------------------------------------- after reboot of node 1 - that note will not join the cluster immediately and will begin to start resources. (at this stage the attached report "standby_attr" was generated) after some minutes the cluster will check this unwanted situation (resources running on both nodes) and renegotiate the current state. After recovering, the node 1 is still listed as standby (at this stage the attached report "standby_attr_after_join" was generated) ---------------------------------------------------------------------------- Expected results: ---------------------------------------------------------------------------- after reboot the note should join the cluster and delete the standby attribute because there lifetime is "reboot" ---------------------------------------------------------------------------- Additional info: ---------------------------------------------------------------------------- the process to reproduce this situation is not clear. in some cases the rebooted note behaves as expected (joining cluster and deleting standby attribute).