Bug 918502

Summary:

Node fails to rejoin the cluster after restart

Product:

Red Hat Enterprise Linux 6

Reporter:

Leon Fauster <leonfauster>

Component:

corosync

Assignee:

Jan Friesse <jfriesse>

Status:

CLOSED DUPLICATE

QA Contact:

Cluster QE <mspqa-list>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.4

CC:

ccaulfie, cluster-maint, dvossel, fdinitto, rpeterso, sdake, teigland

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-06-25 11:29:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rebooted node1 / not joined / resources started while also running on note2	none
crm_report: after node1 has joined the cluster / attribute standby still exists (standby_attr_after_join.tar.bz2)	none

Description Leon Fauster 2013-03-06 12:07:45 UTC

Created attachment 705895 [details]
rebooted node1 / not joined / resources started while also running on note2

----------------------------------------------------------------------------
Description of problem:
----------------------------------------------------------------------------

cman_pre_stop function of pacemakers init script 
sets node as standby while rebooting. This should
only stay until the node comes back. That is not
the case in some (unknown) circumstances. 

cman_pre_stop() 
{
   cname=`crm_node --name`
   crm_attribute -N $cname -n standby -v true -l reboot
   echo -n "Waiting for shutdown of managed resources"
...


----------------------------------------------------------------------------
Version-Release number of selected component (if applicable):
----------------------------------------------------------------------------
pacemaker-1.1.8-7.el6.x86_64
cman-3.0.12.1-49.el6.x86_64
corosync-1.4.1-15.el6.x86_64

----------------------------------------------------------------------------
How reproducible:
----------------------------------------------------------------------------
Steps to Reproduce:
1. boot two configured nodes. 
2. DC and resources is on node 1.
3. reboot node 1
4. resources will migrate to node 2
5. dc role will be tacked on node 2
6. node 1 will be marked as standby 


----------------------------------------------------------------------------
Actual results:
----------------------------------------------------------------------------
after reboot of node 1 - that note
will not join the cluster immediately
and will begin to start resources.
(at this stage the attached 
report "standby_attr" was generated)

after some minutes the cluster will
check this unwanted situation (resources
running on both nodes) and renegotiate
the current state. After recovering, the
node 1 is still listed as standby (at 
this stage the attached report 
"standby_attr_after_join" was generated)




----------------------------------------------------------------------------
Expected results:
----------------------------------------------------------------------------
after reboot the note should join the cluster
and delete the standby attribute because there
lifetime is "reboot"


----------------------------------------------------------------------------
Additional info:
----------------------------------------------------------------------------

the process to reproduce this situation is not clear. in some cases the 
rebooted note behaves as expected (joining cluster and deleting standby attribute).

Comment 1 Leon Fauster 2013-03-06 12:09:36 UTC

Created attachment 705896 [details]
crm_report: after node1 has joined the cluster / attribute standby still exists (standby_attr_after_join.tar.bz2)

Comment 2 Andrew Beekhof 2013-03-06 22:12:13 UTC

I see several reboots of both nodes in these logs.
Which reboot corresponded to the problem occuring?

Comment 3 Andrew Beekhof 2013-03-06 22:58:43 UTC

After burning about an hour digging through the logs in attachment #705895 [details], I discovered that "node 1" in the description is apparently not cn1.localdomain but cn2.localdomain.

It looks like CMAN is taking more than 4 minutes to form a combined cluster (11:56:28 -> 12:00:40).  During this time, cn1 is quiet but cn2 is logging the following in a tight loop:

Mar 06 11:57:27 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 11:57:27 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.201.21) ; members(old:1 left:0)
Mar 06 11:57:27 corosync [MAIN  ] Completed service synchronization, ready to provide service.

Handing off to the corosync/cman team.


Corosync logs from cn1:

Mar 06 11:55:20 corosync [CMAN  ] quorum lost, blocking activity
Mar 06 11:55:20 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 06 11:55:20 corosync [QUORUM] Members[1]: 1
Mar 06 11:55:20 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 11:55:20 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:2 left:1)
Mar 06 11:55:20 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Mar 06 11:57:55 corosync [CKPT  ] ========== Checkpoint Information ===========
Mar 06 11:57:55 corosync [CKPT  ] global_ckpt_id: 3
Mar 06 11:58:45 corosync [CKPT  ] ========== Checkpoint Information ===========
Mar 06 11:58:45 corosync [CKPT  ] global_ckpt_id: 3
Mar 06 12:00:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 06 12:00:41 corosync [CMAN  ] quorum regained, resuming activity
Mar 06 12:00:41 corosync [QUORUM] This node is within the primary component and will provide service.
Mar 06 12:00:41 corosync [QUORUM] Members[2]: 1 2
Mar 06 12:00:41 corosync [QUORUM] Members[2]: 1 2
Mar 06 12:00:41 corosync [CPG   ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:1 left:0)
Mar 06 12:00:41 corosync [MAIN  ] Completed service synchronization, ready to provide service.

Comment 4 Leon Fauster 2013-03-06 23:20:48 UTC

(In reply to comment #2)
> I see several reboots of both nodes in these logs.
> Which reboot corresponded to the problem occuring?

the last one.

Comment 5 Lon Hohberger 2013-03-06 23:26:25 UTC

<cman two_node="1" expected_votes="1"> should probably be set for this cluster

Comment 6 Lon Hohberger 2013-03-06 23:26:49 UTC

Please attach your cluster.conf

Comment 7 Leon Fauster 2013-03-06 23:33:23 UTC

(In reply to comment #3)
> After burning about an hour digging through the logs in attachment #705895 [details]
> [details], I discovered that "node 1" in the description is apparently not
> cn1.localdomain but cn2.localdomain.

Hello Andrew, sorry that it was not clearly described but it should be 
cn1.localdomain as in attachment #705896 [details] is shown in crm_mon.txt 
-> Node cn1.localdomain: standby. The rebooted node gets the standby attribute.



> It looks like CMAN is taking more than 4 minutes to form a combined cluster
> (11:56:28 -> 12:00:40).  


i can confirm this - some tests here with following monitor 

$ count=1; while [ 1 = 1 ]; do count=$(($count + 1)); corosync-objctl  | grep members ; echo ; echo $count; sleep 1 ; done

shows that a rebooted node needs ~216 seconds to join the cluster.




> During this time, cn1 is quiet but cn2 is logging the following in a tight loop:
> 
> Mar 06 11:57:27 corosync [TOTEM ] A processor joined or left the membership
> and a new membership was formed.
> Mar 06 11:57:27 corosync [CPG   ] chosen downlist: sender r(0)
> ip(192.168.201.21) ; members(old:1 left:0)
> Mar 06 11:57:27 corosync [MAIN  ] Completed service synchronization, ready
> to provide service.
> 
> Handing off to the corosync/cman team.


i expect that the standby attribute will be "deleted" if the 
node joins the cluster initially after reboot?

Comment 8 Leon Fauster 2013-03-06 23:33:59 UTC

(In reply to comment #6)
> Please attach your cluster.conf

<?xml version="1.0"?>
 <cluster name="sheeHA" config_version="8">
   <totem join="1000"/>
   <logging debug="off"/>
   <clusternodes>
     <clusternode name="cn1.localdomain" votes="1" nodeid="1">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="cn1.localdomain"/>
         </method>
       </fence>
     </clusternode>
     <clusternode name="cn2.localdomain" votes="1" nodeid="2">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="cn2.localdomain"/>
         </method>
       </fence>
     </clusternode>
   </clusternodes>
   <fencedevices>
     <fencedevice name="pcmk" agent="fence_pcmk"/>
   </fencedevices>
   <rm>
     <failoverdomains/>
     <resources/>
   </rm>
 </cluster>

Comment 9 Leon Fauster 2013-03-06 23:36:01 UTC

(In reply to comment #5)
> <cman two_node="1" expected_votes="1"> should probably be set for this
> cluster

also for my configuration (#8) (cman/corosync + pacemaker as crm)?

Comment 10 Fabio Massimo Di Nitto 2013-03-07 09:37:45 UTC

(In reply to comment #9)
> (In reply to comment #5)
> > <cman two_node="1" expected_votes="1"> should probably be set for this
> > cluster
> 
> also for my configuration (#8) (cman/corosync + pacemaker as crm)?

Yes, 2 node cluster needs some special handling, both for quorum and fencing.

Are those nodes virtual machine?

By the look at the 4 minutes delay and logs from the other node it could be a network / firewall problem.

Reassigning to corosync. cman doesn´t handle the membership at that level.

Comment 11 Leon Fauster 2013-03-07 11:24:26 UTC

(In reply to comment #10)
> Yes, 2 node cluster needs some special handling, both for quorum and fencing.

sure but i was expecting that this is passed through to pacemaker

my cib has 

    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
      </cluster_property_set>
    </crm_config>


in the initial setup i had that two_node config included after some problems 
i remove it. I will try it again ...


> Are those nodes virtual machine?

yes (xen) - to get a feeling about ha i started to deploy a ha environment in a test lab. 


> By the look at the 4 minutes delay and logs from the other node it could be
> a network / firewall problem.

well - iptables is disabled and the nodes are direct attach to a /24 network.

Comment 12 Fabio Massimo Di Nitto 2013-03-07 12:07:36 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > Yes, 2 node cluster needs some special handling, both for quorum and fencing.
> 
> sure but i was expecting that this is passed through to pacemaker
> 
> my cib has 
> 
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.1.8-7.el6-394e906"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="cman"/>
>         <nvpair id="cib-bootstrap-options-expected-quorum-votes"
> name="expected-quorum-votes" value="2"/>
>         <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="false"/>
>         <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy" value="ignore"/>
>       </cluster_property_set>
>     </crm_config>
> 
> 
> in the initial setup i had that two_node config included after some problems 
> i remove it. I will try it again ...
> 

AFAIK pacemaker does not configure cman at all. core parameters needs to be configured in cluster.conf.

> 
> > Are those nodes virtual machine?
> 
> yes (xen) - to get a feeling about ha i started to deploy a ha environment
> in a test lab. 

Be aware of the load on the host/hosts. Recently we have experienced "pauses" due to odd scheduling.

> 
> 
> > By the look at the 4 minutes delay and logs from the other node it could be
> > a network / firewall problem.
> 
> well - iptables is disabled and the nodes are direct attach to a /24 network.

Are you talking about the hosts or the guests? Check the iptables on both. Some virt implementations automatically add NAT rules that are undesired.

Comment 13 Leon Fauster 2013-03-07 17:34:22 UTC

i added <cman two_node="1" expected_votes="1"> and after rebooting node2 i started
cman manually and waited until the node2 had joined the cluster (node1)
This happens after ~170 seconds and immediately node1 began to fencing node2.


Mar  7 17:59:48 cn1 corosync[3582]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:1 left:0)
Mar  7 17:59:48 cn1 corosync[3582]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar  7 17:59:59 cn1 fenced[3625]: telling cman to remove nodeid 2 from cluster
Mar  7 18:00:09 cn1 corosync[3582]:   [TOTEM ] A processor failed, forming new configuration.
Mar  7 18:00:11 cn1 kernel: dlm: closing connection to node 2
Mar  7 18:00:11 cn1 corosync[3582]:   [QUORUM] Members[1]: 1
Mar  7 18:00:11 cn1 corosync[3582]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar  7 18:00:11 cn1 corosync[3582]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.201.20) ; members(old:2 left:1)
Mar  7 18:00:11 cn1 corosync[3582]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar  7 18:00:11 cn1 crmd[4158]:   notice: crm_update_peer_state: cman_event_callback: Node cn2.localdomain[2] - state is now lost
Mar  7 18:00:17 cn1 fenced[3625]: fencing node cn2.localdomain
Mar  7 18:00:17 cn1 root: fence_pcmk[5064]: Requesting Pacemaker fence cn2.localdomain (reset)
Mar  7 18:00:17 cn1 stonith_admin[5065]:   notice: crm_log_args: Invoked: stonith_admin --reboot cn2.localdomain --tolerance 5s 
Mar  7 18:00:17 cn1 stonith-ng[4154]:   notice: handle_request: Client stonith_admin.5065.dc69bbd3 wants to fence (reboot) 'cn2.localdomain' with device '(any)'
Mar  7 18:00:17 cn1 stonith-ng[4154]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for cn2.localdomain: ca2c181b-0aa1-430c-807e-17bc2a165169 (0)




and corosync was killed on node2 

$ corosync-objctl  | grep members
Could not initialize objdb library. Error 6
$ ps aux | grep coro
root      2552  3.0  0.0 105300   896 pts/1    S+   18:01   0:00 grep coro 







> > > Are those nodes virtual machine?
> > 
> > yes (xen) - to get a feeling about ha i started to deploy a ha environment in a test lab. 
> 
> Be aware of the load on the host/hosts. Recently we have experienced "pauses" due to odd scheduling.

its a test workstation -> without load. Deployment will happen on real hw.



> > > By the look at the 4 minutes delay and logs from the other node it could be
> > > a network / firewall problem.
> > 
> > well - iptables is disabled and the nodes are direct attach to a /24 network.
> 
> Are you talking about the hosts or the guests? Check the iptables on both.
> Some virt implementations automatically add NAT rules that are undesired.

its bridged - without filter rules.








Since i added <cman two_node="1" expected_votes="1">, i do not get the 
cluster running. Every node is shooting the other node in .... 

i appreciate any comments

Comment 14 Fabio Massimo Di Nitto 2013-03-07 17:37:22 UTC

(In reply to comment #13)
> i added <cman two_node="1" expected_votes="1"> and after rebooting node2 i
> started
> cman manually and waited until the node2 had joined the cluster (node1)
> This happens after ~170 seconds and immediately node1 began to fencing node2.

This is a clear sign that something is wrong with the network communication between the nodes.

Also you need to fix fencing and add delay to one of the nodes to avoid race conditions. There are documents in RH kbase for it or contact our customer support.

Comment 15 Leon Fauster 2013-03-07 23:15:57 UTC

(In reply to comment #14)
> This is a clear sign that something is wrong with the network communication between the nodes.

mmh - i am sure that this is not the case. Rebooting _both_ nodes at the same time (xen starts 
the vms sequentially). The cluster is formed!

while the last node is coming up: 
ping cn2
PING cn2.localdomain (192.168.201.21) 56(84) bytes of data.
64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=3 ttl=64 time=0.846 ms
64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=4 ttl=64 time=0.198 ms
64 bytes from cn2.localdomain (192.168.201.21): icmp_seq=5 ttl=64 time=0.182 ms

[root@cn2 ~]# corosync-objctl  | grep members
runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(192.168.201.20) 
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(192.168.201.21) 
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined

[root@cn2 ~]# uptime 
 00:09:17 up 1 min,  1 user,  load average: 0.44, 0.16, 0.05

[root@cn2 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    600   2013-03-08 00:08:30  cn1.localdomain
   2   M    600   2013-03-08 00:08:30  cn2.localdomain



> Also you need to fix fencing and add delay to one of the nodes to avoid race
> conditions. There are documents in RH kbase for it or contact our customer support.

i will look at this. Thanks.

Comment 16 Fabio Massimo Di Nitto 2013-04-09 19:21:21 UTC

Have you been able to verify your environment? are you still experiencing the issue?

Comment 19 Leon Fauster 2013-06-15 10:47:14 UTC

Hi Fabio, sorry for the delay. I switched over to native hardware and 
did a fresh setup of a two node cluster (fresh updated el6). So far the nodes reconnect to the running cluster after rebooting them. I have tested different reboot scenarios. It looks good so far. Just for the record: 

N1: FENCED_OPTS="-f 9 -j 30", N2: FENCED_OPTS="-f 3 -j 30"

<?xml version="1.0"?>
 <cluster name="foo" config_version="4">
   <cman expected_votes="1" two_node="1"/>
   <logging debug="off"/>
   <totem consensus="6000" secauth="0"/>
   <clusternodes>
     <clusternode name="bar1.foo" votes="1" nodeid="1">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="bar1.foo"/>
         </method>
       </fence>
     </clusternode>
     <clusternode name="bar2.foo" votes="1" nodeid="2">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="bar2.foo"/>
         </method>
       </fence>
     </clusternode>
   </clusternodes>
   <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
   </fencedevices>
   <rm>
     <failoverdomains/>
     <resources/>
   </rm>
 </cluster>


the only thing that confuses: crm_mon shows unknown expected votes
$ crm_mon -1 | head -6
Last updated: Sat Jun 15 12:44:21 2013
Last change: Sat Jun 15 12:35:49 2013 via crm_resource on bar1.foo
Stack: cman
Current DC: bar1.foo - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
                    ^^^^^^^^^^^^^^^^^^^^^^


anyway - thank you for your time!

Comment 20 Leon Fauster 2013-06-24 18:20:07 UTC

after reading [1] i took a look at my 
test cluster (virtual nodes) again and 
reconfigured the xen host (dom0)

echo 1 > /sys/class/net/xenbr0/bridge/multicast_querier

this helped to avoid the "Node fails to rejoin the 
cluster after restart"-problem. I could't reproduce 
the problem anymore. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=880035#c9

PS: Xen host (dom0) runs EL5 - the nodes EL6

Comment 21 Jan Friesse 2013-06-25 11:29:31 UTC

(In reply to Leon Fauster from comment #20)
> after reading [1] i took a look at my 
> test cluster (virtual nodes) again and 
> reconfigured the xen host (dom0)
> 
> echo 1 > /sys/class/net/xenbr0/bridge/multicast_querier
> 
> this helped to avoid the "Node fails to rejoin the 
> cluster after restart"-problem. I could't reproduce 
> the problem anymore. 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=880035#c9
> 
> PS: Xen host (dom0) runs EL5 - the nodes EL6

Perfect. Marking as clone of bug 902454.

*** This bug has been marked as a duplicate of bug 902454 ***